Most of the researchers submit their research papers to academic conference because its a faster way of making the results available. Finding and selecting a suitable conference has always been challenging especially for young researchers.

However, based on the previous conferences proceeding data, the researchers can increase their chances of paper acceptance and publication. We will try to solve this text classification problem with deep learning using BERT.

Almost all the code were taken from this tutorial, the only difference is the data.

The Data

The dataset contains 2,507 research paper titles, and have been manually classified into 5 categories (i.e. conferences) that can be downloaded from here.

Explore and Preprocess

import torch
	from tqdm.notebook import tqdm

	from transformers import BertTokenizer
	from torch.utils.data import TensorDataset

	from transformers import BertForSequenceClassification

	df = pd.read_csv('data/title_conference.csv')
	df.head()
view raw
conf_explore.py hosted with ❤ by GitHub

conf_explore.py

Image for post

Table 1

df['Conference'].value_counts()

Image for post

Figure 1

You may have noticed that our classes are imbalanced, and we will address this later on.

#machine-learning #nlp #document-classification #nlp-tutorial #text-classification #deep learning

Multi Class Text Classification With Deep Learning Using BERT
15.00 GEEK