Most of the researchers submit their research papers to academic conference because its a faster way of making the results available. Finding and selecting a suitable conference has always been challenging especially for young researchers.
However, based on the previous conferences proceeding data, the researchers can increase their chances of paper acceptance and publication. We will try to solve this text classification problem with deep learning using BERT.
Almost all the code were taken from this tutorial, the only difference is the data.
The dataset contains 2,507 research paper titles, and have been manually classified into 5 categories (i.e. conferences) that can be downloaded from here.
import torch
from tqdm.notebook import tqdm
from transformers import BertTokenizer
from torch.utils.data import TensorDataset
from transformers import BertForSequenceClassification
df = pd.read_csv('data/title_conference.csv')
df.head()
view raw
conf_explore.py hosted with ❤ by GitHub
conf_explore.py
Table 1
df['Conference'].value_counts()
Figure 1
You may have noticed that our classes are imbalanced, and we will address this later on.
#machine-learning #nlp #document-classification #nlp-tutorial #text-classification #deep learning