Transformers is the state of the art model which have been used to solve novel NLP tasks ranging from sentiment analysis to questions/answering in a very efficent way. However, transformers at its very basic is just stacks of Encoder Layers of Attention Mechanism. Implementing it from scratch could be quite difficult and challenging even with the use of DL frameworks such as Pytorch or Tensorflow. However hugging face has made it quite easy to implement various types of transformers. In this article, I am going to show to you as how through hugging face library you can easily implement transformers in Tensorflow(Keras).

What you need:

Firstly you need to install the hugging face library which is really easy. Just simply pip install it:

pip install transformers

Secondly, you will be needing the latest TensorFlow version which can also be easily installed through pip.

Data:

For the purpose of testing and implementing different transformers, I have used data from a kaggle competition. It is a recent competition in which I took a part called jigsaw-multilingual-toxic-comment-classification. However, it is not compulsory that you use the same data, as the following implementation can be easily adapted to any text data.

This competition gave different comments and our task was to detect whether that specific comment is toxic or not. Therefore this is a binary classification task.

Heavy Compute Power:

Also Note that transformers have millions of parameters, therefore I utilized the TPUS given by Kaggle Kernel to train my model. Alternatively, you could use google colab to follow the implementation of this article, given that you do not have a local powerful machine.

Let’s Have Fun Implementing Transformers:

Image for post

Imports

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import tensorflow as tf
import tensorflow_hub as hub
from tqdm import tqdm
from tqdm import tqdm_notebook
from sklearn.metrics import auc
from sklearn.metrics import classification_report
import seaborn as sns
import matplotlib.pyplot as plt

from transformers import AutoTokenizer,BertTokenizer,TFBertModel,TFOpenAIGPTModel,OpenAIGPTTokenizer,DistilBertTokenizer, TFDistilBertModel,XLMTokenizer, TFXLMModel
from transformers import TFAutoModel, AutoTokenizer
from kaggle_datasets import KaggleDatasets
from sklearn.metrics import roc_curve,confusion_matrix,auc
from tokenizers import Tokenizer, models, pre_tokenizers, decoders, processors
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory
import matplotlib as mpl

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.model_selection import train_test_split

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import *
from tensorflow.keras.initializers import Constant

Which Transformers:

The following Transformers Architecture have been tested in the notebook

1-BERT

2-OpenAIGPT

3-DistillBERT

4-XLM

5-XLMRobertaLarge

Don’t worry about the implementation of all these transformers. The implementation is straightforward and similar.

Hyperparameters Used:

EPOCHS=2

max_seq_length = 192
LEARNING_RATE=1e-5
early_stopping=early_stopping = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss', 
    verbose=1,
    patience=10,
    mode='max',
    restore_best_weights=True)

Encoding Function:

Every transformer encodes each sentence. I hope you do understand what it means to encode the sentence. If not then they are numerous good resource on the internet to understand the encoding. At a very basic level, encoding means to convert raw text data into numerical data by assigning a unique integer to each individual word(tokens) in our corpus. However, the transformer encoding is a little bit more complex as it also uses character level encoding where an unknown word is broke into individual characters and then encoded. However, I will not be further going into the details of how transformers encoding works as it quite detailed. It is just safe to say that the next function basically converts each sentence in your data into a list of special integers which are understandable by the various transformers:

def single_encoding_function(text,tokenizer,name='BERT'):
    input_ids=[]
    if name=='BERT':
        tokenizer.pad_token ='[PAD]'
    elif name=='OPENAIGPT2':
        tokenizer.pad_token='<unk>'
    elif name=='Transformer XL':
        print(tokenizer.eos_token)
        tokenizer.pad_token= tokenizer.eos_token
    elif name=='DistilBert':
        tokenizer.pad_token='[PAD]'

for sentence in tqdm(text):       encoded=tokenizer.encode(sentence,max_length=max_seq_length,
pad_to_aax_length=True)## this is inside the loop
        input_ids.append(encoded)
    return input_ids

#nlp #deep-learning #transformers #classification #hugging-face #deep learning

What you need:

Let’s Have Fun Implementing Transformers:

medium.com

Easily Implement Different Transformers🤗🤗 through Hugging Face