Fine Tuning BERT for Text Classification and Question Answering using TensorFlow Framework

Fine Tuning BERT for Text Classification and Question Answering using TensorFlow Framework

A Comprehensive Guide To Fine-Tuning BERT For Text Classification And SQuAD Tasks. Fine Tuning BERT for Text Classification and Question Answering using TensorFlow Framework

Google BERT (Bidirectional** Encoder Representations from **Transformers) and other transformer-based models further improved the state of the art on eleven natural language processing tasks under broad categories of single text classification (e.g., sentiment analysis), text pair classification (e.g., natural language inference), question answering ( like SQuAD 1.1) and text tagging (e.g., named entity recognition).

BERT model is based on a few key ideas:

  • Аttention only model without RNNs (LSTM/GRU) is computationally more attractive (parallel rather than sequential processing of input) and even has better performance (ability remember information beyond just about 100+ words_) _than RNNs.
  • BERT uses an idea of representing words as subwords or ngrams. On average a vocab of 8k to 30k ngrams can represent any word in a large corpus. This has a significant advantage from memory perspective.
  • Eliminates the need for task specific architectures. A pre-trained BERT model can be used as is for a wide variety of NLP tasks with fine-tuning. This avoids the need for task specific architectures (like  ELMo) that we needed before — for example, a model forQ&A would have a very different architecture from a model that solved NER.
  • Word2vec and Glove word embeddings are context independent — these models output just one vector (embedding) for each word, combining all the different senses of the word into one vector. Given the abundance of  polysemy and complex semantics in natural languages, context-independent representations have obvious limitations. For instance, the word crane *in contexts *a crane is flying and a crane driver came has completely different meanings; thus, the same word may be assigned different representations depending on contexts. BERT can generate different word embeddings for a word that captures the context of a word — that is its position in a sentence.
  • Unlike the  GPT model, which also represents an effort in designing a general task-agnostic model for context-sensitive representations, BERT encodes context bidirectionally, while due to the autoregressive nature of language models, GPT only looks forward (left-to-right).
  • Transfer learning. This advantage has nothing directly to do with the model architecture — but the fact that these models are trained on a language modeling task (and other tasks too in the case of BERT_)_ they can be used for downstream tasks which have very little labeled data. During supervised learning of downstream tasks, BERT is similar to GPT in two aspects. First, BERT representations will be fed into an added output layer, with minimal changes to the model architecture depending on nature of tasks, such as predicting for every token vs. predicting for the entire sequence. Second, all the parameters of the pretrained Transformer encoder are fine-tuned, while the additional output layer will be trained from scratch.

artificial-intelligence deep-learning data-science machine-learning tensorflow

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Most popular Data Science and Machine Learning courses — July 2020

Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant

Artificial Intelligence vs. Machine Learning vs. Deep Learning

Simple explanations of Artificial Intelligence, Machine Learning, and Deep Learning and how they’re all different

Artificial Intelligence, Machine Learning, Deep Learning 

Artificial Intelligence (AI) will and is currently taking over an important role in our lives — not necessarily through intelligent robots.

Data Augmentation in Deep Learning | Data Science | Machine Learning

Data Augmentation is a technique in Deep Learning which helps in adding value to our base dataset by adding the gathered information from various sources to improve the quality of data of an organisation.

Difference between Machine Learning, Data Science, AI, Deep Learning, and Statistics

In this article, I clarify the various roles of the data scientist, and how data science compares and overlaps with related fields such as machine learning, deep learning, AI, statistics, IoT, operations research, and applied mathematics.