Waves of hypes on data science technologies have been surging day after day. Many of you might have been tired of surfing through large and small hypes to bring a solution for a new proposal or to pretend you are familiar with new AI technologies in a meeting. The targets of this article are managers without a strong data science background, junior-level data scientists, and developers who are interested in NLP, and the goal is to help them to feel comfortable with BERT with minimal knowledge and to say “Can we use BERT for this project?” intelligently.

Table of Contents

  1. Why should we learn about BERT now and use it?
  2. How does BERT work in layman’s terms?
  3. What value does BERT bring to your projects?
  4. Where can we learn more about BERT?

1. Why Should We Learn about BERT Now and Use it?

It is sometimes a good strategy to postpone learning about new technologies in the data science domain due to hypes and biased success stories. Let’s start with my short answer about “now.” BERT was developed and published by Google in 2018, and after two years the open-source community around this approach has become mature and we can use an amazing toolbox developed by them. We should learn about BERT because it has alleviated the problems in the efficiency of model training unlike conventional recurrent neural networks as Culurciello discussed in The fall of RNN / LSTM. We should use BERT because it is easy to finetune and use models thanks to Hugging Face’s framework, and we can switch from BERT to other state-of-art NLP models by making small modifications to your code — sometimes just a few lines of modifications.

2. How Does BERT Work in Layman’s Terms?

The official name of BERT is sort of long, and it stands for Bidirectional Encoder Representations from Transformers. Let’s start with the last word, which is most important. **Transformer **is a type of neural network architecture, and BERT and its derivatives inherit a similar architecture. Unlike Recurrent Neural Networks, Transformers do not require the sequence of inputs to be processed in the order. It means that Transformer does not need to process the beginning of a sentence before it processes the middle or the end of the sentence. On the other hand, RNN needs to process the input in the order; thus, it creates bottlenecks. This feature gives Transformer much more freedom to run parallelization during model training.

#data-science #nlp #machine-learning #deep learning #deep learning

Minimal Requirements to Pretend You are Familiar with BERT
1.50 GEEK