Rahul  Hickle

Rahul Hickle

1591347360

Understanding ULMFiT — The Shift towards Transfer Learning in NLP

Understanding ULMFiT — The Shift Towards Transfer Learning in NLP
An understanding of what ULMFiT is; what it does and how it works; and the radical shift that it caused in NLP
Natural language processing has picked up pace over the last decade and with the with the increasing ease of implementing deep learning, there have been major…

#transfer-learning #deep-learning #nlp #machine-learning

What is GEEK

Buddha Community

Understanding ULMFiT — The Shift towards Transfer Learning in NLP
Jerad  Bailey

Jerad Bailey

1598891580

Google Reveals "What is being Transferred” in Transfer Learning

Recently, researchers from Google proposed the solution of a very fundamental question in the machine learning community — What is being transferred in Transfer Learning? They explained various tools and analyses to address the fundamental question.

The ability to transfer the domain knowledge of one machine in which it is trained on to another where the data is usually scarce is one of the desired capabilities for machines. Researchers around the globe have been using transfer learning in various deep learning applications, including object detection, image classification, medical imaging tasks, among others.

#developers corner #learn transfer learning #machine learning #transfer learning #transfer learning methods #transfer learning resources

Rahul  Hickle

Rahul Hickle

1591347360

Understanding ULMFiT — The Shift towards Transfer Learning in NLP

Understanding ULMFiT — The Shift Towards Transfer Learning in NLP
An understanding of what ULMFiT is; what it does and how it works; and the radical shift that it caused in NLP
Natural language processing has picked up pace over the last decade and with the with the increasing ease of implementing deep learning, there have been major…

#transfer-learning #deep-learning #nlp #machine-learning

Evolution of NLP : Introduction to Transfer Learning for NLP

This is the third part of a series of posts showing the improvements in NLP modeling approaches. We have seen the use of traditional techniques like Bag of Words, TF-IDF, then moved on to RNNs and LSTMs. This time we’ll look into one of the pivotal shifts in approaching NLP Tasks — Transfer Learning!

The complete code for this tutorial is available at this Kaggle Kernel

ULMFit

The idea of using Transfer Learning is quite new in NLP Tasks, while it has been quite prominently used in Computer Vision tasks! This new way of looking at NLP was first proposed by Howard Jeremy, and has transformed the way we looked at data previously!

The core idea is two-fold — using generative pre-trained Language Model + task-specific fine-tuning was first explored in ULMFiT (Howard & Ruder, 2018), directly motivated by the success of using ImageNet pre-training for computer vision tasks. The base model is AWD-LSTM.

A Language Model is exactly like it sounds — the output of this model is to predict the next word of a sentence. The goal is to have a model that can understand the semantics, grammar, and unique structure of a language.

ULMFit follows three steps to achieve good transfer learning results on downstream language classification tasks:

  1. General Language Model pre-training: on Wikipedia text.
  2. Target task Language Model fine-tuning: ULMFiT proposed two training techniques for stabilizing the fine-tuning process.
  3. Target task classifier fine-tuning: The pretrained LM is augmented with two standard feed-forward layers and a softmax normalization at the end to predict a target label distribution.

Using fast.ai for NLP -

fast.ai’s motto — Making Neural Networks Uncool again — tells you a lot about their approach ;) Implementation of these models is remarkably simple and intuitive, and with good documentation, you can easily find a solution if you get stuck anywhere. Along with this, and a few other reasons I elaborate below, I decided to try out the fast.ai library which is built on top of PyTorch instead of Keras. Despite being used to working in Keras, I didn’t find it difficult to navigate fast.ai and the learning curve is quite fast to implement advanced things as well!

In addition to its simplicity, there are some advantages of using fast.ai’s implementation -

  • Discriminative fine-tuning is motivated by the fact that different layers of LM capture different types of information (see discussion above). ULMFiT proposed to tune each layer with different learning rates, {η1,…,ηℓ,…,ηL}, where η is the base learning rate for the first layer, ηℓ is for the ℓ-th layer and there are L layers in total.

Image for post

Image for post

Weight update for Stochastic Gradient Descent (SGD). ∇θ(ℓ)J(θ) is the gradient of Loss Function with respect to θ(ℓ). η(ℓ) is the learning rate of the ℓ-th layer.

  • Slanted triangular learning rates (STLR) refer to a special learning rate scheduling that first linearly increases the learning rate and then linearly decays it. The increase stage is short so that the model can converge to a parameter space suitable for the task fast, while the decay period is long allowing for better fine-tuning.

Image for post

Learning rate increases till 200th iteration and then slowly decays. Howard, Ruder (2018) — Universal Language Model Fine-tuning for Text Classification

Let’s try to see how well this approach works for our dataset. I would also like to point out that all these ideas and code are available at fast.ai’s free official course for Deep Learning.

#nlp #machine-learning #transfer-learning #deep-learning #deep learning

Nat  Kutch

Nat Kutch

1596615120

 Transfer Learning Using ULMFit

This is the third part of a series of posts showing the improvements in NLP modeling approaches. We have seen the use of traditional techniques like Bag of Words, TF-IDF, then moved on to RNNs and LSTMs. This time we’ll look into one of the pivotal shifts in approaching NLP Tasks — Transfer Learning!

The complete code for this tutorial is available at this Kaggle Kernel

ULMFit

The idea of using Transfer Learning is quite new in NLP Tasks, while it has been quite prominently used in Computer Vision tasks! This new way of looking at NLP was first proposed by Howard Jeremy, and has transformed the way we looked at data previously!

The core idea is two-fold — using generative pre-trained Language Model + task-specific fine-tuning was first explored in ULMFiT (Howard & Ruder, 2018), directly motivated by the success of using ImageNet pre-training for computer vision tasks. The base model is AWD-LSTM.

A Language Model is exactly like it sounds — the output of this model is to predict the next word of a sentence. The goal is to have a model that can understand the semantics, grammar, and unique structure of a language.

ULMFit follows three steps to achieve good transfer learning results on downstream language classification tasks:

  1. General Language Model pre-training: on Wikipedia text.
  2. Target task Language Model fine-tuning: ULMFiT proposed two training techniques for stabilizing the fine-tuning process.
  3. Target task classifier fine-tuning: The pretrained LM is augmented with two standard feed-forward layers and a softmax normalization at the end to predict a target label distribution.

Using fast.ai for NLP -

fast.ai’s motto — Making Neural Networks Uncool again — tells you a lot about their approach ;) Implementation of these models is remarkably simple and intuitive, and with good documentation, you can easily find a solution if you get stuck anywhere. Along with this, and a few other reasons I elaborate below, I decided to try out the fast.ai library which is built on top of PyTorch instead of Keras. Despite being used to working in Keras, I didn’t find it difficult to navigate fast.ai and the learning curve is quite fast to implement advanced things as well!

In addition to its simplicity, there are some advantages of using fast.ai’s implementation -

  • Discriminative fine-tuning is motivated by the fact that different layers of LM capture different types of information (see discussion above). ULMFiT proposed to tune each layer with different learning rates, {η1,…,ηℓ,…,ηL}, where η is the base learning rate for the first layer, ηℓ is for the ℓ-th layer and there are L layers in total.

Image for post
Weight update for Stochastic Gradient Descent (SGD). ∇θ(ℓ)J(θ) is the gradient of Loss Function with respect to θ(ℓ). η(ℓ) is the learning rate of the ℓ-th layer.

  • Slanted triangular learning rates (STLR) refer to a special learning rate scheduling that first linearly increases the learning rate and then linearly decays it. The increase stage is short so that the model can converge to a parameter space suitable for the task fast, while the decay period is long allowing for better fine-tuning.

#nlp #machine-learning #transfer-learning #deep-learning #sentiment-classification #deep learning

8 Open-Source Tools To Start Your NLP Journey

Teaching machines to understand human context can be a daunting task. With the current evolving landscape, Natural Language Processing (NLP) has turned out to be an extraordinary breakthrough with its advancements in semantic and linguistic knowledge. NLP is vastly leveraged by businesses to build customised chatbots and voice assistants using its optical character and speed recognition techniques along with text simplification.

To address the current requirements of NLP, there are many open-source NLP tools, which are free and flexible enough for developers to customise it according to their needs. Not only these tools will help businesses analyse the required information from the unstructured text but also help in dealing with text analysis problems like classification, word ambiguity, sentiment analysis etc.

Here are eight NLP toolkits, in no particular order, that can help any enthusiast start their journey with Natural language Processing.


Also Read: Deep Learning-Based Text Analysis Tools NLP Enthusiasts Can Use To Parse Text

1| Natural Language Toolkit (NLTK)

About: Natural Language Toolkit aka NLTK is an open-source platform primarily used for Python programming which analyses human language. The platform has been trained on more than 50 corpora and lexical resources, including multilingual WordNet. Along with that, NLTK also includes many text processing libraries which can be used for text classification tokenisation, parsing, and semantic reasoning, to name a few. The platform is vastly used by students, linguists, educators as well as researchers to analyse text and make meaning out of it.


#developers corner #learning nlp #natural language processing #natural language processing tools #nlp #nlp career #nlp tools #open source nlp tools #opensource nlp tools