Fine-tuning BERT and RoBERTa for high accuracy text classification in PyTorch

As of the time of writing this piece, state-of-the-art results on NLP and NLU tasks are obtained with Transformer models. There is a trend of performance improvement as models become deeper and larger, GPT 3 comes to mind. Training small versions of such models from scratch takes a significant amount of time, even with GPU. This problem can be solved via pre-training when a model is trained on a large text corpus using a high-performance cluster. Later it can be fine-tuned for a specific task in a much shorter amount of time. During fine tuning stage, additional layers can be added to the model for specific tasks, which can be different from those for which the model was initially trained. This technique is related to transfer learning, a concept applied to areas of machine learning beyond NLP (see here and here for a quick intro).

In this post, I would like to share my experience of fine-tuning BERT and RoBERTa, available from the transformers library by Hugging Face, for a document classification task. Both models share a transformer architecture, which consists of at least two distinct blocks — encoder and decoder. Both encoder and decoder consist of multiple layers based around Attention mechanism. Encoder processed the input token sequence into a vector of floating point numbers — a hidden state, which is picked up by the decoder. It is the hidden state that encompasses the information content of the input sequence. This enables to represent an entire sequence of tokens with a single dense vector of float point numbers. Two texts or documents, which have similar meaning are represented by closely aligned vectors. Comparing vectors using a metric of choice, for example, cosine similarity, enables to quantify the similarity of original text pieces.

#machine-learning #nlp #data-science #text-classification #pytorch

What is GEEK

Buddha Community

Fine-tuning BERT and RoBERTa for high accuracy text classification in PyTorch

Fine-tuning BERT and RoBERTa for high accuracy text classification in PyTorch

As of the time of writing this piece, state-of-the-art results on NLP and NLU tasks are obtained with Transformer models. There is a trend of performance improvement as models become deeper and larger, GPT 3 comes to mind. Training small versions of such models from scratch takes a significant amount of time, even with GPU. This problem can be solved via pre-training when a model is trained on a large text corpus using a high-performance cluster. Later it can be fine-tuned for a specific task in a much shorter amount of time. During fine tuning stage, additional layers can be added to the model for specific tasks, which can be different from those for which the model was initially trained. This technique is related to transfer learning, a concept applied to areas of machine learning beyond NLP (see here and here for a quick intro).

In this post, I would like to share my experience of fine-tuning BERT and RoBERTa, available from the transformers library by Hugging Face, for a document classification task. Both models share a transformer architecture, which consists of at least two distinct blocks — encoder and decoder. Both encoder and decoder consist of multiple layers based around Attention mechanism. Encoder processed the input token sequence into a vector of floating point numbers — a hidden state, which is picked up by the decoder. It is the hidden state that encompasses the information content of the input sequence. This enables to represent an entire sequence of tokens with a single dense vector of float point numbers. Two texts or documents, which have similar meaning are represented by closely aligned vectors. Comparing vectors using a metric of choice, for example, cosine similarity, enables to quantify the similarity of original text pieces.

#machine-learning #nlp #data-science #text-classification #pytorch

Navigating Between DOM Nodes in JavaScript

In the previous chapters you've learnt how to select individual elements on a web page. But there are many occasions where you need to access a child, parent or ancestor element. See the JavaScript DOM nodes chapter to understand the logical relationships between the nodes in a DOM tree.

DOM node provides several properties and methods that allow you to navigate or traverse through the tree structure of the DOM and make changes very easily. In the following section we will learn how to navigate up, down, and sideways in the DOM tree using JavaScript.

Accessing the Child Nodes

You can use the firstChild and lastChild properties of the DOM node to access the first and last direct child node of a node, respectively. If the node doesn't have any child element, it returns null.

Example

<div id="main">
    <h1 id="title">My Heading</h1>
    <p id="hint"><span>This is some text.</span></p>
</div>

<script>
var main = document.getElementById("main");
console.log(main.firstChild.nodeName); // Prints: #text

var hint = document.getElementById("hint");
console.log(hint.firstChild.nodeName); // Prints: SPAN
</script>

Note: The nodeName is a read-only property that returns the name of the current node as a string. For example, it returns the tag name for element node, #text for text node, #comment for comment node, #document for document node, and so on.

If you notice the above example, the nodeName of the first-child node of the main DIV element returns #text instead of H1. Because, whitespace such as spaces, tabs, newlines, etc. are valid characters and they form #text nodes and become a part of the DOM tree. Therefore, since the <div> tag contains a newline before the <h1> tag, so it will create a #text node.

To avoid the issue with firstChild and lastChild returning #text or #comment nodes, you could alternatively use the firstElementChild and lastElementChild properties to return only the first and last element node, respectively. But, it will not work in IE 9 and earlier.

Example

<div id="main">
    <h1 id="title">My Heading</h1>
    <p id="hint"><span>This is some text.</span></p>
</div>

<script>
var main = document.getElementById("main");
alert(main.firstElementChild.nodeName); // Outputs: H1
main.firstElementChild.style.color = "red";

var hint = document.getElementById("hint");
alert(hint.firstElementChild.nodeName); // Outputs: SPAN
hint.firstElementChild.style.color = "blue";
</script>

Similarly, you can use the childNodes property to access all child nodes of a given element, where the first child node is assigned index 0. Here's an example:

Example

<div id="main">
    <h1 id="title">My Heading</h1>
    <p id="hint"><span>This is some text.</span></p>
</div>

<script>
var main = document.getElementById("main");

// First check that the element has child nodes 
if(main.hasChildNodes()) {
    var nodes = main.childNodes;
    
    // Loop through node list and display node name
    for(var i = 0; i < nodes.length; i++) {
        alert(nodes[i].nodeName);
    }
}
</script>

The childNodes returns all child nodes, including non-element nodes like text and comment nodes. To get a collection of only elements, use children property instead.

Example

<div id="main">
    <h1 id="title">My Heading</h1>
    <p id="hint"><span>This is some text.</span></p>
</div>

<script>
var main = document.getElementById("main");

// First check that the element has child nodes 
if(main.hasChildNodes()) {
    var nodes = main.children;
    
    // Loop through node list and display node name
    for(var i = 0; i < nodes.length; i++) {
        alert(nodes[i].nodeName);
    }
}
</script>

#javascript 

Vern  Greenholt

Vern Greenholt

1595046000

How to fine-tune BERT on text classification task?

BERT (Bidirectional Encoder Representations from Transformers) is a transformer-based architecture released in the paper Attention Is All You Need**_” in the year 2016 by Google. The BERT model got published in the year 2019 in the paper — “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. _**When it was released, it showed the state of the art results on GLUE benchmark.

Introduction

First, I will tell a little bit about the Bert architecture, and then will move on to the code on how to use is for the text classification task.

The BERT architecture is a multi-layer bidirectional transformer’s encoder described in the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.

There are two different architecture’s proposed in the paper. **BERT_base **and **BERT_large. **The BERT base architecture has L=12, H=768, A=12 and a total of around 110M parameters. Here L refers to the number of transformer blocks, H refers to the hidden size, A refers to the number of self-attention head. For BERT largeL=24, H=1024, A=16.


BERT: State of the Art NLP Model, Explained

Source:- https://www.kdnuggets.com/2018/12/bert-sota-nlp-model-explained.html

The input format of the BERT is given in the above image. I won’t get into much detail into this. You can refer the above link for a more detailed explanation.

Source Code

The code which I will be following can be cloned from the following HuggingFace’s GitHub repo -

https://github.com/huggingface/transformers/

Scripts to be used

Majorly we will be modifying and using two scripts for our text classification task. One is **_glue.py, _**and the other will be **_run_glue.py. _**The file glue.py path is “_transformers/data/processors/” _and the file run_glue.py can be found in the location “examples/text-classification/”.

#deep-learning #machine-learning #text-classification #bert #nlp #deep learning

Alec  Nikolaus

Alec Nikolaus

1596635640

Text Classification With PyTorch

A baseline model with LSTMs

The question remains open: how to learn semantics? what is semantics? would DL-based models be capable to learn semantics?

Introduction

The aim of this blog is to explain how to build a text classifier based on LSTMs as well as how it is built by using the PyTorch framework.

I would like to start with the following question: how to classify a text? Several approaches have been proposed from different viewpoints under different premises, but what is the most suitable one?. It’s interesting to pause for a moment and question ourselves: how we as humans can classify a text?what do our brains take into account to be able to classify a text?. Such questions are complex to be answered.

Currently, we have access to a set of different text types such as emails, movie reviews, social media, books, etc. In this sense, the text classification problem would be determined by what’s intended to be classified (e.g. _is it intended to classify the polarity of given text? Is it intended to classify a set of movie reviews by category? Is it intended to classify a set of texts by topic? _). In this regard, the problem of text classification is categorized most of the time under the following tasks:

  • Sentiment analysis
  • News categorization
  • Topic analysis
  • Question answering
  • Natural language inference

In order to go deeper into this hot topic, I really recommend to take a look at this paper: Deep Learning Based Text Classification: A Comprehensive Review.

Methodology

The two keys in this model are: tokenization and recurrent neural nets. Tokenization refers to the process of splitting a text into a set of sentences or words (i.e. tokens). In this regard, tokenization techniques can be applied at sequence-level or word-level. In order to understand the bases of tokenization you can take a look at: Introduction to Information Retrieval.

In the other hand, RNNs (Recurrent Neural Networks) are a kind of neural network which are well-known to work well on sequential data, such as the case of text data. In this case, it’s been implemented a special kind of RNN which is LSTMs (Long-Short Term Memory). LSTMs are one of the _improved _versions of RNNs, essentially LSTMs have shown a better performance working with _longer sentences. _In order to go deeper about what RNNs and LSTMs are, you can take a look at: Understanding LSTMs Networks.

Since the idea of this blog is to present a baseline model for text classification, the text preprocessing phase is based on the tokenization technique, meaning that each _text sentence _will be tokenized, then each _token _will be transformed into its index-based representation. Then, each _token sentence based indexes _will be passed sequentially through an embedding layer, this embedding layer will output an embedded representation of each token whose are passed through a two-stacked LSTM neural net, then the last LSTM’s hidden state will be passed through a two-linear layer neural net which outputs a single value filtered by a sigmoid activation function. The following image describes the model architecture:

Image for post

#pytorch #text-mining #lstm #text-classification #python

Hana Juali

Hana Juali

1592132880

How to BERT Text Classification using Pytorch

Text classification is one of the most common tasks in NLP. It is applied in a wide variety of applications, including sentiment analysis, spam filtering, news categorization, etc. Here, we show you how you can detect fake news (classifying an article as REAL or FAKE) using the state-of-the-art models, a tutorial that can be extended to really any text classification task.

The Transformer is the basic building block of most current state-of-the-art architectures of NLP. Its primary advantage is its multi-head attention mechanisms which allow for an increase in performance and significantly more parallelization than previous competing models such as recurrent neural networks. In this tutorial, we will use pre-trained BERT, one of the most popular transformer models, and fine-tune it on fake news detection.

The main source code of this article is available in this Google Colab Notebook.

The preprocessing code is also available in this Google Colab Notebook.

#python #text-classification #pytorch #deep-learning #nlp