Semi-supervised Intent Classification with GAN-BERT

Is it possible to do text-classification with 150 target classes using only 10 labelled samples for each class but still get a good performance?

Starting from that simple question, I start to do research in order to answer that question. After spending several hours, I ended up with GAN-BERT. What is GAN-BERT? What experiment that I did using GAN-BERT? In this article, I will try to give a brief introduction of GAN-BERT and also the implementation of it for Intent Classification using CLINC150 Dataset.


In Natural Language Processing (NLP) field, BERT or Bidirectional Encoder Representations from Transformers is a well-known technique based on Transformers architecture to do a wide range of tasks, including text classification. However, this technique can perform well when there is ‘enough’ labelled training data to be exploited while obtaining labelled data is time-consuming and a costly process. The potential solution for this is using a semi-supervised learning approach.

Semi-supervised learning is an approach in machine learning field which combines both labelled and unlabelled data during training. The goal is the same as the supervised learning approach, that is to predict the target variable given the data with several features. This approach is crucial when we have not so many labelled data while our model needs a lot of training data to perform well.

Recently in July 2020, a paper named “GAN-BERT: Generative Adversarial Learning for Robust Text Classification with a Bunch of Labeled Examples”, try to extend the fine-tuning of BERT-like architectures with unlabeled data in a generative adversarial setting. In high-level, they try to enrich the BERT fine-tuning process with an SS-GAN (Semi-supervised GAN) perspective.

“In this paper, we extend the BERT training with unlabeled data in a generative adversarial setting. In particular, we enrich the BERT fine-tuning process with an SS-GAN perspective, in the so-called GAN-BERT model”

#nlp #naturallanguageprocessing #data-science #text-classification #machine-learning #deep learning

What is GEEK

Buddha Community

Semi-supervised Intent Classification with GAN-BERT

Semi-supervised Intent Classification with GAN-BERT

Is it possible to do text-classification with 150 target classes using only 10 labelled samples for each class but still get a good performance?

Starting from that simple question, I start to do research in order to answer that question. After spending several hours, I ended up with GAN-BERT. What is GAN-BERT? What experiment that I did using GAN-BERT? In this article, I will try to give a brief introduction of GAN-BERT and also the implementation of it for Intent Classification using CLINC150 Dataset.


In Natural Language Processing (NLP) field, BERT or Bidirectional Encoder Representations from Transformers is a well-known technique based on Transformers architecture to do a wide range of tasks, including text classification. However, this technique can perform well when there is ‘enough’ labelled training data to be exploited while obtaining labelled data is time-consuming and a costly process. The potential solution for this is using a semi-supervised learning approach.

Semi-supervised learning is an approach in machine learning field which combines both labelled and unlabelled data during training. The goal is the same as the supervised learning approach, that is to predict the target variable given the data with several features. This approach is crucial when we have not so many labelled data while our model needs a lot of training data to perform well.

Recently in July 2020, a paper named “GAN-BERT: Generative Adversarial Learning for Robust Text Classification with a Bunch of Labeled Examples”, try to extend the fine-tuning of BERT-like architectures with unlabeled data in a generative adversarial setting. In high-level, they try to enrich the BERT fine-tuning process with an SS-GAN (Semi-supervised GAN) perspective.

“In this paper, we extend the BERT training with unlabeled data in a generative adversarial setting. In particular, we enrich the BERT fine-tuning process with an SS-GAN perspective, in the so-called GAN-BERT model”

#nlp #naturallanguageprocessing #data-science #text-classification #machine-learning #deep learning

Obie  Rowe

Obie Rowe

1599019380

A Gentle Introduction to Self-Training and Semi-Supervised Learning

When it comes to machine learning classification tasks, the more data available to train algorithms, the better. In supervised learning, this data must be labeled with respect to the target class — otherwise, these algorithms wouldn’t be able to learn the relationships between the independent and target variables. However, there are a couple of issues that arise when building large, labeled data sets for classification:

  1. Labeling data can be time-consuming. Let’s say we have 1,000,000 dog images that we want to feed to a classification algorithm, with the goal of predicting whether each image contains a Boston Terrier. If we want to use all of those images for a supervised classification task, we need a human to look at each image and determine whether a Boston Terrier is present. While I do have friends (and a wife) who wouldn’t mind scrolling through dog pictures all day, it probably isn’t how most of us want to spend our weekend.
  2. **Labeling data can be expensive. **See reason 1: to get someone to painstakingly scour 1,000,000 dog pictures, we’re probably going to have to shell out some cash.

So, what if we only have enough time and money to label some of a large data set, and choose to leave the rest unlabeled? Can this unlabeled data somehow be used in a classification algorithm?

This is where semi-supervised learning comes in. In taking a semi-supervised approach, we can train a classifier on the small amount of labeled data, and then use the classifier to make predictions on the unlabeled data. Since these predictions are likely better than random guessing, the unlabeled data predictions can be adopted as ‘pseudo-labels’ in subsequent iterations of the classifier. While there are many flavors of semi-supervised learning, this specific technique is called self-training.

Self-Training

Image for post

Self-Training

On a conceptual level, self-training works like this:

Step 1: Split the labeled data instances into train and test sets. Then, train a classification algorithm on the labeled training data.

**Step 2: **Use the trained classifier to predict class labels for all of the unlabeled data instances. Of these predicted class labels, the ones with the highest probability of being correct are adopted as ‘pseudo-labels’.

(A couple of variations on Step 2: a) All of the predicted labels can be adopted as ‘pseudo-labels’ at once, without considering probability, or b) The ‘pseudo-labeled’ data can be weighted by confidence in the prediction.)

**Step 3: **Concatenate the ‘pseudo-labeled’ data with the labeled training data. Re-train the classifier on the combined ‘pseudo-labeled’ and labeled training data.

Step 4: Use the trained classifier to predict class labels for the labeled test data instances. Evaluate classifier performance using your metric(s) of choice.

(Steps 1 through 4 can be repeated until no more predicted class labels from Step 2 meet a specific probability threshold, or until no more unlabeled data remains.)

Ok, got it? Good! Let’s work through an example.

Example: Using Self-Training to Improve a Classifier

To demonstrate self-training, I’m using Python and the surgical_deepnet data set, available here on Kaggle. This data set is intended to be used for binary classification, and contains data for 14.6k+ surgeries. The attributes are measurements like bmi, age, and a variety of others, while the target variable, complication, records whether the patient suffered complications as a result of surgery. Clearly, being able to accurately predict whether a patient will suffer complications from a surgery would be in the best interest of healthcare and insurance providers alike.

#semi-supervised-learning #machine-learning #python #data-science #classification

6 GAN Architectures Every Data Scientist Should Know

Generative Adversarial Networks (GANs) were first introduced in 2014 by Ian Goodfellow et. al. and since then this topic itself opened up a new area of research.

Within a few years, the research community came up with plenty of papers on this topic some of which have very interesting names :). You have CycleGAN, followed by BiCycleGAN, followed by ReCycleGAN and so on.

With the invention of GANs, Generative Models had started showing promising results in generating realistic images. GANs has shown tremendous success in Computer Vision. In recent times, it started showing promising results in Audio, Text as well.

Some of the most popular GAN formulations are:

  • Transforming an image from one domain to another(CycleGAN),
  • Generating an image from a textual description (text-to-image),
  • Generating very high-resolution images (ProgressiveGAN) and many more.

In this article, we will talk about some of the most popular GAN architectures, particularly 6 architectures that you should know to have a diverse coverage on Generative Adversarial Networks (GANs).

Namely:

  • CycleGAN
  • StyleGAN
  • pixelRNN
  • text-2-image
  • DiscoGAN
  • lsGAN

#machine-learning #deep-learning #data-science #gan-algorithm #gans #gan

Vern  Greenholt

Vern Greenholt

1595046000

How to fine-tune BERT on text classification task?

BERT (Bidirectional Encoder Representations from Transformers) is a transformer-based architecture released in the paper Attention Is All You Need**_” in the year 2016 by Google. The BERT model got published in the year 2019 in the paper — “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. _**When it was released, it showed the state of the art results on GLUE benchmark.

Introduction

First, I will tell a little bit about the Bert architecture, and then will move on to the code on how to use is for the text classification task.

The BERT architecture is a multi-layer bidirectional transformer’s encoder described in the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.

There are two different architecture’s proposed in the paper. **BERT_base **and **BERT_large. **The BERT base architecture has L=12, H=768, A=12 and a total of around 110M parameters. Here L refers to the number of transformer blocks, H refers to the hidden size, A refers to the number of self-attention head. For BERT largeL=24, H=1024, A=16.


BERT: State of the Art NLP Model, Explained

Source:- https://www.kdnuggets.com/2018/12/bert-sota-nlp-model-explained.html

The input format of the BERT is given in the above image. I won’t get into much detail into this. You can refer the above link for a more detailed explanation.

Source Code

The code which I will be following can be cloned from the following HuggingFace’s GitHub repo -

https://github.com/huggingface/transformers/

Scripts to be used

Majorly we will be modifying and using two scripts for our text classification task. One is **_glue.py, _**and the other will be **_run_glue.py. _**The file glue.py path is “_transformers/data/processors/” _and the file run_glue.py can be found in the location “examples/text-classification/”.

#deep-learning #machine-learning #text-classification #bert #nlp #deep learning

Tyshawn  Braun

Tyshawn Braun

1602788400

Basics of Supervised Learning (Classification)

In this post, we are going to dive into the concepts of Supervised Learning or rather known as Classification in the domain of Machine Learning. We will discuss the definitions, components, examples of classification.

Classification can be defined as the task of learning a target function **f**that maps each attribute set **x**to one of the predefined labels y.

**Example: **Assigning a piece of news to one of the predefined categories.

In the community of Data Science or Machine Learning, anything done on data is called **modelling. **In context of classification, there are two types of modelling:

  1. Descriptive Modelling: A classification model can serve as an explanatory tool to distinguish between objects of different classes. **Example: **A model that defines the type of vertebrae based on its features.
  2. _Predictive Modelling: _A classification model can also be used to predict the class label of unknown records.

Classification techniques are most suited for predicting or describing data sets with binary or nominal categories. They are less effective for ordinal categories (e.g.,to classify a person as a member of high-, medium-, or low- income group) because they do not consider the implicit order among the categories.

#data-science #supervised-learning #computer-science #classification #machine-learning