Tia  Gottlieb

Tia Gottlieb

1595195400

Finetuning BERT using ktrain for Disaster Tweets Classification

Problem Statement

Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programmatically monitoring Twitter (i.e. disaster relief organizations and news agencies). However, identifying such tweets has always been a difficult task because of the ambiguity in the linguistic structure of the tweets and hence it is not always clear whether an individual’s words are actually announcing a disaster. For example, if a person tweets:

“On the plus side look at the sky last night, it was ablaze

Image for post

The person explicitly uses the word “ABLAZE” here but means it metaphorically. For a human, it is easier to understand it especially if there is some visual aid provided as well. However, for machines, it is not always clear.Kaggle hosted a challenge named Real or Not whose aim was to use the Twitter data of disaster tweets, originally created by the company figure-eight, to classify Tweets talking about real disaster against the ones talking about it metaphorically.

Approach

BERT (Bidirectional Encoder Representations from Transformers) is a deep learning model developed by Google. Ever since it was open-sourced by Google, it has been adopted by many researchers and industries and has applied in solving many NLP tasks. The model has been able to achieve state of the art performance on most of the problems it has been applied upon.ktrain is a lightweight wrapper for the deep learning library TensorFlow Keras (and other libraries) to help build, train, and deploy neural networks and other machine learning models. Inspired by ML framework extensions like fastai and ludwig, it is designed to make deep learning and AI more accessible and easier to apply for both newcomers and experienced practitioners._ktrain _provides support for applying many pre-trained deep learning architectures in the domain of Natural Language Processing and BERT is one of them. To solve this problem, we will be using the implementation of pre-trained BERT provided by _ktrain _and fine-tune it to classify whether the disaster tweets are real or not.

Solution

The dataset consist of three files:

  1. train.csv
  2. test.csv
  3. sample_submission.csv

The structure of the file train.csv is as follows:

Image for post

  • id - a unique identifier for each tweet
  • text - the text of the tweet
  • location - the location the tweet was sent from (may be blank)
  • keyword - a particular keyword from the tweet (may be blank)
  • target - in train.csv only, this denotes whether a tweet is about a real disaster (1) or not (0)

We are only interested in the text and the **target **column and will be using them for the classification of tweets.

#deep-learning #bert #twitter #supervised-learning #deep learning

What is GEEK

Buddha Community

Finetuning BERT using ktrain for Disaster Tweets Classification
Chloe  Butler

Chloe Butler

1686166020

PyTorch Unsupervised Sentiment Discovery

** DEPRECATED **

This repo has been deprecated. Please visit Megatron-LM for our up to date Large-scale unsupervised pretraining and finetuning code.

If you would still like to use this codebase, see our tagged releases and install required software/dependencies that was available publicly at that date.

PyTorch Unsupervised Sentiment Discovery

This codebase contains pretrained binary sentiment and multimodel emotion classification models as well as code to reproduce results from our series of large scale pretraining + transfer NLP papers: Large Scale Language Modeling: Converging on 40GB of Text in Four Hours and Practical Text Classification With Large Pre-Trained Language Models. This effort was born out of a desire to reproduce, analyze, and scale the Generating Reviews and Discovering Sentiment paper from OpenAI.

The techniques used in this repository are general purpose and our easy to use command line interface can be used to train state of the art classification models on your own difficult classification datasets.

This codebase supports mixed precision training as well as distributed, multi-gpu, multi-node training for language models (support is provided based on the NVIDIA APEx project). In addition to training language models, this codebase can be used to easily transfer and finetune trained models on custom text classification datasets.

For example, a Transformer language model for unsupervised modeling of large text datasets, such as the amazon-review dataset, is implemented in PyTorch. We also support other tokenization methods, such as character or sentencepiece tokenization, and language models using various recurrent architectures.

The learned language model can be transferred to other natural language processing (NLP) tasks where it is used to featurize text samples. The featurizations provide a strong initialization point for discriminative language tasks, and allow for competitive task performance given only a few labeled samples. For example, we consider finetuning our models on the difficult task of multimodal emotion classification based on a subset of the plutchik wheel of emotions.

plutchik fig

Created by Robert Plutchik, this wheel is used to illustrate different emotions in a compelling and nuanced way. He suggested that there are 8 primary bipolar emotions (joy versus sadness, anger versus fear, trust versus disgust, and surprise versus anticipation) with different levels of emotional intensity. For our classification task we utilize tweets from the SemEval2018 Task 1E-c emotion classification dataset to perform multilabel classification of anger, anticipation, disgust, fear, joy, sadness, surprise, and trust. This is a difficult task that suffers from real world classification problems such as class imbalance and labeler disagreement.

semeval results

On the full SemEval emotion classification dataset we find that finetuning our model on the data achieves competitive state of the art performance with no additional domain-specific feature engineering.

semeval leaderboard

Setup

Install

Install the sentiment_discovery package with python3 setup.py install in order to run the modules/scripts within this repo.

Python Requirements

At this time we only support python3.

  • numpy
  • pytorch (>= 0.4.1)
  • pandas
  • scikit-learn
  • matplotlib
  • unidecode
  • sentencepiece
  • seaborn
  • emoji

Pretrained models

We've included our sentencepiece tokenizer model and vocab as a zip file:

We've included a transformer language model base as well as a 4096-d mlstm language model base. For examples on how to use these models please see our finetuning and transfer sections. Even though these models were trained with FP16 they can be used in FP32 training/inference.

We've also included classifiers trained on a subset of SemEval emotions corresponding to the 8 plutchik emotions (anger, anticipation, disgust, fear, joy, sadness, surprise, and trust):

Lastly, we've also included already trained classification models for SST and IMDB binary sentiment classification:

To use classification models that reproduce results from our original large batch language modeling paper please use the following commit hash and set of models.

We did not include pretrained models leveraging ELMo. To reproduce our papers' results with ELMo, please see our available resources.

Each file has a dictionary containing a PyTorch state_dict consisting of a language model (lm_encoder keys) trained on Amazon reviews and a classifier (classifier key) as well as accompanying args necessary to run a model with that state_dict.

Data Downloads

In the ./data folder we've provided processed copies of the Binary Stanford Sentiment Treebank (Binary SST), IMDB Movie Review, and the SemEval2018 Tweet Emotion datasets as part of this repository. In order to train on the amazon dataset please download the "aggressively deduplicated data" version from Julian McAuley's original site. Access requests to the dataset should be approved instantly. While using the dataset make sure to load it with the --loose-json flag.

Usage

In addition to providing easily reusable code of the core functionalities (models, distributed, fp16, etc.) of this work, we also provide scripts to perform the high-level functionalities of the original paper:

  • sentiment classification of input text
  • unsupervised reconstruction/language modeling of a corpus of text (+ script for launching distributed workers)
  • transfer of learned language model to perform sentiment analysis on a specified corpus
  • sampling from language model to generate text (possibly of fixed sentiment) + heatmap visualization of sentiment in text

Classifying text

Classify an input csv/json using one of our pretrained models or your own. Performs classification on Binary SST by default. Output classification probabilities are saved to a .npy file

python3 run_classifier.py --load_model ama_sst.pt                               # classify Binary SST
python3 run_classifier.py --load_model ama_sst_16.pt --fp16                     # run classification in fp16
python3 run_classifier.py --load_model ama_sst.pt --text-key <text-column> --data <path.csv>     # classify your own dataset

See here for more documentation.

Training Language Models (+ Distributed/FP16 Training)

Train a language model on a csv/json corpus. By default we train a weight-normalized, 4096-d mLSTM, with a 64-d character embedding. This is the first step of a 2-step process to training your own sentiment classifier. Saves model to lang_model.pt by default.

python3 pretrain.py                                                               #train a large model on imdb
python3 pretrain.py --model LSTM --nhid 512                                       #train a small LSTM instead
python3 pretrain.py --fp16 --dynamic-loss-scale                                   #train a model with fp16
python3 -m multiproc pretrain.py                                                  #distributed model training
python3 pretrain.py --data ./data/amazon/reviews.json --lazy --loose-json \       #train a model on amazon data
  --text-key reviewText --label-key overall --optim Adam --split 1000,1,1 
python3 pretrain.py --tokenizer-type SentencePieceTokenizer --vocab-size 32000 \  #train a model with our sentencepiece tokenization
  --tokenizer-type bpe --tokenizer-path ama_32k_tokenizer.model 
python3 pretrain.py --tokenizer-type SentencePieceTokenizer --vocab-size 32000 \  #train a transformer model with our sentencepiece tokenization
  --tokenizer-type bpe --tokenizer-path ama_32k_tokenizer.model --model transformer \
  --decoder-layers 12 --decoder-embed-dim 768 --decoder-ffn-embed-dim 3072 \
  --decoder-learned-pos --decoder-attention-heads 8
bash ./experiments/train_mlstm_singlenode.sh                                      #run our mLSTM training script on 1 DGX-1V
bash ./experiments/train_transformer_singlenode.sh                                #run our transformer training script on 1 DGX-1V 

For more documentation of our language modeling functionality look here

In order to learn about our language modeling experiments and reproduce results see the training reproduction section in analysis.

For information about how we achieve numerical stability with FP16 training see our fp16 training analysis.

Sentiment Transfer

Given a trained language model, this script will featurize text from train, val, and test csv/json's. It then uses sklearn logistic regression to fit a classifier to predict sentiment from these features. Lastly it performs feature selection to try and fit a regression model to the top n most relevant neurons (features). By default only one neuron is used for this second regression.

python3 transfer.py --load mlstm.pt                                 #performs transfer to SST, saves results to `<model>_transfer/` directory
python3 transfer.py --load mlstm.pt --neurons 5                     #use 5 neurons for the second regression
python3 transfer.py --load mlstm.pt --fp16                          #run model in fp16 for featurization step
bash ./experiments/run_sk_sst.sh                                    #run transfer learning with mlstm on imdb dataset
bash ./experiments/run_sk_imdb.sh                                   #run transfer learning with mlstm on sst dataset

Additional documentation of the command line arguments available for transfer can be found here

Classifier Finetuning

Given a trained language model and classification dataset, this script will build a classifier that leverages the trained language model as a text feature encoder. The difference between this script and transfer.py is that the model training is performed end to end: the loss from the classifier is backpropagated into the language model encoder as well. This script allows one to build more complex classification models, metrics, and loss functions than transfer.py. This script supports building arbitrary multilable, multilayer, and multihead perceptron classifiers. Additionally it allows using language modeling as an auxiliary task loss during training and multihead variance as an auxiliary loss during training. Lastly this script supports automatically selecting classification thresholds from validation performance. To measure validation performance this script includes more complex metrics including: f1-score, mathew correlation coefficient, jaccard index, recall, precision, and accuracy.

python3 finetune_classifier.py --load mlstm.pt --lr 2e-5 --aux-lm-loss --aux-lm-loss-weight .02   #finetune mLSTM model on sst (default dataset) with auxiliary loss
python3 finetune_classifier.py --load mlstm.pt --automatic-thresholding --threshold-metric f1     #finetune mLSTM model on sst and automatically select classification thresholds based on the validation f1 score
python3 finetune_classifier.py --tokenizer-type SentencePieceTokenizer --vocab-size 32000 \       #finetune transformer with sentencepiece on SST
  --tokenizer-type bpe tokenizer-path ama_32k_tokenizer.model --model transformer --lr 2e-5 \
  --decoder-layers 12 --decoder-embed-dim 768 --decoder-ffn-embed-dim 3072 \
  --decoder-learned-pos --decoder-attention-heads 8 --load transformer.pt --use-final-embed
python3 finetune_classifier.py --automatic-thresholding --non-binary-cols l1 l2 l3 --lr 2e-5\     #finetune multilayer classifier with 3 classes and 4 heads per class on some custom dataset and automatically select classfication thresholds
  --classifier-hidden-layers 2048 1024 3 --heads-per-class 4 --aux-head-variance-loss-weight 1.   #`aux-head-variance-loss-weight` is an auxiliary loss to increase the variance between each of the 4 head's weights
  --data <custom_train>.csv --val <custom_val>.csv --test <custom_test>.csv --load mlstm.pt
bash ./experiments/se_transformer_multihead.sh                                                    #finetune a multihead transformer on 8 semeval categories

See how to reproduce our finetuning experiments in the finetuning reproduction section of analysis.

Additional documentation of the command line arguments available for finetune_classifier.py can be found here

Analysis

Acknowledgement

A special thanks to our amazing summer intern Neel Kant for all the work he did with transformers, tokenization, and pretraining+finetuning classification models.

A special thanks to @csarofeen and @Michael Carilli for their help developing and documenting our RNN interface, Distributed Data Parallel model, and fp16 optimizer. The latest versions of these utilities can be found at the APEx github page.

Thanks to @guillitte for providing a lightweight pytorch port of openai's sentiment-neuron repo.

This project uses the amazon review dataset collected by J. McAuley

Thanks

Want to help out? Open up an issue with questions/suggestions or pull requests ranging from minor fixes to new functionality.

May your learning be Deep and Unsupervised.


Download details:

Author: NVIDIA
Source: https://github.com/NVIDIA/sentiment-discovery

License: View license

#pytorch 

Tia  Gottlieb

Tia Gottlieb

1595195400

Finetuning BERT using ktrain for Disaster Tweets Classification

Problem Statement

Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programmatically monitoring Twitter (i.e. disaster relief organizations and news agencies). However, identifying such tweets has always been a difficult task because of the ambiguity in the linguistic structure of the tweets and hence it is not always clear whether an individual’s words are actually announcing a disaster. For example, if a person tweets:

“On the plus side look at the sky last night, it was ablaze

Image for post

The person explicitly uses the word “ABLAZE” here but means it metaphorically. For a human, it is easier to understand it especially if there is some visual aid provided as well. However, for machines, it is not always clear.Kaggle hosted a challenge named Real or Not whose aim was to use the Twitter data of disaster tweets, originally created by the company figure-eight, to classify Tweets talking about real disaster against the ones talking about it metaphorically.

Approach

BERT (Bidirectional Encoder Representations from Transformers) is a deep learning model developed by Google. Ever since it was open-sourced by Google, it has been adopted by many researchers and industries and has applied in solving many NLP tasks. The model has been able to achieve state of the art performance on most of the problems it has been applied upon.ktrain is a lightweight wrapper for the deep learning library TensorFlow Keras (and other libraries) to help build, train, and deploy neural networks and other machine learning models. Inspired by ML framework extensions like fastai and ludwig, it is designed to make deep learning and AI more accessible and easier to apply for both newcomers and experienced practitioners._ktrain _provides support for applying many pre-trained deep learning architectures in the domain of Natural Language Processing and BERT is one of them. To solve this problem, we will be using the implementation of pre-trained BERT provided by _ktrain _and fine-tune it to classify whether the disaster tweets are real or not.

Solution

The dataset consist of three files:

  1. train.csv
  2. test.csv
  3. sample_submission.csv

The structure of the file train.csv is as follows:

Image for post

  • id - a unique identifier for each tweet
  • text - the text of the tweet
  • location - the location the tweet was sent from (may be blank)
  • keyword - a particular keyword from the tweet (may be blank)
  • target - in train.csv only, this denotes whether a tweet is about a real disaster (1) or not (0)

We are only interested in the text and the **target **column and will be using them for the classification of tweets.

#deep-learning #bert #twitter #supervised-learning #deep learning

Why Use WordPress? What Can You Do With WordPress?

Can you use WordPress for anything other than blogging? To your surprise, yes. WordPress is more than just a blogging tool, and it has helped thousands of websites and web applications to thrive. The use of WordPress powers around 40% of online projects, and today in our blog, we would visit some amazing uses of WordPress other than blogging.
What Is The Use Of WordPress?

WordPress is the most popular website platform in the world. It is the first choice of businesses that want to set a feature-rich and dynamic Content Management System. So, if you ask what WordPress is used for, the answer is – everything. It is a super-flexible, feature-rich and secure platform that offers everything to build unique websites and applications. Let’s start knowing them:

1. Multiple Websites Under A Single Installation
WordPress Multisite allows you to develop multiple sites from a single WordPress installation. You can download WordPress and start building websites you want to launch under a single server. Literally speaking, you can handle hundreds of sites from one single dashboard, which now needs applause.
It is a highly efficient platform that allows you to easily run several websites under the same login credentials. One of the best things about WordPress is the themes it has to offer. You can simply download them and plugin for various sites and save space on sites without losing their speed.

2. WordPress Social Network
WordPress can be used for high-end projects such as Social Media Network. If you don’t have the money and patience to hire a coder and invest months in building a feature-rich social media site, go for WordPress. It is one of the most amazing uses of WordPress. Its stunning CMS is unbeatable. And you can build sites as good as Facebook or Reddit etc. It can just make the process a lot easier.
To set up a social media network, you would have to download a WordPress Plugin called BuddyPress. It would allow you to connect a community page with ease and would provide all the necessary features of a community or social media. It has direct messaging, activity stream, user groups, extended profiles, and so much more. You just have to download and configure it.
If BuddyPress doesn’t meet all your needs, don’t give up on your dreams. You can try out WP Symposium or PeepSo. There are also several themes you can use to build a social network.

3. Create A Forum For Your Brand’s Community
Communities are very important for your business. They help you stay in constant connection with your users and consumers. And allow you to turn them into a loyal customer base. Meanwhile, there are many good technologies that can be used for building a community page – the good old WordPress is still the best.
It is the best community development technology. If you want to build your online community, you need to consider all the amazing features you get with WordPress. Plugins such as BB Press is an open-source, template-driven PHP/ MySQL forum software. It is very simple and doesn’t hamper the experience of the website.
Other tools such as wpFoRo and Asgaros Forum are equally good for creating a community blog. They are lightweight tools that are easy to manage and integrate with your WordPress site easily. However, there is only one tiny problem; you need to have some technical knowledge to build a WordPress Community blog page.

4. Shortcodes
Since we gave you a problem in the previous section, we would also give you a perfect solution for it. You might not know to code, but you have shortcodes. Shortcodes help you execute functions without having to code. It is an easy way to build an amazing website, add new features, customize plugins easily. They are short lines of code, and rather than memorizing multiple lines; you can have zero technical knowledge and start building a feature-rich website or application.
There are also plugins like Shortcoder, Shortcodes Ultimate, and the Basics available on WordPress that can be used, and you would not even have to remember the shortcodes.

5. Build Online Stores
If you still think about why to use WordPress, use it to build an online store. You can start selling your goods online and start selling. It is an affordable technology that helps you build a feature-rich eCommerce store with WordPress.
WooCommerce is an extension of WordPress and is one of the most used eCommerce solutions. WooCommerce holds a 28% share of the global market and is one of the best ways to set up an online store. It allows you to build user-friendly and professional online stores and has thousands of free and paid extensions. Moreover as an open-source platform, and you don’t have to pay for the license.
Apart from WooCommerce, there are Easy Digital Downloads, iThemes Exchange, Shopify eCommerce plugin, and so much more available.

6. Security Features
WordPress takes security very seriously. It offers tons of external solutions that help you in safeguarding your WordPress site. While there is no way to ensure 100% security, it provides regular updates with security patches and provides several plugins to help with backups, two-factor authorization, and more.
By choosing hosting providers like WP Engine, you can improve the security of the website. It helps in threat detection, manage patching and updates, and internal security audits for the customers, and so much more.

Read More

#use of wordpress #use wordpress for business website #use wordpress for website #what is use of wordpress #why use wordpress #why use wordpress to build a website

Noah  Rowe

Noah Rowe

1596681180

Multi Class Text Classification With Deep Learning Using BERT

Most of the researchers submit their research papers to academic conference because its a faster way of making the results available. Finding and selecting a suitable conference has always been challenging especially for young researchers.

However, based on the previous conferences proceeding data, the researchers can increase their chances of paper acceptance and publication. We will try to solve this text classification problem with deep learning using BERT.

Almost all the code were taken from this tutorial, the only difference is the data.

The Data

The dataset contains 2,507 research paper titles, and have been manually classified into 5 categories (i.e. conferences) that can be downloaded from here.

Explore and Preprocess

import torch
	from tqdm.notebook import tqdm

	from transformers import BertTokenizer
	from torch.utils.data import TensorDataset

	from transformers import BertForSequenceClassification

	df = pd.read_csv('data/title_conference.csv')
	df.head()
view raw
conf_explore.py hosted with ❤ by GitHub

conf_explore.py

Image for post

Table 1

df['Conference'].value_counts()

Image for post

Figure 1

You may have noticed that our classes are imbalanced, and we will address this later on.

#machine-learning #nlp #document-classification #nlp-tutorial #text-classification #deep learning

Vern  Greenholt

Vern Greenholt

1595046000

How to fine-tune BERT on text classification task?

BERT (Bidirectional Encoder Representations from Transformers) is a transformer-based architecture released in the paper Attention Is All You Need**_” in the year 2016 by Google. The BERT model got published in the year 2019 in the paper — “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. _**When it was released, it showed the state of the art results on GLUE benchmark.

Introduction

First, I will tell a little bit about the Bert architecture, and then will move on to the code on how to use is for the text classification task.

The BERT architecture is a multi-layer bidirectional transformer’s encoder described in the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.

There are two different architecture’s proposed in the paper. **BERT_base **and **BERT_large. **The BERT base architecture has L=12, H=768, A=12 and a total of around 110M parameters. Here L refers to the number of transformer blocks, H refers to the hidden size, A refers to the number of self-attention head. For BERT largeL=24, H=1024, A=16.


BERT: State of the Art NLP Model, Explained

Source:- https://www.kdnuggets.com/2018/12/bert-sota-nlp-model-explained.html

The input format of the BERT is given in the above image. I won’t get into much detail into this. You can refer the above link for a more detailed explanation.

Source Code

The code which I will be following can be cloned from the following HuggingFace’s GitHub repo -

https://github.com/huggingface/transformers/

Scripts to be used

Majorly we will be modifying and using two scripts for our text classification task. One is **_glue.py, _**and the other will be **_run_glue.py. _**The file glue.py path is “_transformers/data/processors/” _and the file run_glue.py can be found in the location “examples/text-classification/”.

#deep-learning #machine-learning #text-classification #bert #nlp #deep learning