Angela  Dickens

Angela Dickens

1595106000

BERT: Pre-Training of Transformers for Language Understanding

Pre-training Language Models has taken over a majority of tasks in NLP. The 2017 paper, “Attention Is All You Need”, which proposed the Transformer architecture, changed the course of NLP. Based on that, several architectures like BERT, Open AI GPT evolved by leveraging self-supervised learning.

In this article, we discuss BERT : Bidirectional Encoder Representations from Transformers; which was proposed by Google AI in the paper, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. This is one of the groundbreaking models that has achieved the state of the art in many downstream tasks and is widely used.

Overview

Image for post

BERT Pre-training and Fine-Tuning Tasks from the Paper (We will cover the architecture and specifications in the coming sections. Just observe that the same architecture is transferred for the fine-tuning tasks with minimal changes in the parameters).

BERT leverages a fine-tuning based approach for applying pre-trained language models; i.e. a common architecture is trained for a relatively generic task, and then, it is fine-tuned on specific downstream tasks that are more or less similar to the pre-training task.

To achieve this, the BERT paper proposes 2 pre-training tasks:

  1. Masked Language Modeling (MLM)
  2. Next Sentence Prediction (NSP)

and fine-tuning on downstream tasks such as:

  1. Sequence Classification
  2. Named Entity Recognition (NER)
  3. Natural Language Inference (NLI) or Textual Entailment
  4. Grounded Common Sense Inference
  5. Question Answering (QnA)

We will discuss these in-depth in the coming sections of this article.

BERT Architecture

_BERT’s model architecture is a multi-layer bidirectional Transformer encoder based on the original implementation in _Vaswani et al.

_— _BERT Paper

I have already covered the Transformer architecture in this post. Consider giving it a read if you are interested in knowing about the Transformer.

To elaborate on the BERT-specific architecture, we will compare the encoder and the decoder of the Transformer:

  • **The Transformer Encoder **is essentially a Bidirectional Self-Attentive Model, that uses all the tokens in a sequence to attend each token in that sequence

i.e. for a given word, the attention is computed using all the words in the sentence and not just the words preceding the given word in one of the left-to-right or right-to-left traversal order.

#bert #nlp #artificial-intelligence #deep learning

What is GEEK

Buddha Community

BERT: Pre-Training of Transformers for Language Understanding
Chloe  Butler

Chloe Butler

1686166020

PyTorch Unsupervised Sentiment Discovery

** DEPRECATED **

This repo has been deprecated. Please visit Megatron-LM for our up to date Large-scale unsupervised pretraining and finetuning code.

If you would still like to use this codebase, see our tagged releases and install required software/dependencies that was available publicly at that date.

PyTorch Unsupervised Sentiment Discovery

This codebase contains pretrained binary sentiment and multimodel emotion classification models as well as code to reproduce results from our series of large scale pretraining + transfer NLP papers: Large Scale Language Modeling: Converging on 40GB of Text in Four Hours and Practical Text Classification With Large Pre-Trained Language Models. This effort was born out of a desire to reproduce, analyze, and scale the Generating Reviews and Discovering Sentiment paper from OpenAI.

The techniques used in this repository are general purpose and our easy to use command line interface can be used to train state of the art classification models on your own difficult classification datasets.

This codebase supports mixed precision training as well as distributed, multi-gpu, multi-node training for language models (support is provided based on the NVIDIA APEx project). In addition to training language models, this codebase can be used to easily transfer and finetune trained models on custom text classification datasets.

For example, a Transformer language model for unsupervised modeling of large text datasets, such as the amazon-review dataset, is implemented in PyTorch. We also support other tokenization methods, such as character or sentencepiece tokenization, and language models using various recurrent architectures.

The learned language model can be transferred to other natural language processing (NLP) tasks where it is used to featurize text samples. The featurizations provide a strong initialization point for discriminative language tasks, and allow for competitive task performance given only a few labeled samples. For example, we consider finetuning our models on the difficult task of multimodal emotion classification based on a subset of the plutchik wheel of emotions.

plutchik fig

Created by Robert Plutchik, this wheel is used to illustrate different emotions in a compelling and nuanced way. He suggested that there are 8 primary bipolar emotions (joy versus sadness, anger versus fear, trust versus disgust, and surprise versus anticipation) with different levels of emotional intensity. For our classification task we utilize tweets from the SemEval2018 Task 1E-c emotion classification dataset to perform multilabel classification of anger, anticipation, disgust, fear, joy, sadness, surprise, and trust. This is a difficult task that suffers from real world classification problems such as class imbalance and labeler disagreement.

semeval results

On the full SemEval emotion classification dataset we find that finetuning our model on the data achieves competitive state of the art performance with no additional domain-specific feature engineering.

semeval leaderboard

Setup

Install

Install the sentiment_discovery package with python3 setup.py install in order to run the modules/scripts within this repo.

Python Requirements

At this time we only support python3.

  • numpy
  • pytorch (>= 0.4.1)
  • pandas
  • scikit-learn
  • matplotlib
  • unidecode
  • sentencepiece
  • seaborn
  • emoji

Pretrained models

We've included our sentencepiece tokenizer model and vocab as a zip file:

We've included a transformer language model base as well as a 4096-d mlstm language model base. For examples on how to use these models please see our finetuning and transfer sections. Even though these models were trained with FP16 they can be used in FP32 training/inference.

We've also included classifiers trained on a subset of SemEval emotions corresponding to the 8 plutchik emotions (anger, anticipation, disgust, fear, joy, sadness, surprise, and trust):

Lastly, we've also included already trained classification models for SST and IMDB binary sentiment classification:

To use classification models that reproduce results from our original large batch language modeling paper please use the following commit hash and set of models.

We did not include pretrained models leveraging ELMo. To reproduce our papers' results with ELMo, please see our available resources.

Each file has a dictionary containing a PyTorch state_dict consisting of a language model (lm_encoder keys) trained on Amazon reviews and a classifier (classifier key) as well as accompanying args necessary to run a model with that state_dict.

Data Downloads

In the ./data folder we've provided processed copies of the Binary Stanford Sentiment Treebank (Binary SST), IMDB Movie Review, and the SemEval2018 Tweet Emotion datasets as part of this repository. In order to train on the amazon dataset please download the "aggressively deduplicated data" version from Julian McAuley's original site. Access requests to the dataset should be approved instantly. While using the dataset make sure to load it with the --loose-json flag.

Usage

In addition to providing easily reusable code of the core functionalities (models, distributed, fp16, etc.) of this work, we also provide scripts to perform the high-level functionalities of the original paper:

  • sentiment classification of input text
  • unsupervised reconstruction/language modeling of a corpus of text (+ script for launching distributed workers)
  • transfer of learned language model to perform sentiment analysis on a specified corpus
  • sampling from language model to generate text (possibly of fixed sentiment) + heatmap visualization of sentiment in text

Classifying text

Classify an input csv/json using one of our pretrained models or your own. Performs classification on Binary SST by default. Output classification probabilities are saved to a .npy file

python3 run_classifier.py --load_model ama_sst.pt                               # classify Binary SST
python3 run_classifier.py --load_model ama_sst_16.pt --fp16                     # run classification in fp16
python3 run_classifier.py --load_model ama_sst.pt --text-key <text-column> --data <path.csv>     # classify your own dataset

See here for more documentation.

Training Language Models (+ Distributed/FP16 Training)

Train a language model on a csv/json corpus. By default we train a weight-normalized, 4096-d mLSTM, with a 64-d character embedding. This is the first step of a 2-step process to training your own sentiment classifier. Saves model to lang_model.pt by default.

python3 pretrain.py                                                               #train a large model on imdb
python3 pretrain.py --model LSTM --nhid 512                                       #train a small LSTM instead
python3 pretrain.py --fp16 --dynamic-loss-scale                                   #train a model with fp16
python3 -m multiproc pretrain.py                                                  #distributed model training
python3 pretrain.py --data ./data/amazon/reviews.json --lazy --loose-json \       #train a model on amazon data
  --text-key reviewText --label-key overall --optim Adam --split 1000,1,1 
python3 pretrain.py --tokenizer-type SentencePieceTokenizer --vocab-size 32000 \  #train a model with our sentencepiece tokenization
  --tokenizer-type bpe --tokenizer-path ama_32k_tokenizer.model 
python3 pretrain.py --tokenizer-type SentencePieceTokenizer --vocab-size 32000 \  #train a transformer model with our sentencepiece tokenization
  --tokenizer-type bpe --tokenizer-path ama_32k_tokenizer.model --model transformer \
  --decoder-layers 12 --decoder-embed-dim 768 --decoder-ffn-embed-dim 3072 \
  --decoder-learned-pos --decoder-attention-heads 8
bash ./experiments/train_mlstm_singlenode.sh                                      #run our mLSTM training script on 1 DGX-1V
bash ./experiments/train_transformer_singlenode.sh                                #run our transformer training script on 1 DGX-1V 

For more documentation of our language modeling functionality look here

In order to learn about our language modeling experiments and reproduce results see the training reproduction section in analysis.

For information about how we achieve numerical stability with FP16 training see our fp16 training analysis.

Sentiment Transfer

Given a trained language model, this script will featurize text from train, val, and test csv/json's. It then uses sklearn logistic regression to fit a classifier to predict sentiment from these features. Lastly it performs feature selection to try and fit a regression model to the top n most relevant neurons (features). By default only one neuron is used for this second regression.

python3 transfer.py --load mlstm.pt                                 #performs transfer to SST, saves results to `<model>_transfer/` directory
python3 transfer.py --load mlstm.pt --neurons 5                     #use 5 neurons for the second regression
python3 transfer.py --load mlstm.pt --fp16                          #run model in fp16 for featurization step
bash ./experiments/run_sk_sst.sh                                    #run transfer learning with mlstm on imdb dataset
bash ./experiments/run_sk_imdb.sh                                   #run transfer learning with mlstm on sst dataset

Additional documentation of the command line arguments available for transfer can be found here

Classifier Finetuning

Given a trained language model and classification dataset, this script will build a classifier that leverages the trained language model as a text feature encoder. The difference between this script and transfer.py is that the model training is performed end to end: the loss from the classifier is backpropagated into the language model encoder as well. This script allows one to build more complex classification models, metrics, and loss functions than transfer.py. This script supports building arbitrary multilable, multilayer, and multihead perceptron classifiers. Additionally it allows using language modeling as an auxiliary task loss during training and multihead variance as an auxiliary loss during training. Lastly this script supports automatically selecting classification thresholds from validation performance. To measure validation performance this script includes more complex metrics including: f1-score, mathew correlation coefficient, jaccard index, recall, precision, and accuracy.

python3 finetune_classifier.py --load mlstm.pt --lr 2e-5 --aux-lm-loss --aux-lm-loss-weight .02   #finetune mLSTM model on sst (default dataset) with auxiliary loss
python3 finetune_classifier.py --load mlstm.pt --automatic-thresholding --threshold-metric f1     #finetune mLSTM model on sst and automatically select classification thresholds based on the validation f1 score
python3 finetune_classifier.py --tokenizer-type SentencePieceTokenizer --vocab-size 32000 \       #finetune transformer with sentencepiece on SST
  --tokenizer-type bpe tokenizer-path ama_32k_tokenizer.model --model transformer --lr 2e-5 \
  --decoder-layers 12 --decoder-embed-dim 768 --decoder-ffn-embed-dim 3072 \
  --decoder-learned-pos --decoder-attention-heads 8 --load transformer.pt --use-final-embed
python3 finetune_classifier.py --automatic-thresholding --non-binary-cols l1 l2 l3 --lr 2e-5\     #finetune multilayer classifier with 3 classes and 4 heads per class on some custom dataset and automatically select classfication thresholds
  --classifier-hidden-layers 2048 1024 3 --heads-per-class 4 --aux-head-variance-loss-weight 1.   #`aux-head-variance-loss-weight` is an auxiliary loss to increase the variance between each of the 4 head's weights
  --data <custom_train>.csv --val <custom_val>.csv --test <custom_test>.csv --load mlstm.pt
bash ./experiments/se_transformer_multihead.sh                                                    #finetune a multihead transformer on 8 semeval categories

See how to reproduce our finetuning experiments in the finetuning reproduction section of analysis.

Additional documentation of the command line arguments available for finetune_classifier.py can be found here

Analysis

Acknowledgement

A special thanks to our amazing summer intern Neel Kant for all the work he did with transformers, tokenization, and pretraining+finetuning classification models.

A special thanks to @csarofeen and @Michael Carilli for their help developing and documenting our RNN interface, Distributed Data Parallel model, and fp16 optimizer. The latest versions of these utilities can be found at the APEx github page.

Thanks to @guillitte for providing a lightweight pytorch port of openai's sentiment-neuron repo.

This project uses the amazon review dataset collected by J. McAuley

Thanks

Want to help out? Open up an issue with questions/suggestions or pull requests ranging from minor fixes to new functionality.

May your learning be Deep and Unsupervised.


Download details:

Author: NVIDIA
Source: https://github.com/NVIDIA/sentiment-discovery

License: View license

#pytorch 

Ananya Gupta

1594464365

Advantage of C Language Certification Online Training in 2020

C language is a procedural programming language. C language is the general purpose and object oriented programming language. C language is mainly used for developing different types of operating systems and other programming languages. C language is basically run in hardware and operating systems. C language is used many software applications such as internet browser, MYSQL and Microsoft Office.
**
Advantage of doing C Language Training in 2020 are:**

  1. Popular Programming language: The main Advantage of doing C language training in 2020 is popular programming language. C programming language is used and applied worldwide. C language is adaptable and flexible in nature. C language is important for different programmers. The basic languages that are used in C language is Java, C++, PHP, Python, Perl, JavaScript, Rust and C- shell.

  2. Basic language of all advanced languages: The another main Advantage of doing C language training in 2020 is basic language of all advanced languages. C language is an object oriented language. For learning, other languages, you have to master in C language.

  3. Understand the computer theories: The another main Advantage of doing C language training in 2020 is understand the computer theories. The theories such as Computer Networks, Computer Architecture and Operating Systems are based on C programming language.

  4. Fast in execution time: The another main Advantage of doing C language training in 2020 is fast in execution time. C language is to requires small run time and fast in execution time. The programs are written in C language are faster than the other programming language.

  5. Used by long term: The another main Advantage of doing C language training in 2020 is used by long term. The C language is not learning in the short span of time. It takes time and energy for becoming career in C language. C language is the only language that used by decades of time. C language is that exists for the longest period of time in computer programming history.

  6. Rich Function Library: The another main Advantage of doing C language training in 2020 is rich function library. C language has rich function of libraries as compared to other programming languages. The libraries help to build the analytical skills.

  7. Great degree of portability: The another main Advantage of doing C language training in 2020 is great degree of portability. C is a portable assemble language. It has a great degree of portability as compilers and interpreters of other programming languages are implemented in C language.
    The demand of C language is high in IT sector and increasing rapidly.

C Language Online Training is for individuals and professionals.
C Language Online Training helps to develop an application, build operating systems, games and applications, work on the accessibility of files and memory and many more.

C Language Online Course is providing the depth knowledge of functional and logical part, develop an application, work on memory management, understanding of line arguments, compiling, running and debugging of C programs.

Is C Language Training Worth Learning for You! and is providing the basic understanding of create C applications, apply the real time programming, write high quality code, computer programming, C functions, variables, datatypes, operators, loops, statements, groups, arrays, strings, etc.

The companies which are using C language are Amazon, Martin, Apple, Samsung, Google, Oracle, Nokia, IBM, Intel, Novell, Microsoft, Facebook, Bloomberg, VM Ware, etc.
C language is used in different domains like banking, IT, Insurance, Education, Gaming, Networking, Firmware, Telecommunication, Graphics, Management, Embedded, Application Development, Driver level Development, Banking, etc.

The job opportunities after completing the C Language Online certificationAre Data Scientists, Back End Developer, Embedded Developer, C Analyst, Software Developer, Junior Programmer, Database Developer, Embedded Engineer, Programming Architect, Game Programmer, Quality Analyst, Senior Programmer, Full Stack Developer, DevOps Specialist, Front End Web Developer, App Developer, Java Software Engineer, Software Developer and many more.

#c language online training #c language online course #c language certification online #c language certification #c language certification course #c language certification training

Ananya Gupta

1599550659

Benefits Of C Language Over Other Programming Languages

C may be a middle-level programing language developed by Dennis Ritchie during the first 1970s while performing at AT&T Bell Labs within the USA. the target of its development was within the context of the re-design of the UNIX OS to enable it to be used on multiple computers.

Earlier the language B was now used for improving the UNIX. Being an application-oriented language, B allowed a much faster production of code than in programming language. Still, B suffered from drawbacks because it didn’t understand data-types and didn’t provide the utilization of “structures”.

These drawbacks became the drive for Ritchie for the development of a replacement programing language called C. He kept most of the language B’s syntax and added data-types and lots of other required changes. Eventually, C was developed during 1971-73, containing both high-level functionality and therefore the detailed features required to program an OS. Hence, many of the UNIX components including the UNIX kernel itself were eventually rewritten in C.

Benefits of C language

As a middle-level language, C combines the features of both high-level and low-level languages. It is often used for low-level programmings, like scripting for it also supports functions of high-level C programming languages, like scripting for software applications, etc.
C may be a structured programing language that allows a posh program to be broken into simpler programs called functions. It also allows free movement of knowledge across these functions.

Various features of C including direct access to machine level hardware APIs, the presence of C compilers, deterministic resource use, and dynamic memory allocation make C language an optimum choice for scripting applications and drivers of embedded systems.

C language is case-sensitive which suggests lowercase and uppercase letters are treated differently.
C is very portable and is employed for scripting system applications which form a serious a part of Windows, UNIX, and Linux OS.

C may be a general-purpose programing language and may efficiently work on enterprise applications, games, graphics, and applications requiring calculations, etc.
C language features a rich library that provides a variety of built-in functions. It also offers dynamic memory allocation.

C implements algorithms and data structures swiftly, facilitating faster computations in programs. This has enabled the utilization of C in applications requiring higher degrees of calculations like MATLAB and Mathematica.

Riding on these advantages, C became dominant and spread quickly beyond Bell Labs replacing many well-known languages of that point, like ALGOL, B, PL/I, FORTRAN, etc. C language has become available on a really wide selection of platforms, from embedded microcontrollers to supercomputers.

#c language online training #c language training #c language course #c language online course #c language certification course

Simpliv LLC

Simpliv LLC

1582889136

Body Language course | The Power of Body Language Courses | Simpliv

Description
Imagine that every time you attend a business meeting or event or give a business presentation or media interview, you appear supremely confident, relaxed and authoritative. Every movement your body makes conveys the message you want it to. Imagine never having to wonder what to do with your hands or how to stand or move when you are in a business setting.

You can be a master of your own body language. Today.

In this course you will learn what to do with your hands, body and face so that you will look confident and relaxed in front of business colleagues and groups. You won’t have to wonder if you look nervous, scared or unprofessional any longer. Even if you are scared, you will learn how to look completely relaxed in front of any speaking audience or meeting with clients.

Why go through one more meeting or presentation worried that you don’t look or sound your best? Sign up for this course today.

TJ Walker is a body language/presentation skills coach to CEOs, Presidents of countries and Prime Ministers.

Who is the target audience?

Business executives
Entrepreneurs
Leaders
CEOs
C-level executives
Those who aspire to the C-Level
This course is NOT for single people looking to help their dating life!
Basic knowledge
cell phone with video camera or web cam
What will you learn
Use comfortable and confident body language in every business situation
Look relaxed and at ease in meetings, presentations and media interviews
Move, gesture and stand with poise and confidence in every business situation

ENROLL

#Top Body Language Courses Online #The Complete Body Language for Business Course #Body Language Training #6 Best Body Language Training & Courses

Angela  Dickens

Angela Dickens

1595106000

BERT: Pre-Training of Transformers for Language Understanding

Pre-training Language Models has taken over a majority of tasks in NLP. The 2017 paper, “Attention Is All You Need”, which proposed the Transformer architecture, changed the course of NLP. Based on that, several architectures like BERT, Open AI GPT evolved by leveraging self-supervised learning.

In this article, we discuss BERT : Bidirectional Encoder Representations from Transformers; which was proposed by Google AI in the paper, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. This is one of the groundbreaking models that has achieved the state of the art in many downstream tasks and is widely used.

Overview

Image for post

BERT Pre-training and Fine-Tuning Tasks from the Paper (We will cover the architecture and specifications in the coming sections. Just observe that the same architecture is transferred for the fine-tuning tasks with minimal changes in the parameters).

BERT leverages a fine-tuning based approach for applying pre-trained language models; i.e. a common architecture is trained for a relatively generic task, and then, it is fine-tuned on specific downstream tasks that are more or less similar to the pre-training task.

To achieve this, the BERT paper proposes 2 pre-training tasks:

  1. Masked Language Modeling (MLM)
  2. Next Sentence Prediction (NSP)

and fine-tuning on downstream tasks such as:

  1. Sequence Classification
  2. Named Entity Recognition (NER)
  3. Natural Language Inference (NLI) or Textual Entailment
  4. Grounded Common Sense Inference
  5. Question Answering (QnA)

We will discuss these in-depth in the coming sections of this article.

BERT Architecture

_BERT’s model architecture is a multi-layer bidirectional Transformer encoder based on the original implementation in _Vaswani et al.

_— _BERT Paper

I have already covered the Transformer architecture in this post. Consider giving it a read if you are interested in knowing about the Transformer.

To elaborate on the BERT-specific architecture, we will compare the encoder and the decoder of the Transformer:

  • **The Transformer Encoder **is essentially a Bidirectional Self-Attentive Model, that uses all the tokens in a sequence to attend each token in that sequence

i.e. for a given word, the attention is computed using all the words in the sentence and not just the words preceding the given word in one of the left-to-right or right-to-left traversal order.

#bert #nlp #artificial-intelligence #deep learning