Here is a compilation of the top ten alternatives to the popular language model BERT for natural language understanding (NLU) projects.
DeBERTa is the first-ever language model that proposed the disentangled attention mechanism.
Facebook AI introduced a Transformer architecture, that is known to be with more memory as well as time-efficient, called Linformer
Google introduced Smart Scrolling, a new ML-based feature in its Recorder app that automatically marks important sections in the transcript.
Google is extending the capability of BERT to a new domain -- patent search. This BERT algorithm is trained exclusively on patent text.
Build a Serverless Question-Answering API using the Serverless Framework, AWS Lambda, AWS EFS, efsync, Terraform, the transformers Library from HuggingFace
Amazon introduced an optimal subset of the popular BERT architecture for neural architecture search, known as BORT.
The most popular family in NLP town. If you haven’t and still somehow have stumbled across this article, let me have the honor of introducing you to BERT — the powerful NLP beast.
Check out this guide to choosing and benchmarking BERT models for question answering.
In this article, I’ve covered: A brief overview of Transformers-based Models; Limitations of Transformers-based Models; What is BigBird, and; Potential applications of BigBird.
In this article, we are going to discuss this type of prediction, especially if the prediction has to happen on a mobile device. In this article, we’re going to discuss one of the MobileBERT implementations, called MobileBertForNextSentencePrediction.
ELECTRA: Pre-Training Text Encoders as Discriminators rather than Generators. What is the difference between ELECTRA and BERT?
Step-by-Step BERT Explanation & Implementation Part 2— Data Formatting & Loading. This is Part 2 of the BERT Explanation & Implementation series.
Zero-Shot Text Classification & Evaluation. In this post, we will see how to use zero-shot text classification with any labels and explain the background model.
In this post, I will be explaining how to fine-tune DistilBERT for a multi-label text classification task. I have made a GitHub repo as well containing the complete code which is explained below.
Researchers at Google AI unveiled an extension of the projection attention neural network PRADO, known as pQRNN.
Both GPT-3 and BERT have been relatively new for the industry, but their SOTA performance has made them the winners in the NLP tasks.
MachineHack successfully conducted its eighteenth installment of the weekend hackathon series this Monday. The Product Sentiment Classification: Weekend Hackathon #19 provided the contestants with an opportunity to develop a machine learning model to accurately classify various products into four different classes of sentiments based on the raw text review provided by the user
Just when we thought that all name variations of BERT were taken (RoBERTa, ALBERT, FlauBERT, ColBERT, CamemBERT etc.), along comes AMBERT, another incremental iteration on the Transformer Muppet that has taken over natural language understanding. AMBERT was published on August 27 by ByteDance, the developer of TikTok and Toutiao.
For the life of me, I couldn’t understand how BERT or GPT-2 worked. I read articles; followed diagrams; squinted at equations; watched recorded classes; read code documentation; and still struggled to make sense of it all. It wasn’t the math that made it hard.