This paper (“Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context”) was published in ACL 2019, one of the top NLP conferences, by researchers at Google AI. It proposes Transformer-XL, a new architecture that enables natural language understanding beyond a fixed-length context without disrupting temporal coherence. Its key innovations are a segment-level recurrence mechanism and a novel positional encoding scheme. Unlike the traditional Transformer model, it can capture longer-term dependency and solve the context fragmentation problem, which are the main limitations of the vanilla Transformer. The experiments show that Transformer-XL learns dependency that is much longer than RNNs and vanilla Transformer. Transformer-XL also achieves state-of-the-art results in the evaluation with large benchmark datasets.

Paper link: https://www.aclweb.org/anthology/P19-1285.pdf

1. Background

Language modeling is an important topic in natural language processing. People have proposed many unsupervised pre-training methods like BERT and ELMo. However, modeling long-term dependency remains a challenge. Recurrent neural networks (RNNs), especially Long Short-term Memory networks (LSTM) have been a standard solution to modeling long-term dependency. The introduction of gating in LSTMs and the gradient clipping technique improve the ability of modeling long-term dependency, but it is insufficient to address this challenge. Also, it is difficult to optimize RNNs for modeling long-term dependency due to gradient vanishing and explosion.

#data-science #machine-learning #artificial-intelligence #nlp

Transformer-XL Review: Beyond Fixed-Length Contexts
1.40 GEEK