The transformer architecture has produced a revolution in the **NLP **field and in deep learning. A multitude of applications are benefiting from the capacity of these models to process sequences in parallel while achieving a deeper understanding of their context through the attention mechanisms they implement. And GPT-3 is a hot topic right now in the deep learning community.

Understanding how the transformer processes sequences can be a bit challenging at first. When tackling a complex model, many people like to study how the computations of the model change the shapes of the tensors that travel through it.

To that purpose, I created the X-Ray Transformer infographic, that allows you to make the journey from the beginning to the end of the transformer’s computations in both the training and inference phases. Its objective is to achieve a quick and deep understanding of the inner computations of a transformer model through the analysis and exploration of a single visual asset.

A link to download a higher resolution version of the full infographic below is available at the end of this article.

#artificial-intelligence #deep-learning #data-science #machine-learning #towards-data-science

X-Ray Transformer Infographic
1.25 GEEK