Thinking Beyond Transformers: Google Introduces Performers

Thinking Beyond Transformers: Google Introduces Performers

To overcome the limitations of sparse transformers, Google introduced Performers, a Transformer architecture with attention mechanisms that scale linearly, thus enabling faster training while allowing the model to process longer lengths.

The key component of the transformer architecture is the attention module. Its job is to figure out the matching pairs (think: Translation) in a sequence through similarity scores. When the length of a sequence increases, calculating similarity scores for all pairs gets inefficient. So, the researchers have come up with the sparse attention technique where it computes only a few pairs and cuts downtime and memory requirements.

According to Google researchers, sparse attention methods still suffer from a number of limitations:

  • They require efficient sparse-matrix multiplication operations, which are not available on all accelerators.
  • They do not provide rigorous theoretical guarantees for their representation power. 
  • Optimised primarily for Transformer models and generative pre-training.
  • Difficult to use with other pre-trained models as they usually stack more attention layers to compensate for sparse representations, thus requiring retraining and significant energy consumption. 
  • Not sufficient to address the full range of problems to which regular attention methods are applied, such as Pointer Networks. 

Along with these, there are also some operations that cannot be sparsified, such as the commonly used softmax operation, which normalises similarity scores in the attention mechanism and is used heavily in industry-scale recommender systems.

developers corner google ai performers self attention models ai

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Google’s New AI-Enabled Flood Alert Model For India & Bangladesh

Google’s New AI-Enabled Flood Alert Model For India & Bangladesh. Google launched a new forecasting artificial intelligence model that will allow doubling the lead time of its alerts.

What Is Google’s Recently Launched BigBird

Google Research introduced a new sparse attention mechanism that improves performance on a multitude of tasks known as BigBird.

How can Self-Healing AI Help a Web Test Automation Developer?

Self-Healing AI is also associated with an added technology which evolves every time you run the test. It learns and relearns automatically. So as your UI evolves with development, your tests evolve too. Your tests will adapt to UI changes automatically and stay up to date even after several successive UI changes.

How long does it take to develop/build an app?

This article covers A-Z about the mobile and web app development process and answers your question on how long does it take to develop/build an app.

Developer Career Path: To Become a Team Lead or Stay a Developer?

For a developer, becoming a team leader can be a trap or open up opportunities for creating software. Two years ago, when I was a developer, ... by Oleg Sklyarov, Fullstack Developer at Skyeng company