The key component of the transformer architecture is the attention module. Its job is to figure out the matching pairs (think: Translation) in a sequence through similarity scores. When the length of a sequence increases, calculating similarity scores for all pairs gets inefficient. So, the researchers have come up with the sparse attention technique where it computes only a few pairs and cuts downtime and memory requirements.
According to Google researchers, sparse attention methods still suffer from a number of limitations:
Along with these, there are also some operations that cannot be sparsified, such as the commonly used softmax operation, which normalises similarity scores in the attention mechanism and is used heavily in industry-scale recommender systems.
#developers corner #google ai #performers #self attention models #ai