SelfExtend: Boost Python Mistral Model with Group Attention

SelfExtend Attention for Mistral

Implementation of the Self-Extend paper that uses group attention to extend context windows of LLMs without fine-tuning/pre-training.


The SelfExtend mechanism modifies the standard attention mechanism in the Mistral model to improve its context capturing capabilities. This is achieved by extending the attention span of the model, allowing it to consider a broader context while making predictions. This enhancement is particularly useful in tasks involving long sequences of data.


  • Compatibility: Designed to work with the Hugging Face Transformers library.
  • Extended Context: Currently it can take Mistral 7b's 8k context to 16k.
  • Grouped Attention: Utilize a novel attention mechanism that groups tokens to mitigate the positional O.O.D. issue


To use this implementation, the following prerequisites must be met:

  • Python 3.10
  • PyTorch
  • Transformers Library


Clone the repository to your local machine and copy the modeling files into transformers/src/transformers/models/mistral

When initializing the weights specify the self_extend attention mechanism as such:

model = MistralForCausalLM.from_pretrained("hf_mistral-7B-v0.1", attn_implementation="self_extend")

Download Details:

Author: sdan
Source Code: 
License: Apache-2.0 license

#python #context #wpf 

SelfExtend: Boost Python Mistral Model with Group Attention
1.60 GEEK