Understanding the Context Window Attention Gradient in LLMs
Have you ever noticed that AI language models sometimes overlook parts of your input? Let’s dive into why that happens.
In OpenAI models like ChatGPT-4, instructions at the beginning and end of the context window have the higher attention weight. The attached image illustrates the attention gradient for a dialogue with a lengthy instruction.
Even though modern models can process context windows with tens of thousands of tokens, Large Language Models (LLMs) can still partially or completely ignore certain sections of the input. This occurs due to the training data and the varying importance assigned to information based on its position within the context. Attention gradients can differ between models trained with different methods.
As the number of messages in a conversation or the length of instructions increases, the “gray zone” in the attention gradient expands. This can lead to LLMs becoming forgetful, even after just a few messages.
In my next post, I’ll share how to effectively craft your prompts to get the best results from LLMs.
Understanding the Context Window Attention Gradient in LLMs
