While hands-on experience is paramount for mastering Large Language Models, a basic understanding of their internal mechanisms can have a big impact.
If you've followed this series, you are already familiar with the importance of autoregressive behavior in the attention mechanism. Today's discussion will focus on tokenization and context size: two additional implementation aspects that will show why shorter prompts are often better prompts.
1. Tokenization
In order for text to be processed by a Large Language Model, it has to be turned into numbers. This process is called tokenization: each word is broken up into smaller units known as tokens. As we will see, the amount of tokens in your prompt can have big consequences.
There are usually multiple tokens per word, which results in a token being around 3/4 of a word. To get the exact count of tokens in your prompt, you can use OpenAI's tokenizer.
2. Context Window Size
Due to memory constraints, computational costs and the use of "positional embeddings" (necessary to keep word ordering), Large Language Models have a fixed context window size: they can only look at so many words to predict the next one.
Because the model looks at input tokens as well as previously emitted output tokens, both the prompt and its answer have to fit within the context size. ChatGPT 3.5 has a context size of 32k tokens, while ChatGPT 4 has a context size of 8k.
Better Conversations with Less Tokens
Each message in a conversation adds to the context size. Exceed the limit, and ChatGPT starts forgetting earlier parts. While it can be tempting to add a lot of data to your prompts, conveying the important information concisely is often a better technique. It not only allows for longer conversations, but it tends to improve the quality of the answers as well.
Here are a few techniques to reduce the size of your prompts:
Pre-summarize data before adding it to your prompt
Request answers in bullet-point form
Condense information and often start new conversations, keeping only the key details
Understanding why token size matters is key to having conversations that make full use of the context window of the LLMs.