Prompt Engineering Course
Tokens and Context Windows
Every interaction with a Large Language Model is constrained by two invisible limits:
tokens and the context window.
Understanding these limits is essential for writing reliable, scalable prompts.
Tokenization as the Model’s Input Language
LLMs do not operate on characters or words.
They operate on tokens — numerical representations produced by a tokenizer.
A single word may be:
- one token
- multiple tokens
- split differently depending on context
This means visible text length is not the same as computational length.
Why Tokens Are Not Predictable by Intuition
Human intuition fails when estimating token count.
For example:
"ChatGPT is powerful"
This short sentence already expands into multiple tokens internally.
Longer prompts can silently explode in token usage without looking long.
The Context Window as Working Memory
The context window defines how many tokens the model can consider at once.
It includes:
- system instructions
- user prompts
- conversation history
- generated responses
Once this limit is exceeded, earlier information is truncated.
Why Truncation Breaks Prompt Behavior
When important instructions fall outside the context window:
- the model ignores them
- output quality degrades
- behavior becomes inconsistent
This is not a bug — it is a physical limitation of the architecture.
Prompt Length vs Attention Allocation
Even inside the context window, attention is not evenly distributed.
Earlier tokens strongly shape interpretation.
Later tokens compete for diminishing influence.
This creates a practical rule:
Important instructions must appear early and clearly.
Design Trade-offs Introduced by Context Limits
Prompt engineers constantly balance:
- detail vs brevity
- examples vs instructions
- history vs freshness
There is no “best” length — only optimal design for a goal.
Why Long Prompts Often Fail
Long prompts fail not because models are weak, but because humans overload them.
Common failure patterns include:
- buried constraints
- conflicting examples
- redundant instructions
Good prompting is subtractive, not additive.
Conceptual Processing Flow
Internally, the model workflow resembles:
Text → Tokenization → Context Window → Attention → Output Tokens
Every prompt decision influences this pipeline.
How You Should Practice This Concept
Do not memorize token counts.
Instead, practice by:
- shortening prompts without losing intent
- moving instructions earlier
- removing redundant phrasing
This builds intuition that scales to real systems.
Practice
What unit do LLMs process internally?
What limits how much information a model can consider?
Where should critical instructions be placed?
Quick Quiz
When prompts exceed limits, what happens?
Token count is based on:
Recap: Tokens and context windows define the hard boundaries of prompt design.
Next: Prompt types — zero-shot, one-shot, and few-shot prompting.