Prompt Engineering Course
Memory Prompting
Memory prompting is the technique of designing prompts and systems that allow a language model to retain, recall, and reuse information across multiple interactions.
Without memory, every prompt starts from zero.
With memory, systems become personalized, contextual, and long-running.
Why Memory Matters in Real Systems
Real applications are not single-turn conversations.
They require the system to remember:
- User preferences
- Past decisions
- Conversation context
- Task progress
Memory prompting is what turns a chatbot into a usable product.
Important Clarification
Language models do not have true long-term memory.
Memory is implemented by:
- Storing information externally
- Injecting it back into prompts
Prompt engineering controls how this injection happens.
Types of Memory
In practice, memory falls into three categories:
- Short-term – current conversation context
- Session memory – data remembered within a session
- Long-term memory – stored across sessions
Each type requires a different prompt strategy.
Short-Term Memory via Context
Short-term memory is achieved by passing previous messages in the prompt.
messages = [
{ role: "user", content: "My name is Alex" },
{ role: "assistant", content: "Nice to meet you, Alex." },
{ role: "user", content: "What is my name?" }
]
The model answers correctly because the information exists in the current context window.
Limitations of Context-Based Memory
Context windows are finite.
As conversations grow:
- Older messages are dropped
- Important details are lost
- Costs increase
This is why external memory is necessary.
Session Memory Pattern
Session memory stores key facts extracted from conversation.
These facts are reinserted as structured context.
System:
User Profile:
- Name: Alex
- Preferred language: English
- Goal: Learn prompt engineering
This keeps prompts small while preserving important context.
How Memory Injection Works
The memory is injected:
- At the system level
- Before user messages
- In a structured format
This ensures the model treats memory as facts, not conversation.
Long-Term Memory Using Storage
Long-term memory is stored outside the model:
- Databases
- Vector stores
- Files
Relevant memory is retrieved and injected dynamically.
Example: Memory Retrieval Flow
Typical flow:
- User asks a question
- System retrieves relevant memories
- Memories are added to prompt
- Model responds using both memory and input
Memory-Aware Prompt Example
System:
You are a helpful assistant.
Use the user's stored preferences when responding.
Memory:
- User prefers concise explanations
- User is learning Prompt Engineering
User:
Explain memory prompting.
The response adapts automatically to stored preferences.
What Happens Inside the Model
The model:
- Reads memory as ground truth
- Combines it with user input
- Generates context-aware output
It does not know the memory source — only its content.
Common Mistakes
Frequent issues include:
- Injecting too much memory
- Using unstructured text
- Failing to update memory
Bad memory design leads to confusion and drift.
Best Practices
Effective memory prompting:
- Stores only relevant facts
- Uses structured formats
- Updates memory intentionally
Real-World Applications
Memory prompting powers:
- Personalized assistants
- Long-running agents
- User onboarding flows
- Adaptive learning platforms
Practice
What enables short-term memory in LLMs?
Where is long-term memory stored?
What must be done with memory before the model can use it?
Quick Quiz
LLMs store long-term memory internally.
Memory is best injected as:
Which is a valid use of memory?
Recap: Memory prompting enables continuity, personalization, and long-running interactions.
Next up: Multimodal prompting — working across text, images, audio, and more.