Context Engineering: Sessions and Memory
Your agent aced the demo. Then a real user asked: “What did we discuss yesterday?” — and it drew a complete blank. That’s not a bug. That’s a missing discipline.
📑 In This Article:
- The Problem
- The Shift: Prompt Engineering → Context Engineering
- Part 1: Sessions — Short-Term Memory
- Part 2: Memory Types — Long-Term Knowledge
- Part 3: Managing the Context Window
- Part 4: Multi-Agent Context Sharing
- Part 5: Production Best Practices
- The Context Engineering Checklist
- Industry Applications
- Key Takeaways
- References
The Problem
Your agent works great in testing. Single-turn queries? Perfect answers.
Then users have conversations:
- “What did we discuss yesterday?”
- “Update the recommendations based on what I told you earlier.”
- “Remember my preferences for next time.”
Your agent draws a blank. Every conversation starts from zero.
| The Failure Mode | Root Cause |
|---|---|
| 🧠 Mid-Conversation Amnesia | No session management |
| 📅 No Cross-Session Memory | No persistent storage |
| 🔀 Context Overflow | Conversation exceeds token limit |
| 🎭 Lost Personalization | User preferences not retained |
My Take: This Is Engineering Now
Here’s what I’ve come to realize: Context Engineering isn’t about prompting anymore.
When I first started building agents, I thought the skill was in crafting the perfect prompt—choosing the right words, the right tone, the right examples. That’s prompt engineering. It’s a craft.
But as agents grew more complex, I found myself managing an entirely different set of problems:
- Context window limits: How much can I fit? What gets cut?
- Token usage: Every request costs money. Waste compounds at scale.
- Session state: What happened 10 turns ago? Where do I store it?
- Memory persistence: What should survive across conversations?
- Multi-agent context sharing: How do agents pass information to each other?
This isn’t wordsmithing. This is systems design. This is engineering.
The shift from “Prompt Engineering” to “Context Engineering” isn’t just a name change. It’s a recognition that building production-grade agents requires the same rigor we apply to building production-grade platforms: architecture, state management, cost optimization, and observability.
📊 The Reality Check:
What the Industry Shows Why It Matters Source 84% of AI project failures are attributed to leadership/architecture, not model quality — Forbes/RAND, 2025 Most failures trace back to how information reaches the model, not the model itself — making context management the core architecture challenge. RAND Corporation Report Gartner: 60% of AI projects unsupported by AI-ready data will be abandoned through 2026 — Gartner, 2025 ”AI-ready data” largely means properly structured context. When the data reaching the model is disorganized, even strong models underperform. 🔒 Gartner Research · Subscription required AI hallucinations cost corporations >$67 billion annually in revenue and legal expenses — AI-TechPark, 2025 Much of this cost stems from models generating answers without proper grounding. Well-engineered context dramatically reduces hallucination risk. AI-TechPark Analysis
The Shift: Prompt Engineering → Context Engineering
Key Insight: What information reaches the model matters more than how you phrase the prompt.
Prompt Engineering focuses on crafting the perfect instruction.
Context Engineering focuses on curating the optimal information for each moment:
- What does the model need to know right now?
- What should be loaded on-demand vs. pre-loaded?
- What should persist across conversations?
Think of it like a football manager’s memory.
During a match, Sir Alex Ferguson held the current game state in immediate memory: the score, who’s tired, who’s booked, what’s working. That’s session context.
But he also drew on decades of accumulated knowledge: this opponent’s weaknesses, how his players perform under pressure, tactical patterns that work in specific situations. That’s long-term memory.
A manager who forgets the current score is useless. A manager who can’t recall that this striker always drifts left is missing crucial context. You need both.
Part 1: Sessions — Short-Term Memory
What is a Session?
A session is the complete context for a single conversation:
- User messages
- Agent responses
- Tool calls and results
- Working state (e.g., items in a cart)
The Session Lifecycle
Production Session Requirements
| Requirement | Why It Matters |
|---|---|
| Strict Isolation | User A cannot see User B’s session |
| Persistence | Survive server restarts |
| Ordering | Events must be chronological |
| TTL Policy | Sessions expire after inactivity |
| PII Redaction | Remove sensitive data before storage |
📦 Case Study: The ProjectState
In Vibe Product Design, we define the “Context Boundary” explicitly using a TypedDict. This ensures every agent knows exactly what context is available—and what isn’t.
# From studio/vibe-product-design/backend/app/graph/state.py
class ProjectState(TypedDict):
"""
The Single Source of Truth for the session.
"""
# 1. EPISODIC MEMORY: The chat history (accumulated)
messages: Annotated[list[BaseMessage], operator.add]
# 2. SEMANTIC MEMORY: The generated artifacts (BRD, ERD)
artifacts: dict[str, str]
# 3. SESSION STATE: Where are we in the workflow?
current_step: Literal["STRATEGY", "REQUIREMENTS", "ARCHITECTURE"]
# 4. HUMAN-IN-THE-LOOP: Explicit approval flag
human_feedback: Literal["pending", "approved", "rejected"]
Why This Works:
- Type Safety: The agent can’t hallucinate a “user_emotion” field that doesn’t exist.
- ** Persistence**: This entire dictionary is serialized to the database after every turn.
- Boundaries: If it’s not in
ProjectState, the agent doesn’t know it.
Part 2: Memory Types — Long-Term Knowledge
Google’s research defines three types of long-term memory:
The Memory Taxonomy
| Memory Type | What It Stores | Example | Time Horizon |
|---|---|---|---|
| 🧠 Semantic | Facts, knowledge | ”The user is a vegetarian” | Permanent |
| 📋 Procedural | How-to knowledge | ”How to deploy to production” | Stable |
| 📔 Episodic | Past experiences | ”Last week we debugged the login issue” | Decaying |
(Facts & Knowledge)"] PROC["📋 Procedural
(How-To)"] EPIS["📔 Episodic
(Past Events)"] end subgraph Examples["Examples"] S1["User preferences"] S2["Company policies"] P1["Coding standards"] P2["Deploy procedures"] E1["Past conversations"] E2["Previous decisions"] end SEM --> S1 SEM --> S2 PROC --> P1 PROC --> P2 EPIS --> E1 EPIS --> E2
Semantic Memory (Facts)
What the agent knows about the world and the user.
| Source | Examples |
|---|---|
| User Profile | Name, role, preferences, timezone |
| Domain Knowledge | Product catalog, company policies |
| External Knowledge | Via RAG from documents |
Storage: User profiles, vector databases, knowledge graphs.
Procedural Memory (How-To)
What the agent knows how to do.
This maps directly to Skills (see Article 3):
- Coding standards
- Review procedures
- Deployment workflows
Storage: Skill files (.agent/skills/), runbooks, SOPs.
Episodic Memory (Past Events)
What the agent remembers from past interactions.
| Pattern | Implementation |
|---|---|
| Conversation Summaries | Compress old sessions into key points |
| Decision Logs | ”On Jan 15, we chose option B because…” |
| Preference Learning | ”User consistently prefers concise answers” |
Storage: Summarized session archives, decision logs.
Part 3: Managing the Context Window
The Context Budget
Every model has a finite context window. You must budget it:
🔍 The “More Context = Better” Fallacy
Here’s the blind spot most teams miss.
The intuition is simple: give the model more information, get better answers. The data says the opposite.
Research on the “Lost in the Middle” phenomenon (Galileo, 2024) demonstrates that LLMs systematically degrade when processing information in the middle of long contexts. They attend strongly to the beginning and end — but the critical facts buried at position 40% through a 100K context? Effectively invisible.
| Context Size | What Happens | The Risk |
|---|---|---|
| Under 4K tokens | Model attends to everything | ✅ Safe zone |
| 4K–32K tokens | Middle content starts degrading | ⚠️ Retrieval accuracy drops |
| 32K+ tokens | Severe “lost in the middle” effect | ❌ Critical facts get ignored |
The counterintuitive conclusion: Aggressively pruning context often produces better results than stuffing the window full. Budget your tokens like you budget your cloud spend — every token should justify its cost.
Context Overflow Strategies
When history exceeds your budget:
| Strategy | How It Works | Trade-off |
|---|---|---|
| Truncation | Keep last N messages | Loses early context |
| Summarization | LLM summarizes old messages | Loses detail, costs tokens |
| Sliding Window | Fixed window that moves | Simple, may miss key context |
| Semantic Selection | Keep most relevant messages | Complex, more accurate |
| Query-Aware Compression | Compress based on current task relevance | Best quality, requires planning |
💡 2025 Update: The Sentinel Framework (May 2025) introduces lightweight, query-aware context compression that outperforms simple summarization. Key insight: compress based on what the model needs now, not just recency.
The Summarization Pattern
(10,000 tokens)"] --> S["🤖 Summarize"] S --> C["📝 Compressed
(500 tokens)"] C --> N["➕ New Messages"] N --> CTX["📋 Context Window"]
When to Summarize:
- When history reaches 70% of context budget
- At conversation milestones (topic changes)
- Before archiving a session
Part 4: Multi-Agent Context Sharing
In multi-agent systems, context becomes more complex.
Shared vs. Private Context
| Context Type | Who Sees It | Examples |
|---|---|---|
| Global | All agents | User identity, session goals |
| Shared | Agent subsets | Research results, intermediate data |
| Private | Single agent | Internal reasoning, tool credentials |
The Handoff Pattern
When Agent A hands off to Agent B:
- Summarize Agent A’s work
- Transfer relevant context (not everything)
- Preserve the user’s original intent
- Clear Agent A’s private state
Part 5: Production Best Practices
Security & Privacy
| Practice | Implementation |
|---|---|
| PII Redaction | Remove before storage (Model Armor) |
| Strict Isolation | ACLs per user session |
| Encryption | At rest and in transit |
| Audit Logging | Track all context access |
Data Lifecycle
| Stage | Policy |
|---|---|
| Active Session | Full context in working memory |
| Paused Session | Persist to durable storage |
| Archived Session | Summarize + move to cold storage |
| Expired Session | Delete per retention policy |
Performance Optimization
| Technique | Benefit |
|---|---|
| Lazy Loading | Load memories only when needed |
| Caching | Cache frequent retrievals |
| Prefetching | Anticipate likely context needs |
| Compression | Summarize before archiving |
The Context Engineering Checklist
For Every Agent
- Session Management: How is conversation history persisted?
- Memory Strategy: What’s stored permanently vs. session-scoped?
- Overflow Handling: What happens when context exceeds limits?
- Privacy Controls: Is PII redacted before storage?
- TTL Policies: When do sessions expire?
For Multi-Agent Systems
- Shared State: What context do agents share?
- Handoff Protocol: How is context transferred between agents?
- Isolation: What’s private to each agent?
Industry Applications
Context engineering patterns apply across all domains:
Memory Types by Industry
| Memory Type | 🏦 Banking | 🛒 Retail | 🎓 Education |
|---|---|---|---|
| Semantic | Account preferences, risk profile | Purchase history, size preferences | Learning style, accessibility needs |
| Procedural | KYC verification steps, dispute resolution | Return processing, loyalty rewards | Grading rubrics, lesson planning |
| Episodic | ”Last month we discussed refinancing" | "You bought this item before" | "We covered fractions last week” |
Session Examples
🏦 Banking: Customer returns after 3 days. Session restored with: prior questions, account context, and the loan application they started. No need to re-authenticate intent.
🛒 Retail: Shopper returns to abandoned cart. Session recalls: items, applied coupons, shipping preference. Seamless checkout resume.
🎓 Education: Student returns to tutoring session. Context includes: current topic, recent mistakes, learning pace. Agent picks up exactly where they left off.
Key Takeaways
- ✅ Sessions = Short-term: Current conversation state.
- ✅ Memory = Long-term: Semantic (facts), Procedural (how-to), Episodic (past events).
- ✅ Budget your context: Allocate tokens intentionally across system prompt, history, and knowledge.
- ✅ Summarize, don’t truncate: Preserve important context by compressing, not cutting.
- ✅ In multi-agent systems: Define what’s global, shared, and private.
- ✅ Security first: Redact PII, enforce isolation, encrypt storage.
What’s Next
- 📖 Previous article: Skills: Progressive Context Disclosure — On-demand procedural knowledge.
- 📖 Next article: The 9 Principles of Intelligent Agents — Core design principles from Google research.
- 💬 Discuss: How do you handle context overflow in your agents?
References
-
Google Cloud Research — Context Engineering: Sessions & Memory (2025). The primary reference for memory types and session management.
-
Anthropic — Building Effective Agents (2024). Emphasizes context curation over prompt crafting.
-
Google Cloud Research — Introduction to Agents (2025). Defines the role of context in the agentic loop.
-
Tulving, E. — Episodic and Semantic Memory (1972). The foundational cognitive science research on memory types.
❓ Frequently Asked Questions
What is context engineering for AI agents?
Context engineering is the discipline of managing an agent's entire context window—including conversation history, tool outputs, retrieved documents, and long-term memory—to optimize reasoning quality across multi-turn sessions.
What are the three types of long-term memory for agents?
Semantic Memory (facts and knowledge via RAG), Procedural Memory (how-to skills via SKILL.md), and Episodic Memory (past interactions for personalization).
How do I handle context window overflow?
Use strategies like summarization (compress old context), sliding window (keep recent N turns), or selective pruning (remove low-relevance content). Never silently truncate important information.
💬 Join the Discussion
Got questions, feedback, or want to share your experience building AI agents? Join our community of architects and engineers.