Context Engineering: Sessions and Memory
“Context engineering is what separates agents that forget mid-conversation from agents that remember you for years.”
The Problem
Your agent works great in testing. Single-turn queries? Perfect answers.
Then users have conversations:
- “What did we discuss yesterday?”
- “Update the recommendations based on what I told you earlier.”
- “Remember my preferences for next time.”
Your agent draws a blank. Every conversation starts from zero.
| The Failure Mode | Root Cause |
|---|---|
| 🧠 Mid-Conversation Amnesia | No session management |
| 📅 No Cross-Session Memory | No persistent storage |
| 🔀 Context Overflow | Conversation exceeds token limit |
| 🎭 Lost Personalization | User preferences not retained |
The Shift: Prompt Engineering → Context Engineering
Key Insight: What information reaches the model matters more than how you phrase the prompt.
Prompt Engineering focuses on crafting the perfect instruction.
Context Engineering focuses on curating the optimal information for each moment:
- What does the model need to know right now?
- What should be loaded on-demand vs. pre-loaded?
- What should persist across conversations?
Part 1: Sessions — Short-Term Memory
What is a Session?
A session is the complete context for a single conversation:
- User messages
- Agent responses
- Tool calls and results
- Working state (e.g., items in a cart)
The Session Lifecycle
Production Session Requirements
| Requirement | Why It Matters |
|---|---|
| Strict Isolation | User A cannot see User B’s session |
| Persistence | Survive server restarts |
| Ordering | Events must be chronological |
| TTL Policy | Sessions expire after inactivity |
| PII Redaction | Remove sensitive data before storage |
Part 2: Memory Types — Long-Term Knowledge
Google’s research defines three types of long-term memory:
The Memory Taxonomy
| Memory Type | What It Stores | Example | Time Horizon |
|---|---|---|---|
| 🧠 Semantic | Facts, knowledge | ”The user is a vegetarian” | Permanent |
| 📋 Procedural | How-to knowledge | ”How to deploy to production” | Stable |
| 📔 Episodic | Past experiences | ”Last week we debugged the login issue” | Decaying |
(Facts & Knowledge)"] PROC["📋 Procedural
(How-To)"] EPIS["📔 Episodic
(Past Events)"] end subgraph Examples["Examples"] S1["User preferences"] S2["Company policies"] P1["Coding standards"] P2["Deploy procedures"] E1["Past conversations"] E2["Previous decisions"] end SEM --> S1 SEM --> S2 PROC --> P1 PROC --> P2 EPIS --> E1 EPIS --> E2
Semantic Memory (Facts)
What the agent knows about the world and the user.
| Source | Examples |
|---|---|
| User Profile | Name, role, preferences, timezone |
| Domain Knowledge | Product catalog, company policies |
| External Knowledge | Via RAG from documents |
Storage: User profiles, vector databases, knowledge graphs.
Procedural Memory (How-To)
What the agent knows how to do.
This maps directly to Skills (see Article 3):
- Coding standards
- Review procedures
- Deployment workflows
Storage: Skill files (.agent/skills/), runbooks, SOPs.
Episodic Memory (Past Events)
What the agent remembers from past interactions.
| Pattern | Implementation |
|---|---|
| Conversation Summaries | Compress old sessions into key points |
| Decision Logs | ”On Jan 15, we chose option B because…” |
| Preference Learning | ”User consistently prefers concise answers” |
Storage: Summarized session archives, decision logs.
Part 3: Managing the Context Window
The Context Budget
Every model has a finite context window. You must budget it:
Context Overflow Strategies
When history exceeds your budget:
| Strategy | How It Works | Trade-off |
|---|---|---|
| Truncation | Keep last N messages | Loses early context |
| Summarization | LLM summarizes old messages | Loses detail, costs tokens |
| Sliding Window | Fixed window that moves | Simple, may miss key context |
| Semantic Selection | Keep most relevant messages | Complex, more accurate |
| Query-Aware Compression | Compress based on current task relevance | Best quality, requires planning |
💡 2025 Update: The Sentinel Framework (May 2025) introduces lightweight, query-aware context compression that outperforms simple summarization. Key insight: compress based on what the model needs now, not just recency.
The Summarization Pattern
(10,000 tokens)"] --> S["🤖 Summarize"] S --> C["📝 Compressed
(500 tokens)"] C --> N["➕ New Messages"] N --> CTX["📋 Context Window"]
When to Summarize:
- When history reaches 70% of context budget
- At conversation milestones (topic changes)
- Before archiving a session
Part 4: Multi-Agent Context Sharing
In multi-agent systems, context becomes more complex.
Shared vs. Private Context
| Context Type | Who Sees It | Examples |
|---|---|---|
| Global | All agents | User identity, session goals |
| Shared | Agent subsets | Research results, intermediate data |
| Private | Single agent | Internal reasoning, tool credentials |
The Handoff Pattern
When Agent A hands off to Agent B:
- Summarize Agent A’s work
- Transfer relevant context (not everything)
- Preserve the user’s original intent
- Clear Agent A’s private state
Part 5: Production Best Practices
Security & Privacy
| Practice | Implementation |
|---|---|
| PII Redaction | Remove before storage (Model Armor) |
| Strict Isolation | ACLs per user session |
| Encryption | At rest and in transit |
| Audit Logging | Track all context access |
Data Lifecycle
| Stage | Policy |
|---|---|
| Active Session | Full context in working memory |
| Paused Session | Persist to durable storage |
| Archived Session | Summarize + move to cold storage |
| Expired Session | Delete per retention policy |
Performance Optimization
| Technique | Benefit |
|---|---|
| Lazy Loading | Load memories only when needed |
| Caching | Cache frequent retrievals |
| Prefetching | Anticipate likely context needs |
| Compression | Summarize before archiving |
The Context Engineering Checklist
For Every Agent
- Session Management: How is conversation history persisted?
- Memory Strategy: What’s stored permanently vs. session-scoped?
- Overflow Handling: What happens when context exceeds limits?
- Privacy Controls: Is PII redacted before storage?
- TTL Policies: When do sessions expire?
For Multi-Agent Systems
- Shared State: What context do agents share?
- Handoff Protocol: How is context transferred between agents?
- Isolation: What’s private to each agent?
Industry Applications
Context engineering patterns apply across all domains:
Memory Types by Industry
| Memory Type | 🏦 Banking | 🛒 Retail | 🎓 Education |
|---|---|---|---|
| Semantic | Account preferences, risk profile | Purchase history, size preferences | Learning style, accessibility needs |
| Procedural | KYC verification steps, dispute resolution | Return processing, loyalty rewards | Grading rubrics, lesson planning |
| Episodic | ”Last month we discussed refinancing" | "You bought this item before" | "We covered fractions last week” |
Session Examples
🏦 Banking: Customer returns after 3 days. Session restored with: prior questions, account context, and the loan application they started. No need to re-authenticate intent.
🛒 Retail: Shopper returns to abandoned cart. Session recalls: items, applied coupons, shipping preference. Seamless checkout resume.
🎓 Education: Student returns to tutoring session. Context includes: current topic, recent mistakes, learning pace. Agent picks up exactly where they left off.
Key Takeaways
- ✅ Sessions = Short-term: Current conversation state.
- ✅ Memory = Long-term: Semantic (facts), Procedural (how-to), Episodic (past events).
- ✅ Budget your context: Allocate tokens intentionally across system prompt, history, and knowledge.
- ✅ Summarize, don’t truncate: Preserve important context by compressing, not cutting.
- ✅ In multi-agent systems: Define what’s global, shared, and private.
- ✅ Security first: Redact PII, enforce isolation, encrypt storage.
What’s Next
- 📖 Previous article: MCP Best Practices: Tools That Don’t Overwhelm
- 📖 Next article: A2A Protocol: Agent-to-Agent Collaboration — Google’s protocol for agent interoperability.
- 💬 Discuss: How do you handle context overflow in your agents?
References
-
Google Cloud Research — Context Engineering: Sessions & Memory (2025). The primary reference for memory types and session management.
-
Anthropic — Building Effective Agents (2024). Emphasizes context curation over prompt crafting.
-
Google Cloud Research — Introduction to Agents (2025). Defines the role of context in the agentic loop.
-
Tulving, E. — Episodic and Semantic Memory (1972). The foundational cognitive science research on memory types.
❓ Frequently Asked Questions
What is context engineering for AI agents?
Context engineering is the discipline of managing an agent's entire context window—including conversation history, tool outputs, retrieved documents, and long-term memory—to optimize reasoning quality across multi-turn sessions.
What are the three types of long-term memory for agents?
Semantic Memory (facts and knowledge via RAG), Procedural Memory (how-to skills via SKILL.md), and Episodic Memory (past interactions for personalization).
How do I handle context window overflow?
Use strategies like summarization (compress old context), sliding window (keep recent N turns), or selective pruning (remove low-relevance content). Never silently truncate important information.
💬 Join the Discussion
📣 Prefer LinkedIn? Connect and discuss on LinkedIn →