Jan 21, 2026

Context Engineering: Sessions and Memory

Your agent aced the demo. Then a real user asked: “What did we discuss yesterday?” — and it drew a complete blank. That’s not a bug. That’s a missing discipline.

📑 In This Article:

The Problem
The Shift: Prompt Engineering → Context Engineering
Part 1: Sessions — Short-Term Memory
Part 2: Memory Types — Long-Term Knowledge
Part 3: Managing the Context Window
Part 4: Multi-Agent Context Sharing
Part 5: Production Best Practices
The Context Engineering Checklist
Industry Applications
Key Takeaways
References

The Problem

Your agent works great in testing. Single-turn queries? Perfect answers.

Then users have conversations:

“What did we discuss yesterday?”
“Update the recommendations based on what I told you earlier.”
“Remember my preferences for next time.”

Your agent draws a blank. Every conversation starts from zero.

The Failure Mode	Root Cause
🧠 Mid-Conversation Amnesia	No session management
📅 No Cross-Session Memory	No persistent storage
🔀 Context Overflow	Conversation exceeds token limit
🎭 Lost Personalization	User preferences not retained

My Take: This Is Engineering Now

Here’s what I’ve come to realize: Context Engineering isn’t about prompting anymore.

When I first started building agents, I thought the skill was in crafting the perfect prompt—choosing the right words, the right tone, the right examples. That’s prompt engineering. It’s a craft.

But as agents grew more complex, I found myself managing an entirely different set of problems:

Context window limits: How much can I fit? What gets cut?
Token usage: Every request costs money. Waste compounds at scale.
Session state: What happened 10 turns ago? Where do I store it?
Memory persistence: What should survive across conversations?
Multi-agent context sharing: How do agents pass information to each other?

This isn’t wordsmithing. This is systems design. This is engineering.

The shift from “Prompt Engineering” to “Context Engineering” isn’t just a name change. It’s a recognition that building production-grade agents requires the same rigor we apply to building production-grade platforms: architecture, state management, cost optimization, and observability.

📊 The Reality Check:

What the Industry Shows Why It Matters Source
84% of AI project failures are attributed to leadership/architecture, not model quality — Forbes/RAND, 2025 Most failures trace back to how information reaches the model, not the model itself — making context management the core architecture challenge. RAND Corporation Report
Gartner: 60% of AI projects unsupported by AI-ready data will be abandoned through 2026 — Gartner, 2025 ”AI-ready data” largely means properly structured context. When the data reaching the model is disorganized, even strong models underperform. 🔒 Gartner Research · Subscription required
AI hallucinations cost corporations >$67 billion annually in revenue and legal expenses — AI-TechPark, 2025 Much of this cost stems from models generating answers without proper grounding. Well-engineered context dramatically reduces hallucination risk. AI-TechPark Analysis

What the Industry Shows	Why It Matters	Source
84% of AI project failures are attributed to leadership/architecture, not model quality — Forbes/RAND, 2025	Most failures trace back to how information reaches the model, not the model itself — making context management the core architecture challenge.	RAND Corporation Report
Gartner: 60% of AI projects unsupported by AI-ready data will be abandoned through 2026 — Gartner, 2025	”AI-ready data” largely means properly structured context. When the data reaching the model is disorganized, even strong models underperform.	🔒 Gartner Research · Subscription required
AI hallucinations cost corporations >$67 billion annually in revenue and legal expenses — AI-TechPark, 2025	Much of this cost stems from models generating answers without proper grounding. Well-engineered context dramatically reduces hallucination risk.	AI-TechPark Analysis

The Shift: Prompt Engineering → Context Engineering

Key Insight: What information reaches the model matters more than how you phrase the prompt.

Prompt Engineering focuses on crafting the perfect instruction.

Context Engineering focuses on curating the optimal information for each moment:

What does the model need to know right now?
What should be loaded on-demand vs. pre-loaded?
What should persist across conversations?

Think of it like a football manager’s memory.

During a match, Sir Alex Ferguson held the current game state in immediate memory: the score, who’s tired, who’s booked, what’s working. That’s session context.

But he also drew on decades of accumulated knowledge: this opponent’s weaknesses, how his players perform under pressure, tactical patterns that work in specific situations. That’s long-term memory.

A manager who forgets the current score is useless. A manager who can’t recall that this striker always drifts left is missing crucial context. You need both.

flowchart TD subgraph PromptEng["❌ Prompt Engineering"] P["Craft perfect prompt"] end subgraph ContextEng["✅ Context Engineering"] S["📋 Session State"] M["🧠 Long-term Memory"] T["🔧 Tool Results"] R["📚 Retrieved Knowledge"] end P --> LLM1["🤖 Model"] S --> C["Context Window"] M --> C T --> C R --> C C --> LLM2["🤖 Model"]

Part 1: Sessions — Short-Term Memory

What is a Session?

A session is the complete context for a single conversation:

User messages
Agent responses
Tool calls and results
Working state (e.g., items in a cart)

flowchart TD subgraph Session["📋 Session"] E1["Event 1: User message"] E2["Event 2: Agent response"] E3["Event 3: Tool call"] E4["Event 4: Tool result"] E5["Event 5: Agent response"] ST["State: cart items, preferences"] end E1 --> E2 --> E3 --> E4 --> E5

The Session Lifecycle

stateDiagram-v2 [*] --> Created: User starts conversation Created --> Active: First message Active --> Active: Messages exchanged Active --> Paused: User inactive (timeout) Paused --> Active: User returns Active --> Archived: TTL expires or user ends Archived --> [*]

Production Session Requirements

Requirement	Why It Matters
Strict Isolation	User A cannot see User B’s session
Persistence	Survive server restarts
Ordering	Events must be chronological
TTL Policy	Sessions expire after inactivity
PII Redaction	Remove sensitive data before storage

📦 Case Study: The ProjectState

In Vibe Product Design, we define the “Context Boundary” explicitly using a TypedDict. This ensures every agent knows exactly what context is available—and what isn’t.

# From studio/vibe-product-design/backend/app/graph/state.py
class ProjectState(TypedDict):
    """
    The Single Source of Truth for the session.
    """
    # 1. EPISODIC MEMORY: The chat history (accumulated)
    messages: Annotated[list[BaseMessage], operator.add]
    
    # 2. SEMANTIC MEMORY: The generated artifacts (BRD, ERD)
    artifacts: dict[str, str]
    
    # 3. SESSION STATE: Where are we in the workflow?
    current_step: Literal["STRATEGY", "REQUIREMENTS", "ARCHITECTURE"]
    
    # 4. HUMAN-IN-THE-LOOP: Explicit approval flag
    human_feedback: Literal["pending", "approved", "rejected"]

Why This Works:

Type Safety: The agent can’t hallucinate a “user_emotion” field that doesn’t exist.
** Persistence**: This entire dictionary is serialized to the database after every turn.
Boundaries: If it’s not in ProjectState, the agent doesn’t know it.

Part 2: Memory Types — Long-Term Knowledge

Google’s research defines three types of long-term memory:

The Memory Taxonomy

Memory Type	What It Stores	Example	Time Horizon
🧠 Semantic	Facts, knowledge	”The user is a vegetarian”	Permanent
📋 Procedural	How-to knowledge	”How to deploy to production”	Stable
📔 Episodic	Past experiences	”Last week we debugged the login issue”	Decaying

flowchart TD subgraph Memory["🧠 Long-Term Memory"] SEM["📚 Semantic
(Facts & Knowledge)"] PROC["📋 Procedural
(How-To)"] EPIS["📔 Episodic
(Past Events)"] end subgraph Examples["Examples"] S1["User preferences"] S2["Company policies"] P1["Coding standards"] P2["Deploy procedures"] E1["Past conversations"] E2["Previous decisions"] end SEM --> S1 SEM --> S2 PROC --> P1 PROC --> P2 EPIS --> E1 EPIS --> E2

Semantic Memory (Facts)

What the agent knows about the world and the user.

Source	Examples
User Profile	Name, role, preferences, timezone
Domain Knowledge	Product catalog, company policies
External Knowledge	Via RAG from documents

Storage: User profiles, vector databases, knowledge graphs.

Procedural Memory (How-To)

What the agent knows how to do.

This maps directly to Skills (see Article 3):

Coding standards
Review procedures
Deployment workflows

Storage: Skill files (.agent/skills/), runbooks, SOPs.

Episodic Memory (Past Events)

What the agent remembers from past interactions.

Pattern	Implementation
Conversation Summaries	Compress old sessions into key points
Decision Logs	”On Jan 15, we chose option B because…”
Preference Learning	”User consistently prefers concise answers”

Storage: Summarized session archives, decision logs.

Part 3: Managing the Context Window

The Context Budget

Every model has a finite context window. You must budget it:

pie title Context Window Budget (32K tokens) "System Prompt" : 500 "Recent History" : 2000 "Retrieved Knowledge" : 1500 "Tool Definitions" : 800 "Working Memory" : 500 "Available for Response" : 26700

🔍 The “More Context = Better” Fallacy

Here’s the blind spot most teams miss.

The intuition is simple: give the model more information, get better answers. The data says the opposite.

Research on the “Lost in the Middle” phenomenon (Galileo, 2024) demonstrates that LLMs systematically degrade when processing information in the middle of long contexts. They attend strongly to the beginning and end — but the critical facts buried at position 40% through a 100K context? Effectively invisible.

Context Size	What Happens	The Risk
Under 4K tokens	Model attends to everything	✅ Safe zone
4K–32K tokens	Middle content starts degrading	⚠️ Retrieval accuracy drops
32K+ tokens	Severe “lost in the middle” effect	❌ Critical facts get ignored

The counterintuitive conclusion: Aggressively pruning context often produces better results than stuffing the window full. Budget your tokens like you budget your cloud spend — every token should justify its cost.

Context Overflow Strategies

When history exceeds your budget:

Strategy	How It Works	Trade-off
Truncation	Keep last N messages	Loses early context
Summarization	LLM summarizes old messages	Loses detail, costs tokens
Sliding Window	Fixed window that moves	Simple, may miss key context
Semantic Selection	Keep most relevant messages	Complex, more accurate
Query-Aware Compression	Compress based on current task relevance	Best quality, requires planning

💡 2025 Update: The Sentinel Framework (May 2025) introduces lightweight, query-aware context compression that outperforms simple summarization. Key insight: compress based on what the model needs now, not just recency.

The Summarization Pattern

flowchart LR H["📜 Full History
(10,000 tokens)"] --> S["🤖 Summarize"] S --> C["📝 Compressed
(500 tokens)"] C --> N["➕ New Messages"] N --> CTX["📋 Context Window"]

When to Summarize:

When history reaches 70% of context budget
At conversation milestones (topic changes)
Before archiving a session

In multi-agent systems, context becomes more complex.

Shared vs. Private Context

Context Type	Who Sees It	Examples
Global	All agents	User identity, session goals
Shared	Agent subsets	Research results, intermediate data
Private	Single agent	Internal reasoning, tool credentials

flowchart TD subgraph Global["🌐 Global Context"] G1["User ID"] G2["Session Goal"] end subgraph Shared["🔗 Shared Context"] S1["Research Results"] S2["Draft Document"] end subgraph Private["🔒 Private"] P1["Agent A Reasoning"] P2["Agent B Credentials"] end A1["🤖 Agent A"] --> Global A1 --> Shared A1 --> P1 A2["🤖 Agent B"] --> Global A2 --> Shared A2 --> P2

The Handoff Pattern

When Agent A hands off to Agent B:

Summarize Agent A’s work
Transfer relevant context (not everything)
Preserve the user’s original intent
Clear Agent A’s private state

Part 5: Production Best Practices

Security & Privacy

Practice	Implementation
PII Redaction	Remove before storage (Model Armor)
Strict Isolation	ACLs per user session
Encryption	At rest and in transit
Audit Logging	Track all context access

Data Lifecycle

Stage	Policy
Active Session	Full context in working memory
Paused Session	Persist to durable storage
Archived Session	Summarize + move to cold storage
Expired Session	Delete per retention policy

Performance Optimization

Technique	Benefit
Lazy Loading	Load memories only when needed
Caching	Cache frequent retrievals
Prefetching	Anticipate likely context needs
Compression	Summarize before archiving

The Context Engineering Checklist

For Every Agent

Session Management: How is conversation history persisted?
Memory Strategy: What’s stored permanently vs. session-scoped?
Overflow Handling: What happens when context exceeds limits?
Privacy Controls: Is PII redacted before storage?
TTL Policies: When do sessions expire?

For Multi-Agent Systems

Shared State: What context do agents share?
Handoff Protocol: How is context transferred between agents?
Isolation: What’s private to each agent?

Industry Applications

Context engineering patterns apply across all domains:

Memory Types by Industry

Memory Type	🏦 Banking	🛒 Retail	🎓 Education
Semantic	Account preferences, risk profile	Purchase history, size preferences	Learning style, accessibility needs
Procedural	KYC verification steps, dispute resolution	Return processing, loyalty rewards	Grading rubrics, lesson planning
Episodic	”Last month we discussed refinancing"	"You bought this item before"	"We covered fractions last week”

Session Examples

🏦 Banking: Customer returns after 3 days. Session restored with: prior questions, account context, and the loan application they started. No need to re-authenticate intent.

🛒 Retail: Shopper returns to abandoned cart. Session recalls: items, applied coupons, shipping preference. Seamless checkout resume.

🎓 Education: Student returns to tutoring session. Context includes: current topic, recent mistakes, learning pace. Agent picks up exactly where they left off.

Key Takeaways

✅ Sessions = Short-term: Current conversation state.
✅ Memory = Long-term: Semantic (facts), Procedural (how-to), Episodic (past events).
✅ Budget your context: Allocate tokens intentionally across system prompt, history, and knowledge.
✅ Summarize, don’t truncate: Preserve important context by compressing, not cutting.
✅ In multi-agent systems: Define what’s global, shared, and private.
✅ Security first: Redact PII, enforce isolation, encrypt storage.

What’s Next

📖 Previous article: Skills: Progressive Context Disclosure — On-demand procedural knowledge.
📖 Next article: The 9 Principles of Intelligent Agents — Core design principles from Google research.
💬 Discuss: How do you handle context overflow in your agents?

References

Google Cloud Research — Context Engineering: Sessions & Memory (2025). The primary reference for memory types and session management.
Anthropic — Building Effective Agents (2024). Emphasizes context curation over prompt crafting.
Google Cloud Research — Introduction to Agents (2025). Defines the role of context in the agentic loop.
Tulving, E. — Episodic and Semantic Memory (1972). The foundational cognitive science research on memory types.

❓ Frequently Asked Questions

What is context engineering for AI agents?

Context engineering is the discipline of managing an agent's entire context window—including conversation history, tool outputs, retrieved documents, and long-term memory—to optimize reasoning quality across multi-turn sessions.

What are the three types of long-term memory for agents?

Semantic Memory (facts and knowledge via RAG), Procedural Memory (how-to skills via SKILL.md), and Episodic Memory (past interactions for personalization).

How do I handle context window overflow?

Use strategies like summarization (compress old context), sliding window (keep recent N turns), or selective pruning (remove low-relevance content). Never silently truncate important information.

💬 Join the Discussion

Got questions, feedback, or want to share your experience building AI agents? Join our community of architects and engineers.

Join Facebook Community → Connect on LinkedIn →