Context Engineering: Sessions and Memory


Your agent aced the demo. Then a real user asked: “What did we discuss yesterday?” — and it drew a complete blank. That’s not a bug. That’s a missing discipline.

📑 In This Article:


The Problem

Your agent works great in testing. Single-turn queries? Perfect answers.

Then users have conversations:

  • “What did we discuss yesterday?”
  • “Update the recommendations based on what I told you earlier.”
  • “Remember my preferences for next time.”

Your agent draws a blank. Every conversation starts from zero.

The Failure ModeRoot Cause
🧠 Mid-Conversation AmnesiaNo session management
📅 No Cross-Session MemoryNo persistent storage
🔀 Context OverflowConversation exceeds token limit
🎭 Lost PersonalizationUser preferences not retained

My Take: This Is Engineering Now

Here’s what I’ve come to realize: Context Engineering isn’t about prompting anymore.

When I first started building agents, I thought the skill was in crafting the perfect prompt—choosing the right words, the right tone, the right examples. That’s prompt engineering. It’s a craft.

But as agents grew more complex, I found myself managing an entirely different set of problems:

  • Context window limits: How much can I fit? What gets cut?
  • Token usage: Every request costs money. Waste compounds at scale.
  • Session state: What happened 10 turns ago? Where do I store it?
  • Memory persistence: What should survive across conversations?
  • Multi-agent context sharing: How do agents pass information to each other?

This isn’t wordsmithing. This is systems design. This is engineering.

The shift from “Prompt Engineering” to “Context Engineering” isn’t just a name change. It’s a recognition that building production-grade agents requires the same rigor we apply to building production-grade platforms: architecture, state management, cost optimization, and observability.

📊 The Reality Check:

What the Industry ShowsWhy It MattersSource
84% of AI project failures are attributed to leadership/architecture, not model quality — Forbes/RAND, 2025Most failures trace back to how information reaches the model, not the model itself — making context management the core architecture challenge.RAND Corporation Report
Gartner: 60% of AI projects unsupported by AI-ready data will be abandoned through 2026 — Gartner, 2025”AI-ready data” largely means properly structured context. When the data reaching the model is disorganized, even strong models underperform.🔒 Gartner Research · Subscription required
AI hallucinations cost corporations >$67 billion annually in revenue and legal expenses — AI-TechPark, 2025Much of this cost stems from models generating answers without proper grounding. Well-engineered context dramatically reduces hallucination risk.AI-TechPark Analysis

The Shift: Prompt Engineering → Context Engineering

Key Insight: What information reaches the model matters more than how you phrase the prompt.

Prompt Engineering focuses on crafting the perfect instruction.

Context Engineering focuses on curating the optimal information for each moment:

  • What does the model need to know right now?
  • What should be loaded on-demand vs. pre-loaded?
  • What should persist across conversations?

Think of it like a football manager’s memory.

During a match, Sir Alex Ferguson held the current game state in immediate memory: the score, who’s tired, who’s booked, what’s working. That’s session context.

But he also drew on decades of accumulated knowledge: this opponent’s weaknesses, how his players perform under pressure, tactical patterns that work in specific situations. That’s long-term memory.

A manager who forgets the current score is useless. A manager who can’t recall that this striker always drifts left is missing crucial context. You need both.

flowchart TD subgraph PromptEng["❌ Prompt Engineering"] P["Craft perfect prompt"] end subgraph ContextEng["✅ Context Engineering"] S["📋 Session State"] M["🧠 Long-term Memory"] T["🔧 Tool Results"] R["📚 Retrieved Knowledge"] end P --> LLM1["🤖 Model"] S --> C["Context Window"] M --> C T --> C R --> C C --> LLM2["🤖 Model"]

Part 1: Sessions — Short-Term Memory

What is a Session?

A session is the complete context for a single conversation:

  • User messages
  • Agent responses
  • Tool calls and results
  • Working state (e.g., items in a cart)
flowchart TD subgraph Session["📋 Session"] E1["Event 1: User message"] E2["Event 2: Agent response"] E3["Event 3: Tool call"] E4["Event 4: Tool result"] E5["Event 5: Agent response"] ST["State: cart items, preferences"] end E1 --> E2 --> E3 --> E4 --> E5

The Session Lifecycle

stateDiagram-v2 [*] --> Created: User starts conversation Created --> Active: First message Active --> Active: Messages exchanged Active --> Paused: User inactive (timeout) Paused --> Active: User returns Active --> Archived: TTL expires or user ends Archived --> [*]

Production Session Requirements

RequirementWhy It Matters
Strict IsolationUser A cannot see User B’s session
PersistenceSurvive server restarts
OrderingEvents must be chronological
TTL PolicySessions expire after inactivity
PII RedactionRemove sensitive data before storage

📦 Case Study: The ProjectState

In Vibe Product Design, we define the “Context Boundary” explicitly using a TypedDict. This ensures every agent knows exactly what context is available—and what isn’t.

# From studio/vibe-product-design/backend/app/graph/state.py
class ProjectState(TypedDict):
    """
    The Single Source of Truth for the session.
    """
    # 1. EPISODIC MEMORY: The chat history (accumulated)
    messages: Annotated[list[BaseMessage], operator.add]
    
    # 2. SEMANTIC MEMORY: The generated artifacts (BRD, ERD)
    artifacts: dict[str, str]
    
    # 3. SESSION STATE: Where are we in the workflow?
    current_step: Literal["STRATEGY", "REQUIREMENTS", "ARCHITECTURE"]
    
    # 4. HUMAN-IN-THE-LOOP: Explicit approval flag
    human_feedback: Literal["pending", "approved", "rejected"]

Why This Works:

  • Type Safety: The agent can’t hallucinate a “user_emotion” field that doesn’t exist.
  • ** Persistence**: This entire dictionary is serialized to the database after every turn.
  • Boundaries: If it’s not in ProjectState, the agent doesn’t know it.

Part 2: Memory Types — Long-Term Knowledge

Google’s research defines three types of long-term memory:

The Memory Taxonomy

Memory TypeWhat It StoresExampleTime Horizon
🧠 SemanticFacts, knowledge”The user is a vegetarian”Permanent
📋 ProceduralHow-to knowledge”How to deploy to production”Stable
📔 EpisodicPast experiences”Last week we debugged the login issue”Decaying
flowchart TD subgraph Memory["🧠 Long-Term Memory"] SEM["📚 Semantic
(Facts & Knowledge)"] PROC["📋 Procedural
(How-To)"] EPIS["📔 Episodic
(Past Events)"] end subgraph Examples["Examples"] S1["User preferences"] S2["Company policies"] P1["Coding standards"] P2["Deploy procedures"] E1["Past conversations"] E2["Previous decisions"] end SEM --> S1 SEM --> S2 PROC --> P1 PROC --> P2 EPIS --> E1 EPIS --> E2

Semantic Memory (Facts)

What the agent knows about the world and the user.

SourceExamples
User ProfileName, role, preferences, timezone
Domain KnowledgeProduct catalog, company policies
External KnowledgeVia RAG from documents

Storage: User profiles, vector databases, knowledge graphs.

Procedural Memory (How-To)

What the agent knows how to do.

This maps directly to Skills (see Article 3):

  • Coding standards
  • Review procedures
  • Deployment workflows

Storage: Skill files (.agent/skills/), runbooks, SOPs.

Episodic Memory (Past Events)

What the agent remembers from past interactions.

PatternImplementation
Conversation SummariesCompress old sessions into key points
Decision Logs”On Jan 15, we chose option B because…”
Preference Learning”User consistently prefers concise answers”

Storage: Summarized session archives, decision logs.


Part 3: Managing the Context Window

The Context Budget

Every model has a finite context window. You must budget it:

pie title Context Window Budget (32K tokens) "System Prompt" : 500 "Recent History" : 2000 "Retrieved Knowledge" : 1500 "Tool Definitions" : 800 "Working Memory" : 500 "Available for Response" : 26700

🔍 The “More Context = Better” Fallacy

Here’s the blind spot most teams miss.

The intuition is simple: give the model more information, get better answers. The data says the opposite.

Research on the “Lost in the Middle” phenomenon (Galileo, 2024) demonstrates that LLMs systematically degrade when processing information in the middle of long contexts. They attend strongly to the beginning and end — but the critical facts buried at position 40% through a 100K context? Effectively invisible.

Context SizeWhat HappensThe Risk
Under 4K tokensModel attends to everything✅ Safe zone
4K–32K tokensMiddle content starts degrading⚠️ Retrieval accuracy drops
32K+ tokensSevere “lost in the middle” effect❌ Critical facts get ignored

The counterintuitive conclusion: Aggressively pruning context often produces better results than stuffing the window full. Budget your tokens like you budget your cloud spend — every token should justify its cost.

Context Overflow Strategies

When history exceeds your budget:

StrategyHow It WorksTrade-off
TruncationKeep last N messagesLoses early context
SummarizationLLM summarizes old messagesLoses detail, costs tokens
Sliding WindowFixed window that movesSimple, may miss key context
Semantic SelectionKeep most relevant messagesComplex, more accurate
Query-Aware CompressionCompress based on current task relevanceBest quality, requires planning

💡 2025 Update: The Sentinel Framework (May 2025) introduces lightweight, query-aware context compression that outperforms simple summarization. Key insight: compress based on what the model needs now, not just recency.

The Summarization Pattern

flowchart LR H["📜 Full History
(10,000 tokens)"] --> S["🤖 Summarize"] S --> C["📝 Compressed
(500 tokens)"] C --> N["➕ New Messages"] N --> CTX["📋 Context Window"]

When to Summarize:

  • When history reaches 70% of context budget
  • At conversation milestones (topic changes)
  • Before archiving a session

Part 4: Multi-Agent Context Sharing

In multi-agent systems, context becomes more complex.

Shared vs. Private Context

Context TypeWho Sees ItExamples
GlobalAll agentsUser identity, session goals
SharedAgent subsetsResearch results, intermediate data
PrivateSingle agentInternal reasoning, tool credentials
flowchart TD subgraph Global["🌐 Global Context"] G1["User ID"] G2["Session Goal"] end subgraph Shared["🔗 Shared Context"] S1["Research Results"] S2["Draft Document"] end subgraph Private["🔒 Private"] P1["Agent A Reasoning"] P2["Agent B Credentials"] end A1["🤖 Agent A"] --> Global A1 --> Shared A1 --> P1 A2["🤖 Agent B"] --> Global A2 --> Shared A2 --> P2

The Handoff Pattern

When Agent A hands off to Agent B:

  1. Summarize Agent A’s work
  2. Transfer relevant context (not everything)
  3. Preserve the user’s original intent
  4. Clear Agent A’s private state

Part 5: Production Best Practices

Security & Privacy

PracticeImplementation
PII RedactionRemove before storage (Model Armor)
Strict IsolationACLs per user session
EncryptionAt rest and in transit
Audit LoggingTrack all context access

Data Lifecycle

StagePolicy
Active SessionFull context in working memory
Paused SessionPersist to durable storage
Archived SessionSummarize + move to cold storage
Expired SessionDelete per retention policy

Performance Optimization

TechniqueBenefit
Lazy LoadingLoad memories only when needed
CachingCache frequent retrievals
PrefetchingAnticipate likely context needs
CompressionSummarize before archiving

The Context Engineering Checklist

For Every Agent

  • Session Management: How is conversation history persisted?
  • Memory Strategy: What’s stored permanently vs. session-scoped?
  • Overflow Handling: What happens when context exceeds limits?
  • Privacy Controls: Is PII redacted before storage?
  • TTL Policies: When do sessions expire?

For Multi-Agent Systems

  • Shared State: What context do agents share?
  • Handoff Protocol: How is context transferred between agents?
  • Isolation: What’s private to each agent?

Industry Applications

Context engineering patterns apply across all domains:

Memory Types by Industry

Memory Type🏦 Banking🛒 Retail🎓 Education
SemanticAccount preferences, risk profilePurchase history, size preferencesLearning style, accessibility needs
ProceduralKYC verification steps, dispute resolutionReturn processing, loyalty rewardsGrading rubrics, lesson planning
Episodic”Last month we discussed refinancing""You bought this item before""We covered fractions last week”

Session Examples

🏦 Banking: Customer returns after 3 days. Session restored with: prior questions, account context, and the loan application they started. No need to re-authenticate intent.

🛒 Retail: Shopper returns to abandoned cart. Session recalls: items, applied coupons, shipping preference. Seamless checkout resume.

🎓 Education: Student returns to tutoring session. Context includes: current topic, recent mistakes, learning pace. Agent picks up exactly where they left off.


Key Takeaways

  • Sessions = Short-term: Current conversation state.
  • Memory = Long-term: Semantic (facts), Procedural (how-to), Episodic (past events).
  • Budget your context: Allocate tokens intentionally across system prompt, history, and knowledge.
  • Summarize, don’t truncate: Preserve important context by compressing, not cutting.
  • In multi-agent systems: Define what’s global, shared, and private.
  • Security first: Redact PII, enforce isolation, encrypt storage.

What’s Next


References

  1. Google Cloud ResearchContext Engineering: Sessions & Memory (2025). The primary reference for memory types and session management.

  2. AnthropicBuilding Effective Agents (2024). Emphasizes context curation over prompt crafting.

  3. Google Cloud ResearchIntroduction to Agents (2025). Defines the role of context in the agentic loop.

  4. Tulving, E.Episodic and Semantic Memory (1972). The foundational cognitive science research on memory types.

❓ Frequently Asked Questions

What is context engineering for AI agents?

Context engineering is the discipline of managing an agent's entire context window—including conversation history, tool outputs, retrieved documents, and long-term memory—to optimize reasoning quality across multi-turn sessions.

What are the three types of long-term memory for agents?

Semantic Memory (facts and knowledge via RAG), Procedural Memory (how-to skills via SKILL.md), and Episodic Memory (past interactions for personalization).

How do I handle context window overflow?

Use strategies like summarization (compress old context), sliding window (keep recent N turns), or selective pruning (remove low-relevance content). Never silently truncate important information.

💬 Join the Discussion

Got questions, feedback, or want to share your experience building AI agents? Join our community of architects and engineers.