The 4 Pillars: Persona, Skills, RAG, MCP
“Should I put this in RAG, a Skill, or the Persona?”
Every engineer building agents hits this wall. You have domain knowledge—a PDF, a database, a rule—and you don’t know where it belongs.
Get it wrong, and you get Context Overflow (expensive, slow agents) or Context Amnesia (hallucinations).
📑 In This Article:
- The Problem
- The Concept
- Pillar 1: Persona 🎭
- Pillar 2: Skills 📚
- Pillar 3: RAG 📖
- Pillar 4: MCP 🔌
- The Decision Framework
- Industry Applications
- 📦 Case Study: The Vibe Context Stack
- Key Takeaways
- References
The Problem
You have a PDF with company policies. A database schema. A set of coding standards.
Where do they go?
Into the system prompt? Into RAG? A skill definition?
Most developers treat the context window like a junk drawer—stuffing rules, docs, and schemas into one massive prompt.
This is the “Bruno Fernandes” trap.
Watch Bruno at Manchester United. He’s asked to do everything: press high, track back and defend, orchestrate the midfield, create chances, and score goals. The result? Overload. Burnout. A world-class playmaker reduced to chasing the ball across 70 meters of pitch.
When Bruno plays his natural role—an attacking midfielder focused purely on creativity and final-third damage—he’s explosive. Killer passes. Match-winning assists. That’s his superpower. But you can’t unlock it when he’s also covering for defensive midfielders.
The same principle governs cloud architecture: Choose the right tool for the right job.
You don’t run a data warehouse on a Lambda function. You don’t serve real-time APIs from a batch ETL pipeline. Each service excels at one thing—and the architecture succeeds because of separation of concerns.
LLMs are no different. Cognitive Load is real. Just as Bruno degrades when asked to multitask, LLMs degrade when instructions conflict. We need architecture that lets each component do what it does best.
📊 The Reality Check:
What the Industry Shows Why It Matters Source RAG systems achieve 95-99% accuracy vs 30-50% for LLMs without retrieval — Enterprise RAG Benchmarks, 2025 The accuracy gap shows how much grounding matters — retrieval turns unreliable outputs into dependable answers. 🔒 Enterprise RAG Benchmarks · Analyst report By end of 2025, nearly every company adopted MCP as the standard for tool integration — The New Stack, 2025 MCP’s rapid adoption mirrors what happened with container orchestration — the industry converged faster than expected. The New Stack — MCP Only 23% of organizations have scaled agentic AI beyond pilots — McKinsey State of AI, 2025 Most teams get stuck between pilot and production. A structured agent design (persona, skills, RAG, MCP) helps bridge that gap. McKinsey State of AI
The Concept
There are four distinct pillars of agent context. Each solves a specific problem — and the key differentiator is not when data is loaded, but what type of knowledge it represents and which direction it flows.
| Pillar | Knowledge Type | Data Direction | The Authority Anchor |
|---|---|---|---|
| 🎭 Persona | Identity (WHO am I) | Static → always in context | ”Role Prompting” improves reasoning accuracy (research). |
| 📚 Skills | Procedure (HOW to do) | Read-only → loaded when task type matches | Tool Use / Function Calling standards. |
| 📖 RAG | Facts (WHAT is true) | Read-only → retrieved by semantic similarity | Lewis et al. (2020) original RAG paper. |
| 🔌 MCP | State + Actions (DO something) | Bidirectional → reads live state AND writes/executes | Anthropic’s Model Context Protocol. |
Validated by Google’s Framework
This structure mirrors the cognitive architecture defined in Google’s Context Engineering guide:
| Our Pillar | Google’s Equivalent | The Function |
|---|---|---|
| 🎭 Persona | System Instructions | Defines the “Role” and behavioral constraints. |
| 📚 Skills | Procedural Memory | Stores “How-to” knowledge (tools, code, workflows). |
| 📖 RAG | Semantic Memory | Stores “What-is” knowledge (facts, docs, data). |
| 🔌 MCP | Tool Interoperability | The standardized interface for action. |
(WHO am I?)"] S["📚 Skills
(HOW do I code?)"] R["📖 RAG
(WHAT is the schema?)"] M["🔌 MCP
(ACT on the DB)"] end Q --> P P --> S S --> R S --> M M --> Output["✅ Result"]
Pillar 1: Persona 🎭
Purpose: Define the State Machine and Governance Layer. When: Always present (System Prompt).
Recent research on Role Prompting shows that assigning a specific persona (e.g., “You are a Senior Security Engineer”) significantly improves reasoning capabilities. But at an enterprise level, Persona is much more than a “tone of voice.” It is the agent’s fundamental operating system.
It defines the decision boundaries and risk tolerance. If an agent encounters an ambiguous user request, the Persona dictates whether it should attempt a best-effort guess (high risk tolerance) or halt and escalate to a human (strict governance).
The Mistake: Using Persona for mechanics.
- ❌ “You are an agent that outputs JSON with keys x, y, z…”
- ✅ “You are a pragmatist who values working code over theoretical purity. You will never bypass security checks. If a request violates compliance, you will immediately trigger the
escalate_to_humansequence.”
Governance Rule: The Persona defines the values and state transitions the agent uses to make trade-offs.
Pillar 2: Skills 📚
Purpose: Establish Deterministic Execution. When: Loaded on demand (Tool Definitions / API Contracts).
Skills are procedural knowledge. If Persona is the “character,” Skills are the “script.” In modern terms, these are the API contracts, Tools, or Functions that the model is allowed to call.
The Mistake: Hardcoding complex, multi-step business logic into the System Prompt.
The Fix: Encapsulate logic in a tool. Instead of wasting 1,000 tokens trying to explain the exact algorithmic steps of “How to validate a SEPA bank transfer,” give the agent a validate_sepa_transfer() tool.
Why? It moves complexity from probabilistic tokens (the LLM guessing how to execute logic) to deterministic code (a Python or TypeScript function executing safely). This drastically reduces failure rates, prevents hallucinations in business logic, and makes the system unit-testable.
The SKILL.md Standard
Anthropic formalized the skill pattern through their Skills architecture. Each skill lives in its own folder with a SKILL.md file containing YAML frontmatter and structured instructions:
.agent/skills/
├── code-review/
│ ├── SKILL.md # Main instructions (loaded when triggered)
│ ├── examples.md # Usage examples (loaded as needed)
│ └── scripts/
│ └── lint.py # Utility script (executed, not loaded)
├── database/
│ └── SKILL.md
└── deployment/
├── SKILL.md
└── reference/
├── aws.md # AWS-specific details
└── gcp.md # GCP-specific details
The key insight from Anthropic: The context window is a public good. Your skill shares the context window with everything else — system prompt, conversation history, other skills’ metadata, and the actual request. At startup, only the metadata (name, description) is pre-loaded. Claude reads SKILL.md only when the skill becomes relevant, and reads additional files only as needed.
📖 Cross-Reference: We explore the full progressive loading architecture in Skills: Progressive Context Disclosure.
Degrees of Freedom
Not every skill needs the same level of control. Anthropic defines a freedom spectrum for skill instructions:
| Freedom Level | When to Use | Example |
|---|---|---|
| 🟢 High (text guidance) | Multiple valid approaches, context-dependent | ”Analyze the code structure and suggest improvements” |
| 🟡 Medium (pseudocode) | Preferred pattern exists, some variation OK | ”Use this template and customize as needed” |
| 🔴 Low (exact scripts) | Fragile operations, consistency critical | ”Run exactly: python scripts/migrate.py --verify --backup” |
The analogy from Anthropic: Think of the agent as a robot on a path. On a narrow bridge with cliffs on both sides, there’s only one safe way forward — provide exact instructions (low freedom). In an open field with no hazards, many paths lead to success — give general direction and trust the agent (high freedom).
Match specificity to task fragility. Database migrations get exact scripts. Code reviews get general guidelines.
The Enterprise Lifecycle
For production deployments, Anthropic recommends a 6-stage lifecycle:
Key enterprise governance principles:
- Recall limits: Too many active skills degrades selection accuracy — evaluate as you add
- Start specific, consolidate later: Begin with narrow, workflow-specific skills; merge into role-based bundles only when evaluations confirm equivalent performance
- Separation of duties: Skill authors should not be their own reviewers
- Version pinning: Pin to specific versions in production; run full evaluation suites before promoting
🔗 Practical Resource: For a production-ready skill library, see Antigravity Awesome Skills — a community-curated collection of 889+ battle-tested agentic skills for Claude Code, Antigravity IDE, Cursor, and more. It demonstrates skills at ecosystem scale, with curated role-based bundles for different engineering workflows.
Pillar 3: RAG 📖
Purpose: Inject factual knowledge the agent doesn’t have — read-only context that enriches reasoning. When: Retrieved at query time by semantic similarity to the user’s question.
Patrick Lewis et al. introduced RAG in 2020 to solve the “knowledge cutoff” problem. In modern agentic architectures, RAG is about injecting the exact semantic context needed to solve a problem right now, without poisoning the context window with irrelevant data.
It requires strategic thinking regarding semantic boundaries: understanding when to use Vector Databases (for unstructured semantic search like finding similar policies) versus Graph Databases (for relational pathfinding, like understanding dependencies in a microservice architecture).
The key distinction: RAG gives the agent information to think with — it’s read-only knowledge injection. The agent receives facts, policies, or documents and uses them to reason. RAG never changes external state.
The Enterprise Litmus Test for RAG: The knowledge is factual (not procedural), the corpus is too large to fit in the system prompt, and the agent needs different slices of it depending on the query.
- Company policies that fill 500 pages? RAG — retrieve the 3 relevant sections per query.
- Product catalog with 10,000 SKUs? RAG — retrieve matching products by semantic search.
- Step-by-step deployment procedure? Skills — this is procedural (HOW), not factual (WHAT).
- Current stock price right now? MCP — this requires querying a live API, not a knowledge base.
Pillar 4: MCP 🔌
Purpose: Give the agent hands — the ability to read live system state and execute actions in the real world. When: Invoked when the agent needs to interact with external systems, not just read static knowledge.
The Model Context Protocol (MCP) is Anthropic’s open standard for connecting AI models to data sources. It is the “USB-C” for agents.
The key distinction: MCP is bidirectional. Unlike RAG (which only reads from a knowledge base), MCP both reads live state AND writes/executes actions. When the agent checks the current deployment status → MCP. When the agent creates a GitHub issue → MCP. When the agent queries a database for the latest row → MCP. The data is live, the interaction is active, and the agent can change things.
RAG vs MCP — the acid test: Can the operation change the world? If yes → MCP. If the agent is only reading knowledge to inform its reasoning → RAG.
Why it matters for Architecture: Before MCP, every agent needed custom, brittle glue code to talk to GitHub, Slack, or Postgres. This tightly coupled the AI logic directly to the infrastructure.
With MCP, you achieve true separation of concerns. The connector is a standardized transport layer. Your infrastructure team can manage, secure, and scale the database connections independently, while the AI team focuses purely on the agent’s reasoning logic. This decoupling is mandatory for scaling agentic systems securely in the enterprise.
⚠️ Security Note: MCP gives agents hands — and hands can break things. The official MCP specification identifies 5 protocol-level attack vectors (Confused Deputy, SSRF, Token Passthrough, Session Hijacking, Scope Minimization) that go beyond basic “least privilege” protections. 📖 Deep Dive: For the full security model and production hardening guide, see MCP Best Practices: Tools That Do Not Overwhelm.
🆕 2025 Update: MCP Apps now support UI capabilities, allowing MCP servers to present visual interfaces directly to users — bridging the gap between headless tool invocation and interactive applications. MCP has also been contributed to the Agentic AI Foundation, joining industry efforts alongside Google’s A2A protocol.
The Decision Framework
How do you decide? Use the Knowledge-Type Test first, then confirm with the Change-Frequency Heuristic:
Primary: What type of knowledge is it?
| Ask This Question… | If Yes → | Example |
|---|---|---|
| Does it define who the agent is? | 🎭 Persona | Values, risk tolerance, reasoning style |
| Does it define how to do a task? | 📚 Skills | Procedures, workflows, step-by-step guides |
| Does it provide facts to think with? (read-only) | 📖 RAG | Policies, catalogs, documentation |
| Does it require live interaction or action? | 🔌 MCP | Creating issues, sending messages, deploying code |
Secondary: How often does it change?
| Change Frequency | Confirms… |
|---|---|
| Never (Values, Style) | 🎭 Persona |
| Quarterly (Procedures) | 📚 Skills |
| Daily/Weekly (Knowledge base) | 📖 RAG |
| Real-time (Live system state) | 🔌 MCP |
⚠️ Why frequency alone isn’t enough: A database schema that changes quarterly could be RAG (if you need to look up column types) or Skills (if you need the procedure for migrating it). The knowledge type matters more than the refresh rate.
The Grey Zone: When RAG and MCP Overlap
“But a database query can be both RAG and MCP!” — Yes. And that’s fine. Here’s how to think about it.
The same database can serve both pillars. The differentiator is query intent and how the result is used, not the data source:
| Scenario | Query Type | Result Is Used To… | Pillar |
|---|---|---|---|
| ”Find policies similar to this complaint” | Semantic (similarity search) | Enrich reasoning context | 📖 RAG |
| ”What’s the status of order #5678?” | Exact (structured lookup) | Provide the direct answer | 🔌 MCP |
| ”Find past bugs similar to this error” | Semantic (vector search) | Inform the agent’s diagnosis | 📖 RAG |
| ”Show overdue invoices and send reminders” | Exact (SQL query + action) | Trigger downstream action | 🔌 MCP |
The acid test in 2 questions:
- Is the query semantic or exact? Semantic similarity search (finding related knowledge) → RAG. Exact/structured lookup (getting this specific record) → MCP.
- Does the result enrich reasoning, or IS it the answer? If the agent uses the result as context to think harder → RAG. If the result is the answer itself or triggers an action → MCP.
💡 In practice: Many production agents use both for the same data source. A customer service agent might use RAG to retrieve similar past tickets (semantic context) AND MCP to look up this customer’s order status (exact query). The database is the same — Postgres. The pillar depends on the query’s purpose.
🔍 The Common Traps
The framework looks clean. But here’s where teams actually get it wrong.
Most pillar misplacements happen because teams categorize by format (it’s a document → RAG) instead of behavior (it changes rarely → Skill). Three traps that catch even experienced engineers:
| The Trap | What They Do | Why It Breaks | The Fix |
|---|---|---|---|
| 📄 Coding standards in RAG | ”It’s a document, so it goes in RAG” | Coding standards change quarterly, not daily. RAG retrieval adds latency and may return partial matches. | Move to Skills — stable procedural knowledge loaded on demand |
| 🎭 User preferences in Persona | ”The agent should just know the user likes concise answers” | Persona is static. User preferences evolve per session and per user. Hardcoding them means one-size-fits-all. | Store in Semantic Memory (RAG) — retrieved per user, per session |
| 📋 API docs in System Prompt | ”The agent needs to know what tools it has” | Full API docs in the system prompt waste thousands of tokens. The agent only needs descriptions to decide; the execution happens in code. | Expose via MCP tool definitions — the agent sees what it can do, not how every endpoint works |
💡 The Meta-Principle: Classify by change frequency and access pattern, not by file format. A PDF can be a Skill (if it’s a stable procedure) or RAG (if it’s a living policy document). The content’s behavior determines its pillar, not its container.
Industry Applications
The 4 Pillars apply universally. Here’s how they map across domains:
| Pillar | 🏦 Banking | 🛒 Retail | 🎓 Education |
|---|---|---|---|
| 🎭 Persona | ”Risk-aware advisor prioritizing compliance" | "Helpful shopping assistant with brand voice" | "Patient tutor adapting to learning pace” |
| 📚 Skills | Fraud investigation procedures, loan underwriting steps | Return processing workflow, inventory lookup | Lesson plan generation, assessment rubrics |
| 📖 RAG | Lending policies, rate sheets, compliance docs | Product catalog, pricing, promotions | Course materials, student records |
| 🔌 MCP | Core banking API, credit bureaus, fraud detection | Inventory system, payment gateway, shipping | LMS, gradebook, content library |
Result: Each student gets a tailored path. The Assessment Agent identifies gaps; the Curriculum Agent builds the plan; the Content Agent delivers materials at the right level.
📦 Case Study: The Vibe Context Stack
In Vibe Product Design, we implemented the 4 Pillars as a rigorous architectural stack to prevent “context rot” during complex design sessions.
1. 🎭 Persona (The Identity)
We define the Chief Architect not just as “an architect,” but through specific values: Rigorous, First-Principles Thinker, Skeptical of Complexity. This guides every trade-off decision (e.g., rejecting a microservice architecture for a simple MVP).
2. 📚 Skills (The Mechanics)
We extracted 92 procedural skills into markdown files. Instead of a 5,000-token prompt explaining how to write a C4 diagram, we simply load skills/mermaidjs-v11/SKILL.md when the agent reaches the diagramming step.
3. 📖 RAG (The Knowledge Base) - Targeted State
To prevent “context rot” during complex design sessions, we are moving beyond simple vector search to implement CRAG (Corrective RAG). If the internal LLM-as-Judge determines a generated architecture has low confidence or violates constraints, the agent will autonomously perform targeted re-retrieval to correct its own design.
4. 🔌 MCP (The Action) - Targeted State
We are designing Enterprise MCP integrations to establish Verified Design Loops. Instead of just generating theoretical diagrams, agents will use MCP tools to validate designs against real-world constraints (e.g., integrating AWS cost calculators, running SOC2/HIPAA compliance checklists) with full authentication and audit logging.
The Payoff: The system prompt remains under 800 tokens. The agent is “smart” because it accesses knowledge just-in-time, not because we stuffed it with facts.
Key Takeaways
- ✅ Don’t clutter context: Use the right pillar to keep the “reasoning brain” clear.
- ✅ Persona is for values: Use it to guide decisions, not just format output.
- ✅ Skills are deterministic: Move complex logic out of prompts and into code.
- ✅ Skills need structure: Follow the SKILL.md standard — YAML frontmatter for discovery, progressive disclosure for loading, and degrees of freedom matched to task fragility.
- ✅ Standardize with MCP: Don’t build custom integrations if an open standard exists.
What’s Next
- 📖 Previous article: The Orchestra: Why Multi-Agent AI Works — Why specialized agents outperform monolithic models.
- 📖 Next article: Skills: Progressive Context Disclosure — Escape the “Prompt Blob Monster” with on-demand procedural knowledge.
- 💬 Discuss: Which pillar is the biggest bottleneck in your current agents?
References
- Google Cloud Research — Context Engineering: Sessions & Memory (2025). Defines the distinction between Procedural Memory (Skills) and Semantic Memory (RAG) in agentic architectures.
- Anthropic — Prompt Engineering Guidelines. Source for Role Prompting effectiveness.
- Lewis et al. — Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (NeurIPS 2020).
- Anthropic — Skill Authoring Best Practices (2025). Official guidelines for SKILL.md structure, progressive disclosure, and degrees of freedom.
- Anthropic — Skills for Enterprise (2025). Governance, security review, lifecycle management, and organizing skills at scale.
- Anthropic — The Complete Guide to Building Skills for Claude (2025). Comprehensive PDF guide covering skill structure, testing tiers, and production deployment patterns.
❓ Frequently Asked Questions
What are the 4 pillars of agent context?
Persona (WHO the agent is), Skills (HOW to do tasks), RAG (WHAT knowledge to retrieve), and MCP (ACTION through external tools).
How do I decide between RAG vs Skills for my knowledge?
Use RAG for factual, frequently-updated knowledge (WHAT). Use Skills for procedural, step-by-step instructions (HOW). The Time-Horizon Heuristic: Skills = years, RAG = days to months.
What is the Model Context Protocol (MCP)?
MCP is Anthropic's open standard for connecting AI agents to external tools and data sources, solving the N×M integration problem.
What is the SKILL.md pattern and why does it matter?
A SKILL.md file is a structured markdown file with YAML frontmatter (name, description) that defines procedural knowledge for an agent. Skills are loaded on-demand using progressive disclosure, keeping the context window clean. Anthropic formalized this pattern through their Skills architecture.
💬 Join the Discussion
Got questions, feedback, or want to share your experience building AI agents? Join our community of architects and engineers.