Kaggle × Google 5-Day AI Agents Intensive — Full Breakdown, Day by Day
1.5 million learners, 5 days, one roadmap — everything covered in Kaggle's free AI Agents Intensive with Google.
Chatbots are solved. What comes next — systems that plan, call tools, manage memory, and coordinate with other agents to actually get things done — is where most engineers are stuck. This is the course that 1.5 million of them used to bridge that gap.
The 5-Day AI Agents Intensive was originally a live cohort run by Google and Kaggle in November 2025. Every whitepaper, codelab, and recorded livestream is now free and self-paced. This post breaks down exactly what each day covers so you can navigate it with intent.
🗺️ The 5-Day Arc — What You’re Actually Learning
The course follows a deliberate progression: foundations → tools → memory → quality → production. Don’t treat it as a playlist — the days build on each other.
flowchart LR
D1[Day 1\n🧠 Agent Foundations\n& Architectures]
D2[Day 2\n🔧 Tools & MCP\nInteroperability]
D3[Day 3\n💾 Context Engineering\nSessions & Memory]
D4[Day 4\n📊 Agent Quality\nObs & Evaluation]
D5[Day 5\n🚀 Prototype to\nProduction & A2A]
D1 --> D2 --> D3 --> D4 --> D5
style D1 fill:#4A90D9,color:#fff,stroke:none
style D2 fill:#5BA85A,color:#fff,stroke:none
style D3 fill:#E8A838,color:#fff,stroke:none
style D4 fill:#9B6EBD,color:#fff,stroke:none
style D5 fill:#D9534F,color:#fff,stroke:none
| Day | Theme | Core Concept |
|---|---|---|
| 1 | Agent Foundations | What is an agent? How do agentic architectures differ from plain LLM calls? |
| 2 | Tools & MCP | How agents “take action” — custom tools, APIs, Model Context Protocol |
| 3 | Context Engineering | Sessions as short-term memory, persistent memory across conversations |
| 4 | Agent Quality | Observability (logs, traces, metrics), LLM-as-Judge, HITL evaluation |
| 5 | Prototype to Production | Deployment, scaling, Agent2Agent (A2A) Protocol for multi-agent systems |
🧠 Day 1 — Introduction to Agents & Agentic Architectures
The first day re-wires how you think about LLM applications. A chatbot is a single call. An agent is a loop — it receives a goal, makes a plan, calls tools, observes results, and replans until it’s done.
📚 Day 1 Resources — 📄 Whitepaper · 🎙 Podcast (~30 min) · 💻 Codelab 1A — First Agent · 💻 Codelab 1B — Multi-Agent · 📺 Livestream
What the Whitepaper Covers
The Day 1 whitepaper introduces a taxonomy of agent capabilities and argues that agents aren’t just “smarter LLMs” — they represent a different architectural paradigm:
- Perception — what inputs an agent can process (text, code, tool outputs, sensor data)
- Reasoning — how the agent plans and decides (ReAct pattern, chain-of-thought, tree-of-thought)
- Action — what the agent can do (search, code execution, API calls, sub-agent delegation)
- Memory — how state persists across steps (covered deeply in Day 3)
It also introduces the concept of Agent Ops — the discipline of running agents reliably in production — and covers the security challenges of agent identity and constrained policy execution.
The Agent Loop in Practice
flowchart TD
Goal([User Goal]) --> Planner[🤔 Planner\nLLM Reasoning + ReAct]
Planner -->|tool_call| Tools[(🔧 Tool Registry\nSearch / Code / APIs)]
Tools -->|observation| Observer[👁️ Observer\nResult Parsing]
Observer -->|update context| Memory[(💾 Short-Term Memory\nContext Window)]
Memory --> Planner
Observer -->|done?| Exit{✅ Done?}
Exit -->|no| Planner
Exit -->|yes| Output([📤 Final Answer])
classDef blue fill:#4A90D9,color:#fff,stroke:none
classDef green fill:#5BA85A,color:#fff,stroke:none
classDef orange fill:#E8A838,color:#fff,stroke:none
classDef purple fill:#9B6EBD,color:#fff,stroke:none
classDef red fill:#D9534F,color:#fff,stroke:none
class Goal,Output green
class Planner blue
class Tools orange
class Observer,Memory purple
class Exit red
Codelabs: What You Build
- Codelab 1A — Build your first AI agent using Gemini and ADK (Agent Development Kit). Give it access to Google Search so it can answer questions with real-time information.
- Codelab 1B — Build your first multi-agent system using ADK. Create teams of specialized agents and explore different architectural patterns (sequential, parallel, hierarchical).
ADK = Agent Development Kit, Google’s open-source framework for building production agents on top of Gemini. Think of it as the orchestration layer that manages the agent loop for you.
Key Takeaways from Day 1
| Concept | Explanation |
|---|---|
| Agent vs. LLM app | Agent: goal → loop → tools → output. LLM app: prompt → single call → output |
| ReAct pattern | Reasoning + Acting interleaved: Thought → Action → Observation → Thought… |
| Multi-agent architectures | Specialized agents (planner, executor, critic) coordinated by an orchestrator |
| Agent Ops | Operational discipline for reliability, governance, security in production agents |
Guests on the Day 1 livestream: Kanchana Patlolla, Anant Nawalgaria (course founder), Kristopher Overholt, Hangfei Lin, Alan Blount, Mike Clark, Michael Gerstenhaber, and Antonio Gulli — all from Google.
🔧 Day 2 — Agent Tools & Interoperability with MCP
An agent that can only reason but can’t act is just a fancy chatbot. Day 2 is about giving agents real power: the ability to call Python functions, external APIs, databases, and services — and how to do it systematically via the Model Context Protocol (MCP).
📚 Day 2 Resources — 📄 Whitepaper · 🎙 Podcast (~20 min) · 💻 Codelab 2A — Agent Tools · 💻 Codelab 2B — MCP & Best Practices · 📺 Livestream Playlist
What the Whitepaper Covers
The Day 2 whitepaper dives into external tool functions — the mechanism that lets agents interact with the world beyond their training data. Key areas:
- Tool design best practices — how to write tool descriptions that LLMs reliably understand and call correctly
- Tool types — retrieval tools (search, RAG), action tools (APIs, code runners), computation tools
- Model Context Protocol (MCP) — an open standard for how agents discover and communicate with tool servers
- MCP architectural components: host, client, server
- Communication layer: JSON-RPC over stdio or HTTP/SSE
- Enterprise risks and current readiness gaps
Understanding MCP
MCP is one of the most important emerging standards in the AI agent ecosystem. It decouples tool providers from agent frameworks — any MCP-compatible tool can be plugged into any MCP-compatible agent.
flowchart LR
Agent[🤖 Agent\nADK / LangChain / etc.] -->|MCP Client| Protocol[⚡ MCP Protocol\nJSON-RPC]
Protocol -->|MCP Server| ToolA[🗂️ File System\nMCP Server]
Protocol -->|MCP Server| ToolB[🌐 Web Search\nMCP Server]
Protocol -->|MCP Server| ToolC[🗄️ Database\nMCP Server]
Protocol -->|MCP Server| ToolD[💻 Code Exec\nMCP Server]
style Agent fill:#4A90D9,color:#fff,stroke:none
style Protocol fill:#E8A838,color:#fff,stroke:none
style ToolA fill:#5BA85A,color:#fff,stroke:none
style ToolB fill:#5BA85A,color:#fff,stroke:none
style ToolC fill:#5BA85A,color:#fff,stroke:none
style ToolD fill:#5BA85A,color:#fff,stroke:none
MCP was developed by Anthropic and has been adopted across the ecosystem (Claude, ADK, Cursor, Zed, etc.). It’s becoming the USB-C of AI tooling — one standard connector for everything.
Codelabs: What You Build
- Codelab 2A — Turn your own Python functions into agent-callable tools. Learn how to write tool schemas that the LLM reliably calls with correct parameters.
- Codelab 2B — Use MCP to extend your agent’s toolset, and implement long-running operations with a human-in-the-loop approval step: the agent pauses mid-execution, waits for human confirmation, then resumes.
Long-running ops with HITL is a critical production pattern. Never let an agent take irreversible actions (send email, delete data, make payments) without a human checkpoint.
Tool Design Principles from Day 2
| Principle | Why It Matters |
|---|---|
| Clear function names | LLMs pick tools by name — vague names = wrong tool calls |
| Detailed docstrings | The docstring IS the tool description the LLM reads |
| Narrow scope | One tool, one job — avoids the LLM trying to overload a single tool |
| Typed parameters | Strong typing reduces hallucinated parameter values |
| Idempotent when possible | Safe to retry on failure without side effects |
Day 2 livestream guests: Edward Grefenstette, Mike Styer, Oriol Vinyals (Google), and Alex Wissner-Gross (Reified).
💾 Day 3 — Context Engineering: Sessions & Memory
An agent that forgets everything after each call can’t handle complex, multi-turn tasks. Day 3 is about making agents stateful — understanding the difference between what lives in the context window (Sessions) and what persists across conversations (Memory).
📚 Day 3 Resources — 📄 Whitepaper · 🎙 Podcast (~25 min) · 💻 Codelabs · 📺 Livestream Playlist
What the Whitepaper Covers
The Day 3 whitepaper introduces context engineering as a first-class discipline: the art of dynamically assembling and managing information in the agent’s context window. Two key abstractions:
- Sessions — the container for a single, immediate conversation’s history. Everything in the current exchange: messages, tool call results, intermediate reasoning steps.
- Memory — the long-term persistence mechanism. Information that survives across sessions and is retrieved when relevant.
The Memory Stack
flowchart TD
Input([New User Message]) --> ContextBuilder[🏗️ Context Builder\nAssemble Prompt]
subgraph Short["⚡ Short-Term (Session)"]
ConvHistory[Conversation History]
ToolResults[Tool Call Results]
ScratchPad[Working Memory / Scratchpad]
end
subgraph Long["🗄️ Long-Term (Memory Store)"]
VectorDB[(Vector DB\nSemantic Search)]
KV[(Key-Value Store\nFacts & Preferences)]
Episodic[(Episodic Memory\nPast Sessions)]
end
ContextBuilder --> Short
ContextBuilder -->|semantic retrieval| Long
Short --> LLM[🤖 LLM Reasoning]
Long --> LLM
LLM --> Output([Response + Memory Updates])
Output -->|important facts| Long
style Input,Output fill:#5BA85A,color:#fff,stroke:none
style ContextBuilder fill:#4A90D9,color:#fff,stroke:none
style LLM fill:#9B6EBD,color:#fff,stroke:none
Context Engineering in Practice
This is subtler than it sounds. The context window is finite and expensive — you can’t dump everything in. Context engineering means:
| Strategy | What It Solves |
|---|---|
| Summarization | Compress old conversation history to free up tokens |
| Selective retrieval | Pull only the most relevant memories via embedding search |
| Structured injection | Format retrieved info so the LLM can use it reliably |
| Context windowing | Slide or truncate history to stay within token limits |
| Memory consolidation | Periodically distill conversation history into durable facts |
Common trap: Stuffing everything into the context window. It’s slower, more expensive, and paradoxically makes the agent less focused. Retrieval-augmented memory is almost always better than full history dumps.
Codelabs: What You Build
- Codelab 3A — Make agents stateful by managing conversation history through context engineering in ADK. Implement working memory within a session so the agent holds context across multiple turns.
- Codelab 3B — Give your agent long-term memory that persists across sessions. When you come back a week later, the agent remembers your preferences and past decisions.
Day 3 livestream guests: Steven Johnson, Kimberly Milam, Julia Wiesinger (Google), and Jay Alammar from Cohere — author of the iconic “Illustrated Transformer” blog posts.
📊 Day 4 — Agent Quality: Observability & Evaluation
You can’t improve what you can’t see. Day 4 is arguably the most underappreciated day — it’s the one that separates toy agents from deployable ones. If you skip this, you’re flying blind.
📚 Day 4 Resources — 📄 Whitepaper · 🎙 Podcast (~25 min) · 💻 Codelabs · 📺 Livestream Playlist
What the Whitepaper Covers
The Day 4 whitepaper introduces a holistic evaluation framework built on three technical pillars:
- Logs (The Diary) — a timestamped record of every event: tool calls, errors, LLM responses. Essential for post-mortem debugging.
- Traces (The Narrative) — end-to-end causal chains showing how the agent arrived at an answer. Which reasoning steps led where, which tools were called in what order.
- Metrics (The Health Report) — quantified performance: task completion rate, latency, tool call accuracy, error rates.
These three form a continuous feedback loop that drives iterative improvement.
The Observability Stack
flowchart LR
Agent[🤖 Agent Execution] --> Logs[📋 Logs\nTimestamped Events\nErrors & Debug Info]
Agent --> Traces[🕸️ Traces\nCausal Chains\nStep-by-Step Narrative]
Agent --> Metrics[📈 Metrics\nLatency / Task Rate\nTool Accuracy]
Logs --> Dashboard[🖥️ Monitoring\nDashboard]
Traces --> Dashboard
Metrics --> Dashboard
Dashboard --> Eval[⚖️ Evaluation\nLLM-as-Judge\nHITL Review]
Eval --> Improve[🔄 Improvement\nPrompt / Tool / Architecture]
Improve --> Agent
style Agent fill:#4A90D9,color:#fff,stroke:none
style Logs fill:#5BA85A,color:#fff,stroke:none
style Traces fill:#E8A838,color:#fff,stroke:none
style Metrics fill:#9B6EBD,color:#fff,stroke:none
style Dashboard fill:#D9534F,color:#fff,stroke:none
style Eval,Improve fill:#4A90D9,color:#fff,stroke:none
Evaluation Strategies: LLM-as-Judge vs HITL
One of the most practical innovations covered in Day 4 is using an LLM to evaluate another LLM’s outputs at scale — LLM-as-a-Judge. Combined with periodic Human-in-the-Loop checks, this creates a scalable quality pipeline.
| Evaluation Method | Scalability | Accuracy | When to Use |
|---|---|---|---|
| Manual human review | ❌ Low | ✅ High | Gold standard for high-stakes outputs |
| Rule-based assertions | ✅ High | Medium | Structured outputs, format checks |
| LLM-as-a-Judge | ✅ High | ✅ High | Open-ended responses, nuanced quality |
| HITL (Human-in-the-Loop) | Medium | ✅ High | Irreversible actions, ambiguous cases |
| Regression test suite | ✅ High | Medium | Catching regressions during development |
Prioritize evaluation early — on Day 3, not Day 5. Add even basic assertions (golden task checks, format validation) before you start iterating on architecture. Polishing an unverified agent is the #1 time sink in agent development.
LLM-as-Judge pitfall: The judge can share the same biases and blind spots as the agent being evaluated. Always validate your judge’s verdicts against human ratings on a calibration set before trusting it at scale.
Codelabs: What You Build
- Codelab 4A — Instrument your agent with logs, traces, and metrics. Get full visibility into the agent’s decision-making process — see exactly why it chose a particular tool call, where it hesitated, and where it failed.
- Codelab 4B — Evaluate your agent systematically: score response quality, measure tool usage accuracy, and set up a feedback loop to guide improvement.
Day 4 livestream guests: Wafae Bakkali, Turan Bulmus, Sian Gooding (Google), and Jiwei Liu from NVIDIA.
🚀 Day 5 — Prototype to Production & Agent2Agent (A2A) Protocol
Day 5 closes the loop. You’ve built an agent, given it tools, given it memory, and made it observable. Now: how do you take it from a working notebook to something other people can actually use?
📚 Day 5 Resources — 📄 Whitepaper · 🎙 Podcast (~25 min) · 💻 Codelabs · 📺 Livestream Playlist
What the Whitepaper Covers
The Day 5 whitepaper provides a technical guide to the full operational lifecycle of an AI agent:
- Containerization and deployment — packaging agents as services others can call
- Scaling strategies — handling concurrent requests, load balancing, stateful vs. stateless agent design
- Reliability patterns — retries, fallbacks, circuit breakers for when tool calls fail
- Agent2Agent (A2A) Protocol — an open standard for how multiple agents communicate, delegate tasks, and return results to each other
The Agent2Agent (A2A) Protocol
A2A is to multi-agent systems what MCP is to tools: a standardized interface that lets agents communicate regardless of what framework they’re built in.
flowchart TD
User([👤 User Request]) --> Orchestrator[🎯 Orchestrator Agent\nPlanning & Routing]
Orchestrator -->|A2A| ResearchAgent[🔍 Research Agent\nWeb Search + RAG]
Orchestrator -->|A2A| CodeAgent[💻 Code Agent\nCode Gen + Execution]
Orchestrator -->|A2A| ReviewAgent[✅ Review Agent\nQuality Checking]
ResearchAgent -->|A2A result| Orchestrator
CodeAgent -->|A2A result| Orchestrator
ReviewAgent -->|A2A result| Orchestrator
Orchestrator --> Output([📤 Final Deliverable])
style User,Output fill:#5BA85A,color:#fff,stroke:none
style Orchestrator fill:#4A90D9,color:#fff,stroke:none
style ResearchAgent fill:#E8A838,color:#fff,stroke:none
style CodeAgent fill:#9B6EBD,color:#fff,stroke:none
style ReviewAgent fill:#D9534F,color:#fff,stroke:none
A2A was open-sourced by Google in early 2025. It defines a standard JSON-based message format and a discovery protocol so orchestrators can find and delegate to remote agents — even those running in separate services or clouds.
Production Deployment Checklist
Day 5 also covers the operational patterns that separate demos from real deployments:
| Concern | Pattern |
|---|---|
| Stateless design | Keep agent logic stateless; externalize session/memory to storage |
| Retry logic | Exponential backoff on tool call failures; never fail silently |
| Fallback paths | Define what the agent should do when tools are unavailable |
| Rate limiting | Respect API quotas; implement client-side rate limiting |
| Logging for audit | Every action logged with user ID, timestamp, and outcome |
| Security | Constrained tool permissions; never give agents broader access than needed |
Codelabs: What You Build
- Codelab 5A — Implement the A2A Protocol to have agents communicate with each other. Build a multi-agent pipeline where an orchestrator routes tasks to specialist agents and aggregates their results.
- Codelab 5B (Optional) — Deploy your agent to Agent Engine on Google Cloud — go from a local notebook to a scalable, hosted service.
If you only have time for one optional codelab in the entire course, do 5B. Deploying an agent to production — even once — will teach you more about production realities than any whitepaper.
Day 5 livestream guests: Will Grannis, Sokratis Kartakis, Elia Secchi, and Saurabh Tiwary — all from Google.
🧭 How to Work Through This Course
A few strategic notes for getting the most out of the self-paced version:
Recommended flow per day:
- Listen to the podcast summary (~30 min) for the mental model
- Read the whitepaper (1–2 hrs) for depth
- Run the codelabs (2–3 hrs) to build real intuition
- Watch the livestream replay at 1.5× speed for Q&A insights
Before Day 1: Pick a real problem you want to automate — a research workflow, a code review pipeline, a reporting task. Shape the capstone around it. Abstract exercises are forgettable; concrete problems are not.
Don’t skip Day 4. It’s the least flashy day and the most commonly skipped. It’s also the reason most agents fail in production. Add evaluation before you start iterating on architecture.
📌 All Resources in One Place
🗂️ Course Home & Community
| Resource | Link |
|---|---|
| 🎓 Kaggle Learn Guide (free, self-paced) | kaggle.com/learn-guide/5-day-agents |
| 📺 YouTube Playlist (all 5 livestreams) | 5-Day AI Agents Intensive Playlist |
| 💬 Kaggle Discord | discord.gg/kaggle |
| 🏆 Capstone Projects | Competition discussion |
| 📰 Google’s Recap Blog | blog.google recap |
| ⚙️ Agent Development Kit (ADK) | google.github.io/adk-docs |
| 🔧 Codelab Troubleshooting Guide | Day 0 — FAQs |
📅 Per-Day Quick Access
| Day | Whitepaper | Podcast | Codelab A | Codelab B | Livestream |
|---|---|---|---|---|---|
| Day 1 — Intro to Agents | 📄 | 🎙 | First Agent | Multi-Agent | 📺 |
| Day 2 — Tools & MCP | 📄 | 🎙 | Agent Tools | MCP & Best Practices | 📺 |
| Day 3 — Context & Memory | 📄 | 🎙 | Sessions | Long-term Memory | 📺 |
| Day 4 — Agent Quality | 📄 | 🎙 | Observability | Evaluation | 📺 |
| Day 5 — Prototype to Prod | 📄 | 🎙 | A2A Protocol | Deploy to Agent Engine | 📺 |
💡 One-Sentence Intuition
An LLM answers questions; an agent completes goals — and understanding the full stack from tool calling to evaluation to deployment is what makes the difference between a demo and a system.
Saving this as a reference for the FYP toolkit — the context engineering and evaluation frameworks in Days 3–4 are directly applicable to building reliable reconstruction pipelines.
📺 Full playlist: 5-Day AI Agents Intensive — All 5 Livestreams