Kaggle × Google 5-Day AI Agents Intensive — Full Breakdown, Day by Day

1.5 million learners, 5 days, one roadmap — everything covered in Kaggle's free AI Agents Intensive with Google.

Posted Mar 30, 2026 Updated Mar 31, 2026

By YuXuan Yan

15 min read

Kaggle × Google 5-Day AI Agents Intensive — Full Breakdown, Day by Day

Chatbots are solved. What comes next — systems that plan, call tools, manage memory, and coordinate with other agents to actually get things done — is where most engineers are stuck. This is the course that 1.5 million of them used to bridge that gap.

The 5-Day AI Agents Intensive was originally a live cohort run by Google and Kaggle in November 2025. Every whitepaper, codelab, and recorded livestream is now free and self-paced. This post breaks down exactly what each day covers so you can navigate it with intent.

🗺️ The 5-Day Arc — What You’re Actually Learning

The course follows a deliberate progression: foundations → tools → memory → quality → production. Don’t treat it as a playlist — the days build on each other.

flowchart LR
    D1[Day 1\n🧠 Agent Foundations\n& Architectures]
    D2[Day 2\n🔧 Tools & MCP\nInteroperability]
    D3[Day 3\n💾 Context Engineering\nSessions & Memory]
    D4[Day 4\n📊 Agent Quality\nObs & Evaluation]
    D5[Day 5\n🚀 Prototype to\nProduction & A2A]

    D1 --> D2 --> D3 --> D4 --> D5

    style D1 fill:#4A90D9,color:#fff,stroke:none
    style D2 fill:#5BA85A,color:#fff,stroke:none
    style D3 fill:#E8A838,color:#fff,stroke:none
    style D4 fill:#9B6EBD,color:#fff,stroke:none
    style D5 fill:#D9534F,color:#fff,stroke:none

Day	Theme	Core Concept
1	Agent Foundations	What is an agent? How do agentic architectures differ from plain LLM calls?
2	Tools & MCP	How agents “take action” — custom tools, APIs, Model Context Protocol
3	Context Engineering	Sessions as short-term memory, persistent memory across conversations
4	Agent Quality	Observability (logs, traces, metrics), LLM-as-Judge, HITL evaluation
5	Prototype to Production	Deployment, scaling, Agent2Agent (A2A) Protocol for multi-agent systems

🧠 Day 1 — Introduction to Agents & Agentic Architectures

The first day re-wires how you think about LLM applications. A chatbot is a single call. An agent is a loop — it receives a goal, makes a plan, calls tools, observes results, and replans until it’s done.

📚 Day 1 Resources — 📄 Whitepaper · 🎙 Podcast (~30 min) · 💻 Codelab 1A — First Agent · 💻 Codelab 1B — Multi-Agent · 📺 Livestream

What the Whitepaper Covers

The Day 1 whitepaper introduces a taxonomy of agent capabilities and argues that agents aren’t just “smarter LLMs” — they represent a different architectural paradigm:

Perception — what inputs an agent can process (text, code, tool outputs, sensor data)
Reasoning — how the agent plans and decides (ReAct pattern, chain-of-thought, tree-of-thought)
Action — what the agent can do (search, code execution, API calls, sub-agent delegation)
Memory — how state persists across steps (covered deeply in Day 3)

It also introduces the concept of Agent Ops — the discipline of running agents reliably in production — and covers the security challenges of agent identity and constrained policy execution.

The Agent Loop in Practice

flowchart TD
    Goal([User Goal]) --> Planner[🤔 Planner\nLLM Reasoning + ReAct]
    Planner -->|tool_call| Tools[(🔧 Tool Registry\nSearch / Code / APIs)]
    Tools -->|observation| Observer[👁️ Observer\nResult Parsing]
    Observer -->|update context| Memory[(💾 Short-Term Memory\nContext Window)]
    Memory --> Planner
    Observer -->|done?| Exit{✅ Done?}
    Exit -->|no| Planner
    Exit -->|yes| Output([📤 Final Answer])

    classDef blue fill:#4A90D9,color:#fff,stroke:none
    classDef green fill:#5BA85A,color:#fff,stroke:none
    classDef orange fill:#E8A838,color:#fff,stroke:none
    classDef purple fill:#9B6EBD,color:#fff,stroke:none
    classDef red fill:#D9534F,color:#fff,stroke:none

    class Goal,Output green
    class Planner blue
    class Tools orange
    class Observer,Memory purple
    class Exit red

Codelabs: What You Build

Codelab 1A — Build your first AI agent using Gemini and ADK (Agent Development Kit). Give it access to Google Search so it can answer questions with real-time information.
Codelab 1B — Build your first multi-agent system using ADK. Create teams of specialized agents and explore different architectural patterns (sequential, parallel, hierarchical).

ADK = Agent Development Kit, Google’s open-source framework for building production agents on top of Gemini. Think of it as the orchestration layer that manages the agent loop for you.

Key Takeaways from Day 1

Concept	Explanation
Agent vs. LLM app	Agent: goal → loop → tools → output. LLM app: prompt → single call → output
ReAct pattern	Reasoning + Acting interleaved: Thought → Action → Observation → Thought…
Multi-agent architectures	Specialized agents (planner, executor, critic) coordinated by an orchestrator
Agent Ops	Operational discipline for reliability, governance, security in production agents

Guests on the Day 1 livestream: Kanchana Patlolla, Anant Nawalgaria (course founder), Kristopher Overholt, Hangfei Lin, Alan Blount, Mike Clark, Michael Gerstenhaber, and Antonio Gulli — all from Google.

🔧 Day 2 — Agent Tools & Interoperability with MCP

An agent that can only reason but can’t act is just a fancy chatbot. Day 2 is about giving agents real power: the ability to call Python functions, external APIs, databases, and services — and how to do it systematically via the Model Context Protocol (MCP).

📚 Day 2 Resources — 📄 Whitepaper · 🎙 Podcast (~20 min) · 💻 Codelab 2A — Agent Tools · 💻 Codelab 2B — MCP & Best Practices · 📺 Livestream Playlist

What the Whitepaper Covers

The Day 2 whitepaper dives into external tool functions — the mechanism that lets agents interact with the world beyond their training data. Key areas:

Tool design best practices — how to write tool descriptions that LLMs reliably understand and call correctly
Tool types — retrieval tools (search, RAG), action tools (APIs, code runners), computation tools
Model Context Protocol (MCP) — an open standard for how agents discover and communicate with tool servers
- MCP architectural components: host, client, server
- Communication layer: JSON-RPC over stdio or HTTP/SSE
- Enterprise risks and current readiness gaps

Understanding MCP

MCP is one of the most important emerging standards in the AI agent ecosystem. It decouples tool providers from agent frameworks — any MCP-compatible tool can be plugged into any MCP-compatible agent.

flowchart LR
    Agent[🤖 Agent\nADK / LangChain / etc.] -->|MCP Client| Protocol[⚡ MCP Protocol\nJSON-RPC]
    Protocol -->|MCP Server| ToolA[🗂️ File System\nMCP Server]
    Protocol -->|MCP Server| ToolB[🌐 Web Search\nMCP Server]
    Protocol -->|MCP Server| ToolC[🗄️ Database\nMCP Server]
    Protocol -->|MCP Server| ToolD[💻 Code Exec\nMCP Server]

    style Agent fill:#4A90D9,color:#fff,stroke:none
    style Protocol fill:#E8A838,color:#fff,stroke:none
    style ToolA fill:#5BA85A,color:#fff,stroke:none
    style ToolB fill:#5BA85A,color:#fff,stroke:none
    style ToolC fill:#5BA85A,color:#fff,stroke:none
    style ToolD fill:#5BA85A,color:#fff,stroke:none

MCP was developed by Anthropic and has been adopted across the ecosystem (Claude, ADK, Cursor, Zed, etc.). It’s becoming the USB-C of AI tooling — one standard connector for everything.

Codelabs: What You Build

Codelab 2A — Turn your own Python functions into agent-callable tools. Learn how to write tool schemas that the LLM reliably calls with correct parameters.
Codelab 2B — Use MCP to extend your agent’s toolset, and implement long-running operations with a human-in-the-loop approval step: the agent pauses mid-execution, waits for human confirmation, then resumes.

Long-running ops with HITL is a critical production pattern. Never let an agent take irreversible actions (send email, delete data, make payments) without a human checkpoint.

Tool Design Principles from Day 2

Principle	Why It Matters
Clear function names	LLMs pick tools by name — vague names = wrong tool calls
Detailed docstrings	The docstring IS the tool description the LLM reads
Narrow scope	One tool, one job — avoids the LLM trying to overload a single tool
Typed parameters	Strong typing reduces hallucinated parameter values
Idempotent when possible	Safe to retry on failure without side effects

Day 2 livestream guests: Edward Grefenstette, Mike Styer, Oriol Vinyals (Google), and Alex Wissner-Gross (Reified).

💾 Day 3 — Context Engineering: Sessions & Memory

An agent that forgets everything after each call can’t handle complex, multi-turn tasks. Day 3 is about making agents stateful — understanding the difference between what lives in the context window (Sessions) and what persists across conversations (Memory).

📚 Day 3 Resources — 📄 Whitepaper · 🎙 Podcast (~25 min) · 💻 Codelabs · 📺 Livestream Playlist

What the Whitepaper Covers

The Day 3 whitepaper introduces context engineering as a first-class discipline: the art of dynamically assembling and managing information in the agent’s context window. Two key abstractions:

Sessions — the container for a single, immediate conversation’s history. Everything in the current exchange: messages, tool call results, intermediate reasoning steps.
Memory — the long-term persistence mechanism. Information that survives across sessions and is retrieved when relevant.

The Memory Stack

flowchart TD
    Input([New User Message]) --> ContextBuilder[🏗️ Context Builder\nAssemble Prompt]

    subgraph Short["⚡ Short-Term (Session)"]
        ConvHistory[Conversation History]
        ToolResults[Tool Call Results]
        ScratchPad[Working Memory / Scratchpad]
    end

    subgraph Long["🗄️ Long-Term (Memory Store)"]
        VectorDB[(Vector DB\nSemantic Search)]
        KV[(Key-Value Store\nFacts & Preferences)]
        Episodic[(Episodic Memory\nPast Sessions)]
    end

    ContextBuilder --> Short
    ContextBuilder -->|semantic retrieval| Long
    Short --> LLM[🤖 LLM Reasoning]
    Long --> LLM
    LLM --> Output([Response + Memory Updates])
    Output -->|important facts| Long

    style Input,Output fill:#5BA85A,color:#fff,stroke:none
    style ContextBuilder fill:#4A90D9,color:#fff,stroke:none
    style LLM fill:#9B6EBD,color:#fff,stroke:none

Context Engineering in Practice

This is subtler than it sounds. The context window is finite and expensive — you can’t dump everything in. Context engineering means:

Strategy	What It Solves
Summarization	Compress old conversation history to free up tokens
Selective retrieval	Pull only the most relevant memories via embedding search
Structured injection	Format retrieved info so the LLM can use it reliably
Context windowing	Slide or truncate history to stay within token limits
Memory consolidation	Periodically distill conversation history into durable facts

Common trap: Stuffing everything into the context window. It’s slower, more expensive, and paradoxically makes the agent less focused. Retrieval-augmented memory is almost always better than full history dumps.

Codelabs: What You Build

Codelab 3A — Make agents stateful by managing conversation history through context engineering in ADK. Implement working memory within a session so the agent holds context across multiple turns.
Codelab 3B — Give your agent long-term memory that persists across sessions. When you come back a week later, the agent remembers your preferences and past decisions.

Day 3 livestream guests: Steven Johnson, Kimberly Milam, Julia Wiesinger (Google), and Jay Alammar from Cohere — author of the iconic “Illustrated Transformer” blog posts.

📊 Day 4 — Agent Quality: Observability & Evaluation

You can’t improve what you can’t see. Day 4 is arguably the most underappreciated day — it’s the one that separates toy agents from deployable ones. If you skip this, you’re flying blind.

📚 Day 4 Resources — 📄 Whitepaper · 🎙 Podcast (~25 min) · 💻 Codelabs · 📺 Livestream Playlist

What the Whitepaper Covers

The Day 4 whitepaper introduces a holistic evaluation framework built on three technical pillars:

Logs (The Diary) — a timestamped record of every event: tool calls, errors, LLM responses. Essential for post-mortem debugging.
Traces (The Narrative) — end-to-end causal chains showing how the agent arrived at an answer. Which reasoning steps led where, which tools were called in what order.
Metrics (The Health Report) — quantified performance: task completion rate, latency, tool call accuracy, error rates.

These three form a continuous feedback loop that drives iterative improvement.

The Observability Stack

flowchart LR
    Agent[🤖 Agent Execution] --> Logs[📋 Logs\nTimestamped Events\nErrors & Debug Info]
    Agent --> Traces[🕸️ Traces\nCausal Chains\nStep-by-Step Narrative]
    Agent --> Metrics[📈 Metrics\nLatency / Task Rate\nTool Accuracy]

    Logs --> Dashboard[🖥️ Monitoring\nDashboard]
    Traces --> Dashboard
    Metrics --> Dashboard

    Dashboard --> Eval[⚖️ Evaluation\nLLM-as-Judge\nHITL Review]
    Eval --> Improve[🔄 Improvement\nPrompt / Tool / Architecture]
    Improve --> Agent

    style Agent fill:#4A90D9,color:#fff,stroke:none
    style Logs fill:#5BA85A,color:#fff,stroke:none
    style Traces fill:#E8A838,color:#fff,stroke:none
    style Metrics fill:#9B6EBD,color:#fff,stroke:none
    style Dashboard fill:#D9534F,color:#fff,stroke:none
    style Eval,Improve fill:#4A90D9,color:#fff,stroke:none

Evaluation Strategies: LLM-as-Judge vs HITL

One of the most practical innovations covered in Day 4 is using an LLM to evaluate another LLM’s outputs at scale — LLM-as-a-Judge. Combined with periodic Human-in-the-Loop checks, this creates a scalable quality pipeline.

Evaluation Method	Scalability	Accuracy	When to Use
Manual human review	❌ Low	✅ High	Gold standard for high-stakes outputs
Rule-based assertions	✅ High	Medium	Structured outputs, format checks
LLM-as-a-Judge	✅ High	✅ High	Open-ended responses, nuanced quality
HITL (Human-in-the-Loop)	Medium	✅ High	Irreversible actions, ambiguous cases
Regression test suite	✅ High	Medium	Catching regressions during development

Prioritize evaluation early — on Day 3, not Day 5. Add even basic assertions (golden task checks, format validation) before you start iterating on architecture. Polishing an unverified agent is the #1 time sink in agent development.

LLM-as-Judge pitfall: The judge can share the same biases and blind spots as the agent being evaluated. Always validate your judge’s verdicts against human ratings on a calibration set before trusting it at scale.

Codelabs: What You Build

Codelab 4A — Instrument your agent with logs, traces, and metrics. Get full visibility into the agent’s decision-making process — see exactly why it chose a particular tool call, where it hesitated, and where it failed.
Codelab 4B — Evaluate your agent systematically: score response quality, measure tool usage accuracy, and set up a feedback loop to guide improvement.

Day 4 livestream guests: Wafae Bakkali, Turan Bulmus, Sian Gooding (Google), and Jiwei Liu from NVIDIA.

🚀 Day 5 — Prototype to Production & Agent2Agent (A2A) Protocol

Day 5 closes the loop. You’ve built an agent, given it tools, given it memory, and made it observable. Now: how do you take it from a working notebook to something other people can actually use?

📚 Day 5 Resources — 📄 Whitepaper · 🎙 Podcast (~25 min) · 💻 Codelabs · 📺 Livestream Playlist

What the Whitepaper Covers

The Day 5 whitepaper provides a technical guide to the full operational lifecycle of an AI agent:

Containerization and deployment — packaging agents as services others can call
Scaling strategies — handling concurrent requests, load balancing, stateful vs. stateless agent design
Reliability patterns — retries, fallbacks, circuit breakers for when tool calls fail
Agent2Agent (A2A) Protocol — an open standard for how multiple agents communicate, delegate tasks, and return results to each other

The Agent2Agent (A2A) Protocol

A2A is to multi-agent systems what MCP is to tools: a standardized interface that lets agents communicate regardless of what framework they’re built in.

flowchart TD
    User([👤 User Request]) --> Orchestrator[🎯 Orchestrator Agent\nPlanning & Routing]

    Orchestrator -->|A2A| ResearchAgent[🔍 Research Agent\nWeb Search + RAG]
    Orchestrator -->|A2A| CodeAgent[💻 Code Agent\nCode Gen + Execution]
    Orchestrator -->|A2A| ReviewAgent[✅ Review Agent\nQuality Checking]

    ResearchAgent -->|A2A result| Orchestrator
    CodeAgent -->|A2A result| Orchestrator
    ReviewAgent -->|A2A result| Orchestrator

    Orchestrator --> Output([📤 Final Deliverable])

    style User,Output fill:#5BA85A,color:#fff,stroke:none
    style Orchestrator fill:#4A90D9,color:#fff,stroke:none
    style ResearchAgent fill:#E8A838,color:#fff,stroke:none
    style CodeAgent fill:#9B6EBD,color:#fff,stroke:none
    style ReviewAgent fill:#D9534F,color:#fff,stroke:none

A2A was open-sourced by Google in early 2025. It defines a standard JSON-based message format and a discovery protocol so orchestrators can find and delegate to remote agents — even those running in separate services or clouds.

Production Deployment Checklist

Day 5 also covers the operational patterns that separate demos from real deployments:

Concern	Pattern
Stateless design	Keep agent logic stateless; externalize session/memory to storage
Retry logic	Exponential backoff on tool call failures; never fail silently
Fallback paths	Define what the agent should do when tools are unavailable
Rate limiting	Respect API quotas; implement client-side rate limiting
Logging for audit	Every action logged with user ID, timestamp, and outcome
Security	Constrained tool permissions; never give agents broader access than needed

Codelabs: What You Build

Codelab 5A — Implement the A2A Protocol to have agents communicate with each other. Build a multi-agent pipeline where an orchestrator routes tasks to specialist agents and aggregates their results.
Codelab 5B (Optional) — Deploy your agent to Agent Engine on Google Cloud — go from a local notebook to a scalable, hosted service.

If you only have time for one optional codelab in the entire course, do 5B. Deploying an agent to production — even once — will teach you more about production realities than any whitepaper.

Day 5 livestream guests: Will Grannis, Sokratis Kartakis, Elia Secchi, and Saurabh Tiwary — all from Google.

🧭 How to Work Through This Course

A few strategic notes for getting the most out of the self-paced version:

Recommended flow per day:

Listen to the podcast summary (~30 min) for the mental model
Read the whitepaper (1–2 hrs) for depth
Run the codelabs (2–3 hrs) to build real intuition
Watch the livestream replay at 1.5× speed for Q&A insights

Before Day 1: Pick a real problem you want to automate — a research workflow, a code review pipeline, a reporting task. Shape the capstone around it. Abstract exercises are forgettable; concrete problems are not.

Don’t skip Day 4. It’s the least flashy day and the most commonly skipped. It’s also the reason most agents fail in production. Add evaluation before you start iterating on architecture.

📌 All Resources in One Place

🗂️ Course Home & Community

Resource	Link
🎓 Kaggle Learn Guide (free, self-paced)	kaggle.com/learn-guide/5-day-agents
📺 YouTube Playlist (all 5 livestreams)	5-Day AI Agents Intensive Playlist
💬 Kaggle Discord	discord.gg/kaggle
🏆 Capstone Projects	Competition discussion
📰 Google’s Recap Blog	blog.google recap
⚙️ Agent Development Kit (ADK)	google.github.io/adk-docs
🔧 Codelab Troubleshooting Guide	Day 0 — FAQs

📅 Per-Day Quick Access

Day	Whitepaper	Podcast	Codelab A	Codelab B	Livestream
Day 1 — Intro to Agents	📄	🎙	First Agent	Multi-Agent	📺
Day 2 — Tools & MCP	📄	🎙	Agent Tools	MCP & Best Practices	📺
Day 3 — Context & Memory	📄	🎙	Sessions	Long-term Memory	📺
Day 4 — Agent Quality	📄	🎙	Observability	Evaluation	📺
Day 5 — Prototype to Prod	📄	🎙	A2A Protocol	Deploy to Agent Engine	📺

💡 One-Sentence Intuition

An LLM answers questions; an agent completes goals — and understanding the full stack from tool calling to evaluation to deployment is what makes the difference between a demo and a system.

Saving this as a reference for the FYP toolkit — the context engineering and evaluation frameworks in Days 3–4 are directly applicable to building reliable reconstruction pipelines.

📺 Full playlist: 5-Day AI Agents Intensive — All 5 Livestreams

AI, Learning Resources

This post is licensed under CC BY 4.0 by the author.

🗺️ The 5-Day Arc — What You’re Actually Learning

🧠 Day 1 — Introduction to Agents & Agentic Architectures

What the Whitepaper Covers

The Agent Loop in Practice

Codelabs: What You Build

Key Takeaways from Day 1

🔧 Day 2 — Agent Tools & Interoperability with MCP

What the Whitepaper Covers

Understanding MCP

Codelabs: What You Build

Tool Design Principles from Day 2

💾 Day 3 — Context Engineering: Sessions & Memory

What the Whitepaper Covers

The Memory Stack

Context Engineering in Practice

Codelabs: What You Build

📊 Day 4 — Agent Quality: Observability & Evaluation

What the Whitepaper Covers

The Observability Stack

Evaluation Strategies: LLM-as-Judge vs HITL

Codelabs: What You Build

🚀 Day 5 — Prototype to Production & Agent2Agent (A2A) Protocol

What the Whitepaper Covers

The Agent2Agent (A2A) Protocol

Production Deployment Checklist

Codelabs: What You Build

🧭 How to Work Through This Course

📌 All Resources in One Place

🗂️ Course Home & Community

📅 Per-Day Quick Access

💡 One-Sentence Intuition

Trending Tags