Post

Claude on AWS Bedrock, Google Vertex AI & Anthropic Cookbook Notes

Notes from Anthropic's cloud platform courses covering Claude on Amazon Bedrock, Google Vertex AI, and the Anthropic Cookbook — with practical code examples.

Claude on AWS Bedrock, Google Vertex AI & Anthropic Cookbook Notes

Notes from Anthropic’s cloud platform courses (Claude With Amazon Bedrock, Claude With Google Cloud Vertex AI) plus the free Jupyter notebook tutorials from the Anthropic Cookbook on GitHub. All free resources available at anthropic.skilljar.com and github.com/anthropics/anthropic-cookbook.


☁️ 1. Claude With Amazon Bedrock

Amazon Bedrock is AWS’s managed AI platform. Claude models are available in Bedrock, letting you use Claude inside your existing AWS infrastructure with IAM, VPC, CloudWatch, and all the enterprise controls you already have.

Why Bedrock?

  • No separate API keys — use AWS IAM credentials you already manage
  • Data residency — keep requests within your chosen AWS region
  • Cost consolidation — Claude usage on the same AWS bill
  • Compliance — AWS’s SOC 2, HIPAA, and other certifications apply

Setup

1
2
pip install boto3
aws configure  # or use IAM roles in production

Basic Invocation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import boto3
import json

client = boto3.client("bedrock-runtime", region_name="us-east-1")

body = json.dumps({
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1024,
    "messages": [
        {"role": "user", "content": "What is the capital of Singapore?"}
    ]
})

response = client.invoke_model(
    modelId="anthropic.claude-opus-4-5-20241022-v2:0",
    body=body,
    contentType="application/json",
    accept="application/json"
)

result = json.loads(response["body"].read())
print(result["content"][0]["text"])

Streaming With Bedrock

1
2
3
4
5
6
7
8
9
10
11
response = client.invoke_model_with_response_stream(
    modelId="anthropic.claude-sonnet-4-5-20241022-v2:0",
    body=body,
    contentType="application/json",
    accept="application/json"
)

for event in response["body"]:
    chunk = json.loads(event["chunk"]["bytes"])
    if chunk["type"] == "content_block_delta":
        print(chunk["delta"].get("text", ""), end="", flush=True)

Available Models on Bedrock

ModelBedrock Model ID
Claude Opusanthropic.claude-opus-4-5-…
Claude Sonnetanthropic.claude-sonnet-4-5-…
Claude Haikuanthropic.claude-haiku-3-5-…

Always check the Bedrock console for the latest available model IDs in your region — they change with new releases.

Production Patterns

  • Use Bedrock Guardrails to add content filtering on top of Claude’s built-in safety
  • Use Bedrock Knowledge Bases to connect Claude to your private documents via RAG
  • Monitor with CloudWatch — Bedrock emits token usage metrics automatically

🔵 2. Claude With Google Cloud’s Vertex AI

Vertex AI is Google Cloud’s ML platform. Claude models are available as managed endpoints via Anthropic’s partnership with Google.

Why Vertex AI?

  • Tight integration with Google Cloud services (BigQuery, Cloud Storage, Pub/Sub)
  • Use existing GCP IAM and billing
  • Deploy alongside other Google AI models (Gemini, etc.) in the same platform
  • Ideal if your data is already in Google Cloud

Setup

1
2
pip install google-cloud-aiplatform anthropic[vertex]
gcloud auth application-default login

Basic Call

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import anthropic

client = anthropic.AnthropicVertex(
    project_id="your-gcp-project-id",
    region="us-east5"
)

message = client.messages.create(
    model="claude-opus-4-5@20241022",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Describe the architecture of a transformer model."}
    ]
)

print(message.content[0].text)

The AnthropicVertex client uses Google Cloud credentials instead of an Anthropic API key. Everything else — messages format, tool use, streaming — works identically to the standard Anthropic SDK.

Integrating With Google Services

Read from Cloud Storage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from google.cloud import storage

def read_gcs_file(bucket: str, blob: str) -> str:
    client = storage.Client()
    bucket = client.bucket(bucket)
    return bucket.blob(blob).download_as_text()

# Pass to Claude
content = read_gcs_file("my-bucket", "report.txt")
message = vertex_client.messages.create(
    model="claude-sonnet-4-5@20241022",
    max_tokens=512,
    messages=[{"role": "user", "content": f"Summarise this report:\n\n{content}"}]
)

📒 3. Anthropic Cookbook — GitHub Notebook Notes

The Anthropic Cookbook is a collection of Jupyter notebooks with practical, runnable examples.

API Fundamentals

Core concepts every Claude developer should know:

  • Message structure — always [{"role": "user"/"assistant", "content": "..."}]
  • max_tokens — always set this; Claude will stop when reached
  • stop_sequences — stop generation at specific strings (useful for structured output)
1
2
3
4
5
6
7
# Stop generation at a custom delimiter
response = client.messages.create(
    model="claude-haiku-3-5",
    max_tokens=256,
    stop_sequences=["</answer>"],
    messages=[{"role": "user", "content": "Answer in tags: What is ML? <answer>"}]
)

Prompt Engineering Techniques

1. Role Assignment

1
System: You are an expert radiologist with 20 years of experience interpreting CT scans.

2. Chain of Thought

1
2
Before answering, think through this step by step inside <thinking> tags.
Then give your final answer inside <answer> tags.

3. Few-Shot Examples — showing Claude examples of the format you want:

1
2
3
4
5
Input: "The cat sat on the mat."
Output: {"subject": "cat", "verb": "sat", "location": "mat"}

Input: "Birds fly south in winter."
Output:

4. XML Tags for Structure — Claude responds well to XML-structured prompts:

1
2
3
<task>Classify the sentiment of the following review</task>
<review>The product broke after two days. Very disappointed.</review>
<output_format>Return only: POSITIVE, NEGATIVE, or NEUTRAL</output_format>

5. Avoiding Hallucination:

1
2
Answer only from the provided context. If the answer is not in the context,
say "I don't have enough information to answer this."

Structured Extraction Pattern

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import json

response = client.messages.create(
    model="claude-haiku-3-5",
    max_tokens=512,
    messages=[{
        "role": "user",
        "content": """Extract the following from this job posting and return as JSON:
        - job_title
        - company
        - required_skills (list)
        - salary_range

        Job posting: {posting}

        Return only valid JSON, no explanation.""".format(posting="...")
    }]
)

data = json.loads(response.content[0].text)

Evaluations

Building evals is critical for production — you can’t improve what you don’t measure.

Simple eval loop:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
test_cases = [
    {"input": "What is 2+2?", "expected": "4"},
    {"input": "Capital of France?", "expected": "Paris"},
]

correct = 0
for case in test_cases:
    response = client.messages.create(
        model="claude-haiku-3-5",
        max_tokens=64,
        messages=[{"role": "user", "content": case["input"]}]
    )
    output = response.content[0].text
    if case["expected"].lower() in output.lower():
        correct += 1

print(f"Accuracy: {correct}/{len(test_cases)} = {correct/len(test_cases)*100:.1f}%")

LLM-as-judge — use Claude to evaluate Claude:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def evaluate_response(question, answer, rubric):
    judge_prompt = f"""
    Question: {question}
    Answer: {answer}
    Rubric: {rubric}

    Rate this answer 1-5 and explain briefly. Return JSON: score
    """
    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=256,
        messages=[{"role": "user", "content": judge_prompt}]
    )
    return json.loads(response.content[0].text)

LLM-as-judge is the most scalable evaluation approach for open-ended outputs. Use it for assessing quality when ground-truth answers don’t exist.

Tool Use (Advanced Patterns)

Parallel tool calls — Claude can call multiple tools in one response:

1
2
3
# Claude may return multiple tool_use blocks
tool_calls = [b for b in response.content if b.type == "tool_use"]
# Execute all in parallel, then return all results

Tool choice forcing — make Claude always use a specific tool:

1
2
3
4
5
6
7
response = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=1024,
    tools=tools,
    tool_choice={"type": "tool", "name": "get_weather"},  # Force this tool
    messages=[...]
)

💡 Summary: Which Platform to Use?

ScenarioRecommendation
Building a new appDirect Anthropic API — simplest, latest models
Already on AWSAmazon Bedrock — IAM, compliance, same bill
Already on GCPGoogle Vertex AI — GCP native, BigQuery integration
ExperimentationAnthropic Cookbook notebooks — great starting point

Sources:


Part of my Anthropic developer notes series. Next: building your first MCP server in Python.

This post is licensed under CC BY 4.0 by the author.