Claude Managed Agents Drop · Research Note

Feature	Verdict	Effort	Order
Outcomes-loop (rubric self-grading)	Adopt now	Low	1st
Webhooks to phone notifications	Adopt with bridge	Med	2nd
Multi-agent orchestration	Adopt for V2 build	Med-High	3rd
/claude-api skill	Already active	None	Done
Dreaming (memory curation)	Skip for now	High	Revisit at scale

Feature 01

Outcomes-Loop (Rubric Self-Grading)

What this is, in plain English

You write a checklist of rules (called a "rubric") and save it as a plain text file. Then, every time KAIRO builds a prospect dossier, a separate helper AI reads the checklist and grades the work before you see it. If something fails the checklist, the agent rewrites that part and grades again, up to five rounds. Think of it like a quality-control inspector at the end of a production line, except the inspector runs in under a minute and costs less than 2 cents.

Why it matters for KAIRO

The Ice Cold Yeti dossier went through three manual rounds of fixes because of simple rule violations: wrong pricing shown, wrong brand name used, phone number wrong. Each round cost roughly $9 to $18 in API (Application Programming Interface, the way one piece of software talks to another) spend plus 30 to 60 minutes of review time. With an outcomes-loop, those violations get caught automatically before the dossier ever lands in front of Anthony. The "inspector" runs in its own memory bubble, separate from the main agent's work, so they don't interfere with each other.

What we'll do with it

Save the rubric file today (the full version is below). Use it as a manual checklist on the next prospect dossier even before we wire up the automated version. Every failed rule caught early saves 30 to 60 minutes of review and up to $18 in re-run cost.

Adopt Now

KAIRO Prospect Dossier Rubric v1 (drop-in ready)

Brand Identity

Brand name used throughout is either the confirmed locked name or the exact phrase "marketing and AI company." No other variations.
No em dashes appear anywhere in the copy. Use commas, periods, semicolons, parentheses, or rewrite.
All logos referenced were opened and visually confirmed against the prospect's existing mark before use.

Contact Accuracy

The phone number in any tel: or sms: link is Anthony's number 361-549-6417, not the prospect's.
The prospect's address and business category were verified against a live source, not inferred.

Audience Positioning

The target audience is named by job title or buying role, not described as "general" or "everyone."
No pricing figures, service fees, or cost estimates appear anywhere in the deliverable.
The deliverable references the in-person conversation with the prospect, not a cold-discovery framing.

Audio and Visual Alignment

If audio is present, the script matches the page copy in tone and substance, no contradictions.
Audio for prospect-facing artifacts opens with "Hi, this is Anthony's research agent" and stays in agent voice.

Output Verification

A completed deliverable file exists at the expected path. A plan or outline alone does not satisfy.
The deployment URL resolves and returns HTTP 200 (meaning "the page loaded successfully") before marking complete.
All asset paths embedded in the HTML resolve to real files in the deploy bundle.

Here is what this code does in plain English: it sends the rubric to the AI agent and tells it to check the dossier up to 5 times until all rules pass. Python

client.beta.sessions.events.send(
    session_id=session.id,
    events=[{
        "type": "user.define_outcome",
        "description": "Build a prospect dossier for ICE COLD YETI",
        "rubric": {"type": "text", "content": RUBRIC},
        "max_iterations": 5,
    }],
)

Feature 02

Webhooks to Phone Notifications

What this is, in plain English

A "webhook" is like a doorbell for your phone. When the AI finishes building a prospect dossier, Anthropic (the company that makes Claude) sends a message to a web address you control. That address runs a small piece of code (a "Cloudflare Worker," basically a tiny server that costs nothing to run) that translates the message into a push notification on Anthony's phone with a button he can tap to open the finished dossier.

Right now, after a dossier builds, you have to go check manually. With webhooks, your phone tells you when it's done, and you tap one button to see it.

Why it matters for KAIRO

The v2 Prospect Cloner pipeline needs this as its "done" signal. Drop a voice transcript in iCloud, walk away, and ten minutes later your phone buzzes with a link to the finished dossier. No checking, no polling, no waiting at the computer. The Worker (the tiny translator code) costs effectively nothing to run on Cloudflare's free tier.

What we'll do with it

Deploy two small Cloudflare Workers when we build v2. One starts a dossier session, one receives the "done" message and sends Anthony's phone a notification with an action button. Register one webhook endpoint in the Anthropic console pointing at the completion Worker. Estimated setup time: about 1 to 2 hours.

Adopt with Bridge

The full flow from transcript drop to phone notification:

Voice transcript drops in iCloud folder
  → file-watch trigger (Cloudflare R2 storage event or polled)
  → sends to thin starter Worker
  → Worker creates a Managed Agent session via Anthropic API
  → Agent runs prospect research with rubric grading
  → On completion, Anthropic webhook fires the completion Worker
  → Worker fetches the dossier URL from the session
  → Worker sends to NTFY (a push notification service)
  → Anthony's phone buzzes with a button to open the dossier

Here is what this code does in plain English: it listens for the "dossier finished" message, then sends Anthony's phone a notification with a button to open the dossier. JavaScript (Cloudflare Worker)

export default {
  async fetch(request, env) {
    const event = await request.json();
    if (event.data.type !== 'session.outcome_evaluation_ended') {
      return new Response('ignored', { status: 200 });
    }
    const session = await fetchSession(event.data.id, env.ANTHROPIC_KEY);
    const dossierUrl = session.output_url;
    await fetch('https://ntfy.sh/kairo-ant-updates', {
      method: 'POST',
      headers: {
        'Title': 'Prospect dossier ready',
        'Priority': 'high',
        'Actions': `view, Open dossier, ${dossierUrl}, clear=true`,
      },
      body: `Dossier ready for review: ${dossierUrl}`,
    });
    return new Response('ok');
  },
};

Feature 03

Multi-Agent Orchestration

What this is, in plain English

Instead of one AI trying to do everything at once, you set up a "coordinator" AI that manages a team of specialist AIs. Each specialist has its own job: one checks the business website, one scans social media, one looks for ads, one reads Google reviews, etc. Critically, each helper AI has its own memory, so they don't get confused by each other's work. The coordinator hands each one a task, collects the results, and assembles the full picture.

Right now in KAIRO, when multiple agents run inside a Claude Code session, they all share one memory pool, which creates confusion at scale. Managed Agents fixes that with hard separation.

Why it matters for KAIRO

The v2 Prospect Cloner will have six specialist agents: website auditor, social media auditor, ad scanner, reviews scanner, Google Business Profile scanner, and positioning strategist. With Managed Agents, these six run simultaneously and in clean isolation. Without it, they'd have to run one after another in a shared memory pool and could interfere with each other. Limits that apply: up to 25 agents running at the same time, max 20 unique specialists in the roster, one level of nesting (coordinator talks to specialists; specialists cannot hire more specialists).

What we'll do with it

Build this into the v2 Prospect Cloner architecture, not today. Pre-create six specialized agents once, store their IDs, and wire the coordinator to call them when a transcript drops. This replaces the current hand-rolled approach where KAIRO spawns agents inside a Claude Code session.

Adopt for V2

Here is what this code does in plain English: it sets up a coordinator AI with a team of six specialists, each with a specific job on the prospect research. Python

coordinator = client.beta.agents.create(
    name="Prospect Cloner Coordinator",
    model="claude-opus-4-7",
    system="You receive a transcript and orchestrate prospect research...",
    tools=[{"type": "agent_toolset_20260401"}],
    multiagent={
        "type": "coordinator",
        "agents": [
            {"type": "agent", "id": web_auditor.id},
            {"type": "agent", "id": social_auditor.id},
            {"type": "agent", "id": ad_scanner.id},
            {"type": "agent", "id": reviews_scanner.id},
            {"type": "agent", "id": gbp_scanner.id},
            {"type": "agent", "id": positioning_strategist.id},
        ],
    },
)

Feature 04

/claude-api Skill

What this is, in plain English

Anthropic published an official "skill" (a set of instructions that Claude reads before writing any code that connects to the Claude API). It enforces the right defaults automatically: right model, right settings for speed and cost, built-in caching so you don't pay twice for the same content. It also knows every old model name and its replacement, so code doesn't break when Anthropic retires a model version.

Why it matters for KAIRO

This skill is already loaded in every KAIRO session. No installation needed. It kicks in automatically when any new KAIRO script imports the Anthropic SDK (Software Development Kit, the toolkit Anthropic provides for connecting to their AI). Most existing KAIRO scripts use OpenAI for images and voice, so the skill smartly skips those. It will matter most when we write the v2 Prospect Cloner scripts, which will connect heavily to Anthropic's Managed Agents API.

Prompt caching, specifically, is worth calling out: it stores frequently used instructions (like the rubric or the system prompt) so Claude doesn't have to re-read them from scratch on every run. Savings: roughly 1 to 2 cents per repeated check, which adds up across dozens of prospect runs.

What we'll do with it

Nothing to install. It's already running. When we write new scripts for the v2 Prospect Cloner or any KAIRO script that touches the Anthropic API, the skill enforces smart defaults in the background.

Already Active

Feature 05

Dreaming (Memory Curation)

What this is, in plain English

"Dreaming" is Anthropic's version of spring cleaning for AI memory. You point it at a batch of session transcripts (up to 100), it reads them all, throws away duplicate or stale notes, and writes a fresh, organized memory file. The original memory is never touched; the output is always a new copy you can review before deciding to use it.

Why it matters for KAIRO

In theory, this is a powerful tool. In practice, there's a mismatch: Dreaming works with Anthropic-hosted memory stores (cloud databases managed by Anthropic), and KAIRO's memory lives in plain text files on Anthony's Mac at ~/.claude/projects/. The two systems don't connect. To use Dreaming you'd have to migrate all KAIRO memory into Anthropic's cloud format and give up the ability to search and version-control it with simple tools. That trade is not worth it today with only 29 active memory entries and full manual control working well.

What we'll do with it

Skip for now. Revisit if KAIRO scales to hundreds of sessions per week and manual memory curation becomes a bottleneck. The current plain-text system is searchable, versionable with git, and free. Dreaming's value kicks in at a much larger scale.

Skip for Now

Build sequence

Five steps. In this order.

Sequenced by value delivered versus effort spent.

Today · 10 minutes

Save the rubric file

Save the Prospect Dossier Rubric to prospect-engine/rubrics/prospect-dossier-v1.md. Use it as a manual checklist on the next dossier, even without any automation. The grader catches rule violations before Anthony's review.

This week · 1 to 2 hours

Prove it works with one small test

Pre-create one coordinator and two specialist agents. Run a tiny prospect test end-to-end. Plant a deliberate rule violation (a fake price in the copy) and confirm the rubric grader catches it and flags it before output.

Next 1 to 2 weeks · 4 to 6 hours

Build v2 Prospect Cloner architecture

Six specialist agents, coordinator, two Cloudflare Workers, one webhook registered in the Anthropic console. End-to-end test: drop a transcript, dossier appears on Cloudflare Pages within 10 minutes, phone notified with action button.

When available

Wire the transcript trigger

When Plod (the voice transcription tool) adds webhook support, wire it so a transcript upload automatically kicks off the dossier build. Until then, a manual file-watch approach works fine.

Future · Revisit at scale

Dreaming (memory curation)

Defer until KAIRO runs hundreds of sessions per week. The current hand-curated plain-text memory system works well at today's volume. No action needed now.

Cost estimate

What it costs per prospect.

Estimated total per complete dossier run on the v2 architecture. These are API costs, not your time.

Six specialist agents (Claude Sonnet, ~3 to 5 minutes each) $0.50 to $1.50

Coordinator agent overhead $0.10 to $0.30

Rubric grader (2 to 3 check rounds) $0.02 to $0.06

Image generation via OpenAI (logos, mockups) $2.00 to $3.00

Audio briefing via OpenAI Nova voice $0.05 to $0.10

Cloudflare Pages hosting + Workers + storage $0 (free tier)

Phone notifications (NTFY) $0

Estimated total per prospect run $3 to $5

Same ballpark as today's manual runs, with much less manual effort per dossier. The time savings per avoided rework round (30 to 60 minutes, $9 to $18 in re-runs) pay for several full dossier runs.

Anthropic shipped 4 things. Here's what we adopt.

Listen first, read second.

TL;DR. Five features, five verdicts.

Outcomes-Loop (Rubric Self-Grading)

Webhooks to Phone Notifications

Multi-Agent Orchestration

/claude-api Skill

Dreaming (Memory Curation)

Five steps. In this order.

What it costs per prospect.

Where this research came from.