Anthropic just shipped major new features for their AI agents that we can use in KAIRO. Here's what each one does in plain English, what it would do for our prospect work, and which ones we adopt now versus later.
Play this while you scroll. About 3 minutes 36 seconds. Written in Anthony's direct voice, not AI-speak.
This is the whole research in one table. Each feature explained in full below.
| Feature | Verdict | Effort | Order |
|---|---|---|---|
| Outcomes-loop (rubric self-grading) | Adopt now | Low | 1st |
| Webhooks to phone notifications | Adopt with bridge | Med | 2nd |
| Multi-agent orchestration | Adopt for V2 build | Med-High | 3rd |
| /claude-api skill | Already active | None | Done |
| Dreaming (memory curation) | Skip for now | High | Revisit at scale |
You write a checklist of rules (called a "rubric") and save it as a plain text file. Then, every time KAIRO builds a prospect dossier, a separate helper AI reads the checklist and grades the work before you see it. If something fails the checklist, the agent rewrites that part and grades again, up to five rounds. Think of it like a quality-control inspector at the end of a production line, except the inspector runs in under a minute and costs less than 2 cents.
The Ice Cold Yeti dossier went through three manual rounds of fixes because of simple rule violations: wrong pricing shown, wrong brand name used, phone number wrong. Each round cost roughly $9 to $18 in API (Application Programming Interface, the way one piece of software talks to another) spend plus 30 to 60 minutes of review time. With an outcomes-loop, those violations get caught automatically before the dossier ever lands in front of Anthony. The "inspector" runs in its own memory bubble, separate from the main agent's work, so they don't interfere with each other.
Save the rubric file today (the full version is below). Use it as a manual checklist on the next prospect dossier even before we wire up the automated version. Every failed rule caught early saves 30 to 60 minutes of review and up to $18 in re-run cost.
Adopt Nowclient.beta.sessions.events.send(
session_id=session.id,
events=[{
"type": "user.define_outcome",
"description": "Build a prospect dossier for ICE COLD YETI",
"rubric": {"type": "text", "content": RUBRIC},
"max_iterations": 5,
}],
)
A "webhook" is like a doorbell for your phone. When the AI finishes building a prospect dossier, Anthropic (the company that makes Claude) sends a message to a web address you control. That address runs a small piece of code (a "Cloudflare Worker," basically a tiny server that costs nothing to run) that translates the message into a push notification on Anthony's phone with a button he can tap to open the finished dossier.
Right now, after a dossier builds, you have to go check manually. With webhooks, your phone tells you when it's done, and you tap one button to see it.
The v2 Prospect Cloner pipeline needs this as its "done" signal. Drop a voice transcript in iCloud, walk away, and ten minutes later your phone buzzes with a link to the finished dossier. No checking, no polling, no waiting at the computer. The Worker (the tiny translator code) costs effectively nothing to run on Cloudflare's free tier.
Deploy two small Cloudflare Workers when we build v2. One starts a dossier session, one receives the "done" message and sends Anthony's phone a notification with an action button. Register one webhook endpoint in the Anthropic console pointing at the completion Worker. Estimated setup time: about 1 to 2 hours.
Adopt with BridgeThe full flow from transcript drop to phone notification:
Voice transcript drops in iCloud folder
→ file-watch trigger (Cloudflare R2 storage event or polled)
→ sends to thin starter Worker
→ Worker creates a Managed Agent session via Anthropic API
→ Agent runs prospect research with rubric grading
→ On completion, Anthropic webhook fires the completion Worker
→ Worker fetches the dossier URL from the session
→ Worker sends to NTFY (a push notification service)
→ Anthony's phone buzzes with a button to open the dossier
export default {
async fetch(request, env) {
const event = await request.json();
if (event.data.type !== 'session.outcome_evaluation_ended') {
return new Response('ignored', { status: 200 });
}
const session = await fetchSession(event.data.id, env.ANTHROPIC_KEY);
const dossierUrl = session.output_url;
await fetch('https://ntfy.sh/kairo-ant-updates', {
method: 'POST',
headers: {
'Title': 'Prospect dossier ready',
'Priority': 'high',
'Actions': `view, Open dossier, ${dossierUrl}, clear=true`,
},
body: `Dossier ready for review: ${dossierUrl}`,
});
return new Response('ok');
},
};
Instead of one AI trying to do everything at once, you set up a "coordinator" AI that manages a team of specialist AIs. Each specialist has its own job: one checks the business website, one scans social media, one looks for ads, one reads Google reviews, etc. Critically, each helper AI has its own memory, so they don't get confused by each other's work. The coordinator hands each one a task, collects the results, and assembles the full picture.
Right now in KAIRO, when multiple agents run inside a Claude Code session, they all share one memory pool, which creates confusion at scale. Managed Agents fixes that with hard separation.
The v2 Prospect Cloner will have six specialist agents: website auditor, social media auditor, ad scanner, reviews scanner, Google Business Profile scanner, and positioning strategist. With Managed Agents, these six run simultaneously and in clean isolation. Without it, they'd have to run one after another in a shared memory pool and could interfere with each other. Limits that apply: up to 25 agents running at the same time, max 20 unique specialists in the roster, one level of nesting (coordinator talks to specialists; specialists cannot hire more specialists).
Build this into the v2 Prospect Cloner architecture, not today. Pre-create six specialized agents once, store their IDs, and wire the coordinator to call them when a transcript drops. This replaces the current hand-rolled approach where KAIRO spawns agents inside a Claude Code session.
Adopt for V2coordinator = client.beta.agents.create(
name="Prospect Cloner Coordinator",
model="claude-opus-4-7",
system="You receive a transcript and orchestrate prospect research...",
tools=[{"type": "agent_toolset_20260401"}],
multiagent={
"type": "coordinator",
"agents": [
{"type": "agent", "id": web_auditor.id},
{"type": "agent", "id": social_auditor.id},
{"type": "agent", "id": ad_scanner.id},
{"type": "agent", "id": reviews_scanner.id},
{"type": "agent", "id": gbp_scanner.id},
{"type": "agent", "id": positioning_strategist.id},
],
},
)
Anthropic published an official "skill" (a set of instructions that Claude reads before writing any code that connects to the Claude API). It enforces the right defaults automatically: right model, right settings for speed and cost, built-in caching so you don't pay twice for the same content. It also knows every old model name and its replacement, so code doesn't break when Anthropic retires a model version.
This skill is already loaded in every KAIRO session. No installation needed. It kicks in automatically when any new KAIRO script imports the Anthropic SDK (Software Development Kit, the toolkit Anthropic provides for connecting to their AI). Most existing KAIRO scripts use OpenAI for images and voice, so the skill smartly skips those. It will matter most when we write the v2 Prospect Cloner scripts, which will connect heavily to Anthropic's Managed Agents API.
Prompt caching, specifically, is worth calling out: it stores frequently used instructions (like the rubric or the system prompt) so Claude doesn't have to re-read them from scratch on every run. Savings: roughly 1 to 2 cents per repeated check, which adds up across dozens of prospect runs.
Nothing to install. It's already running. When we write new scripts for the v2 Prospect Cloner or any KAIRO script that touches the Anthropic API, the skill enforces smart defaults in the background.
Already Active"Dreaming" is Anthropic's version of spring cleaning for AI memory. You point it at a batch of session transcripts (up to 100), it reads them all, throws away duplicate or stale notes, and writes a fresh, organized memory file. The original memory is never touched; the output is always a new copy you can review before deciding to use it.
In theory, this is a powerful tool. In practice, there's a mismatch: Dreaming works with Anthropic-hosted memory stores (cloud databases managed by Anthropic), and KAIRO's memory lives in plain text files on Anthony's Mac at ~/.claude/projects/. The two systems don't connect. To use Dreaming you'd have to migrate all KAIRO memory into Anthropic's cloud format and give up the ability to search and version-control it with simple tools. That trade is not worth it today with only 29 active memory entries and full manual control working well.
Skip for now. Revisit if KAIRO scales to hundreds of sessions per week and manual memory curation becomes a bottleneck. The current plain-text system is searchable, versionable with git, and free. Dreaming's value kicks in at a much larger scale.
Skip for NowSequenced by value delivered versus effort spent.
Save the Prospect Dossier Rubric to prospect-engine/rubrics/prospect-dossier-v1.md. Use it as a manual checklist on the next dossier, even without any automation. The grader catches rule violations before Anthony's review.
Pre-create one coordinator and two specialist agents. Run a tiny prospect test end-to-end. Plant a deliberate rule violation (a fake price in the copy) and confirm the rubric grader catches it and flags it before output.
Six specialist agents, coordinator, two Cloudflare Workers, one webhook registered in the Anthropic console. End-to-end test: drop a transcript, dossier appears on Cloudflare Pages within 10 minutes, phone notified with action button.
When Plod (the voice transcription tool) adds webhook support, wire it so a transcript upload automatically kicks off the dossier build. Until then, a manual file-watch approach works fine.
Defer until KAIRO runs hundreds of sessions per week. The current hand-curated plain-text memory system works well at today's volume. No action needed now.
Estimated total per complete dossier run on the v2 architecture. These are API costs, not your time.
Same ballpark as today's manual runs, with much less manual effort per dossier. The time savings per avoided rework round (30 to 60 minutes, $9 to $18 in re-runs) pay for several full dossier runs.
All Anthropic documentation links below. Verified current as of 2026-05-06.