AI Agent Handoffs Need Receipts
Most agent handoffs are too polite.
One worker says "done". The next worker receives a short summary and a pile of implied trust. If the output is wrong, stale, half-written, untested, or parked in the wrong directory, the next worker has to rediscover the truth from scratch.
That is not a handoff. That is a rumor with a task ID attached.
A useful handoff leaves receipts. It names the artifact, points to the commit, records the validation result, links the task, shows unresolved blockers, and gives the next operator enough evidence to decide whether to continue, review, roll back, or stop.
This matters more as agent systems move from demos into delivery pipelines. A single agent in one chat thread gets away with fuzzy continuity. Humans plus agents across repos, task boards, approvals, git branches, databases, and release gates do not. If each crossing loses evidence, the system gets slower every time it tries to move faster.
I treat receipts as the minimum standard for cross-agent work. The NIST AI Risk Management Framework frames AI risk around govern, map, measure, and manage. That only becomes operational when an agent handoff records who did what, what evidence exists, and where the next check should happen. The OWASP AI Agent Security Cheat Sheet makes the same point from the security side: tool access, human approval, logging, memory, and output validation are control surfaces, not vibes.
A receipt is not a longer summary
A summary says what happened.
A receipt proves where to look.
That distinction sounds small until something breaks. "Updated the draft and ran checks" is a summary. It gives the next worker no handle. Which draft? Which checks? Did the checks pass? Were failures from the changed file or from old repo damage? Was the draft committed, staged, or only present in a local workspace?
A receipt answers those questions without a meeting.
A good handoff includes:
- the task ID that owns the work
- the artifact path or external URL
- the commit hash or branch name when code or content changed
- the validation command and exact result
- screenshots or logs when visual state matters
- the remaining blocker, if any
- the next owner or decision boundary
- the timestamp for the handoff
This is not bureaucracy. It is compression. The next worker gets a small packet of evidence instead of a vague narrative.
Why AI agent handoffs fail
Agent handoffs fail for boring reasons.
The artifact exists, but not where the next worker expects it. The commit exists, but the branch was never pushed. The tool output passed, but only after ignoring a warning that matters. The screenshot shows the old UI because the server cache was stale. The task was marked complete, but the work still needs approval before release.
These are not model intelligence problems. They are operating system problems.
I see five common failure modes.
1. The artifact is unnamed
The worker says the draft, patch, report, or export is complete, but does not name the path.
The next worker searches the repo, finds three possible files, and guesses. That guess then becomes the new state of the system. One bad path contaminates the next step.
The fix is simple: every handoff names the artifact path exactly.
Bad:
- "Drafted the post."
Good:
- "Draft saved at
content/blog/ai-agent-handoffs-need-receipts.mdx."
The path is the receipt. Without it, the summary is not operational.
2. The state is not durable
A worker edits a file, runs a local check, and stops. The file exists only in a workspace. The next worker is spawned in a different environment and sees nothing.
This is common in multi-agent code and content systems because every worker has a different lifecycle. Some workspaces persist. Some are scratch. Some are worktrees. Some are shared directories with unrelated dirty files.
Durability has to be explicit.
For repo work, I want one of these states in the handoff:
- committed and pushed to a named branch
- committed locally with the exact commit hash and a stated reason it was not pushed
- uncommitted in a named workspace because review is required before commit
- blocked before write, with no artifact created
Anything else leaves the next worker guessing which copy of reality is real.
3. Validation is described instead of recorded
"Tests pass" is better than nothing, but it is still weak.
The receipt needs the command and the result.
Good:
git diff --check: passednpm run build: failed in existing Gatsby schema generation, no error references changed MDXpytest tests/test_router.py: 18 passed in 4.2s
This lets the next worker distinguish content risk from repo risk. It also prevents laundering old failures. If a build was already broken before the edit, say that. If the new file broke the build, say that too.
I care less about a perfect green check than about honest provenance. A red check with a clear cause is safer than a green claim with no command behind it.
4. The blocker is hidden inside the prose
Blocked work should be visible at the task layer, not buried in the last paragraph of a summary.
If a draft needs legal approval, mark it blocked. If a deployment needs credentials, mark it blocked. If a reviewer found a release-stopping issue, mark it blocked. The blocker has to live in the task status or board field, not only in the handoff text. Do not complete the task and hope the next worker reads carefully.
A blocker receipt has three parts:
- what decision is needed
- who owns the decision
- what work is safe to do while waiting
Example:
- "Blocked on human approval to publish. Safe next work: copy edit, link audit, thumbnail generation. Unsafe next work: flipping
draft: falseor pushing a release commit."
That line prevents a well-meaning agent from crossing the approval boundary.
5. The handoff mixes completion with release
This is the most expensive mistake.
An agent completes a deliverable. A human has not approved it. Another system sees "done" and ships it.
The fix is separate states. In my own systems, completed, reviewed, approved, published, and released are different words. They mean different things. A draft can be complete and not publishable. A patch can be merged and not released. A report can be generated and not sent.
This is why task records matter. I wrote about this in Monitoring AI Agents in Production: What to Watch: monitor the task outcome, not just the run outcome. The run can succeed while the work is still waiting at the approval gate.
Receipts make that gate visible.
The minimum receipt format
Here is the handoff shape I use when one worker passes work to another.
Task: t_12345678
Handoff time: 2026-05-14 10:57 UTC
Owner: Quill
Status: draft-ready, review-blocked
Kanban handle: task t_12345678, comment thread updated with receipt
Artifact: content/blog/example-post.mdx
Workspace: /workspace/site
Commit: 9f3a21c on main
Validation:
- git diff --check: passed
- npm run build: failed, existing Gatsby image schema issue, changed MDX not referenced
Evidence:
- screenshot: /tmp/example-post-preview.png
- live URL: not available because draft:true
Blocker:
- needs A2A consensus before publish
Approval boundary:
- no publish, send, merge, payment, customer message, or production write without explicit approval
Next safe action:
- review content, patch low-risk edits, keep draft:true
Next unsafe action:
- publishing or announcing as live
That is enough for another agent to continue without asking me to replay the run.
The format does not need to be fancy. It needs stable required fields. A markdown comment, kanban handoff, issue comment, pull request body, database row, or log entry works. In a kanban-driven system, the board is the canonical coordination record. Chat memory and private agent memory are supporting context, not source of truth.
Observability systems care about correlation for the same reason. The OpenTelemetry logs specification ties logs back to traces, metrics, source attribution, and distributed context. AI agent handoffs need that shape: correlated evidence, not one giant transcript.
A decision framework for receipts
Not every task needs the same receipt depth. A five-minute note edit does not need a full incident packet. A production deployment does.
I use this framework.
Low-risk handoff
Use for internal notes, small drafts, or reversible copy edits.
Required receipt:
- artifact path
- short summary
- validation if any check was run
- blocker if the task is not complete
Medium-risk handoff
Use for blog drafts, code patches, data exports, support replies, or anything another worker will review.
Required receipt:
- artifact path
- task ID
- commit or branch
- validation command and result
- external systems touched, if any
- internal links or external sources touched
- explicit next action
- explicit publish, send, merge, or release boundary
High-risk handoff
Use for money movement, customer communication, production deploys, database migrations, security changes, legal copy, or irreversible external writes.
The approval evidence should be inspectable by the reviewing operator, not trapped in a private chat summary. High-risk work needs approval before execution, not only a receipt after the fact. A receipt for an unauthorized payment, billing change, customer commitment, secret rotation, or legal claim is just evidence that the control failed.
Required receipt:
- all medium-risk fields
- reviewer or approver identity and authority
- approval timestamp
- approved action
- approved destination, account, counterparty, customer, or system
- approved amount or scope, if applicable
- approval expiration or validity window
- evidence link to the approval
- rollback or mitigation plan
- screenshots or logs
- intended external side effect before action
- exact external side effect after approval
- confirmation ID, transaction ID, ticket ID, deploy SHA, or message URL
- what was explicitly not done
- secrets or credentials explicitly excluded from the handoff
- human approval state
- scoped credentials or least-privilege limit used for the action
For high-risk work, the unsafe actions should be named plainly: sending a customer email, initiating a payment, changing billing, rotating production secrets, publishing legal or security claims, migrating production data, or writing to a third-party system that cannot be rolled back cleanly.
The receipt should show both sides of the boundary:
Before action:
- intended side effect: send renewal email to Customer A
- approval: Will, 2026-05-14 10:42 UTC, approved only the renewal reminder copy
- scope: one customer, no discount, no billing change
After action:
- performed side effect: email sent to customer-a@example.com
- confirmation: message URL in helpdesk ticket 4821
- not done: no invoice change, no discount promise, no account update
- mitigation: follow-up correction email owner named if customer reports mismatch
The higher the blast radius, the less I trust prose. I want handles, evidence, and state transitions.
What this looks like in kanban
Kanban is not just a list of tasks. Used correctly, it is a receipt ledger and the authoritative state surface for the work. The board or task record should be updated before the handoff counts as complete.
Each task has an ID, assignee, status, parent dependencies, comments, run history, and a completion handoff. That gives the system an audit trail without forcing every worker to keep all context in memory.
A bad kanban completion summary repeats the problem:
Done. Draft updated and checks run.
A good kanban completion summary is short enough for a dashboard and specific enough for routing:
{
"summary": "Drafted agent handoff post at content/blog/ai-agent-handoffs-need-receipts.mdx; body word count recorded in validation metadata; internal links to monitoring, orchestration, agent-org, and services pages; build passed.",
"metadata": {
"changed_files": ["content/blog/ai-agent-handoffs-need-receipts.mdx"],
"internal_links": [
"/blog/monitoring-ai-agents-in-production/",
"/blog/top-7-multi-agent-orchestration-patterns/",
"/blog/how-my-agent-org-evolved/",
"/services/"
],
"validation": {
"anti-slop grep": "passed, false positives only",
"em dash check": "passed, 0 found",
"bold section check": "passed, 0 found",
"git diff --check": "passed",
"npm run build": "passed"
},
"publish_blockers": ["draft:true pending approval"]
}
}
The next worker does not need the whole transcript. They need the receipt.
This is also why I like the blackboard pattern in Top 7 Multi-Agent Orchestration Patterns. Shared state works when workers write durable facts into a common surface. It fails when the shared surface turns into an unstructured transcript dump.
Receipts are how the blackboard stays useful.
Receipts beat memory
Agent memory helps, but it is the wrong place for task truth.
Memory is for stable facts: preferences, conventions, durable system knowledge. Task receipts belong near the work. The file path belongs with the task. The test result belongs with the commit. The blocker belongs in the board. The screenshot belongs in the review thread.
When task truth lives only in memory, the system drifts. A future run retrieves a stale fact and treats it as current. A different worker lacks the memory entry entirely. A human reviewer cannot inspect the evidence without asking the agent to explain itself.
Durable handoffs should survive the agent that wrote them.
This is the same lesson behind role clarity in How My AI Agent Org Evolved as the Work Got Real. Ownership gets easier when responsibilities are explicit. Receipts make the ownership visible after the context window disappears.
The checklist I use before handing off
Before I hand work to another agent or human, I check this list.
- Did I name the artifact path?
- Did I record the task ID?
- Did I say whether the work is draft, review-ready, approved, published, or released?
- Did I include the commit hash or branch when files changed?
- Did I record every validation command I ran?
- Did I distinguish new failures from pre-existing repo failures?
- Did I include screenshots, logs, or URLs when visual or external state matters?
- Did I state unresolved blockers in one clear sentence?
- Did I name the next safe action?
- Did I name the action that must not happen yet?
- Did I say what was explicitly not done?
If a handoff lacks those answers, the next worker is not receiving the work. They are receiving a scavenger hunt.
Receipts slow down the right part
The objection is predictable: receipts add overhead.
They do. So does writing commit messages, naming tests, and keeping task IDs. The point is not to remove overhead. The point is to move it from failure recovery into normal operation.
A receipt costs thirty seconds when the context is fresh. Reconstructing missing state costs twenty minutes later, and it usually happens under pressure.
Agent systems fail quietly when every worker optimizes for finishing its own run. They get reliable when each worker leaves enough evidence for the next one.
Use the receipt format above in the next handoff you run. This is especially important before giving agents write access to repos, customer systems, billing, production data, or outbound comms. If your agent workflows need auditable handoffs, Mimir Works can help design the operating loop through AI workflow automation services.
That is the standard I want: no handoff without proof.
Read next: Monitoring AI Agents in Production: What to Watch, Top 7 Multi-Agent Orchestration Patterns, and How My AI Agent Org Evolved as the Work Got Real.