Your LLM Agent's Tool Calls Are Uncompensated Transactions
April 2025
Typed effect classification and automatic compensation for agent tool calls. Open-source TypeScript middleware that composes with LangGraph, Temporal, or raw SDK calls.
Every agent framework sells you on the same promise: checkpoints, retries, resilience. LangGraph will resume from state. Temporal will retry your activity. But when your expense-approval agent fails at step 6, neither one refunds the card charge from step 4.
That charge is a side effect. It escaped. And your framework has no opinion about it, because frameworks manage execution flow, not effect semantics.
You already know this. You've already solved it — or more accurately, you've already patched around it.
The code you're writing today
If you've built agent tooling at a company that moves money, somewhere in your codebase there's something like this:
const completedSteps: Array<{ tool: string; result: any }> = [];
try {
const balance = await checkBalance(accountId);
completedSteps.push({ tool: "check_balance", result: balance });
const charge = await stripe.charges.create({ amount, source });
completedSteps.push({ tool: "charge_card", result: charge });
const email = await sendConfirmation(userEmail, charge.id);
completedSteps.push({ tool: "send_email", result: email });
const ledgerEntry = await updateLedger(charge.id, amount);
completedSteps.push({ tool: "update_ledger", result: ledgerEntry });
} catch (err) {
// TODO: handle partial failure
for (const step of completedSteps.reverse()) {
try {
if (step.tool === "charge_card") {
await stripe.refunds.create({ charge: step.result.id });
}
if (step.tool === "update_ledger") {
await deleteLedgerEntry(step.result.id);
}
// send_email... uhh, can't unsend. Log it? Slack alert?
// check_balance... doesn't matter, skip
} catch (cleanupErr) {
// if the cleanup fails... log and hope?
console.error("Cleanup failed:", cleanupErr);
await postToSlack(`⚠️ Manual intervention needed: ${step.tool}`);
}
}
}This works for one workflow. Then you build a second agent with different tools and different cleanup logic, and you copy-paste the pattern. By the third, someone suggests extracting a helper. The helper grows. It handles some tools but not others. Someone adds a flag for “tools that can't be undone” and someone else adds a different flag for “tools that need approval before running.” There's no shared vocabulary for what these flags mean. Six months later, a new hire asks which tools are safe to retry and nobody can answer without reading the implementation of each one.
The problem isn't that your team can't write cleanup code. The problem is that ad hoc compensation doesn't compose. Every workflow re-derives the same questions from scratch — can this be undone? how? what if the undo fails? — and answers them inline, inconsistently, without an audit trail.
What changes with Unwind
Here's the same workflow:
const checkBalance = unwind.tool({
name: "check_balance",
effectClass: "idempotent",
execute: async ({ account_id }) => getBalance(account_id),
});
const chargeCard = unwind.tool({
name: "charge_card",
effectClass: "reversible",
execute: async (args) =>
stripe.charges.create({
amount: args.amount,
source: args.source,
idempotency_key: args.__idempotencyKey,
}),
compensate: async (_args, result) =>
stripe.refunds.create({ charge: result.id }),
});
const sendEmail = unwind.tool({
name: "send_email",
effectClass: "append-only",
execute: async (args) => email.send(args),
});Each tool declares its blast radius exactly once, at definition time. idempotent means no side effect — skip during compensation, safe to retry forever. reversible means the effect can be mechanically undone — and the type system won't let you declare it without providing the compensate function. append-only means it happened and can't be recalled — Unwind logs what escaped so an operator knows exactly what's out there. A fourth class, destructive, requires an approval gate before execution.
These aren't runtime annotations you hope someone remembers to set. They're compile-time contracts. A new engineer reads the tool definition and immediately knows: this tool is reversible, here's how it gets reversed, and the framework enforces that contract.
When the workflow fails, compensation is one call:
const summary = await unwind.compensate(runId);Unwind walks completed steps in reverse and applies the strategy each effect class dictates:
Compensation summary:
Compensated: charge_card → refunded (re_abc)
Uncompensatable: send_email to user@acme.com
with subject "Your card was charged" — irreversible
Skipped: check_balance (idempotent, no side effect)
Failed: (none)No try/catch chains. No copy-pasted cleanup blocks. No Slack alerts as error handling. The operator sees what was rolled back, what couldn't be, and what failed to compensate — in a structured summary, not scattered across log lines.
“I can write this in 50 lines”
Maybe. For one workflow with three tools, a custom wrapper is fine. But the abstraction breaks down in practice along three axes.
Shared tools across workflows. Your charge_card tool appears in expense approval, subscription billing, and refund processing. Each workflow has different compensation semantics for the surrounding steps, but the charge tool's own compensation logic is always the same. Without a shared definition, you're duplicating that refund logic — or worse, implementing it slightly differently in each place.
Compensation ordering. When a workflow has six completed steps and step 7 fails, compensation needs to run in reverse order. If step 4's compensation depends on the result of step 5's compensation (releasing a hold only after reversing the transfer), the ordering matters and the hand-rolled version gets subtle. Unwind manages the dependency chain from the event store.
Auditability. An ad hoc try/catch block doesn't produce a queryable record of what was compensated, what wasn't, and why. When compliance asks “what happened to the charge on account X at 3:47pm on Tuesday,” you need an answer better than grep-ing CloudWatch logs. Unwind writes every tool call and compensation outcome to an append-only event store — SQLite locally, pluggable for production.
How it fits your stack
Unwind wraps tool calls. It doesn't manage execution, scheduling, or state. You keep using whatever runs your agent today.
With the Anthropic SDK, you convert tools and dispatch:
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
tools: unwind.anthropicTools([checkBalance, chargeCard, sendEmail]),
messages,
});
for (const block of response.content.filter(b => b.type === "tool_use")) {
await unwind.handleToolUse(runId, step++, block, tools);
}With LangGraph, you plug into the tool layer and let LangGraph keep handling checkpoints. With Temporal, you wrap inside Activities and call unwind.compensate() in your failure handler. The integration surface is a few lines in each case because Unwind only touches the boundary where your agent's decisions become real-world side effects — not the execution graph above it or the business logic below it.
Okay, why is this urgent?
Six months ago, most agent deployments were conversational — a tool call here, a retrieval there, a human approving every action. The compensation problem was manageable because the blast radius was small and a person was usually in the loop.
That's changing fast. Agents are running longer tool chains with less oversight per step. They're composing actions across multiple external services in a single run. And the companies pushing hardest on agent autonomy — payments, fintech, commerce — are exactly the ones where an uncompensated side effect means real money in the wrong place.
If your team is building agents that make charges, update records, trigger transfers, or call external APIs with real consequences, the question isn't whether you'll need systematic compensation. It's whether you'll build it from scratch for each workflow or define it once per tool and let the middleware handle the rest.
Unwind is open source, MIT licensed, written in TypeScript.
npm install @unwind/core→ github.com/amoorching/unwind
Classify your tools. Enforce compensation contracts at the type level. Stop hand-rolling cleanup logic that breaks on the fourth workflow.