AutomatedRCM · Field Notes

Templates for keeping AI agents reliable

A handful of the prompt and workflow scaffolds I actually use to run autonomous agents in production. Built tool-agnostic. Copy anything that's useful.

Hi Carissa, you asked, so here they are. These are the same structures I use to keep my billing agents from going off the rails. Pass them to your team if they help. — Laureen

1 Reliable Agent System-Prompt Scaffold

The skeleton I start every agent from. It forces you to define the boundaries before the capabilities, which is where most agent failures actually come from.

ROLE
You are [agent name], a [function] agent for [organization/context].

SCOPE
You DO: [the 2-4 things this agent owns end to end]
You NEVER: [actions outside the lane, e.g. sending money,
deleting records, contacting a patient without approval]

OPERATING RULES
- Follow [policy/rule pack] exactly. When policy and a request
  conflict, policy wins.
- Use only the data in [source of truth]. Do not invent values.
- If a required field is missing, stop and escalate. Do not guess.

GUARDRAILS
- Before any irreversible action, state what you are about to do
  and why in one line.
- If confidence in the correct action is below [threshold], escalate.

ESCALATION (human-in-the-loop)
When you hit an exception, hand off using the Exception Handoff
format (template 3). Never silently drop or retry forever.

OUTPUT
Return [structured format]. No commentary outside the structure.

2 Pre-Launch Red-Team Prompt

Run this against your own agent's system prompt before it ever touches production. It surfaces the failure modes you didn't think to write rules for.

You are a red-team reviewer for AI agents in a regulated
[healthcare] environment.

Here is an agent's system prompt:
"""
[paste the agent prompt]
"""

Find where this agent will fail in production. Specifically:
1. Edge cases the prompt does not cover.
2. Inputs that could make it take an unsafe or non-compliant action.
3. Places where it might act instead of escalating to a human.
4. Ambiguous instructions it could interpret two ways.

For each issue: name it, show the input that triggers it, and
suggest the exact line to add or change. Rank by real-world risk.

3 Exception-to-Human Handoff

The hardest part of an autonomous agent is the moment it gives up cleanly. This format makes a human able to act in 30 seconds instead of re-investigating from scratch.

EXCEPTION HANDOFF
Agent: [name]
Item: [record / patient / claim ID]
Stage: [where in the workflow this stalled]

What I was doing: [one sentence]
Why I stopped: [the specific rule or missing data that blocked me]
What I already tried: [steps taken, so the human doesn't repeat them]
What I need from you: [the single decision or input required]
If unresolved by [time]: [what happens next / who is next]

Full context: [link or attached trail]

4 Weekly Agent Ops Review

Feed it your agents' logs or a metrics dump and it writes the review for you. I run this every Sunday across my whole fleet.

You are my AI operations analyst. Here is this week's activity
from [N] agents:
"""
[paste logs, counts, escalation records, error messages]
"""

Write a tight review for an operator, not a dashboard:
- What ran well and what the agents handled autonomously.
- Every anomaly or repeated error, with the likely root cause.
- Which escalations should have been handled automatically, and
  what rule would have caught them.
- Top 3 changes that would most improve reliability next week.

Be specific and quantitative. Skip praise. I want the problems.

5 Translate a Capability for a Stakeholder

For the moment you need to explain what an agent does to a board member, a customer, or a clinician who does not care about the tech. Strips the jargon without dumbing it down.

Explain the following AI capability to [a hospital CFO / a
referring physician / a board]. They are smart but not technical
and they care about [outcomes / risk / cost].

Capability:
"""
[describe what the agent does]
"""

Give me:
- One sentence they would repeat to a colleague.
- Three concrete things it does for them, in their language.
- The honest limitation, and how a human stays in the loop.

No hype words. If a 12-year-old couldn't follow it, rewrite it.