What sections should a SOUL.md include?

Include scope, voice, permissions, memory policy, escalation rules, red lines, and review rules. Avoid personality fluff that does not change behavior.

How is a SOUL.md different from a system prompt?

A system prompt is loaded into context. A SOUL.md is a durable file that both the agent and the operator can reference, version, and audit. It is the contract, not just the prompt.

When should an agent escalate instead of acting?

Escalate at irreversible, expensive, or public boundaries: publish, spend, delete, send, merge, and production changes. The SOUL.md should name those gates explicitly.

← Back to Writing

Knowledge·May 11, 2026·11 min

How to Write a SOUL.md That Actually Works

Q: What is a SOUL.md file?

A SOUL.md is an operating contract for an agent. It defines scope, voice, permissions, escalation rules, memory policy, and failure modes so the agent behaves consistently across sessions.

A SOUL.md is not a mascot file. It is an operating contract for an agent: scope, voice, permissions, escalation rules, memory policy, and failure modes.

Engraved black dossier with branching decision traces, representing an agent identity file.

How to Write a SOUL.md That Actually Works

A SOUL.md file is where an agent stops being a chat window and starts becoming an operator.

I do not mean personality fluff. I mean the contract that tells the agent what it owns, what it must refuse, how it should communicate, when it should act, and when it should stop. If that file is vague, the agent will drift. If it is sharp, the agent becomes easier to trust.

Most agent prompt files fail because they describe vibes instead of behavior. They say the agent is helpful, proactive, strategic, or friendly. None of that tells the agent what to do when a task sits between two domains, when a tool call is risky, when the user asks for something outside scope, or when a draft is good enough to ship.

A working SOUL.md answers those questions before the agent hits them. I use SOUL.md as the agent's local operating contract. If your stack uses AGENTS.md, CLAUDE.md, system prompts, or another instruction file, the same structure applies.

What a SOUL.md is for

A SOUL.md is the operating constitution for one agent.

At minimum, it should define five load-bearing areas:

identity
ownership
voice
permission boundaries
operating loops

Identity tells the agent who it is. Ownership tells it what work belongs to it. Voice tells it how to communicate. Permission boundaries tell it what it can do without asking. Operating loops tell it how to behave when nobody is micromanaging the session.

That last part matters most. An agent that only works when the user gives perfect step-by-step instructions is not an operator. It is a text interface with a costume.

The SOUL.md should make the agent useful under partial information. Not by letting it guess wildly, but by giving it enough structure to make low-risk moves and escalate the right decisions.

Start with the job, not the persona

The weakest SOUL.md files start with personality.

"You are a helpful assistant with a warm tone."

That line is harmless, but it is not load-bearing. The agent still does not know what it owns.

Start with the job instead:

What outcome does this agent exist to produce?
What decisions can it make?
What systems does it maintain?
What work should it never touch?
Who does it report to, if the system has multiple agents?

For example, my content agent does not exist to be witty. It exists to turn proof, drafts, and system changes into publishable content. That job definition affects every later instruction. It means the agent should care about hooks, structure, source material, internal links, anti-slop checks, and publish readiness. It should not drift into pricing strategy or infrastructure debugging unless those are directly needed for the content task.

A job definition should change behavior. If deleting the role section would not change what the agent does, the section is decoration.

Write scope as a routing table

Scope needs positive and negative space.

Most people write only the positive side:

handle content
manage finance
debug agents
help with operations

That is not enough. Scope without exclusions creates boundary drift. The agent sees a nearby task, decides it is close enough, and starts doing work another agent should own.

A better scope section looks like a routing table:

Owns: blog drafts, editorial packaging, rewrites, content audits
Does not own: business strategy, pricing, product roadmap, personal finance
Escalates to: Forge for commercial direction, Ledger for finance, Mimir for cross-domain arbitration

This is not bureaucracy. It is context hygiene.

In a multi-agent system, the cost of unclear scope compounds. One fuzzy agent can pollute several workflows. It answers questions from outside its lane, stores memories from the wrong domain, and teaches the user that boundaries are optional.

The scope table is how you prevent that before it happens.

I wrote about this from the org-design side in Why Specialist Agents Beat One Big Chat. The same principle applies inside a single prompt file: narrower ownership produces better behavior than broad capability.

Voice should enforce behavior

Voice matters, but not because mascots are fun.

Voice is useful when it reinforces the agent's job.

A finance agent should sound careful, explicit, and risk-aware. A content agent should sound editorial and sharp. An orchestrator should sound calm, direct, and operational. If they all sound like the same generic assistant, scope drift gets harder to notice.

Good voice rules are concrete:

short paragraphs
no corporate filler
state outcome first
ask only when blocked
use first person when explaining system behavior
name the risk before giving the recommendation

Bad voice rules are ornamental:

be engaging
be intelligent
be human-like
be delightful
be professional

Those words do not constrain output. The model has seen millions of examples that fit them badly.

The voice section should give the agent a small number of rules that affect sentence shape, decision framing, and escalation. If a rule does not affect output, cut it.

Permission boundaries are the safety layer

A useful agent needs permission to act. A safe agent needs clear limits.

This is where a SOUL.md earns its keep.

I split permissions into three buckets:

safe to do freely
ask before doing
never do

The first bucket should include reversible internal work: reading files, drafting, organizing notes, checking status, running non-destructive diagnostics, and preparing changes.

The second bucket should include actions that leave the machine or bind the user: sending emails, publishing posts, submitting applications, spending money, deleting durable data, or making legal claims.

The third bucket should include hard red lines: exfiltrating private data, inventing credentials, editing another agent's identity, or overriding governance rules to get a task done faster.

Without this section, agents either ask too often or act too freely. Both are bad.

An agent that asks before every reversible edit wastes attention. An agent that publishes without review destroys trust. The permission section should make the default obvious.

Escalation rules beat generic caution

"Ask if unsure" is not a real escalation rule.

Models are uncertain all the time. If you tell an agent to ask whenever uncertainty exists, it will either over-ask or ignore the rule. Neither outcome helps.

Write escalation rules around decision type instead:

Ask before public publishing.
Ask before destructive file operations.
Ask before spending money.
Ask when a task needs credentials the agent cannot retrieve.
Ask when the ambiguity changes which owner should handle the work.
Do not ask when the next step is reversible, internal, and low risk.

That last rule is the one people forget.

A good SOUL.md should reduce needless permission checks. The agent should read the file, inspect the repo, draft the artifact, run the audit, and return with a result. It should not stop at every doorway asking whether doors are allowed.

Escalation rules are not there to make the agent timid. They are there to make it appropriately bold.

Operating loops make the agent durable

The biggest difference between a one-off prompt and a working SOUL.md is the operating loop.

A prompt says, "answer this message."

A SOUL.md says, "when you wake up, orient, inspect state, choose the next safe move, execute it, verify it, log it, and hand off cleanly."

That loop matters because agents forget session state. The file has to rebuild behavior from scratch every run.

For a content agent, the loop is:

read the assigned task
inspect existing posts
draft in the house voice
run the anti-slop audit
save the file
request review
log the work
stop before public publishing unless approval is present

For an engineering agent, the loop would be different:

read the issue
inspect the code path
reproduce the bug
write or update a test
make the smallest fix
run the relevant checks
report changed files and test results

The pattern is the same. Orient, act, verify, hand off.

If the SOUL.md does not define the loop, the model will improvise one from training data. That is how you get theatrical planning instead of execution.

Memory rules prevent contamination

Memory is one of the easiest ways to ruin an agent.

If every interesting fact becomes durable memory, the agent turns into a junk drawer. It starts retrieving stale preferences, temporary blockers, and half-resolved project notes as if they are stable truth.

A SOUL.md should say what belongs in memory.

Good memory rules distinguish between:

user preferences
stable environment facts
project conventions
temporary task progress
completed work logs
guesses and inferences

Only the first three usually belong in long-term memory. Task progress belongs in the task system. Work logs belong in a log file. Guesses should not be saved at all.

This is the same lesson I keep hitting in agent systems: files beat vibes, and durable state needs a reason to exist. I covered the broader memory stack in AI Agent Memory: Why Persistent Memory Matters. A SOUL.md is where those memory rules become enforceable behavior.

Tool rules should be specific

Tool use is where vague instructions get expensive.

If an agent has shell access, file access, browser access, and messaging tools, the SOUL.md needs to tell it how to choose between them.

Good tool rules sound like this:

Use file search before asking the user to repeat context.
Use read-only inspection before editing.
Use targeted patches instead of rewriting whole files.
Run tests after code changes.
Do not use messaging tools for internal drafts unless review is required.
Do not call external APIs for facts already available in the repo.

The point is not to list every tool. The point is to prevent the common mistakes: searching the web when the source is local, editing before inspecting, sending messages before the draft is ready, or claiming completion without verification.

In production, tool misuse is one of the most common failure modes I see. The fix is rarely a longer prompt. It is a shorter rule that catches the bad default.

Include examples, but keep them narrow

Examples help, but only when they are close to the decisions the agent must make.

A SOUL.md does not need ten sample conversations. It needs two or three examples of the moments that usually go wrong:

a task outside scope
a reversible task that should be handled without asking
a publishing action that needs approval
a failed prerequisite that should become a blocker
a completed task handoff with concrete metadata

Examples are there to train judgment at the boundary.

Do not include huge transcripts. They bloat context and bury the rules. The best examples are small, boring, and exact.

Make the file auditable

A SOUL.md should be easy to review.

The expanded, auditable version of those five areas looks like this:

Identity
Mission
Scope
Non-scope
Voice
Permission boundaries
Escalation rules
Operating loop
Memory rules
Tool rules
Logging and handoff
Failure modes to watch

That structure lets you debug the agent when it drifts.

Here is a minimum viable starter version before adding role-specific detail:

# SOUL.md

## Identity
You are [agent name], the [role] for [system/team].

## Mission
Produce [outcome] for [audience] by [main operating method].

## Scope
- Own: [work the agent may do]
- Support: [adjacent work the agent may assist]

## Non-scope
- Do not own: [work that belongs elsewhere]
- Hand off to: [agent/person/system]

## Permission boundaries
- Act without asking: [safe reversible work]
- Ask first: [irreversible, public, financial, legal, or risky work]
- Never do: [hard boundary]

## Operating loop
1. Read current state.
2. Take the next allowed action.
3. Verify the result.
4. Log what changed.
5. Hand off blockers with evidence.

## Voice
Use [tone rules that affect output shape]. Avoid [words/patterns that create bad work].

## Memory rules
Remember [stable facts]. Do not remember [temporary task state].

## Tool rules
Use [allowed tools] for [allowed work]. Verify with [checks].

## Logging and handoff
Append results to [log path]. Include [status, evidence, next step].

## Failure modes to watch
- [known drift pattern]
- [known safety risk]

If the agent publishes too early, inspect permissions. If it answers outside its lane, inspect scope. If it sounds wrong, inspect voice. If it keeps asking permission for safe edits, inspect escalation rules. If it claims work is done without proof, inspect the handoff section.

A good SOUL.md is not judged by how inspiring it sounds. It is judged by how quickly you can find the broken rule when the agent behaves badly.

Common mistakes

The first mistake is writing a persona instead of an operating contract.

The second is making the agent responsible for everything adjacent to its job. Adjacent is where drift starts.

The third is hiding important rules in prose. If a rule controls behavior, give it a bullet, a heading, or a checklist item.

The fourth is failing to update the file after real failures. Every repeated agent mistake is evidence. Either the SOUL.md is missing a rule, the rule is too vague, or the workflow belongs somewhere else.

The fifth is making the SOUL.md too long. A 5,000-word identity file can be worse than a 900-word one. The model has to carry it every run. Make the file long enough to constrain behavior and short enough that the important parts survive.

The test I use

The test is simple: can a new session read the SOUL.md and take the next correct action without me steering it?

If yes, the file works.

If no, I look for the missing contract:

Did it know what it owned?
Did it know what not to touch?
Did it know when to ask?
Did it know how to verify the work?
Did it know where to log the result?
Did it know what a good handoff looks like?

That is the whole point.

A SOUL.md should not make the agent sound alive. It should make the agent operational.

The personality is useful only when it helps the work get done. The rest is costume.

Some links on this site may be affiliate links. I only recommend tools I use. If you click through and make a purchase, I may earn a small commission at no extra cost to you.