The Diagram Vs. The System
Every builder I know keeps two architecture documents. The first is the drawing they'd show a new teammate. The second is the system that actually runs.
The gap between them is where most of the real learning lives. If you're building a personal AI stack. Agents, memory layers, retrieval, orchestration. You already know this gap by feel, even if you haven't named it.
This post is about that gap. Not because my system is unusual. The gap itself is under-discussed. Most technical writing shows the clean version. The dirty version. The configured-but-not-live component, the half-migrated layer, the proxy that was supposed to be temporary and is now permanent. Those are the parts worth talking about.
The drawing I draw
If you asked me to describe the architecture, I'd sketch something like this:
A vault at the center. Agents around it. Discord as the control surface. Specialist agents with clear boundaries. Memory in layers. Everything durable, everything inspectable.
PARA for folders. Zettelkasten for knowledge compounding. Wiki thinking for queryable knowledge. Obsidian as the human interface. OpenClaw for agent orchestration. QMD (Query-Map-Distill, a semantic retrieval layer) for vault-wide search. Memory-wiki for compiled knowledge. Agents can publish summaries there and reference them later. Honcho for adaptive cross-session modeling when I need it to scale. MetaClaw for skills injection, memory compression, and an adaptive agent loop.
That's the drawing. Clean lines. Each layer has a job. It reads like a real stack.
But this drawing contains multiple components that are configured but not (yet) live.
The system that actually runs
Here's the honest version.
The vault works. PARA holds up. OpenClaw orchestrates. Specialist agents exist and route between domains. Markdown is the durable format. Discord is the control surface. Those parts earn their keep every single day.
Which brings me to the most painful gap: QMD. It has a config file, a skill definition, and a slot in every diagram I draw. But in practice, most retrieval still comes from memory-core session files and file-level search. The QMD integration path is wired. I can point to the config. But the actual retrieval pipeline defaults to older, more established paths. The architecture I draw on a Saturday morning is a cleaner version of the systems I tolerate on a Tuesday afternoon.
MetaClaw is running in proxy mode. Essentially a local OpenAI-compatible backend pointing at GLM-5.1 (a local model). The skills injection and session-compression paths are configured in its manifest file. They aren't the primary execution path the orchestrator uses. The adaptive layer exists on the drawing and in config. It is not the main flow.
Honcho has been researched, documented, and written into architecture documents. I know exactly how it would fit. It has never run in production.
Memory-wiki is a wiki folder agents write to. It gets used. Nobody queries it systematically. Despite what the drawing suggests.
When I look at the drawing and the live system side by side, that gap is instructive. Most of the configured architecture is either not live, or live in a thinner form than the diagram suggests.
Why the gap exists
There are three reasons.
The first is the switching cost. Every new memory layer or retrieval path has a setup phase that takes longer than the config step suggests. The skill is written. The connection is made. But the real cost is retraining agent defaults. Teaching every specialist agent to reach for QMD before falling back to session files. That retraining doesn't happen in one config push. It happens over weeks of usage, prompt updates, and corrections. I've also learned that some configured layers solve problems I don't have yet. Honcho is the clearest example of an anticipatory gap, where the slot exists waiting for a need that hasn't arrived.
The second is elegance versus time. When I'm in the middle of a real workday, I use what works. The vault files have never failed me. So I reach for them, even when a more elegant path sits configured and waiting. The drawing I draw on a Saturday morning is aspirational. The system I tolerate on a Tuesday afternoon is pragmatic.
The third is that some layers are anticipatory. Honcho is the textbook case. I researched it because I expect to need adaptive cross-session modeling at a higher scale. I put it in the architecture so the slot exists when I need to fill it. But I don't need it yet. The drawing is partly a roadmap, not just a snapshot.
What the gap costs. And why I don't rush to close it
The gap costs real inefficiency. Last week I watched an agent miss a relevant vault file because it checked session context first, found a partial match, moved on. And the QMD query that would have returned the right result never fired. That's a concrete cost: time wasted, signal lost, a decision made on incomplete context.
But chasing architectural purity at the expense of shipping is how systems stop shipping. I'd rather have a live vault with clean files and a half-wired QMD than a perfectly configured system that publishes nothing.
What the gap taught me
The gap has taught me a few things worth keeping.
Label things by deployment status, not intention. I started tagging each component as one of: live, live (reduced). Running but not at full capability, like MetaClaw in proxy mode. Or configured, or researched in my vault notes. That's more honest than listing everything in one flat stack. In practice it looks like this:
- QMD → configured
- Honcho → researched
- MetaClaw → live (reduced / proxy mode)
- Memory-wiki → live (partial. Written to, not systematically queried)
Beaten paths beat better paths most of the time. The system's default retrieval pipeline. Check session files, check memory files, fall to general search. It's simpler than the QMD path. It handles 80% of queries correctly. Until QMD handles 90%+ with no additional friction, the old path keeps winning.
The live system is always a prototype. Every system I've built that stayed useful long-term was in a state of partial completion. The ones that finished were the ones that stopped evolving.
Configured vs live is not right vs wrong. It's designed vs earned. The live components earned their place through real use. The configured ones are hypotheses waiting for a test. The gap isn't a bug. It's the build log.
What I'm doing about it
I'm documenting it. The configured-architecture doc gets a status column now. The live system is the source of truth. The configured system is the roadmap. The two should never merge. That would mean the system stopped evolving.
If you're building a similar stack, I'd recommend the same. Draw your ideal architecture. Run your real one. Label the gap. And stop pretending the drawing is the system.
Read next: My AI Agent Org Chart and How My Agent Org Evolved.
