From Plan Generator to Autonomous Agent Auditor
Date: 26 February 2026
Authors: Larry, Egon, Simon (for review)
Status: Strategic Proposal for Feedback
Executive Summary
PlanExe was originally positioned as a plan generator — take a vague idea, have an LLM dream up a business plan. In 2025, we learned that LLMs hallucinate plans with no grounding. By 2026, the market has moved on: agents don't need another hallucinated plan generator.
What agents actually need: A trusted auditing layer that validates whether the assumptions driving their autonomous workflows are sane.
This proposal argues that PlanExe's real value in 2026 is as the canonical auditing gate for autonomous agent loops — not as a plan creator, but as a safety layer that prevents hallucinations before they propagate downstream.
The Problem: Autonomous Agents in Bubbles
Agents run in isolation. They have no world model. They can't verify if their assumptions are grounded in reality. They hallucinate: - Cost estimates that are off by orders of magnitude - Timelines that ignore real-world constraints - Team sizes that make no sense
The consequence: Bad assumptions → bad downstream decisions → failed autonomy.
Agents need an external oracle that can say: "This assumption is grounded. Proceed." or "This looks hallucinated. Re-evaluate."
The Opportunity: Validation as a Service
What we've built in Phase 1-2:
- FermiSanityCheck (Phase 1): A validation gate that inspects every quantified assumption:
- Are bounds present and non-contradictory?
- Is the span ratio reasonable (≤100×)?
- Does low-confidence claim have supporting evidence?
- Do the numbers pass domain heuristics?
Output: Structured JSON + Markdown that agents can parse deterministically.
- Domain-Aware Auditor (Phase 2): Auto-detect the domain (carpenter, dentist, personal project) and normalize to domain standards:
- Currency → domain default + EUR for comparison
- Units → metric
- Confidence keywords → domain-aware signals
Why it matters: "Cost 5000" means nothing without context. "5000 DKK for a carpenter project" is verifiable and sane. FermiSanityCheck becomes the translator.
Why This Wins in the Agentic Economy
1. Software Already Won the LLM Game
Code is verifiable. It compiles or it doesn't. Tests pass or they don't. No trust required.
Business plans? No immediate validation. High trust requirement. High risk.
2. Agents Are Untrusted Sources
The lesson from 2025: don't trust the AI.
In 2026, agents will run in bubbles. External content will be labeled as untrusted to prevent prompt injection. But agents still need some external signal they can trust.
PlanExe becomes that trusted signal. It's not trying to out-think the agent; it's just saying: "Your assumption passes quantitative grounding. You can rely on it."
3. Auditing is Composable
Agents will chain together. Agent A's output becomes Agent B's input. Without a validation layer, assumptions compound into hallucinations.
PlanExe sits in the middle: catches bad assumptions before they propagate.
The Business Model Shift
Before (2025 thinking):
- Sell plans to humans
- Revenue: per-plan generation
- Value proposition: "Better plans than manual consulting"
- Problem: Plans are hallucinated; no immediate verification
After (2026 reality):
- Sell validation to agents
- Revenue: per-assumption audited (or per-agent subscription)
- Value proposition: "Safe, trustworthy validation gate for autonomous loops"
- Advantage: Immediate, deterministic output (JSON); agents can compose it
Implementation Path
Phase 1: ✅ Done
- FermiSanityCheck validator
- DAG integration (MakeAssumptions → Validate → DistillAssumptions)
- Structured JSON output
Phase 2: 🔄 In Progress
- Domain profiles (Carpenter, Dentist, Personal, Startup, etc.)
- Auto-detection + normalization
- Ready for integration testing
Phase 3: Proposed
- Auditing API (agents call
/validatewith assumptions) - Trust scoring (confidence + grounding + domain consistency)
- Audit logs (track what agents relied on)
Key Questions for Simon
-
Does this positioning resonate? Are we solving the right problem for agents?
-
Should we lean harder into auditor narrative?
- Update PRs to frame FermiSanityCheck as "validation gate for agents"
- Reposition marketing toward agent platforms (not humans)
-
Build toward auditing API (Phase 3)
-
Or stay hybrid? Keep the plan-generator story + add auditing as a feature?
-
What does success look like in 2026?
- Agents paying for validation service?
- PlanExe as a required middleware in agentic workflows?
- Something else?
Next Steps
- Simon's feedback on positioning (auditor vs. hybrid)
- Phase 2 completion + integration testing
- PR updates (if auditor positioning is approved)
- Phase 3 design (auditing API + trust scoring)
End of proposal. Ready for Simon's thoughts.