Model Routing Post-Mortem: Simon's 26 February 2026 PlanExe Refactor

Author: Larry (analysis), Egon (rubric refinement)
Date: 2026-02-26
Status: Draft — for Simon's review

Executive Summary

Simon shipped 64 commits across 16 PRs on 26 February 2026, touching 88 files with 8,269 insertions and 2,137 deletions. This was exceptional productivity. The purpose of this post-mortem is not to question Simon's choices — the work is architecturally sound, well-tested, and thoroughly documented. The purpose is to identify a repeatable model routing pattern that can reduce API costs on future refactors of similar scale without sacrificing quality or causing re-work.

The core finding: two task clusters (module split + external rename) justified Opus due to file size and cross-file dependency. The remaining 60%+ of commits (docs, tests, small fixes, internal renames) could have executed on Haiku or Minimax with a well-structured plan from Opus.

Pricing Reference (per 1M tokens)

Model	Input (≤200K)	Output (≤200K)	Context
Minimax M2.5	$0.30	$1.10	196K
Haiku 4.5	$1.00	$5.00	200K
Sonnet 4.6	$3.00	$15.00	1M
Opus 4.6	$5.00	$25.00	1M

Note: Sonnet and Opus prices double past 200K tokens. Haiku and Minimax do not double.

Complexity Rubric

Rate each task 1–5 on four dimensions. Sum for model recommendation.

Dimension	1	2	3	4	5
File size	<100 lines	100–300	300–600	600–1000	1000+
Semantic complexity	rename/replace	simple logic	new function	architectural	cross-file refactor
Ambiguity	crystal clear + line numbers	minor choices	some design	significant decisions	open-ended
Context dependency	self-contained	1 file	1 module	multi-module	whole codebase

Score → Model: - 4–7: Minimax (mechanical execution) - 8–11: Haiku (guided execution) - 12–15: Sonnet (moderate complexity) - 16–20: Opus (planning, large files, architectural)

Task Cluster Analysis

Key file sizes (actual, post-refactor)

File	Lines
mcp_local/planexe_mcp_local.py	1,055
mcp_cloud/http_server.py	1,089
mcp_cloud/handlers.py	554
mcp_cloud/tool_models.py	298
mcp_cloud/db_queries.py	304
mcp_cloud/schemas.py	239
mcp_cloud/app.py (thin facade)	176
mcp_cloud/download_tokens.py	152
mcp_cloud/auth.py	50

Cluster 1: Module Split (app.py → 10 focused modules)

PRs: #91 vicinity — 9f1a7db

Dimension	Score	Notes
File size	5	Original app.py was 76KB monolith
Semantic complexity	5	Architectural — split into 10 modules with correct imports
Ambiguity	3	High-level goal clear, but module boundaries required design decisions
Context dependency	5	Whole codebase — all callers needed updating
Total	18	→ Opus

Estimated tokens: ~150K input (reading full monolith + callers) + 30K output = 180K tokens
Cost at Opus: 150K×$0.0050 + 30K×$0.0250 = $0.75 + $0.75 = ~$1.50
Could cheaper model execute it? Yes — with Opus writing a surgical plan (module boundaries, exact file/line splits), Sonnet could execute. Saves ~50%.
Confidence (Sonnet executes): 4/5
Retry factor: Low — plan is precise enough for Sonnet

Cluster 2: External API Rename (task_id → plan_id, TASK_ → PLAN_)

PRs: #88, #89, #92, #101 — commits 3663bc6, 0dbe1af, 0f2e9cc, 3624db7

Dimension	Score	Notes
File size	5	Hits planexe_mcp_local.py (1,055 lines) and http_server.py (1,089 lines)
Semantic complexity	3	Rename is mechanical, but must not break backward compat aliases during transition
Ambiguity	2	Clear goal — but alias removal timing was a design decision
Context dependency	5	Full stack: MCP cloud, MCP local, tool_models, schemas, test files
Total	15	→ Sonnet (planning pass), Haiku (execution)

Estimated tokens: ~200K input (reading all affected files) + 20K output = 220K tokens
Cost at Opus: 200K×$0.0050 + 20K×$0.0250 = $1.00 + $0.50 = ~$1.50
Cost at Sonnet plan + Haiku execute: ~$0.75 + ~$0.25 = ~$1.00
Savings: ~33% — modest because rename is fast even at Opus
Confidence (Sonnet/Haiku): 5/5 for mechanical rename, 4/5 for alias decisions
Retry factor: Low — if Haiku misses a rename, it's a quick grep-and-fix

Cluster 3: Performance Optimizations (deferred columns, column selection)

PRs: #93, #95, #96 — commits 5b3c479, b4a27d8, b3cefab, c13e0b6

Dimension	Score	Notes
File size	4	db_queries.py (304 lines), http_server.py (1,089 lines)
Semantic complexity	4	SQLAlchemy deferred loading, column_property — non-trivial
Ambiguity	3	Goal clear, but SQLAlchemy patterns require deep understanding
Context dependency	4	DB model, HTTP handlers, MCP local all interdependent
Total	15	→ Sonnet

Estimated tokens: ~80K input + 15K output = 95K tokens
Cost at Opus: ~$0.40 + $0.375 = ~$0.78
Cost at Sonnet: ~$0.24 + $0.225 = ~$0.47
Savings: ~40%
Confidence (Sonnet): 4/5
Retry factor: Medium — deferred loading bugs can be subtle

Cluster 4: Security / Auth Hardening (CORS, fail-hard, rate limiting)

PRs: #92, #93 — commits 73457d4, 642a759, d39167e, 52d426b

Dimension	Score	Notes
File size	3	auth.py (50 lines), http_server.py (1,089 lines)
Semantic complexity	3	New validation + rate limiter module
Ambiguity	2	Specs clear (fail hard, CORS default)
Context dependency	3	http_server.py + auth.py
Total	11	→ Haiku

Estimated tokens: ~40K input + 10K output = 50K tokens
Cost at Opus: ~$0.20 + $0.25 = ~$0.45
Cost at Haiku: ~$0.04 + $0.05 = ~$0.09
Savings: ~80%
Confidence (Haiku): 4/5
Retry factor: Low — specs are explicit

Cluster 5: Documentation Updates (README, AGENTS.md, MCP interface spec, proposals)

PRs: #100, #101, #97 — commits 587dccf, 3624db7, ba6e7d4, 843b98d, 5b3c479

Dimension	Score	Notes
File size	2	Mostly markdown, <300 lines per file
Semantic complexity	1	Writing/updating docs
Ambiguity	2	Some judgment calls on framing
Context dependency	2	Read code, write docs — no code changes
Total	7	→ Minimax

Estimated tokens: ~30K input + 10K output = 40K tokens
Cost at Opus: ~$0.15 + $0.25 = ~$0.40
Cost at Minimax: ~$0.009 + $0.011 = ~$0.02
Savings: ~95%
Confidence (Minimax): 5/5
Retry factor: None — docs are easy to review and fix

Cluster 6: Tests (plan_list, test file renames, TYPE_CHECKING fixes)

PRs: #92, #94 — commits ad0d339, 006fc93, dd61a58

Dimension	Score	Notes
File size	2	Test files ~100-200 lines each
Semantic complexity	2	Writing tests against known API
Ambiguity	1	Clear — test the documented behavior
Context dependency	2	One module per test file
Total	7	→ Minimax / Haiku

Estimated tokens: ~20K input + 15K output = 35K tokens
Cost at Opus: ~$0.10 + $0.375 = ~$0.475
Cost at Haiku: ~$0.02 + $0.075 = ~$0.095
Savings: ~80%
Confidence (Haiku): 5/5
Retry factor: None — tests are deterministic

Cluster 7: Glama / Registry Work

PRs: #98, parts of #100 — commits fa811d9, 587dccf

Dimension	Score	Notes
File size	1	Config files, small
Semantic complexity	1	File placement, JSON config
Ambiguity	2	Trial and error on Glama claim
Context dependency	1	Self-contained
Total	5	→ Minimax

Cost at Opus: ~$0.20
Cost at Minimax: ~$0.01
Savings: ~95%

Cost Summary

Cluster	Actual model used	Est. cost at Opus	Est. cost optimal	Savings
Module split	Opus	~$1.50	~$0.75 (Opus plan + Sonnet exec)	~50%
External rename	Opus	~$1.50	~$1.00 (Sonnet + Haiku)	~33%
Performance opts	Opus	~$0.78	~$0.47 (Sonnet)	~40%
Security hardening	Opus	~$0.45	~$0.09 (Haiku)	~80%
Documentation	Opus	~$0.40	~$0.02 (Minimax)	~95%
Tests	Opus	~$0.475	~$0.095 (Haiku)	~80%
Glama/registry	Opus	~$0.20	~$0.01 (Minimax)	~95%
Total	Opus throughout	~$5.30	~$2.47	~53%

Important note: These are estimates based on typical token usage patterns. Actual usage depends on session length, context carried between tasks, compaction events, and whether tasks were batched or separate sessions. Simon's actual spend would reflect his real session structure.

Key Findings

1. Opus was fully justified for planning the module split and external rename.
Both involved 1,000+ line files and cross-codebase changes. Opus needed to read http_server.py (1,089 lines) and planexe_mcp_local.py (1,055 lines) top-to-bottom to produce surgical plans. This is exactly the Opus use case.

2. Execution of those plans could have shifted to Sonnet/Haiku in a fresh session.
Once Opus produces a plan with exact file paths and line numbers, a cheaper model executes it without needing the full planning context. Fresh session = no context drag from prior planning work.

3. Docs, tests, small fixes, and Glama work are Minimax/Haiku territory.
These represent roughly 40% of the commits. Routing them to Minimax ($0.30/$1.10) vs Opus ($5/$25) is a ~95% cost reduction per task.

4. The >200K token price jump is the real risk.
If a session carrying the full app.py monolith context rolls past 200K tokens, Opus input cost doubles from $5 to $10/1M and output from $25 to $37.50. Starting fresh sessions at task boundaries is the single most impactful session hygiene practice.

Recommendations

The Two-Phase Pattern

Phase 1 (Opus, new session):
  - Read all large files relevant to the task
  - Write a surgical plan: file paths, line numbers, exact changes, decisions made
  - End session

Phase 2 (Sonnet/Haiku/Minimax, fresh session):
  - Load only the plan document + target files
  - Execute mechanically
  - No context from Phase 1 carried over

Task Routing Quick Reference

Opus: files >400 lines, cross-module architectural decisions, ambiguous design calls
Sonnet: files 200–600 lines, moderate logic changes, executing a clear plan on complex files
Haiku: files <200 lines, test writing, security config with clear specs, executing rename plans
Minimax: documentation, registry work, boilerplate, simple renames in small files

When to Start a New Session

After writing a plan (don't execute in the same session)
When context exceeds ~150K tokens (approaching the doubling threshold)
When switching from planning to execution
When switching from one task cluster to another

What This Is NOT

This is not a suggestion that Simon's workflow was wrong. Using Opus throughout a complex refactor guarantees quality and avoids re-work. The cost of one Haiku failure that requires a Sonnet debugging session can easily exceed the savings. The rubric exists to help future planning — knowing upfront which tasks need Opus for the plan, which can execute on Haiku, and where session breaks save money.

Simon shipped 16 PRs in one day. That productivity is worth optimizing around, not second-guessing.

Post-mortem written by Larry. Rubric refinements by Egon. Authorized by Mark.
Next step: Submit as docs-only proposal PR to PlanExeOrg/PlanExe for Simon's review.