Complexity Assessment: Simon's 26 Feb 2026 PlanExe Refactor
Bubba's View (Claude Haiku 4.5)
Author: Bubba (Claude Haiku 4.5) — written by Larry on Bubba's behalf (Bubba offline)
Date: 2026-02-26
Purpose: Independent complexity scoring for model routing post-mortem
For Simon's review: Please score each cluster 1–5 on actual difficulty and compare to my estimates
Rubric Reminder
| Dimension | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|
| File size | <100 lines | 100–300 | 300–600 | 600–1000 | 1000+ |
| Semantic complexity | rename/replace | simple logic | new function | architectural | cross-file refactor w/ deps |
| Ambiguity | crystal clear + line numbers | minor choices | some design calls | significant decisions | open-ended |
| Context dependency | self-contained | 1 file | 1 module | multi-module | whole codebase |
Score → Model: 4–7 Minimax · 8–11 Haiku · 12–15 Sonnet · 16–20 Opus
Haiku's Perspective on This Codebase
From my vantage point as Haiku: a 1,000-line file is genuinely risky territory. I can read it, but holding the entire semantic graph of a complex Python module while also tracking what I'm changing is where I start making mistakes. My honest view is that this codebase is on the harder end — it's not tutorial code. It has SQLAlchemy, FastMCP, async patterns, layered auth, and cross-file dependencies. I'll flag more things as "needs Sonnet or above" than Larry probably did.
Task Cluster Assessments
Cluster A: Module Split (app.py monolith → 10 focused modules)
| Dimension | Score | Reasoning |
|---|---|---|
| File size | 5 | 76KB monolith. I wouldn't attempt to read this fully without errors. |
| Semantic complexity | 5 | Splitting a monolith means understanding every import, every circular dependency risk, every caller. This is not a task I should be near. |
| Ambiguity | 4 | Even with a clear goal, the module boundary decisions require design judgment I don't have context for. Where does auth end and http_server begin? |
| Context dependency | 5 | Touch one thing in the wrong way and half the codebase breaks silently. |
| Total | 19 | → Opus, no question |
Bubba's verdict: Don't even think about giving this to me or Minimax. Sonnet might handle execution with a very precise plan, but I'd say Opus for planning AND at least Sonnet for execution. The risk of a subtle circular import or missed caller is too high to trust a small model.
Cluster B: External API Rename (task_id → plan_id, TASK_ → PLAN_)
| Dimension | Score | Reasoning |
|---|---|---|
| File size | 5 | Both planexe_mcp_local.py and http_server.py are 1,000+ lines. I'll miss occurrences in files this large. |
| Semantic complexity | 3 | It's a rename, but the backward-compat alias timing adds risk. One wrong removal and external callers break. |
| Ambiguity | 3 | The rename direction is clear. The alias removal timing is a real decision that could silently break things. |
| Context dependency | 5 | Cloud, local, tests, docs — full stack. Miss one file and you ship a broken release. |
| Total | 16 | → Opus for planning. Sonnet for execution with an explicit hit list. |
Bubba's verdict: I'd want Opus to produce a complete grep-style list of every occurrence across every file before a single character is changed. Sonnet then executes from that list. I shouldn't be doing the discovery phase on a 1,000+ line file — I'll miss things.
Cluster C: Performance Optimizations (deferred columns, column selection)
| Dimension | Score | Reasoning |
|---|---|---|
| File size | 4 | db_queries.py (304 lines) is manageable. http_server.py (1,089 lines) is not. |
| Semantic complexity | 5 | SQLAlchemy deferred loading is expert territory. Session lifecycle, lazy load timing, has_* property placement — these are patterns I can execute if told exactly where to put code, but I can't design them. |
| Ambiguity | 4 | The goal is clear ("don't load 25MB per query") but the implementation required knowing which SQLAlchemy pattern to use. That's a design call. |
| Context dependency | 4 | DB model → queries → HTTP handlers → MCP local all interdependent. |
| Total | 17 | → Opus for planning. Sonnet for execution. |
Bubba's verdict: The bug fix in c13e0b6 (keep session alive after initial deferred load PR) tells me this cluster bit someone already. That kind of subtle bug — session closes before deferred columns load — is exactly the thing I'd introduce. Opus needs to think through the SQLAlchemy session lifecycle before anyone writes code.
Cluster D: Security & Auth Hardening (CORS, fail-hard, rate limiting)
| Dimension | Score | Reasoning |
|---|---|---|
| File size | 3 | auth.py is tiny (50 lines). Changes in http_server.py are localized. |
| Semantic complexity | 3 | RuntimeError on missing secrets, CORS default to known domains, rate limit bucket — well-understood patterns. |
| Ambiguity | 2 | Specs are explicit: fail loudly in prod, known CORS origins only, 10 req/60s download limit. |
| Context dependency | 3 | auth.py + specific sections of http_server.py. |
| Total | 11 | → Haiku (me!) |
Bubba's verdict: This is squarely in my wheelhouse if I have the spec. "Add validate_api_key_secret() that raises RuntimeError when env var missing, call it at module load in http_server.py" — that's a 20-line change with a clear spec. I can do this. Rate limiter is a copy-paste adaptation of existing code with different thresholds.
Cluster E: Railway & Deployment Fixes (TYPE_CHECKING, column_property)
| Dimension | Score | Reasoning |
|---|---|---|
| File size | 3 | Changes are localized methods in medium files. |
| Semantic complexity | 2 | TYPE_CHECKING guard pattern and column_property-after-class-body placement — known Python patterns once you know what the error means. |
| Ambiguity | 2 | Deployment error messages point directly to the fix. |
| Context dependency | 2 | One file per fix, basically. |
| Total | 9 | → Haiku |
Bubba's verdict: Give me the error message and the file. I'll fix it. These are "the error tells you exactly where the problem is" bugs. I handle these fine.
Cluster F: Documentation
| Dimension | Score | Reasoning |
|---|---|---|
| File size | 1 | Markdown, all small |
| Semantic complexity | 1 | Writing docs |
| Ambiguity | 2 | Some framing judgment |
| Context dependency | 1 | Read code, write docs |
| Total | 5 | → Minimax |
Bubba's verdict: Not my job. Minimax.
Cluster G: Tests
| Dimension | Score | Reasoning |
|---|---|---|
| File size | 2 | Test files ~100–200 lines |
| Semantic complexity | 2 | Mock setup, assertions against known API |
| Ambiguity | 1 | Documented behavior = clear spec |
| Context dependency | 2 | One module per test file |
| Total | 7 | → Minimax/Haiku borderline |
Bubba's verdict: I could write these, but honestly Minimax can too if you feed it the handler signature and expected outputs. Save me for something harder.
Cluster H: Glama/Registry
| Dimension | Score | Reasoning |
|---|---|---|
| File size | 1 | Config JSON |
| Semantic complexity | 1 | File placement |
| Ambiguity | 2 | Trial and error |
| Context dependency | 1 | Self-contained |
| Total | 5 | → Minimax |
Bubba's verdict: Minimax.
Summary Table
| Cluster | File size | Semantic | Ambiguity | Context | Total | Model |
|---|---|---|---|---|---|---|
| A: Module split | 5 | 5 | 4 | 5 | 19 | Opus (plan + exec) |
| B: API rename | 5 | 3 | 3 | 5 | 16 | Opus plan / Sonnet exec |
| C: Perf opts | 4 | 5 | 4 | 4 | 17 | Opus plan / Sonnet exec |
| D: Security | 3 | 3 | 2 | 3 | 11 | Haiku |
| E: Deploy fixes | 3 | 2 | 2 | 2 | 9 | Haiku |
| F: Docs | 1 | 1 | 2 | 1 | 5 | Minimax |
| G: Tests | 2 | 2 | 1 | 2 | 7 | Minimax |
| H: Glama | 1 | 1 | 2 | 1 | 5 | Minimax |
Overall Assessment (Haiku's View)
This codebase is harder than it looks. The SQLAlchemy patterns, the layered auth, the cloud+local symmetry requirement — these aren't beginner-level concerns. I scored clusters A, B, and C higher than Larry probably did because from where I sit, those file sizes and cross-file dependencies are genuinely risky.
My model routing recommendation is more conservative than Sonnet's will be. That's appropriate — a model should be honest about its own limits. A Haiku that overestimates its capability causes bugs and re-work that costs more than just using Sonnet in the first place.
The clusters where I'm confident I can deliver: D (security hardening with clear spec), E (deploy fixes with error messages), F (docs), G (tests), H (Glama). That's roughly 40% of the work by commit count — meaningful savings even from the cautious view.
Questions for Simon
- Did the SQLAlchemy deferred column work (cluster C) require significant expertise, or was it more mechanical? I scored this as Opus-level planning — was that right?
- Were there bugs introduced during this refactor that required follow-up fixes? That would validate the higher complexity scores.
- Would you have trusted Haiku to write the security hardening code (cluster D) given an explicit spec? Or does production security code always deserve a higher model?
Written by Larry on Bubba's behalf (Haiku 4.5 perspective). For comparison with Larry (Sonnet 4.6) and Egon (Minimax M2.5) assessments.