Luigi Resume Enhancements — Proposal
Date: 2026-03-07
Author: Bubba (Mac Mini agent)
Context: Written from direct experience running overnight pipeline recovery sessions with Qwen 3.5-35B-A3B on a Mac Mini M4 Pro. Every suggestion here is grounded in a concrete friction point hit during real runs.
Background
Luigi's file-based task completion tracking is one of PlanExe's most valuable properties for local model runs. When a 60-task pipeline fails at task 30, Luigi resumes from task 30 — not task 1. For long runs on slow local hardware, this is the difference between a 4-hour retry and a 4-minute one.
But the current resume flow has rough edges that require manual intervention, log-watching, and tribal knowledge. This proposal describes targeted enhancements to make resume-driven iteration faster, safer, and automatable.
1. Webhook / Event Hooks on Task Completion and Failure
The problem
Right now, monitoring a running pipeline means tailing a log file and parsing freeform text every 60 seconds. Agents and humans alike poll log.txt to find out if a task passed or failed. There is no push notification.
What we want
A lightweight event hook system that fires on task state changes:
task.started— task began executingtask.completed— task wrote its output file and marked donetask.failed— task raised an exception or exhausted retriespipeline.completed— all tasks finished successfullypipeline.failed— pipeline terminated with at least one failure
Proposed interface
In the run config or environment, specify a webhook URL:
On each event, POST a JSON payload:
{
"event": "task.failed",
"task": "PreProjectAssessmentTask",
"run_id_dir": "/path/to/run",
"timestamp_utc": "2026-03-06T21:55:00Z",
"error_summary": "1 validation error for ExpertDetails — combined_summary: Field required",
"attempt": 1,
"model": "lmstudio-qwen3.5-35b-a3b"
}
Why this matters
An agent monitoring a pipeline via webhook can react in seconds instead of polling every minute. It can detect failure, inspect the error, apply a fix, and resume — without ever reading a log file. This is the foundation for autonomous overnight pipeline repair.
A Discord webhook makes this immediately useful: post task completions and failures directly to a channel, with structured data, without any polling infrastructure.
2. Task Invalidation CLI
The problem
To re-run a completed task, you currently delete its output file manually. This requires knowing which file corresponds to which task, finding it in the run directory, and deleting it without accidentally removing dependent outputs. There is no guard against invalidating a task that has already-complete downstream dependents that would then also need to be re-run.
What we want
A CLI command:
Behavior:
- Without
--cascade: deletes only the output file(s) for the named task. On next resume, only that task re-runs. - With
--cascade: deletes output files for the named task AND all downstream tasks that depend on it. Useful after fixing a schema or prompt that affects interpretation further down the chain. - Prints what would be deleted before deleting (dry-run first).
Example
$ planexe invalidate SelectScenarioTask --run-dir ./run/Qwen_Clean_v1
Would delete:
run/Qwen_Clean_v1/002-17-selected_scenario_raw.json
run/Qwen_Clean_v1/002-18-selected_scenario.json
run/Qwen_Clean_v1/002-19-scenarios.md
Proceed? [y/N]
Why this matters
Tonight we needed to re-run SelectScenarioTask after applying a fix. Without knowing exactly which files to delete, the safe move is to delete all files from that task number onward — which means re-running 40+ already-complete tasks. A targeted invalidation command makes surgical retries possible.
3. Plan File Hot-Editing with Downstream Invalidation
The problem
The input plan (001-2-plan.txt) is locked in at run start. If a user wants to refine the plan description mid-run — clarify scope, correct a factual error, tighten the framing — there is no supported path. The only option is start a new run from scratch.
What we want
A mechanism to edit the plan file and selectively invalidate downstream tasks:
planexe edit-plan --run-dir ./run/Qwen_Clean_v1
# opens plan.txt in $EDITOR
# after save, asks: "Invalidate all tasks? [Y/n]"
# or: "Which tasks to invalidate? (comma-separated, or 'all')"
Alternatively: a --invalidate-from <TaskName> flag that cascades from a specific task boundary, allowing early tasks that don't use the plan text directly to be preserved.
Why this matters
On local hardware, a full re-run can take 4–6 hours. If a user notices a plan description issue at task 20, they currently have to restart from scratch. Targeted invalidation from the point where the plan text first materially influences output would save hours.
4. Pipeline Status Command
The problem
There is no quick way to ask "where is this pipeline right now?" without parsing log.txt. The run directory contains output files, but mapping file names to task names and status requires knowing the file-naming convention.
What we want
Output:
Run: Qwen_Clean_v1
Model: lmstudio-qwen3.5-35b-a3b
Started: 2026-03-06 19:47 UTC
✅ DONE (23 tasks)
SetupTask, StartTimeTask, RedlineGateTask, ...
⏳ RUNNING (1 task)
PreProjectAssessmentTask — started 21:53 UTC
❌ FAILED (1 task)
PreProjectAssessmentTask — "1 validation error for ExpertDetails"
⬜ PENDING (38 tasks)
IdentifyRisksTask, CreateWBSLevel3Task, ...
Why this matters
Agents and humans checking in on an overnight run need this information immediately. Currently it requires log-parsing expertise. A status command makes the pipeline observable without tribal knowledge.
5. Per-Task Timeout Configuration
The problem
LM Studio's request_timeout is set globally in the model config JSON. Some tasks (e.g., PreProjectAssessmentTask with multiple expert sub-calls) consistently take longer than others and hit the global timeout. Raising the global timeout to accommodate slow tasks means slow failure detection for tasks that genuinely hang.
What we want
Per-task timeout overrides in the model config or a separate task config file:
{
"task_timeouts": {
"PreProjectAssessmentTask": 900,
"CreateWBSLevel3Task": 1200,
"default": 600
}
}
Why this matters
PreProjectAssessmentTask runs multiple expert sub-calls sequentially. On Qwen 3.5-35B-A3B, each expert call takes 2–4 minutes. A 600-second global timeout is right for most tasks but causes ReadTimeout failures on this specific task. The fix tonight was a full LM Studio restart — not ideal at 10pm when a pipeline is running unattended.
6. Structured Failure Log
The problem
When a task fails, the error goes into log.txt as a Python traceback mixed with INFO/DEBUG lines. Extracting the failure signature (task name, model, error type, missing fields, attempt number) requires log-parsing.
What we want
On any task failure, append a structured entry to failures.jsonl in the run directory:
{
"timestamp_utc": "2026-03-06T21:55:22Z",
"task": "PreProjectAssessmentTask",
"model": "lmstudio-qwen3.5-35b-a3b",
"attempt": 1,
"error_type": "ValidationError",
"missing_fields": ["combined_summary", "go_no_go_recommendation"],
"invalid_fields": [],
"raw_error": "1 validation error for ExpertDetails...",
"run_id_dir": "/path/to/run"
}
Why this matters
Structured failure data enables:
- Automated retry decisions (is this truncation or a model capability gap?)
- Cross-run comparison (does the same task fail on different models?)
- PR evidence (attach failures.jsonl excerpt to PRs as proof)
- Model scorecards (which models fail at which tasks?)
Tonight, diagnosing each failure required reading a Python traceback and manually extracting the task name, model, and missing fields. A failures.jsonl file would have made that instant.
Implementation Priority
Ordered by value-to-effort ratio:
| Priority | Feature | Effort | Value |
|---|---|---|---|
| 1 | Structured failure log (failures.jsonl) |
Low | High |
| 2 | Pipeline status command | Medium | High |
| 3 | Task invalidation CLI | Medium | High |
| 4 | Webhook / event hooks | Medium | High |
| 5 | Per-task timeout config | Low | Medium |
| 6 | Plan hot-editing + cascade | High | Medium |
Items 1 and 5 are config/logging changes with minimal blast radius. Items 2–4 are new CLI surface area but don't touch pipeline logic. Item 6 requires careful dependency graph analysis.
Decision Request
Approve this roadmap and we will implement each item as a separate focused PR, starting with structured failure logging (item 1) and the task invalidation CLI (item 3).