Boost Initial Prompt
Author: Simon Strandgaard
Date: 2026-02-18
Status: Proposal
Tags: prompting, quality, normalization, assumptions, guardrails
Pitch
Add a pre-planning stage that rewrites weak user input into a stronger, concise, execution-ready initial prompt before the normal PlanExe pipeline starts.
The goal is not to overwrite user intent, but to preserve intent while repairing missing constraints, unrealistic parameters, and ambiguous wording.
This should be a first-class UX in plan creation UI, not only a backend/MCP behavior.
Problem
The initial prompt has disproportionate impact on downstream output quality.
Current failure patterns:
- Missing key fields (location, budget range, timeline realism, resource constraints).
- Unrealistic values (for example near-zero budget for multi-month, multi-person execution).
- Vague or noisy language that produces weak assumptions and low-quality levers.
- Overly specific or contradictory details that anchor the plan in non-critical noise.
When this happens, later stages can look polished but still be impractical because the seed prompt is weak.
There is already a quality gap between two prompt sources:
- MCP tool-driven prompt assembly that follows
prompt_examples(high structure, better constraints). - Direct human input that is often shorter, incomplete, or inconsistent.
This proposal targets that gap by lifting weak human prompts toward the same baseline used in MCP flows.
Feasibility
This is feasible as an additive step before find_plan_prompt and early assumption tasks.
Why now:
- We already have high-quality prompt examples in
worker_plan/worker_plan_api/prompt/data/simple_plan_prompts.jsonl. - We already document strong prompt shape in
docs/prompt_writing_guide.md. - MCP usage already enforces prompt-quality workflow via
docs/mcp/planexe_mcp_interface.md(prompt_examples-> formulate ->task_create). - PlanExe already contains assumption-oriented components (
assume/*) that can consume cleaner input. - The rewrite stage can be bounded, deterministic in structure, and audited with artifacts.
Constraints:
- Must preserve user intent and avoid silent scope changes.
- Must clearly label inferred fields versus user-provided fields.
- Must cap rewrite iterations to avoid latency/cost blowout.
Proposal
Introduce a Boost Initial Prompt module with three steps.
1) Extract
Parse user input into a structured draft.
- sector
- goal/outcome
- location(s)
- budget + currency
- timeline
- audience/stakeholders
- user role + experience
- hard constraints
2) Repair
Apply bounded transformations.
- Fill missing high-impact fields via explicit assumptions.
- Normalize units/currency and clarify timeframe granularity.
- Detect unrealistic budget-time-scope combinations and propose realistic alternatives.
- Remove low-signal verbosity while preserving domain details that affect execution.
3) Rewrite
Produce normalized prompt artifacts.
boosted_prompt: concise, execution-ready initial prompt.change_log: what changed and why.assumption_flags: inferred values requiring user confirmation or low-confidence handling.
UI Prompt Boost Loop
Add a dedicated step in the create-plan flow: Optimize Prompt.
Flow:
- User enters initial prompt.
- System runs critique and scoring.
- System generates exactly 3 improved prompt proposals.
- System ranks the 3 proposals plus the original prompt.
- User picks one candidate (or edits manually), then starts plan generation.
This gives a controlled back-and-forth loop before task_create/plan execution.
Critique and Ranking Mechanism
For each candidate (including original), produce a compact scorecard:
- completeness
- realism
- clarity
- constraint coverage (budget, timeline, location, scope)
- risk of contradiction
Return:
overall_score(0-100)strengths(short bullets)weaknesses(short bullets)highest_risk_gap(single biggest issue)
Generation policy for 3 proposals:
- Proposal A (Conservative): minimal edits, preserve original phrasing style.
- Proposal B (Balanced): strongest overall quality with moderate rewrites.
- Proposal C (Aggressive): larger structural rewrite for maximum clarity/feasibility.
Selection default:
- Preselect top-ranked candidate.
- Always show why it ranked highest.
- Keep original prompt selectable to preserve user control.
Workflow
Suggested flow:
- Receive raw user prompt.
- Run structure extraction.
- Score prompt quality (completeness + realism + clarity).
- If score below threshold, run repair + rewrite once.
- Re-score; if still below threshold, run one final constrained rewrite.
- Pass
boosted_promptinto existing planning pipeline. - Persist artifacts for debugging and A/B testing.
UI variant:
- User submits initial prompt in UI.
- Run critique + generate 3 candidate improvements.
- Rank original + 3 candidates.
- User chooses one and confirms.
- Send selected prompt into normal pipeline.
Prompt Quality Score
Use a transparent score to gate rewrites:
- Completeness (0-40): key fields present and parseable.
- Realism (0-35): budget/timeline/scope coherence.
- Clarity (0-25): concise, non-contradictory, actionable wording.
Decision rule:
score >= 75: use original (or minimal normalization only).score < 75: trigger boost stage.
Integration Points
- Entry point before
worker_plan_internal/plan/find_plan_prompt.py. - Shared assumptions path with
worker_plan_internal/assume/make_assumptions.py. - Optional report section in plan output: “Initial Prompt Boost Summary”.
- Prompt catalog logging for A/B comparisons.
- Prompt-shape alignment with:
docs/prompt_writing_guide.mddocs/mcp/planexe_mcp_interface.mdworker_plan/worker_plan_api/prompt/data/simple_plan_prompts.jsonl(including MCP-curated examples)
MCP Baseline Alignment
Use MCP prompt_examples as the reference quality target for rewritten human prompts.
Concretely:
- Extract structural patterns from MCP examples (scope, budget, timeline, location, success criteria).
- Rewrite weak human prompts to match that structure without changing core intent.
- Track “distance-to-baseline” before and after rewrite for A/B analysis.
Data Artifacts
Add run artifacts:
prompt/raw_prompt.txtprompt/boosted_prompt.txtprompt/boost_change_log.jsonprompt/boost_quality_score.jsonprompt/boost_candidates.jsonprompt/boost_ranking.json
Recommended fields in boost_change_log.json:
fieldoriginal_valuenew_valuereasonconfidencerequires_user_confirmation
Recommended fields in boost_candidates.json:
candidate_id(original,A,B,C)strategy(conservative,balanced,aggressive)prompt_textscorecard
Recommended fields in boost_ranking.json:
- ordered candidate list
- score deltas
- top-choice rationale
Phased Implementation
Phase A: Baseline Booster
- Implement extraction, single-pass rewrite, and quality scoring.
- Log artifacts and route boosted prompt into pipeline.
Phase B: Realism Guardrails
- Add budget-time-scope plausibility checks with bounded alternatives.
- Add low-confidence flags for missing critical context.
Phase C: UI Optimization Loop
- Add create-plan UI step for critique, 3 proposals, and ranking.
- Allow user selection and final manual edits before generation.
Phase D: Adaptive Improvement
- Run A/B tests: raw prompt vs boosted prompt on identical tasks.
- Promote rewrite patterns that improve objective quality metrics.
Success Metrics
- Higher average plan quality rating for low-quality user inputs.
- Reduced rate of plans with obvious feasibility mismatches.
- Reduced manual prompt rewriting done by humans before run.
- Improved downstream stability (fewer contradiction flags in assumptions/review).
- Controlled overhead: boost stage adds limited latency and token cost.
- Percentage of UI users selecting one of the 3 boosted proposals.
- Win rate of top-ranked candidate versus original in downstream plan-quality evaluation.
Risks
- Over-normalization may remove useful nuance.
- Rewrite model may inject incorrect assumptions.
- Extra stage may increase cost/latency without sufficient quality gain.
Mitigations:
- Preserve intent constraints as highest priority.
- Require explicit marking of inferred values.
- Bound rewrite iterations to max 2 passes.
- Keep rollback option: run pipeline on original prompt when confidence is low.
Open Questions
- Should low-confidence inferred fields block execution or continue with warnings?
- Should users see and approve boosted prompts in UI before plan generation?
- Which quality metric should be canonical for A/B promotion decisions?