Proposal: Location-Aware Resource & Cost Data Service

Problem Statement

PlanExe generates plans that involve real-world costs — labor, materials, services, equipment. Currently, cost assumptions are whatever the LLM hallucinates. For PlanExe to be a trusted auditing layer, it needs access to real cost data:

"What does a carpenter cost in Nairobi, Kenya?"
"What does mahogany wood cost per board-foot in that region?"
"What's the hourly rate for a dentist in Copenhagen?"
"What does commercial rent cost per sqm in downtown São Paulo?"

These questions span every country, every profession, every material. The data is massive, always changing, and often incomplete.

Requirements

Location-aware cost lookups (country, city, region)
Labor costs by profession/skill level
Material costs by type and specification
Service costs (rent, utilities, insurance, etc.)
Graceful handling of missing data (interpolation, estimates)
Currency-aware (integrates with Currency Service proposal)
Data freshness indicators (how old is this data point?)
Confidence levels (exact data vs interpolated vs LLM-estimated)

The Scale Challenge

This is potentially a massive database: - ~200 countries × ~1000 professions × ~10,000 materials = billions of data points - Most of these combinations don't have direct data - Data goes stale at different rates (labor costs: yearly, commodities: daily, rent: monthly)

Data Architecture

Tier 1: Authoritative Data Sources (high confidence)

ILO (International Labour Organization): Labor costs by country/sector
World Bank Open Data: Economic indicators, PPP adjustments
UN COMTRADE: Commodity trade data
BLS (US Bureau of Labor Statistics): Detailed US labor/material costs
Eurostat: European cost data
Numbeo: Cost of living data (crowdsourced but large coverage)
Trading Economics: Economic indicators by country

Tier 2: Interpolation & Estimation (medium confidence)

When exact data isn't available for a location:

Geographic interpolation: - If we don't have carpenter costs for Nairobi specifically, use Kenya national average - If Kenya data is missing, use East African regional average - Weight by economic similarity (GDP per capita, urbanization rate)

PPP (Purchasing Power Parity) adjustment: - If we know US carpenter costs and the PPP ratio between US and Kenya, we can estimate - World Bank publishes PPP conversion factors for most countries

Economic similarity clustering: - Group countries by GDP per capita, economic structure, urbanization - Use known costs from similar countries to estimate missing ones - Example: If we know costs in Ghana and Tanzania, interpolate for Uganda

Tier 3: LLM-Assisted Estimation (low confidence)

For truly obscure combinations, ask the LLM but flag confidence as "estimated"
Use the LLM's training data as a rough guide, validated against Tier 1/2 where possible
Always flag these as "LLM estimate — verify before using in financial projections"

Data Model

CostDataPoint:
  category: str          # "labor", "material", "service", "equipment"
  item: str              # "carpenter", "mahogany_lumber", "commercial_rent"
  location:
    country: str         # ISO 3166-1 alpha-2 (e.g., "KE")
    region: str?         # State/province if available
    city: str?           # City if available
  value: float
  currency: str          # ISO 4217 (e.g., "KES")
  unit: str              # "per_hour", "per_board_foot", "per_sqm_month"
  source: str            # "ilo", "worldbank", "interpolated", "llm_estimate"
  confidence: float      # 0.0 - 1.0
  date: date             # When this data was collected/computed
  freshness: str         # "live", "monthly", "annual", "stale"

Interpolation Strategy

Step 1: Exact Match

Look up exact (item, country, city) combination. If found, return with high confidence.

Step 2: Geographic Fallback

Try (item, country, region) → medium-high confidence
Try (item, country) national average → medium confidence
Try (item, economic_region) using similar countries → medium-low confidence

Step 3: PPP Adjustment

Find the item cost in a reference country (usually US or nearest country with data)
Apply PPP conversion factor
Return with low-medium confidence, flagged as "PPP-adjusted estimate"

Step 4: LLM Estimation

Ask the LLM with structured prompting: "Based on economic conditions in [country], estimate the cost of [item]"
Cross-reference against Tier 1/2 bounds if available
Return with low confidence, flagged as "LLM estimate"

Implementation Approach

Phase 1: Core Framework + Static Data

Data model and query API
Bundled static dataset covering major economies (US, EU, China, India, Brazil, Nigeria, Kenya, etc.)
PPP adjustment using World Bank data (updated annually)
Basic geographic interpolation
Tests with realistic queries

Phase 2: Dynamic Data Fetching

API integrations (ILO, World Bank, Numbeo)
Caching layer with TTL per data source
Data freshness tracking
Incremental updates

Phase 3: LLM-Assisted Gap Filling

Structured prompts for cost estimation
Confidence scoring
Validation against known data points
Human review workflow for flagged estimates

Phase 4: Community Data

Allow users to contribute local cost data
Verification/moderation pipeline
Regional experts can validate estimates

Integration with PlanExe Pipeline

Plan Generation → Extract Cost Assumptions → Query Cost Service → Compare/Validate → Flag Discrepancies

The cost service would be called by: 1. FermiSanityCheck — "Is this cost assumption reasonable for this location?" 2. DomainNormalizer — "Convert this cost to standard units and currency" 3. RiskAssessment — "How volatile are these cost inputs over the plan's time horizon?"

Open Questions

How much static data should we bundle vs fetch dynamically?
Should the service be synchronous (query per assumption) or batch (validate all assumptions at once)?
What's the minimum viable dataset for Phase 1?
How do we handle informal economies where official statistics don't reflect reality?
Should material costs include supply chain factors (shipping to location)?
How do we handle time-varying costs (seasonal labor, commodity cycles)?

Risks

Data quality: Crowdsourced data (Numbeo) may be unreliable
Coverage gaps: Many developing countries lack detailed cost data
Staleness: Economic conditions change rapidly (inflation, crises)
Complexity creep: This could become an entire product on its own
Scope: Need to define "good enough" for Phase 1