Routing & Hokusai
Wavemill gets better over time by learning which models perform best on which kinds of work. That learning loop is what turns wavemill mill from simple automation into a self-improving software factory.
Routing
For each task, Wavemill can choose different models and execution depths for planning, coding, and review.
Routing considers:
- task type and risk signals
- historical eval performance on similar work
- expected success rate
- expected cost
The goal is not to pick one best model globally. The goal is to pick the best workflow for this task.
Stable Routing Metadata
Eval records now preserve router attribution as structured fields on routingDecision:
decisionPolicyVersion: the policy surface that actually made the decisionrouteMode: the emitted route strategy, such asheuristic,stage-aware,hokusai, orpolicyrouteArtifactSchemaVersion: the route artifact shape versionpolicyResolverVersion: the policy-resolution helper versionoperatingModeDependency: quota operating mode, separate from policy source
Current stable decisionPolicyVersion identifiers are:
baselineheuristicheuristic-fallbackstage-awarehokusaipolicyexpanded-route
operatingModeDependency is orthogonal to the policy source. For example, a route may be emitted by the stage-aware router while also recording operatingModeDependency: "survival".
These fields are additive. Older eval records may omit them and remain valid.
Route Prediction Contract
Eval records can also carry two small optional router-analysis blocks:
routePrediction: the router’s falsifiable expectation for success, cost, confidence, risk, and a compact rationale/features summaryrouteCalibration: the comparison between that prediction and actual workflow outcomes such asworkflowCost,outcomes.success, duration, and intervention count
These fields are additive and intentionally small. They are meant for calibration and feedback loops, not for full autonomous change manifests.
CLI Transparency
When routing deviates from the normal path, wavemill mill prints a single concise line explaining why. Examples:
11:31:02 [router] constrained mode: claude-opus-4-8 quota is degrading; reserving it for high-complexity steps
11:31:02 [router] policy adjustment: coder claude-opus-4-8 -> claude-sonnet-4-6 (quota=degrading)
11:31:44 [coder] claude-opus-4-8 unavailable (quota); falling back to claude-sonnet-4-6
gpt-5.3-codex is intentionally excluded from active routing and fallback ladders because Codex with a ChatGPT account rejects that model with HTTP 400.
These lines appear only when quota state or fallback behavior changes the normal route. Healthy normal-mode runs stay silent.
Where The Data Comes From
By default, routing improves from your own repositories:
- Wavemill executes work.
evalscores the outcome.- The eval record is stored locally.
- Future tasks use that history for routing.
This means Wavemill can become more effective and more cost-efficient over time without requiring any shared dataset.
Challenge Mode
Challenge mode periodically runs the same task with two different model configurations. That produces direct comparison data instead of relying only on independent scores.
Challenge data helps answer questions like:
- which model handles refactors better
- which model is cheaper for low-risk tasks
- which model needs fewer review iterations on UI work
That comparison data makes routing more reliable over time.
Hokusai
Hokusai is the collective-intelligence layer for routing.
If you opt in, Wavemill can supplement your local eval history with shared signals gathered across many teams and tasks. This helps with cold starts and can improve routing quality before you have a large local dataset.
Use:
wavemill hokusai status
wavemill hokusai enable
wavemill hokusai disable
Hokusai is optional. The default model is:
- local learning from your own data
- collective learning only when explicitly enabled
Live Prediction Contract
Live Hokusai routing uses the public Model 30 prediction endpoint:
POST https://api.hokus.ai/api/v1/models/30/predict
Wavemill sends a nested inputs payload with:
inputs.task.descriptioninputs.task.task_type- optional
inputs.routing,inputs.context,inputs.workflow, andinputs.metadata
Wavemill expects predictions.recommended_strategy in the response and converts it into the internal WorkflowRouteDecision. If the request times out, auth fails, the API returns 4xx/5xx, or the response shape is invalid, Wavemill classifies the failure and falls back to local routing.
Contribution Contract
Contribution uploads are not the same thing as live Model 30 routing input.
The live /predict API gets task and routing constraints. Contribution upload gets observed workflow outcomes after the run finishes. Wavemill only queues privacy-safe contribution rows after redaction and validation.
Current supported row shapes are:
- public Submit Data rows with required
success_under_budget - stricter benchmark rows using
technical_task_router_row/v1
Optional compact inputs, cost, timing, harness, and benchmark metadata may be included when they pass the redaction allow-list. Raw eval payloads, task bodies, prompts, repository names before redaction, and secrets do not leave the machine.
Contribution Queue
Outcome and benchmark contribution uploads are separate from live Model 30 routing. Live prediction calls stay synchronous and continue to fall back to local routing when Hokusai is unavailable; Wavemill does not enqueue stale route requests.
When hokusai.contributions.enabled is true and user consent is valid, Wavemill stores redacted contribution rows under .wavemill/hokusai/ and later drains them to an explicitly configured contribution endpoint. There is no public default upload endpoint in repo config. If Hokusai has not published a stable contribution API for your environment, leave the endpoint unset and export rows for manual handling instead.
If no explicit contribution endpoint is configured, drain can export pending rows for manual submission instead of pretending upload succeeded. Transient failures such as timeouts, 429, and 5xx responses are retried with persisted backoff; permanent failures such as auth, schema, or malformed-row errors move to dead-letter with redacted operator-facing details only.
Contribution lifecycle history is stored in an append-only ledger at .wavemill/hokusai/ledger.jsonl. Accepted and rejected terminal events track idempotency key, Model 30, row count, timestamps, job/submission identifiers when present, and reward state (pending, none, awarded, unknown). Missing rewards are never inferred as zero.
wavemill hokusai status includes queue and ledger summary fields:
- pending queue count
- accepted submission count
- accepted row count
- rejected submission count
- last terminal submission
- known awarded tokens plus pending/none/unknown reward counts
Ledger summary deduplicates by idempotency key and returned job/submission IDs to avoid double-counting duplicate accepts or retries. Awarded totals are local known awards only, not a live Hokusai account balance.
Related Commands
wavemill millruns routing as part of the main factory loopwavemill routeshows the recommended planner, coder, and reviewer workflow for a taskwavemill evalinspects the outcome data that routing learns from
See Also
- Mill Mode — the default workflow
- CLI Reference — all commands and command groups
- Eval Mode — how outcomes are scored
- Adding Models — maintainer checklist for adding model support