Proof-gated cutover — see the replay/eval

Your agents should know
what to do next.

Munin ingests every past Claude, Codex, and IDE session on your machine, learns what actually works for you, and produces grounded next-step suggestions tied to your own strategy.

~/your-project
$ munin resume --format prompt
<startup_memory_brief scope="user" generated_at="2026-04-14T09:02:11Z">

# ── what I know ────────────────────────────────────────────────
  role:              founder, non-technical, directs work
  primary_project:   sitesorted  (Next.js 16 · React 19 · Tailwind 4)
  active_projects:   Orchestratorv2 · Munin · Peakflow · textbee-outreach
  stack_locks:       EAG builder pipeline · ReCraft · Firecrawl · Resend

# ── how you work ───────────────────────────────────────────────
  voice:             direct, no hype, NZ English, no em-dashes as decoration
  autonomy:          high — ask only for destructive or cross-team actions
  verification:      always run tests + lint before claiming done
  file_rule:         focused edits, never git add -A, stage by path
  preferred_tools:   munin wrapper, Glob over find, typescript-lsp active

# ── what is active ─────────────────────────────────────────────
  - project: BMA eag-native build
    phase:        Phase 3 complete, Phase 4 in progress
    gates:        0–4f passed, 4g blocked on pager-manifest regeneration
    last_touch:   2026-04-13 21:44 NZDT
  - project: Munin website
    phase:        positioning locked, basic HTML shipping
    next_stage:   siterecord-eag rebuild

# ── open loops ─────────────────────────────────────────────────
  - BMA Phase 4–6 outstanding              severity: medium
  - Vercel alias drift on preview deploys  severity: low
  - replay-eval dev-public proof > 6 days  severity: high

# ── next steps (ranked, evidence-backed) ───────────────────────
  1. Re-run replay-eval dev-public before Phase 5 cutover
     why_now: proof gate stale · unblocks promotion
     confidence: high · interrupt: soft
  2. Regenerate BMA pager manifest, then re-run gate 4f
     why_now: 3 sessions deferred · blocks Phase 4g
     confidence: high · interrupt: soft
  3. Normalise --base-dir on validate-gate.js before next EAG
     why_now: repeated correction × 29 · time lost avg 48s
     confidence: medium · interrupt: deferred

# ── watchouts ──────────────────────────────────────────────────
  - Windows path normalisation on validate-gate.js
  - Firecrawl flag drift: --formats vs --format
  - Vercel token precedence: env vs --token vs cleared state
  - Sparse checkouts: .omx2 excluded from benchmark v3 worktrees

# ── learned from your corrections ──────────────────────────────
  git add -A          → stage by path          × 47  promoted
  vercel domains add  → vercel alias set       × 18  promoted
  qmd status          → qmd --help first       × 11  assertion
  node --test tests/  → narrow glob target     × 9   candidate

# ── friction signal (last 30 days) ─────────────────────────────
  redirects:               124  avg_recovery: 3.2 cmds / 48s
  repeated_corrections:     12  classes, 2 trending up
  compaction_recoveries:    7   of 7 successful
  cross_tool_reconciles:    Claude↔Codex: 2,418 events merged

# ── behaviour changes recommended ──────────────────────────────
  target: claude_code
    prefer Glob over find on MSYS2       evidence: find p95 > 8s ×31
    echo URLs before EAG builds           evidence: 3 misfired runs
  target: codex
    import strategy kernel on session start
                                          evidence: drift ×14 vs Claude

# ── proof gate ─────────────────────────────────────────────────
  strict_gate:      enabled
  required_split:   dev-public × proposed-kernel
  latest_result:    verified · 2026-04-08T11:42:03Z  (stale)
  cutover_state:    memory brief active · fallback armed

</startup_memory_brief>

One local Rust binary. Ingests Claude Code, Codex, OMX, and raw PowerShell sessions. Runs on Windows, macOS, and Linux.

The problem

Your agent keeps losing the plot.

01

It keeps making the same mistake you corrected three sessions ago. Your corrections evaporate with the transcript.

02

Every session starts with twenty minutes of re-explaining the project, the goal, and the constraints you already locked in.

03

When you actually need the next move, you're still the one driving. The agent waits. It never proposes the highest-leverage step.

How Munin works

Four layers. One operating loop.

Munin is not a notes app for your agent. It is a local loop that ingests activity, projects strategy, scores actions, and organises the workspace around what actually works.

01 — Ingestion

Full session corpus, no hand-written memory.

Automatic onboarding reads every prior Claude Code, Codex, OMX, and raw PowerShell session across all projects, all time. Every command, every correction, every redirect, every verification — journaled, deduped, reconciled into one cross-tool history.

You don't write the memory. Munin builds it from what already happened.

munin memory-os overview
# sessions ingested across 4 tools
imported_sessions:         9,214
imported_shell_execs:      184,602
imported_sources:
  - claude_code                5,811
  - codex                      2,476
  - omx                          712
  - powershell                   215

top_correction_patterns:
  git add -A           → specific files         × 47
  validate-gate.js     → normalised --base-dir  × 29
  vercel domains add   → vercel alias set       × 18

active_work:
  - BMA eag-native build (Phase 4 open)
  - Munin website (positioning locked)
02 — Action

Proactive next-step nudges tied to your strategy.

Import your strategy document — goals, KPIs, initiatives, constraints, assumptions. Munin compiles it into a deterministic kernel and joins it against your evidence corpus.

munin strategy recommend produces bounded nudges. Each nudge carries the task, why now, the evidence, evidence freshness, confidence, and expected effect. The agent stops asking and starts proposing.

strategy_nudge.json
{
  "task": "Run verify-memory-os-phase4.ps1 before touching Phase 5",
  "supports": ["initiative:memory-os-cutover"],
  "why_now": "Last verified run is 6 days old and proof gate is stale",
  "evidence": [
    "replay-eval dev-public last verified 2026-04-08",
    "3 sessions deferred verification since"
  ],
  "evidence_freshness": "stale",
  "confidence":         "high",
  "interrupt_level":    "soft",
  "expected_effect":    "unblocks Phase 5 promotion cutover"
}
03 — Learning

Promotes what works. Forgets what doesn't.

Action candidates earn their way to durable rules. Munin ranks them by precedent count, success count, and failure count across every session. Candidates promote to assertions, assertions promote to rules — observe-only by default, so you decide what gets enforced.

Every promoted claim carries freshness, stability, and confidence scores. Stale memory ages out. Re-confirmed memory strengthens. Memory stays relevant because it is always under pressure to prove itself.

Candidate
git-add-specific-files
precedent × 47 · success 44 · fail 3
Assertion
"Stage by path, never -A"
confidence: high · stability: rising
Rule
Enforce on commit suggest
target: claude_code + codex
Observe-only by default. You promote what you want.
04 — Organisation

Messy workspace, messy agent. Munin keeps both tidy.

Per-project kernel captures live claims, open loops, checkpoints, current recommendation, active risks. Every session re-entry gets a current recommendation, a first question to answer, and a first verification to run.

The agent hits the right target because the target is explicit, current, and evidence-backed.

project_kernel / sitesorted
checkpoint: bma-eag-native.phase3
preset:     handoff
current_recommendation:
  Rebuild pager manifest, then run gate 4f regression.
first_question:
  Did Phase 3 verification touch template exports?
first_verification:
  munin powershell -File scripts/verify-gate-4f.ps1

open_loops:
  - BMA Phase 4-6 outstanding
  - Vercel alias drift on preview deploys
active_risks:
  - replay-eval proof older than 6 days

From memory to action

An observation is not the output. A move is.

Munin doesn't stop at "here's what I've seen". Every durable pattern in your corpus is compiled into a concrete artifact — a next step, a prevention hook, a reusable skill, a guidance rule, a per-agent directive — with the output ready to paste.

priority:      KPI · memory-os cutover (Q2 rock)
target_metric: replay-eval verified within 48h
current_state: verified 6 days ago (stale)
blocked_by:    nothing — work is ready
downstream:    Phase 5 cutover, 2 dependent rocks
{
  "kind": "next_step",
  "task": "Re-run replay-eval dev-public",
  "supports": [
    "kpi:replay-eval-freshness",
    "initiative:memory-os-cutover"
  ],
  "why_now": "highest-leverage unblock — target metric"
            " at yellow, Phase 5 cutover waiting",
  "expected_effect":
    "KPI back to green · unblocks 2 rocks",
  "confidence":    "high",
  "interrupt_level": "soft"
}
pattern:       git add -A → stage by path
frequency:     47 corrections in 6 months
promotion:     candidate → assertion → rule
stability:     high
# .claude/hooks/pre-tool-use.sh
# Prevents the correction you keep making.
if echo "$CMD" | grep -qE 'git add (-A|\.|-u)'; then
  echo "Munin: stage by path. 47 priors." >&2
  exit 1
fi
workflow:      eag-build pipeline
repetitions:   23 over 4 months
variance:      low · 6 steps always in order
stability:     rising
# .claude/skills/eag-build/SKILL.md
---
name: eag-build
description: Full EAG tradie-site pipeline
---
1. Confirm content + reference URLs
2. Credential smoke test
3. Capture → analyse → generate
4. Gate 0–4f verification
5. Pager manifest regeneration
6. Gate 5 QA
preference:    NZ English in user-facing text
confirming:    89 sessions
contradicting: 0 sessions
confidence:    high · stability: rising
# CLAUDE.md (proposed addition)

- NZ English in all user-facing text.
  Evidence: 89 confirming, 0 contradicting.
  Locked: 2026-02-18.
  Source: munin-os.db assertions
target_agent: claude_code
pattern:      uses find on MSYS2
p95_duration: 8.4s
sessions:     31 affected
redirect_rate: 62%
# Claude Code behaviour change

Change:
  Prefer Glob over find on MSYS2.

Rationale:
  find p95 > 8s across 31 sessions.
  62% of those sessions redirected.

Targets only: claude_code on Windows

Every action is inspectable. Every action is reversible. Nothing enforces silently. You promote what you want.

What makes Munin different

Most memory tools store notes. Munin produces moves.

Actions, not notes

Ranked next-step nudges tied to your own strategy and your own historical evidence, not stored text the agent may or may not read.

Self-learning under pressure

Every rule carries freshness, stability, and confidence. Stale memory ages out. Re-confirmed memory strengthens.

Cross-agent reconciliation

Claude Code, Codex, OMX, and raw PowerShell — one memory, one continuity layer, one reconciled history.

Friction as a first-class output

Munin tells you which corrections you keep making, which redirects waste the most time, and which behaviours to change. Per agent, with evidence.

Proof-gated continuity

Replay/eval harness with held-out splits including adversarial and post-cutoff. Cutover only stays live while proof passes. Silent regressions fall back automatically.

Local-only by design

Single Rust binary. No SaaS, no telemetry pipeline, no account. Salt is device-local with 0600 permissions.

The read surface

The agent and the human read the same memory.

Everything Munin knows is exposed through inspectable commands. Nothing is hidden in a private embedding store. You can audit, diff, or pipe any of it.

Command
What it shows
munin resume --format prompt
Startup brief for a new session: what I know, how you work, what's active, next steps, watchouts.
munin pack --preset handoff
Handoff brief for a fresh window or another agent, same memory shape.
munin memory-os overview
Sessions ingested, top projects, correction patterns, action candidates, onboarding state.
munin memory-os profile
Preferences, operating style, autonomy tendencies, recurring themes, friction triggers.
munin memory-os friction
Repeated corrections, redirect stats, per-agent behaviour-change recommendations.
munin memory-os promotion
Current proof-gate decision and the latest supporting replay/eval result.
munin strategy status
Deterministic scorecard against your imported goals, KPIs, and initiatives.
munin strategy recommend
Ranked next-step nudges with evidence, confidence, and expected effect.

One memory. Every agent.

Framed for the tool you're using.

For Claude Code
  • Continuity that survives compaction — the brief rebuilds from durable state, not the transcript.
  • Behaviour-change recommendations targeted at Claude Code specifically, with evidence from your sessions.
  • Same brief on resume and handoff — fresh window, same context.
For Codex
  • Cross-session memory that survives restarts and fresh Codex homes.
  • Proactive next-step nudges instead of waiting for instructions.
  • Reconciled history with your Claude Code sessions, so both agents share the plot.
For IDE agents
  • Import your strategy so Cursor, Windsurf, and Cline stop guessing the goal.
  • Friction reports surface patterns no single session can see.
  • Project kernel gives every agent the same open-loop list and current recommendation.

And yes, it saves you tokens too

The same instrumentation that powers the memory layer wraps noisy shell commands and compresses their output. Measured on real local sessions, not synthetic benchmarks.

Command
Typical reduction
vitest run
~99.5%
cargo test
~90%
next build
~87%
tsc
~83%
git diff
~80%

Run munin gain to see your own numbers.

Install

Three commands. Local build only.

$ cargo install --path . --force
$ munin init
$ munin resume --format prompt

Local build only. No registry. No signup. No account.

FAQ

The usual questions.

Is Munin a SaaS?

No. Munin is a single local Rust binary. Your memory never leaves your machine. There is no account, no signup, and no cloud component.

Does it send my code anywhere?

No. All ingestion, projection, proof-gating, and strategy compilation runs on-device. The telemetry salt is device-local with 0600 permissions and the session store stays under your home directory.

How is this different from Claude Code's built-in memory?

Claude's built-in memory is notes you or the agent write down. Munin is built automatically from your actual session corpus — every command, correction, redirect, and verification across every agent — and produces ranked actions rather than stored notes.

What stops the memory from going stale or wrong?

Every promoted claim carries freshness, stability, and confidence scores. Cutover runs a replay/eval harness against a held-out corpus on every release. If proof drops, Munin falls back automatically to the deterministic packet path — no silent regressions.

How does Munin actually suggest next steps?

You import your strategy document (goals, KPIs, initiatives, constraints, assumptions). Munin compiles it into a deterministic kernel and joins it against your session evidence. munin strategy recommend produces bounded nudges — task, why now, evidence, expected effect — with no hallucination surface.

Does it work with both Claude and Codex at the same time?

Yes. That is the point. One memory layer reconciles Claude Code, Codex, OMX, and raw PowerShell sessions into one continuity layer. Your Claude session knows what your Codex session did, and vice versa.

Give your agents a memory that acts.

Install Munin and open a session. It will be ready before your next prompt.

Install Munin