Essays and working notes on ingestion, proof-gated cutover, the replay/eval harness, and the benchmark that keeps it honest.
Why storing notes for your agent was always the wrong target. The case for ranked next-step nudges tied to strategy.
How workspace entropy shows up in agent output — and how per-project kernels cut the problem at the source.
Why memory cutover should be gated by replay/eval against a held-out corpus, not by vibes.
Every rule carries freshness, stability, and confidence. The self-learning loop that keeps memory relevant.
Same model, same window, same tasks. Probe recall after 3, 6, 9 completed fixes. Metrics we actually record vs vanity metrics.
Separating durable-state failures from transcript-availability advantages. Why "won/lost" is the wrong verdict.
Baselines: oracle-upper-bound, retrieval-first-hybrid, tiny-authoritative-core, proposed-kernel. Splits: dev-public, test-private, adversarial-private, future-postcutoff.
How munin gain numbers are computed — and how we keep them honest against the raw history DB.
munin gain
Judge-first continuation, replay/eval hardening, observe-only trust, narrow durable kernel projections.
Default-on resume and handoff briefs, rollback via env flags, preserved checkpoint capture.
Latest-result-wins gating against replay evidence, automatic packet fallback, inspectable promotion status.
Allowed dependency graph: app → command_surfaces → memory_engine / reporting / hook_runtime / rewrite_engine / ingest.
The fastest way to understand Munin is to open a session and watch it work.