ollama/x/mlxrunner/cache
Jesse Gross d00622060f mlxrunner: drive MTP speculation through cache snapshots
Speculation used a parallel hierarchy of wrapper cache types that shadowed
the live caches and reconciled against them on commit. Replace it with
snapshot/restore on the live caches themselves: a cache snapshots itself as
a write crosses each offset, and the runner commits a batched draft by
restoring to the accepted count. The wrappers and the comparison plumbing
around them are gone.

Snapshots are lazy. A KV or rotating capture indexes into the live buffer and
owns no memory until a destructive write forces a copy-out, so rejecting a
draft is free.

Recurrent layers now validate in the same batched pass rather than falling
back to serial. A gated-delta layer reports its interior split offsets and
hands back the recurrent state at each one, which the cache records as a
snapshot.
2026-06-09 00:39:19 -07:00
..
cache.go mlxrunner: drive MTP speculation through cache snapshots 2026-06-09 00:39:19 -07:00
cache_test.go mlxrunner: decouple models from attention cache storage layout 2026-04-27 20:04:46 -07:00
kvcache.go mlxrunner: drive MTP speculation through cache snapshots 2026-06-09 00:39:19 -07:00
lazy_test.go mlxrunner: drive MTP speculation through cache snapshots 2026-06-09 00:39:19 -07:00
recurrent.go mlxrunner: drive MTP speculation through cache snapshots 2026-06-09 00:39:19 -07:00
recurrent_test.go nn/recurrent: return per-boundary states from the gated-delta kernels 2026-06-09 00:39:19 -07:00
rotating.go mlxrunner: drive MTP speculation through cache snapshots 2026-06-09 00:39:19 -07:00
rotating_attention_test.go mlxrunner: drive MTP speculation through cache snapshots 2026-06-09 00:39:19 -07:00
rotating_multiturn_test.go mlxrunner: decouple models from attention cache storage layout 2026-04-27 20:04:46 -07:00
snapshot_capture_test.go mlxrunner: drive MTP speculation through cache snapshots 2026-06-09 00:39:19 -07:00