ollama

mirror of https://github.com/ollama/ollama.git synced 2026-07-05 07:11:10 +00:00

History

Jesse Gross d00622060f mlxrunner: drive MTP speculation through cache snapshots Speculation used a parallel hierarchy of wrapper cache types that shadowed the live caches and reconciled against them on commit. Replace it with snapshot/restore on the live caches themselves: a cache snapshots itself as a write crosses each offset, and the runner commits a batched draft by restoring to the accepted count. The wrappers and the comparison plumbing around them are gone. Snapshots are lazy. A KV or rotating capture indexes into the live buffer and owns no memory until a destructive write forces a copy-out, so rejecting a draft is free. Recurrent layers now validate in the same batched pass rather than falling back to serial. A gated-delta layer reports its interior split offsets and hands back the recurrent state at each one, which the cache records as a snapshot.		2026-06-09 00:39:19 -07:00
..
cache.go	mlxrunner: drive MTP speculation through cache snapshots	2026-06-09 00:39:19 -07:00
cache_test.go	mlxrunner: decouple models from attention cache storage layout	2026-04-27 20:04:46 -07:00
kvcache.go	mlxrunner: drive MTP speculation through cache snapshots	2026-06-09 00:39:19 -07:00
lazy_test.go	mlxrunner: drive MTP speculation through cache snapshots	2026-06-09 00:39:19 -07:00
recurrent.go	mlxrunner: drive MTP speculation through cache snapshots	2026-06-09 00:39:19 -07:00
recurrent_test.go	nn/recurrent: return per-boundary states from the gated-delta kernels	2026-06-09 00:39:19 -07:00
rotating.go	mlxrunner: drive MTP speculation through cache snapshots	2026-06-09 00:39:19 -07:00
rotating_attention_test.go	mlxrunner: drive MTP speculation through cache snapshots	2026-06-09 00:39:19 -07:00
rotating_multiturn_test.go	mlxrunner: decouple models from attention cache storage layout	2026-04-27 20:04:46 -07:00
snapshot_capture_test.go	mlxrunner: drive MTP speculation through cache snapshots	2026-06-09 00:39:19 -07:00