mirror of
https://github.com/ollama/ollama.git
synced 2026-07-05 07:11:10 +00:00
Speculation used a parallel hierarchy of wrapper cache types that shadowed the live caches and reconciled against them on commit. Replace it with snapshot/restore on the live caches themselves: a cache snapshots itself as a write crosses each offset, and the runner commits a batched draft by restoring to the accepted count. The wrappers and the comparison plumbing around them are gone. Snapshots are lazy. A KV or rotating capture indexes into the live buffer and owns no memory until a destructive write forces a copy-out, so rejecting a draft is free. Recurrent layers now validate in the same batched pass rather than falling back to serial. A gated-delta layer reports its interior split offsets and hands back the recurrent state at each one, which the cache records as a snapshot. |
||
|---|---|---|
| .. | ||
| cache.go | ||
| cache_test.go | ||
| kvcache.go | ||
| lazy_test.go | ||
| recurrent.go | ||
| recurrent_test.go | ||
| rotating.go | ||
| rotating_attention_test.go | ||
| rotating_multiturn_test.go | ||
| snapshot_capture_test.go | ||