ollama

mirror of https://github.com/ollama/ollama.git synced 2026-07-05 07:11:10 +00:00

History

Jesse Gross 28fbbb06d5 mlxrunner: support draft heads that maintain draft caches Generalize the draft path so a head that maintains a KV cache (EAGLE-style) and Gemma's read-only single-position assistant both fit one drafter interface with no per-model branches, and make the committed stream the drafter's maintenance mechanism — every committed run is reported, the drafter pairs each draft slot with its look-ahead token and flushes completed pairs to the draft caches. The draft KV thus stays prefix-cached alongside the target in every session, drafting or not.	2026-06-22 15:25:45 -07:00
..
batch.go

Jesse Gross 28fbbb06d5 mlxrunner: support draft heads that maintain draft caches

Generalize the draft path so a head that maintains a KV cache (EAGLE-style)
and Gemma's read-only single-position assistant both fit one drafter
interface with no per-model branches, and make the committed stream the
drafter's maintenance mechanism — every committed run is reported, the
drafter pairs each draft slot with its look-ahead token and flushes completed
pairs to the draft caches. The draft KV thus stays prefix-cached alongside
the target in every session, drafting or not.

2026-06-22 15:25:45 -07:00

batch.go