ollama/x/mlxrunner/batch
Jesse Gross 28fbbb06d5 mlxrunner: support draft heads that maintain draft caches
Generalize the draft path so a head that maintains a KV cache (EAGLE-style)
and Gemma's read-only single-position assistant both fit one drafter
interface with no per-model branches, and make the committed stream the
drafter's maintenance mechanism — every committed run is reported, the
drafter pairs each draft slot with its look-ahead token and flushes completed
pairs to the draft caches. The draft KV thus stays prefix-cached alongside
the target in every session, drafting or not.
2026-06-22 15:25:45 -07:00
..
batch.go