mirror of
https://github.com/ollama/ollama.git
synced 2026-07-05 07:11:10 +00:00
Generalize the draft path so a head that maintains a KV cache (EAGLE-style) and Gemma's read-only single-position assistant both fit one drafter interface with no per-model branches, and make the committed stream the drafter's maintenance mechanism — every committed run is reported, the drafter pairs each draft slot with its look-ahead token and flushes completed pairs to the draft caches. The draft KV thus stays prefix-cached alongside the target in every session, drafting or not. |
||
|---|---|---|
| .. | ||
| batch.go | ||