mirror of
https://github.com/ollama/ollama.git
synced 2026-05-13 06:21:28 +00:00
This change adds support for MTP (multi-token prediction) speculative decoding for the gemma4 model family. It includes: * support for importing safetensors based gemma4 draft models with `ollama create` * a new DRAFT command in the Modelfile for specifying draft models * a --quantize-draft flag for the ollama create command to quantize the draft model * cache support for speculation * changes to the rotating cache to be able to handle MTP correctly * sampling support for draft model token prediction --------- Co-authored-by: Daniel Hiltgen <daniel@ollama.com> |
||
|---|---|---|
| .. | ||
| support | ||
| .this-is-the-create-dmg-repo | ||
| build_darwin.sh | ||
| build_docker.sh | ||
| build_linux.sh | ||
| build_windows.ps1 | ||
| create-dmg.sh | ||
| deduplicate_cuda_libs.sh | ||
| env.sh | ||
| install.ps1 | ||
| install.sh | ||
| push_docker.sh | ||
| tag_latest.sh | ||