ollama

mirror of https://github.com/ollama/ollama.git synced 2026-07-05 07:11:10 +00:00

History

Patrick Devine 82e0ddb6fe mlxrunner: harden linear/embedding layers against over-promotion (#16682 ) Adding/Multiplying a tensor by a scalar w/ a different data type can cause the tensor to be promoted and cause performance issues. This change adds several guards against over-promotion.		2026-06-11 13:56:25 -07:00
..
gemma3	runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031 )	2026-05-29 13:35:47 -07:00
gemma4	mlxrunner: drive MTP speculation through cache snapshots	2026-06-09 00:39:19 -07:00
glm4_moe_lite	runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031 )	2026-05-29 13:35:47 -07:00
laguna	mlxrunner: move YaRN RoPE helpers into x/models/nn	2026-05-22 09:32:09 -07:00
llama	runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031 )	2026-05-29 13:35:47 -07:00
nn	mlxrunner: harden linear/embedding layers against over-promotion (#16682 )	2026-06-11 13:56:25 -07:00
qwen3	runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031 )	2026-05-29 13:35:47 -07:00
qwen3_5	mlxrunner: drive MTP speculation through cache snapshots	2026-06-09 00:39:19 -07:00
qwen3_5_moe	MLX: add header vendoring and remove go build tag (#14642 )	2026-03-09 17:24:45 -07:00