ollama/x/models
Patrick Devine 82e0ddb6fe
mlxrunner: harden linear/embedding layers against over-promotion (#16682)
Adding/Multiplying a tensor by a scalar w/ a different data type
can cause the tensor to be promoted and cause performance issues.

This change adds several guards against over-promotion.
2026-06-11 13:56:25 -07:00
..
gemma3 runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
gemma4 mlxrunner: drive MTP speculation through cache snapshots 2026-06-09 00:39:19 -07:00
glm4_moe_lite runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
laguna mlxrunner: move YaRN RoPE helpers into x/models/nn 2026-05-22 09:32:09 -07:00
llama runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
nn mlxrunner: harden linear/embedding layers against over-promotion (#16682) 2026-06-11 13:56:25 -07:00
qwen3 runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
qwen3_5 mlxrunner: drive MTP speculation through cache snapshots 2026-06-09 00:39:19 -07:00
qwen3_5_moe MLX: add header vendoring and remove go build tag (#14642) 2026-03-09 17:24:45 -07:00