ollama/convert
Jeffrey Morgan 1044b0419a
model: add MLA absorption for glm4moelite (#13810)
* model: add MLA absorption for glm4moelite

Split the combined KV_B tensor into separate K_B and V_B tensors
during conversion, enabling MLA (Multi-head Latent Attention)
absorption which compresses the KV cache for improved efficiency.

* ggml: enable MLA flash attention for GLM-4.7-flash

Add support for gqa_ratio 4 in MLA flash attention kernels. GLM-4.7-flash
uses head size 576 with gqa_ratio 4, which was previously only supported
for gqa_ratio 16 (DeepSeek).

Metal changes:
- Enable head size 576 for flash attention
- Increase simdgroups to 8 for large heads (>=512)
- Add case 8 kernel dispatch for 8 simdgroups

CUDA changes:
- Add gqa_ratio 4 support for head 576/512
- Add tile configs for (576, 512, 4) and (576, 512, 8)
- Add MMA config cases for ncols 4
- Add template instances for ncols2=4

* model: add compatibility validation for glm4moelite architecture
2026-01-23 14:47:42 -08:00
..
sentencepiece chore(all): replace instances of interface with any (#10067) 2025-04-02 09:44:27 -07:00
testdata convert: import support for command-r models from safetensors (#6063) 2025-01-15 16:31:22 -08:00
convert.go model: add lfm2 architecture and LFM2.5-1.2B-Thinking support (#13792) 2026-01-20 12:20:53 -08:00
convert_bert.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_commandr.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_deepseek2.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_deepseekocr.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_gemma.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_gemma2.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_gemma2_adapter.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_gemma3.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_gemma3n.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_glm4moelite.go model: add MLA absorption for glm4moelite (#13810) 2026-01-23 14:47:42 -08:00
convert_gptoss.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_lfm2.go model: add lfm2 architecture and LFM2.5-1.2B-Thinking support (#13792) 2026-01-20 12:20:53 -08:00
convert_llama.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_llama4.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_llama_adapter.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_mistral.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_mistral_causal.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_mixtral.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_mllama.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_nomicbert.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_olmo.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_phi3.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_qwen2.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_qwen3.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_qwen3vl.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_qwen25vl.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
convert_test.go Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
reader.go model: add lfm2 architecture and LFM2.5-1.2B-Thinking support (#13792) 2026-01-20 12:20:53 -08:00
reader_safetensors.go deepseekocr 2025-11-18 16:11:37 -08:00
reader_test.go convert: convert bf16 vision weights to fp16 (#12324) 2025-09-17 17:43:17 -07:00
reader_torch.go llama4 2025-04-25 16:59:20 -07:00
sentencepiece_model.proto all: fix typos in documentation, code, and comments (#7021) 2024-12-10 12:58:06 -08:00
tensor.go fix tensor merge (#13053) 2025-11-13 15:32:34 -08:00
tensor_test.go fix tensor merge (#13053) 2025-11-13 15:32:34 -08:00
tokenizer.go s#x/exp/maps#maps# (#11506) 2025-07-23 13:23:32 -07:00
tokenizer_spm.go parsers/renderers: functiongemma (#13521) 2025-12-18 07:55:37 -08:00
tokenizer_test.go model: handle multiple eos tokens (#10577) 2025-05-16 13:40:23 -07:00