ollama

mirror of https://github.com/ollama/ollama.git synced 2026-05-13 06:21:28 +00:00

History

Jeffrey Morgan 1044b0419a model: add MLA absorption for glm4moelite (#13810 ) * model: add MLA absorption for glm4moelite Split the combined KV_B tensor into separate K_B and V_B tensors during conversion, enabling MLA (Multi-head Latent Attention) absorption which compresses the KV cache for improved efficiency. * ggml: enable MLA flash attention for GLM-4.7-flash Add support for gqa_ratio 4 in MLA flash attention kernels. GLM-4.7-flash uses head size 576 with gqa_ratio 4, which was previously only supported for gqa_ratio 16 (DeepSeek). Metal changes: - Enable head size 576 for flash attention - Increase simdgroups to 8 for large heads (>=512) - Add case 8 kernel dispatch for 8 simdgroups CUDA changes: - Add gqa_ratio 4 support for head 576/512 - Add tile configs for (576, 512, 4) and (576, 512, 8) - Add MMA config cases for ncols 4 - Add template instances for ncols2=4 * model: add compatibility validation for glm4moelite architecture		2026-01-23 14:47:42 -08:00
..
sentencepiece	chore(all): replace instances of interface with any (#10067 )	2025-04-02 09:44:27 -07:00
testdata	convert: import support for command-r models from safetensors (#6063 )	2025-01-15 16:31:22 -08:00
convert.go	model: add lfm2 architecture and LFM2.5-1.2B-Thinking support (#13792 )	2026-01-20 12:20:53 -08:00
convert_bert.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_commandr.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_deepseek2.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_deepseekocr.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_gemma.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_gemma2.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_gemma2_adapter.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_gemma3.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_gemma3n.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_glm4moelite.go	model: add MLA absorption for glm4moelite (#13810 )	2026-01-23 14:47:42 -08:00
convert_gptoss.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_lfm2.go	model: add lfm2 architecture and LFM2.5-1.2B-Thinking support (#13792 )	2026-01-20 12:20:53 -08:00
convert_llama.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_llama4.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_llama_adapter.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_mistral.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_mistral_causal.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_mixtral.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_mllama.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_nomicbert.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_olmo.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_phi3.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_qwen2.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_qwen3.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_qwen3vl.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_qwen25vl.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
convert_test.go	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00
reader.go	model: add lfm2 architecture and LFM2.5-1.2B-Thinking support (#13792 )	2026-01-20 12:20:53 -08:00
reader_safetensors.go	deepseekocr	2025-11-18 16:11:37 -08:00
reader_test.go	convert: convert bf16 vision weights to fp16 (#12324 )	2025-09-17 17:43:17 -07:00
reader_torch.go	llama4	2025-04-25 16:59:20 -07:00
sentencepiece_model.proto	all: fix typos in documentation, code, and comments (#7021 )	2024-12-10 12:58:06 -08:00
tensor.go	fix tensor merge (#13053 )	2025-11-13 15:32:34 -08:00
tensor_test.go	fix tensor merge (#13053 )	2025-11-13 15:32:34 -08:00
tokenizer.go	s#x/exp/maps#maps# (#11506 )	2025-07-23 13:23:32 -07:00
tokenizer_spm.go	parsers/renderers: functiongemma (#13521 )	2025-12-18 07:55:37 -08:00
tokenizer_test.go	model: handle multiple eos tokens (#10577 )	2025-05-16 13:40:23 -07:00