ollama

mirror of https://github.com/ollama/ollama.git synced 2026-07-04 23:02:07 +00:00

History

Patrick Devine 964ea42c09 mlx: x/create rewrite (#16919 ) This is a rewrite of the create functionality for the MLX engine. The core idea behind the create functionality is to break the import/convert into a pipeline of distinct phases: * Read (scan the safetensors directory for the various bits of metadata) * Classify (determine what the import type) * Plan (determine any transforms that need to be done) * Write (transform any data as necessary and write out the blobs) * Create the manifest Each architecture has a "policy" which determines how to convert the model correctly. A number of different formats for safetensors are supported including: * nvfp4 (two formats: model optimized, torch) * fp8 datatypes (convert to mxfp8) * standard bf16 based weights A number of cleanups/simplifications have been done including: * using the baked in names for the tensors instead of munging them into something else * unified 3d expert tensors (instead of separate per expert tensors) * fewer unnecessary transforms to the various tensors in a model (keep a model as close to the source as possible) * unified capability checking * draft model handling (for MTP) is done on the same path Image generation has been intentionally removed.		2026-07-03 18:30:45 -07:00
..
batch	mlxrunner: support draft heads that maintain draft caches	2026-06-22 15:25:45 -07:00
cache	mlxrunner: drive MTP speculation through cache snapshots	2026-06-09 00:39:19 -07:00
mlx	mlx: bump dependency (#16935 )	2026-06-29 09:39:11 -07:00
model	mlx: x/create rewrite (#16919 )	2026-07-03 18:30:45 -07:00
sample	mlxrunner: apply in-flight drafts to proposal penalty history	2026-06-22 15:25:45 -07:00
cache.go	mlxrunner: capture prefill snapshots across the forward	2026-06-09 00:39:19 -07:00
cache_test.go	mlxrunner: capture prefill snapshots across the forward	2026-06-09 00:39:19 -07:00
cache_trie.go	mlxrunner: drive MTP speculation through cache snapshots	2026-06-09 00:39:19 -07:00
cache_trie_test.go	mlxrunner: share KV cache across conversations with common prefixes	2026-03-18 16:06:33 -07:00
client.go	MLX: wire up scheduler selected context size for ps (#16918 )	2026-06-26 08:47:03 -07:00
imports.go	models: add cohere2_moe (Command A / North) to the MLX engine (#16670 )	2026-06-16 23:15:21 -07:00
mtp.go	mlxrunner: choose the speculative draft length to maximize throughput	2026-06-22 15:25:45 -07:00
mtp_test.go	mlxrunner: choose the speculative draft length to maximize throughput	2026-06-22 15:25:45 -07:00
pipeline.go	mlxrunner: choose the speculative draft length to maximize throughput	2026-06-22 15:25:45 -07:00
runner.go	mlxrunner: support draft heads that maintain draft caches	2026-06-22 15:25:45 -07:00
server.go	mlx: rework the MLX sampler (#16122 )	2026-05-13 17:18:27 -07:00
speculate.go	mlxrunner: choose the speculative draft length to maximize throughput	2026-06-22 15:25:45 -07:00
speculate_depth.go	mlxrunner: choose the speculative draft length to maximize throughput	2026-06-22 15:25:45 -07:00
speculate_depth_test.go	mlxrunner: choose the speculative draft length to maximize throughput	2026-06-22 15:25:45 -07:00
speculate_stats.go	mlxrunner: choose the speculative draft length to maximize throughput	2026-06-22 15:25:45 -07:00
status_memory.go	mlx: avoid status timeout during inference (#16086 )	2026-05-11 16:03:38 -07:00
status_memory_test.go	mlx: avoid status timeout during inference (#16086 )	2026-05-11 16:03:38 -07:00
utf8_buffer.go	consolidate the tokenizer (#14327 )	2026-02-19 15:55:45 -08:00
utf8_buffer_test.go	consolidate the tokenizer (#14327 )	2026-02-19 15:55:45 -08:00