Commit graph

7 commits

Author SHA1 Message Date
Daniel Hiltgen
03aee88186
mlx: Support NVIDIA TensorRT Model Optimizer import (#15566)
* mlx: Support NVIDIA TensorRT Model Optimizer import

* x/create: support FP8 safetensors import

Decode HF F8_E4M3 safetensors with block scale companions into MLX-importable tensor blobs, including compressed-tensors weight_scale metadata, packed NVFP4 layouts, and mixed-precision tensor headers.

Use that source-precision metadata during create quantization: default FP8-sourced imports to mxfp8, allow source FP8 to target MLX low-bit formats, preserve source-quantized NVFP4 layouts, selectively keep or promote tensors based on their source precision, and detect quantized dtype from mixed-precision safetensors manifests.

* review comments
2026-04-27 18:28:10 -07:00
Patrick Devine
9e7cb9697e
mlx: fix vision capability + min version (#15106) 2026-03-27 17:09:28 -07:00
Patrick Devine
0d5da826d4
bugfix: display the parameter count correctly in mlx for ollama show (#14285) 2026-02-16 13:03:34 -08:00
Patrick Devine
a0407d07fa
safetensors quantization for mlx (#14184)
This change includes:
  - changes to the safetensors metadata format
  - changes to the create command to properly create the blobs with the new format
  - changes to load the new format
  - fixes ollama show to properly show each tensor
2026-02-10 11:29:17 -08:00
Patrick Devine
d8cc798c2b
glm 4.7 flash support on experimental engine (#13838) 2026-02-02 15:22:11 -08:00
Patrick Devine
148a1be0a3
Clean up the manifest and modelpath (#13807) 2026-01-21 11:46:17 -08:00
Patrick Devine
a077d996e3
Fix create and show commands for experimental models (#13741)
* x: make `ollama create --experimental` import from safetensors

This change allows pulling in safetensors models into the new experimental model format, and also
fixes the `ollama show` command to be able to correctly display the model information.

* gofumpt the linter

* gofumpt the linter again

* validate the model name
2026-01-16 14:31:55 -08:00