* mlx: Support NVIDIA TensorRT Model Optimizer import
* x/create: support FP8 safetensors import
Decode HF F8_E4M3 safetensors with block scale companions into MLX-importable tensor blobs, including compressed-tensors weight_scale metadata, packed NVFP4 layouts, and mixed-precision tensor headers.
Use that source-precision metadata during create quantization: default FP8-sourced imports to mxfp8, allow source FP8 to target MLX low-bit formats, preserve source-quantized NVFP4 layouts, selectively keep or promote tensors based on their source precision, and detect quantized dtype from mixed-precision safetensors manifests.
* review comments
This change includes:
- changes to the safetensors metadata format
- changes to the create command to properly create the blobs with the new format
- changes to load the new format
- fixes ollama show to properly show each tensor
* x: make `ollama create --experimental` import from safetensors
This change allows pulling in safetensors models into the new experimental model format, and also
fixes the `ollama show` command to be able to correctly display the model information.
* gofumpt the linter
* gofumpt the linter again
* validate the model name