ollama/server
Daniel Hiltgen 87288ced4f
New models (#15861)
* mlx: add laguna model support

* convert: support fp8 safetensors import

Decode HF F8_E4M3 safetensors with block scale companions into GGUF-supported tensor types, and record which output tensors came from FP8 source weights.

Use that source-precision metadata during create quantization: default FP8-sourced GGUFs to Q8_0, keep non-FP8 tensors at their original precision for Q8_0, and promote non-FP8 quantizable tensors to Q8_0 for Q4_K requests.

* ggml: add laguna model support

* server: preserve generate logprobs with builtin parsers

Generate requests were dropping logprob-only chunks whenever a builtin parser buffered visible content. Chat already handled this case, but generate only forwarded chunks with visible response, thinking, or tool-call output.

Keep generate chunks that carry logprobs even when the builtin parser has not flushed visible content yet, and add a regression test that exercises the behavior with a generic thinking parser.

* review comments - perf improvements

* ggml: implement nemotron 3 nano omni

* add poolside integration

* update poolside doc

* adapt to new cache setup

* fix test

* fix test

---------

Co-authored-by: Eva Ho <hoyyeva@gmail.com>
2026-04-28 11:50:12 -07:00
..
internal docs: fix typos in repository documentation (#10683) 2025-11-15 20:22:29 -08:00
auth.go server: reject unexpected auth hosts (#13738) 2026-01-16 14:10:36 -05:00
auth_test.go server: reject unexpected auth hosts (#13738) 2026-01-16 14:10:36 -05:00
cloud_proxy.go cloud_proxy: for the web_search legacy path, flush on newlines (#14897) 2026-03-17 13:30:17 -07:00
cloud_proxy_test.go cloud_proxy: for the web_search legacy path, flush on newlines (#14897) 2026-03-17 13:30:17 -07:00
create.go New models (#15861) 2026-04-28 11:50:12 -07:00
create_test.go Clean up the manifest and modelpath (#13807) 2026-01-21 11:46:17 -08:00
download.go Clean up the manifest and modelpath (#13807) 2026-01-21 11:46:17 -08:00
fixblobs.go
fixblobs_test.go
gemma4_test.go gemma4: render differently based on model size 2026-04-15 14:37:16 -07:00
images.go create: avoid gc race with create (#15628) 2026-04-16 13:29:16 -07:00
images_test.go create: avoid gc race with create (#15628) 2026-04-16 13:29:16 -07:00
inference_request_log.go add ability to turn on debug request logging (#14106) 2026-03-19 17:08:17 -07:00
laguna_quantization_test.go New models (#15861) 2026-04-28 11:50:12 -07:00
logprob.go logprob: add bytes to logprobs (#13068) 2025-11-13 13:49:25 -08:00
model.go create: Clean up experimental paths, fix create from existing safetensor model (#14679) 2026-04-07 08:12:57 -07:00
model_resolver.go Reapply "don't require pulling stubs for cloud models" again (#14608) 2026-03-06 14:27:47 -08:00
model_resolver_test.go Reapply "don't require pulling stubs for cloud models" again (#14608) 2026-03-06 14:27:47 -08:00
prompt.go gemma4: render differently based on model size 2026-04-15 14:37:16 -07:00
prompt_test.go gemma4: render differently based on model size 2026-04-15 14:37:16 -07:00
quantization.go New models (#15861) 2026-04-28 11:50:12 -07:00
quantization_test.go New models (#15861) 2026-04-28 11:50:12 -07:00
renderer_resolution.go gemma4: render differently based on model size 2026-04-15 14:37:16 -07:00
routes.go New models (#15861) 2026-04-28 11:50:12 -07:00
routes_cloud_test.go revert context length warnings change (#15121) 2026-03-28 16:43:59 -07:00
routes_create_test.go New models (#15861) 2026-04-28 11:50:12 -07:00
routes_debug_test.go sched: Model eviction for MLX 2026-03-16 17:40:29 -07:00
routes_delete_test.go Reapply "don't require pulling stubs for cloud models" again (#14608) 2026-03-06 14:27:47 -08:00
routes_generate_renderer_test.go sched: Model eviction for MLX 2026-03-16 17:40:29 -07:00
routes_generate_test.go New models (#15861) 2026-04-28 11:50:12 -07:00
routes_harmony_streaming_test.go sched: Model eviction for MLX 2026-03-16 17:40:29 -07:00
routes_list_test.go Update the /api/create endpoint to use JSON (#7935) 2024-12-31 18:02:30 -08:00
routes_options_test.go server: use tiered VRAM-based default context length 2026-02-02 10:47:09 -08:00
routes_request_log_test.go add ability to turn on debug request logging (#14106) 2026-03-19 17:08:17 -07:00
routes_test.go modelfiles: fix /save command and add shortname for safetensors based models (#15413) 2026-04-08 21:05:39 -07:00
routes_web_experimental_test.go cloud_proxy: send ollama client version (#14769) 2026-03-10 15:53:25 -07:00
sched.go New models (#15861) 2026-04-28 11:50:12 -07:00
sched_test.go sched: Model eviction for MLX 2026-03-16 17:40:29 -07:00
sparse_common.go
sparse_windows.go
test_home_test.go add ability to disable cloud (#14221) 2026-02-12 15:47:00 -08:00
upload.go Clean up the manifest and modelpath (#13807) 2026-01-21 11:46:17 -08:00