ollama/llm
Parafee41 f8a48df24d
llm: decouple prompt caching from context shift (#16639)
This PR separates prompt caching from the public shift request option for native llama-server requests.

Previously, shift controlled two different mechanisms:

context shifting / overflow behavior
per-request llama-server cache_prompt
That meant callers could not request shift: false without also disabling prompt caching.

Fixes #16635
2026-06-11 16:05:24 -07:00
..
exit_status.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
exit_status_other.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
exit_status_windows.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
llama_binary.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
llama_binary_test.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
llama_server.go llm: decouple prompt caching from context shift (#16639) 2026-06-11 16:05:24 -07:00
llama_server_test.go llama-server: allow GPU offload for projectors (#16473) 2026-06-03 13:58:40 -07:00
llm_darwin.go Optimize container images for startup (#6547) 2024-09-12 12:10:30 -07:00
llm_linux.go Optimize container images for startup (#6547) 2024-09-12 12:10:30 -07:00
llm_windows.go win: lint fix (#10571) 2025-05-05 11:08:12 -07:00
media.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
metal_retry.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
rocm_default.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
rocm_windows.go llama-server followups (#16353) 2026-06-01 10:44:21 -07:00
server.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
server_wait_test.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
status.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
status_test.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
vulkan_windows.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00