ollama

mirror of https://github.com/ollama/ollama.git synced 2026-07-05 15:27:25 +00:00

History

Bruce MacDonald 6e65d95ef5 server: add cached eval metric to response Added cached prompt token counts to Ollama responses and compatibility usage fields. This carries local `llama-server` `cache_n` and MLX cache hits through `/api/generate`, `/api/chat`, OpenAI-compatible endpoints, and Anthropic-compatible `/v1/messages`. Cloud responses are passed through as-is, so cache counts will show up there once Cloud starts returning them.		2026-06-25 16:21:08 -07:00
..
examples	ci: restore previous linter rules (#13322 )	2025-12-03 18:55:02 -08:00
client.go	runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031 )	2026-05-29 13:35:47 -07:00
client_test.go	runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031 )	2026-05-29 13:35:47 -07:00
types.go	server: add cached eval metric to response	2026-06-25 16:21:08 -07:00
types_test.go	runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031 )	2026-05-29 13:35:47 -07:00
types_typescript_test.go	tools: support anyOf types	2025-08-05 16:46:24 -07:00