ollama/api
Bruce MacDonald 6e65d95ef5
server: add cached eval metric to response
Added cached prompt token counts to Ollama responses and compatibility usage fields.

This carries local `llama-server` `cache_n` and MLX cache hits through `/api/generate`, `/api/chat`, OpenAI-compatible endpoints, and Anthropic-compatible `/v1/messages`. Cloud responses are passed through as-is, so cache counts will show up there once Cloud starts returning them.
2026-06-25 16:21:08 -07:00
..
examples ci: restore previous linter rules (#13322) 2025-12-03 18:55:02 -08:00
client.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
client_test.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
types.go server: add cached eval metric to response 2026-06-25 16:21:08 -07:00
types_test.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
types_typescript_test.go tools: support anyOf types 2025-08-05 16:46:24 -07:00