ollama/llm
Daniel Hiltgen dba1e27fa8
llama: enable FA on CUDA CC 6.x GPUs (#16994)
Recent upstream Pascal kernel fixes let us compile native SM60/SM61 kernels again instead of relying on PTX JIT, so allow Flash Attention auto at runtime for CC 6.x devices.

Fixes #16591

Fixes #16754
2026-07-02 17:11:39 -07:00
..
exit_status.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
exit_status_other.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
exit_status_windows.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
llama_binary.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
llama_binary_test.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
llama_server.go llm: fix ollama ps double-counting mmap'd weights on partial offload (#16709) 2026-06-24 11:43:20 -07:00
llama_server_test.go llama: enable FA on CUDA CC 6.x GPUs (#16994) 2026-07-02 17:11:39 -07:00
llm_darwin.go Optimize container images for startup (#6547) 2024-09-12 12:10:30 -07:00
llm_linux.go Optimize container images for startup (#6547) 2024-09-12 12:10:30 -07:00
llm_windows.go win: lint fix (#10571) 2025-05-05 11:08:12 -07:00
media.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
metal_retry.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
rocm_default.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
rocm_windows.go llama-server followups (#16353) 2026-06-01 10:44:21 -07:00
server.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
server_wait_test.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
status.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
status_test.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
vulkan_windows.go llm: use host Vulkan loader on Windows (#16869) 2026-06-24 10:35:48 -07:00