mirror of
https://github.com/ollama/ollama.git
synced 2026-05-13 22:37:14 +00:00
llama/compat: document non-public API dependencies
Added a "Maintenance" section to llama/compat/README.md and an inline
block to llama-ollama-compat-util.h enumerating the three places the
compat layer depends on something that isn't strictly guaranteed
upstream-public API:
1. Direct ggml_tensor::{type,ne,nb} field writes (public struct,
no sanctioned post-creation mutators).
2. const_cast on gguf_get_tensor_name's return to rename in place.
3. llama_model_loader forward-decl from src/ (used only as an
opaque pointer key).
Each entry ships with its escape hatch (small upstream patch or a
local replacement). All have been stable for years; they don't need
immediate action, but surfacing them makes the maintenance story
explicit for future readers.
This commit is contained in:
parent
2a388da77b
commit
9a69a17dc2
2 changed files with 59 additions and 0 deletions
|
|
@ -78,3 +78,36 @@ llama.cpp's source in the Ollama tree (the old `llama/llama.cpp/` layout).
|
|||
This shim keeps upstream unmodified on disk and the Ollama-specific logic
|
||||
isolated in two files plus a small diff — upstream bumps are usually just
|
||||
`LLAMA_CPP_VERSION` changes.
|
||||
|
||||
## Maintenance: non-public API dependencies
|
||||
|
||||
The compat code is mostly written against stable public APIs (`gguf.h`,
|
||||
`ggml.h`, `ggml-backend.h`). There are three places where we lean on
|
||||
something that isn't strictly public:
|
||||
|
||||
| Hack | Why | Escape hatch if upstream changes |
|
||||
|---|---|---|
|
||||
| Direct writes to `ggml_tensor::type` / `ne[]` / `nb[]` | No sanctioned mutator exists for post-creation tensor reshape/retype. Struct is public so this works today. | Ask upstream to expose `ggml_tensor_set_{type,shape}` helpers, or introduce them in our compat util and submit a PR. |
|
||||
| `const_cast<char *>(gguf_get_tensor_name(...))` in `rename_tensor` | Pointer aims into a mutable `char[GGML_MAX_NAME]` buffer inside a `std::vector` element; the const is API hygiene. Lets us rename gguf tensors without a new public helper. | Add `gguf_rename_tensor` to `gguf.h` (10 lines) and drop the `const_cast`. |
|
||||
| `llama_model_loader` forward-decl from `src/llama-model-loader.h` | Used only as an opaque pointer key for our skip-prefix registry. Never dereferenced. | Replace with `const void *` in our registry signatures. Zero behavioral change. |
|
||||
|
||||
None of these have changed in years. If an upstream bump breaks any of
|
||||
them, each has a trivial workaround. See the top of
|
||||
`llama-ollama-compat-util.h` for the inline notes.
|
||||
|
||||
## Documented hacks inside per-arch handlers
|
||||
|
||||
- **`reclaim_slot_as` (qwen35moe patch_embed split)** — repurposes an
|
||||
orphaned `v.blk.0.attn_k` slot (left over after the QKV merge) as a
|
||||
newly-synthesized `v.patch_embd.weight.1`. Needed because clip.cpp's
|
||||
`ctx_meta` is sized for exactly the original tensor count (no_alloc
|
||||
branch of `gguf_init_from_file` uses `n_tensors * ggml_tensor_overhead()`
|
||||
with zero slack). Comment in the helper and call site explains the
|
||||
reasoning; replacement would be a 1-line upstream patch that adds small
|
||||
slack to the ctx size.
|
||||
|
||||
- **Load-op registry overrides `file_offset`** — `maybe_load_tensor` gets
|
||||
passed the gguf offset by its caller but ignores it when a registered
|
||||
op exists. Intentional: the ops capture their own source offsets at
|
||||
translate time (before our renames invalidate them). Documented in the
|
||||
op-registration helpers.
|
||||
|
|
|
|||
26
llama/compat/llama-ollama-compat-util.h
vendored
26
llama/compat/llama-ollama-compat-util.h
vendored
|
|
@ -7,6 +7,32 @@
|
|||
// definitions live in llama-ollama-compat-util.cpp, which also owns the
|
||||
// registry globals (tensor skip list, load-op table) that need a single
|
||||
// translation unit.
|
||||
//
|
||||
// ---- Non-public API dependencies (see also README.md "Maintenance") ----
|
||||
//
|
||||
// Mostly public: gguf_* and ggml_* accessors from ggml/include/ are all
|
||||
// stable. `ggml_backend_*` and `ggml_fp16_to_fp32` are stable too.
|
||||
//
|
||||
// Three pieces we rely on that aren't strictly guaranteed public:
|
||||
//
|
||||
// 1. Direct writes to `ggml_tensor::type`, `ne[]`, `nb[]` — the struct is
|
||||
// public and fields are spec'd, but there's no sanctioned mutator for
|
||||
// them post-creation. Used in set_tensor_type / set_tensor_shape /
|
||||
// reclaim_slot_as. Risk: upstream could in principle introduce an
|
||||
// opaque-tensor mode; in practice it hasn't in years.
|
||||
//
|
||||
// 2. `const_cast<char *>(gguf_get_tensor_name(...))` in rename_tensor.
|
||||
// The pointer returned points into a mutable char[GGML_MAX_NAME]
|
||||
// buffer inside a std::vector element. Defined behavior as long as
|
||||
// upstream keeps name storage in-line (has done so forever).
|
||||
//
|
||||
// 3. `llama_model_loader` forward decl from src/llama-model-loader.h
|
||||
// (internal, not llama.h). Only used as an opaque pointer key for
|
||||
// the skip-prefix registry — we never dereference it. Could swap for
|
||||
// `const void *` if upstream ever moved that type around.
|
||||
//
|
||||
// All three are trivially replaceable if upstream changes out from under
|
||||
// us. See llama/compat/README.md for the escape hatches.
|
||||
|
||||
#include <cstddef>
|
||||
#include <cstdint>
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue