llama/compat: document non-public API dependencies

Added a "Maintenance" section to llama/compat/README.md and an inline block to llama-ollama-compat-util.h enumerating the three places the compat layer depends on something that isn't strictly guaranteed upstream-public API: 1. Direct ggml_tensor::{type,ne,nb} field writes (public struct, no sanctioned post-creation mutators). 2. const_cast on gguf_get_tensor_name's return to rename in place. 3. llama_model_loader forward-decl from src/ (used only as an opaque pointer key). Each entry ships with its escape hatch (small upstream patch or a local replacement). All have been stable for years; they don't need immediate action, but surfacing them makes the maintenance story explicit for future readers.
2026-05-13 22:37:14 +00:00 · 2026-04-19 13:01:46 -07:00 · 2026-04-19 13:01:46 -07:00 · 9a69a17dc2
commit 9a69a17dc2
parent 2a388da77b
2 changed files with 59 additions and 0 deletions
--- a/llama/compat/README.md
+++ b/llama/compat/README.md
@ -78,3 +78,36 @@ llama.cpp's source in the Ollama tree (the old `llama/llama.cpp/` layout).
 This shim keeps upstream unmodified on disk and the Ollama-specific logic
 isolated in two files plus a small diff — upstream bumps are usually just
 `LLAMA_CPP_VERSION` changes.
+
+## Maintenance: non-public API dependencies
+
+The compat code is mostly written against stable public APIs (`gguf.h`,
+`ggml.h`, `ggml-backend.h`). There are three places where we lean on
+something that isn't strictly public:
+
+| Hack | Why | Escape hatch if upstream changes |
+|---|---|---|
+| Direct writes to `ggml_tensor::type` / `ne[]` / `nb[]` | No sanctioned mutator exists for post-creation tensor reshape/retype. Struct is public so this works today. | Ask upstream to expose `ggml_tensor_set_{type,shape}` helpers, or introduce them in our compat util and submit a PR. |
+| `const_cast<char *>(gguf_get_tensor_name(...))` in `rename_tensor` | Pointer aims into a mutable `char[GGML_MAX_NAME]` buffer inside a `std::vector` element; the const is API hygiene. Lets us rename gguf tensors without a new public helper. | Add `gguf_rename_tensor` to `gguf.h` (10 lines) and drop the `const_cast`. |
+| `llama_model_loader` forward-decl from `src/llama-model-loader.h` | Used only as an opaque pointer key for our skip-prefix registry. Never dereferenced. | Replace with `const void *` in our registry signatures. Zero behavioral change. |
+
+None of these have changed in years. If an upstream bump breaks any of
+them, each has a trivial workaround. See the top of
+`llama-ollama-compat-util.h` for the inline notes.
+
+## Documented hacks inside per-arch handlers
+
+- **`reclaim_slot_as` (qwen35moe patch_embed split)** — repurposes an
+  orphaned `v.blk.0.attn_k` slot (left over after the QKV merge) as a
+  newly-synthesized `v.patch_embd.weight.1`. Needed because clip.cpp's
+  `ctx_meta` is sized for exactly the original tensor count (no_alloc
+  branch of `gguf_init_from_file` uses `n_tensors * ggml_tensor_overhead()`
+  with zero slack). Comment in the helper and call site explains the
+  reasoning; replacement would be a 1-line upstream patch that adds small
+  slack to the ctx size.
+
+- **Load-op registry overrides `file_offset`** — `maybe_load_tensor` gets
+  passed the gguf offset by its caller but ignores it when a registered
+  op exists. Intentional: the ops capture their own source offsets at
+  translate time (before our renames invalidate them). Documented in the
+  op-registration helpers.
--- a/llama/compat/llama-ollama-compat-util.h
+++ b/llama/compat/llama-ollama-compat-util.h
@ -7,6 +7,32 @@
 // definitions live in llama-ollama-compat-util.cpp, which also owns the
 // registry globals (tensor skip list, load-op table) that need a single
 // translation unit.
+//
+// ---- Non-public API dependencies (see also README.md "Maintenance") ----
+//
+// Mostly public: gguf_* and ggml_* accessors from ggml/include/ are all
+// stable. `ggml_backend_*` and `ggml_fp16_to_fp32` are stable too.
+//
+// Three pieces we rely on that aren't strictly guaranteed public:
+//
+//  1. Direct writes to `ggml_tensor::type`, `ne[]`, `nb[]` — the struct is
+//     public and fields are spec'd, but there's no sanctioned mutator for
+//     them post-creation. Used in set_tensor_type / set_tensor_shape /
+//     reclaim_slot_as. Risk: upstream could in principle introduce an
+//     opaque-tensor mode; in practice it hasn't in years.
+//
+//  2. `const_cast<char *>(gguf_get_tensor_name(...))` in rename_tensor.
+//     The pointer returned points into a mutable char[GGML_MAX_NAME]
+//     buffer inside a std::vector element. Defined behavior as long as
+//     upstream keeps name storage in-line (has done so forever).
+//
+//  3. `llama_model_loader` forward decl from src/llama-model-loader.h
+//     (internal, not llama.h). Only used as an opaque pointer key for
+//     the skip-prefix registry — we never dereference it. Could swap for
+//     `const void *` if upstream ever moved that type around.
+//
+// All three are trivially replaceable if upstream changes out from under
+// us. See llama/compat/README.md for the escape hatches.

 #include <cstddef>
 #include <cstdint>