llama/compat: handle null buft in maybe_load_tensor

Crash repro: glm-ocr (and any text-side handler that registers a load_op via register_concat_load) hits GGML_ASSERT(buft) failed ggml_backend_buft_get_device + 0 llama_ollama_compat::maybe_load_tensor + 168 llama_ollama_compat::maybe_load_text_tensor + 192 during load_all_data when a tensor has cur->buffer == nullptr (e.g. tied output reusing token_embd's storage, or any tensor that hasn't been bound to a backend buffer yet at load time). maybe_load_text_tensor correctly passes buft = nullptr in that case, but the subsequent call to ggml_backend_buft_is_host(buft) inside maybe_load_tensor asserts buft is non-null. Fix: when buft is null, treat as host (the cur->data pointer is the write target — already host-allocated). If cur->data is also null we can't do anything useful; log and bail with false to let upstream's normal load path try.
2026-05-13 22:37:14 +00:00 · 2026-04-19 21:57:22 -07:00 · 2026-04-19 21:57:22 -07:00 · cc7bdf0bcc
commit cc7bdf0bcc
parent fd98ffa1e6
1 changed files with 13 additions and 2 deletions
--- a/llama/compat/llama-ollama-compat.cpp
+++ b/llama/compat/llama-ollama-compat.cpp
@ -1972,8 +1972,19 @@ bool maybe_load_tensor(ggml_tensor * cur,
        return false;
    }

-    if (ggml_backend_buft_is_host(buft)) std::memcpy(cur->data, dst.data(), dst_size);
-    else                                 ggml_backend_tensor_set(cur, dst.data(), 0, dst_size);
+    // buft can be null for tensors not yet bound to a backend buffer (e.g.
+    // tied output reusing token_embd's storage). In that case the tensor
+    // already has a host-side data pointer — write to it directly.
+    const bool is_host = !buft || ggml_backend_buft_is_host(buft);
+    if (is_host) {
+        if (!cur->data) {
+            LLAMA_LOG_ERROR("%s: no destination for %s (no buffer, no data)\n", __func__, ggml_get_name(cur));
+            return false;
+        }
+        std::memcpy(cur->data, dst.data(), dst_size);
+    } else {
+        ggml_backend_tensor_set(cur, dst.data(), 0, dst_size);
+    }

    LLAMA_LOG_INFO("%s: %s for %s (%zu bytes)\n", __func__, op.description, ggml_get_name(cur), dst_size);
    return true;