llama/compat: handle null buft in maybe_load_tensor

Crash repro: glm-ocr (and any text-side handler that registers a
load_op via register_concat_load) hits

  GGML_ASSERT(buft) failed
  ggml_backend_buft_get_device + 0
  llama_ollama_compat::maybe_load_tensor + 168
  llama_ollama_compat::maybe_load_text_tensor + 192

during load_all_data when a tensor has cur->buffer == nullptr (e.g.
tied output reusing token_embd's storage, or any tensor that hasn't
been bound to a backend buffer yet at load time). maybe_load_text_tensor
correctly passes buft = nullptr in that case, but the subsequent call
to ggml_backend_buft_is_host(buft) inside maybe_load_tensor asserts
buft is non-null.

Fix: when buft is null, treat as host (the cur->data pointer is the
write target — already host-allocated). If cur->data is also null we
can't do anything useful; log and bail with false to let upstream's
normal load path try.
This commit is contained in:
jmorganca 2026-04-19 21:57:22 -07:00
parent fd98ffa1e6
commit cc7bdf0bcc

View file

@ -1972,8 +1972,19 @@ bool maybe_load_tensor(ggml_tensor * cur,
return false;
}
if (ggml_backend_buft_is_host(buft)) std::memcpy(cur->data, dst.data(), dst_size);
else ggml_backend_tensor_set(cur, dst.data(), 0, dst_size);
// buft can be null for tensors not yet bound to a backend buffer (e.g.
// tied output reusing token_embd's storage). In that case the tensor
// already has a host-side data pointer — write to it directly.
const bool is_host = !buft || ggml_backend_buft_is_host(buft);
if (is_host) {
if (!cur->data) {
LLAMA_LOG_ERROR("%s: no destination for %s (no buffer, no data)\n", __func__, ggml_get_name(cur));
return false;
}
std::memcpy(cur->data, dst.data(), dst_size);
} else {
ggml_backend_tensor_set(cur, dst.data(), 0, dst_size);
}
LLAMA_LOG_INFO("%s: %s for %s (%zu bytes)\n", __func__, op.description, ggml_get_name(cur), dst_size);
return true;