LibreChat/packages/api
Danny Avila 2624c18633
🚫 fix: Reject Binary Files in read_file Sandbox Fallback (No More Mojibake) (#12851)
* 🚫 fix: Reject Binary Files in read_file Sandbox Fallback (No More Mojibake)

`read_file("/mnt/data/simple_graph.png")` was shelling `cat` through codeapi
`/exec` and shipping the result back to the model. codeapi's transport is
JSON, so `stdout` containing PNG bytes round-tripped through lossy UTF-8
replacement (every non-ASCII byte became U+FFFD), got line-numbered by
`addLineNumbers`, and arrived in the model's context as a multi-KB blob of
`�PNG\r\n  2 | ...`. The bytes were unrecoverable — and the same
codeapi sandbox logged the base64-style mojibake too — so the goal is
fail-fast, not retrieval.

Two guards in `handleSandboxFileFallback`, both bypassed by the existing
text path:

  1. **Extension precheck** (BEFORE any network call) for known-binary
     types: images, documents (pdf/docx/xlsx/etc), archives, audio, video,
     native libs, fonts, and a few other byte-soup formats. The message
     for image extensions points at the existing chat attachment ("the
     image is already shown to the user, use bash_tool for programmatic
     processing"); other binaries get the generic "use bash" hint.

  2. **NUL-byte sniff** (AFTER the read) on the first 8KB for unknown
     extensions or no-extension paths. The codeapi `/exec` JSON encoding
     mangles most non-UTF-8 bytes but a NUL terminator from a magic
     header survives the round-trip, so this catches novel binary
     formats without an extension precheck.

`lowercaseExtension` uses the basename to avoid false-triggering on
directory-name dots (e.g. `proj.v1/notes` has no extension, not `.v1/notes`).

+6 tests:
  - Image rejected by extension, never calls readSandboxFile, message
    points at the existing attachment.
  - Non-image binary (.zip) rejected with a different (bash-only) message.
  - Case-insensitive extension match (.PNG vs .png).
  - NUL-byte sniff catches unknown-extension binary post-fetch.
  - Text files with binary-adjacent extensions (.txt) still readable.
  - Dotted directory names don't false-trigger the extension match.

38/38 handlers.spec.ts pass.

The companion bash "command not found" issue from the same conversation
is a separate LLM mistake (writing raw Python as the bash command without
`python3 -c` / heredoc wrapper). Not coded here — flagged to the user.

* 🖼️ fix: Allow SVG read_file (XML text, no mojibake risk) — codex review P2

`.svg` was bucketed with raster-image extensions in
`BINARY_EXTENSIONS_NEVER_READABLE`, which made `handleSandboxFileFallback`
reject every SVG before calling readSandboxFile. SVG is an XML text
format — there's no mojibake risk for normal content, and the model has
legitimate reasons to inspect or edit a generated SVG (tweaking colors,
paths, viewBox, etc.). Block was a regression for valid read_file usage.

Remove `.svg` from both `BINARY_EXTENSIONS_NEVER_READABLE` (so it routes
through the normal sandbox read path) and `IMAGE_EXTENSIONS_FOR_HINT`
(now-dead entry — only used by the rejection-message picker). The
post-fetch NUL-byte sniff still catches anything that turns out to be
binary despite a `.svg` extension.

+1 regression test that an SVG with valid XML content reads through
successfully (`<svg>...<circle/>...</svg>` → status: 'success', content
contains `<svg`/`viewBox`).
2026-04-27 20:47:38 -04:00
..
src 🚫 fix: Reject Binary Files in read_file Sandbox Fallback (No More Mojibake) (#12851) 2026-04-27 20:47:38 -04:00
types 🔬 ci: Add TypeScript Type Checks to Backend Workflow and Fix All Type Errors (#12451) 2026-03-28 21:06:39 -04:00
.gitignore
babel.config.cjs
jest.config.mjs 🌱 fix: Inject Code-Tool Files Into Graph Sessions on First Call (+ read_file Sandbox Fallback) (#12831) 2026-04-27 08:56:39 +09:00
jest.setup.cjs 🌱 fix: Inject Code-Tool Files Into Graph Sessions on First Call (+ read_file Sandbox Fallback) (#12831) 2026-04-27 08:56:39 +09:00
package.json 🌱 fix: Inject Code-Tool Files Into Graph Sessions on First Call (+ read_file Sandbox Fallback) (#12831) 2026-04-27 08:56:39 +09:00
rollup.config.js 🔄 refactor: Migrate Cache Logic to TypeScript (#9771) 2025-10-02 09:33:58 -04:00
tsconfig-paths-bootstrap.mjs
tsconfig.build.json 🧑‍💻 refactor: Secure Field Selection for 2FA & API Build Sourcemap (#9087) 2025-08-15 18:55:49 -04:00
tsconfig.json 📦 chore: Update TypeScript Config for TS v7 (#12794) 2026-04-23 12:51:03 -04:00
tsconfig.spec.json 📦 chore: Update TypeScript Config for TS v7 (#12794) 2026-04-23 12:51:03 -04:00