mirror of
https://github.com/danny-avila/LibreChat.git
synced 2026-06-09 17:31:19 +00:00
* 🚫 fix: Reject Binary Files in read_file Sandbox Fallback (No More Mojibake) `read_file("/mnt/data/simple_graph.png")` was shelling `cat` through codeapi `/exec` and shipping the result back to the model. codeapi's transport is JSON, so `stdout` containing PNG bytes round-tripped through lossy UTF-8 replacement (every non-ASCII byte became U+FFFD), got line-numbered by `addLineNumbers`, and arrived in the model's context as a multi-KB blob of `�PNG\r\n 2 | ...`. The bytes were unrecoverable — and the same codeapi sandbox logged the base64-style mojibake too — so the goal is fail-fast, not retrieval. Two guards in `handleSandboxFileFallback`, both bypassed by the existing text path: 1. **Extension precheck** (BEFORE any network call) for known-binary types: images, documents (pdf/docx/xlsx/etc), archives, audio, video, native libs, fonts, and a few other byte-soup formats. The message for image extensions points at the existing chat attachment ("the image is already shown to the user, use bash_tool for programmatic processing"); other binaries get the generic "use bash" hint. 2. **NUL-byte sniff** (AFTER the read) on the first 8KB for unknown extensions or no-extension paths. The codeapi `/exec` JSON encoding mangles most non-UTF-8 bytes but a NUL terminator from a magic header survives the round-trip, so this catches novel binary formats without an extension precheck. `lowercaseExtension` uses the basename to avoid false-triggering on directory-name dots (e.g. `proj.v1/notes` has no extension, not `.v1/notes`). +6 tests: - Image rejected by extension, never calls readSandboxFile, message points at the existing attachment. - Non-image binary (.zip) rejected with a different (bash-only) message. - Case-insensitive extension match (.PNG vs .png). - NUL-byte sniff catches unknown-extension binary post-fetch. - Text files with binary-adjacent extensions (.txt) still readable. - Dotted directory names don't false-trigger the extension match. 38/38 handlers.spec.ts pass. The companion bash "command not found" issue from the same conversation is a separate LLM mistake (writing raw Python as the bash command without `python3 -c` / heredoc wrapper). Not coded here — flagged to the user. * 🖼️ fix: Allow SVG read_file (XML text, no mojibake risk) — codex review P2 `.svg` was bucketed with raster-image extensions in `BINARY_EXTENSIONS_NEVER_READABLE`, which made `handleSandboxFileFallback` reject every SVG before calling readSandboxFile. SVG is an XML text format — there's no mojibake risk for normal content, and the model has legitimate reasons to inspect or edit a generated SVG (tweaking colors, paths, viewBox, etc.). Block was a regression for valid read_file usage. Remove `.svg` from both `BINARY_EXTENSIONS_NEVER_READABLE` (so it routes through the normal sandbox read path) and `IMAGE_EXTENSIONS_FOR_HINT` (now-dead entry — only used by the rejection-message picker). The post-fetch NUL-byte sniff still catches anything that turns out to be binary despite a `.svg` extension. +1 regression test that an SVG with valid XML content reads through successfully (`<svg>...<circle/>...</svg>` → status: 'success', content contains `<svg`/`viewBox`). |
||
|---|---|---|
| .. | ||
| src | ||
| types | ||
| .gitignore | ||
| babel.config.cjs | ||
| jest.config.mjs | ||
| jest.setup.cjs | ||
| package.json | ||
| rollup.config.js | ||
| tsconfig-paths-bootstrap.mjs | ||
| tsconfig.build.json | ||
| tsconfig.json | ||
| tsconfig.spec.json | ||