mirror of
https://github.com/danny-avila/LibreChat.git
synced 2026-06-09 17:31:19 +00:00
* 🐛 fix: anchor getCodeGeneratedFiles on threadFileIds, not threadMessageIds In a branched conversation (regenerations producing the same code-output filename), `getCodeGeneratedFiles` would silently exclude files whose File-record `messageId` lived on a sibling branch. The user-visible symptom: "the previous file isn't persisted" — the LLM tries `load_workbook("output.xlsx")` on turn 2 and gets `FileNotFoundError` because LC sent `_injected_files: []` to codeapi instead of priming the prior turn's output. `claimCodeFile` is keyed by `(filename, conversationId, context)` — not by messageId. When sibling A first creates `output.csv`, the File record persists with `messageId = A`. When sibling N (a regeneration of A's parent) recreates `output.csv`, the claim finds A's record and `processCodeOutput` deliberately preserves `messageId = A` to keep file→original-creator provenance intact (correct behavior for the linear case where the original creator is in-thread). Turn N+1's `parentMessageId = N`. `getThreadData` walks back from N: the thread is `[N, root]` — sibling A is NOT in it. The pre-fix query filtered by `messageId IN [N, root]`, so the file was excluded. `getCodeGeneratedFiles` already lives next to `getUserCodeFiles`, which has always filtered by `file_id IN threadFileIds` (the file_ids referenced by `messages.files[]` arrays during the thread walk). The asymmetry — user-uploaded files anchored on the message's reference, code-generated files anchored on the File's own creator — was the bug. Anchoring both functions on `threadFileIds` reaches the right files regardless of which sibling first generated them. `File.messageId` stays informational ("who first generated this") for provenance and `processCodeOutput`'s "preserve original messageId on update" logic stays as-is — only the lookup key for thread-scoped fetches changes. - `packages/data-schemas/src/methods/file.ts`: signature + filter change. JSDoc spells out the branched-conversation rationale. - `packages/api/src/agents/initialize.ts`: pass `threadFileIds` instead of `threadMessageIds`. The local `threadMessageIds` declaration is removed since the only consumer is gone. - `packages/data-schemas/src/methods/file.spec.ts`: 5 new cases: - basic happy-path (file referenced by current thread) - **the regression**: file's creator messageId is on a sibling branch but file_id is in threadFileIds → finds it - empty/missing threadFileIds returns [] - cross-conversation isolation - non-execute_code context filter still applies (a chat attachment won't be returned even if its file_id is in threadFileIds — that's `getUserCodeFiles`'s job) Applies cleanly on top of dev. When LC #12960 (the typed CodeEnvRef cutover) lands, the only conflict is the legacy `metadata.fileIdentifier` metadata key flipping to `metadata.codeEnvRef` — same line, trivial resolve. - [x] `cd packages/data-schemas && npx jest src/methods/file.spec` — 42/42 pass (including the 5 new regression cases) - [x] `cd packages/api && npx jest src/agents` — 722/722 pass (modulo 2 pre-existing summarization e2e failures unrelated) - [x] `cd api && npx jest server/services/Files server/controllers/agents` — 432/432 pass - [x] `npx tsc --noEmit -p packages/api/tsconfig.json` — clean - [ ] Manual: branched conversation reproducer — generate a file in turn 1, regenerate the parent (sibling), then in turn N+1 ask the agent to read the file. Pre-fix: `FileNotFoundError`. Post-fix: the file is primed and load_workbook succeeds. * 🧪 test: lock initialize.ts → getCodeGeneratedFiles call shape Integration-level regression test asserting initializeAgent passes `threadFileIds` (not `threadMessageIds`) to getCodeGeneratedFiles in branched-conversation scenarios. Locks in the API shape from the previous commit, sitting one layer above the data-schemas unit test — so a future refactor to the priming chain can't silently revert to the messageId-based filter without surfacing a test failure here. Two cases: - The full call shape: agent.tools=['execute_code'], resendFiles=true, threadData mock returns distinct messageIds and fileIds. Asserts the call uses fileIds, and that getUserCodeFiles uses the same array (the symmetric design that closes the sibling-branch hole). - Empty threadFileIds: getCodeGeneratedFiles is still called with [] (its own internal early-return handles the empty case); getUserCodeFiles is gated at the call site and stays unscheduled. |
||
|---|---|---|
| .. | ||
| src | ||
| types | ||
| .gitignore | ||
| babel.config.cjs | ||
| jest.config.mjs | ||
| jest.setup.cjs | ||
| package.json | ||
| rollup.config.js | ||
| tsconfig-paths-bootstrap.mjs | ||
| tsconfig.build.json | ||
| tsconfig.json | ||
| tsconfig.spec.json | ||