LibreChat/packages/api
Danny Avila eb20d8805d
🐛 refactor: anchor code-generated file lookup on threadFileIds for branched conversations (#13004)
* 🐛 fix: anchor getCodeGeneratedFiles on threadFileIds, not threadMessageIds

In a branched conversation (regenerations producing the same code-output
filename), `getCodeGeneratedFiles` would silently exclude files whose
File-record `messageId` lived on a sibling branch. The user-visible
symptom: "the previous file isn't persisted" — the LLM tries
`load_workbook("output.xlsx")` on turn 2 and gets `FileNotFoundError`
because LC sent `_injected_files: []` to codeapi instead of priming
the prior turn's output.

`claimCodeFile` is keyed by `(filename, conversationId, context)` —
not by messageId. When sibling A first creates `output.csv`, the File
record persists with `messageId = A`. When sibling N (a regeneration
of A's parent) recreates `output.csv`, the claim finds A's record and
`processCodeOutput` deliberately preserves `messageId = A` to keep
file→original-creator provenance intact (correct behavior for the
linear case where the original creator is in-thread).

Turn N+1's `parentMessageId = N`. `getThreadData` walks back from N:
the thread is `[N, root]` — sibling A is NOT in it. The pre-fix query
filtered by `messageId IN [N, root]`, so the file was excluded.

`getCodeGeneratedFiles` already lives next to `getUserCodeFiles`,
which has always filtered by `file_id IN threadFileIds` (the file_ids
referenced by `messages.files[]` arrays during the thread walk). The
asymmetry — user-uploaded files anchored on the message's reference,
code-generated files anchored on the File's own creator — was the
bug. Anchoring both functions on `threadFileIds` reaches the right
files regardless of which sibling first generated them.

`File.messageId` stays informational ("who first generated this") for
provenance and `processCodeOutput`'s "preserve original messageId on
update" logic stays as-is — only the lookup key for thread-scoped
fetches changes.

- `packages/data-schemas/src/methods/file.ts`: signature + filter
  change. JSDoc spells out the branched-conversation rationale.
- `packages/api/src/agents/initialize.ts`: pass `threadFileIds` instead
  of `threadMessageIds`. The local `threadMessageIds` declaration is
  removed since the only consumer is gone.
- `packages/data-schemas/src/methods/file.spec.ts`: 5 new cases:
  - basic happy-path (file referenced by current thread)
  - **the regression**: file's creator messageId is on a sibling
    branch but file_id is in threadFileIds → finds it
  - empty/missing threadFileIds returns []
  - cross-conversation isolation
  - non-execute_code context filter still applies (a chat attachment
    won't be returned even if its file_id is in threadFileIds —
    that's `getUserCodeFiles`'s job)

Applies cleanly on top of dev. When LC #12960 (the typed CodeEnvRef
cutover) lands, the only conflict is the legacy `metadata.fileIdentifier`
metadata key flipping to `metadata.codeEnvRef` — same line, trivial
resolve.

- [x] `cd packages/data-schemas && npx jest src/methods/file.spec` —
  42/42 pass (including the 5 new regression cases)
- [x] `cd packages/api && npx jest src/agents` — 722/722 pass
  (modulo 2 pre-existing summarization e2e failures unrelated)
- [x] `cd api && npx jest server/services/Files server/controllers/agents` —
  432/432 pass
- [x] `npx tsc --noEmit -p packages/api/tsconfig.json` — clean
- [ ] Manual: branched conversation reproducer — generate a file in
  turn 1, regenerate the parent (sibling), then in turn N+1 ask the
  agent to read the file. Pre-fix: `FileNotFoundError`. Post-fix:
  the file is primed and load_workbook succeeds.

* 🧪 test: lock initialize.ts → getCodeGeneratedFiles call shape

Integration-level regression test asserting initializeAgent passes
`threadFileIds` (not `threadMessageIds`) to getCodeGeneratedFiles
in branched-conversation scenarios. Locks in the API shape from the
previous commit, sitting one layer above the data-schemas unit test —
so a future refactor to the priming chain can't silently revert to
the messageId-based filter without surfacing a test failure here.

Two cases:
- The full call shape: agent.tools=['execute_code'], resendFiles=true,
  threadData mock returns distinct messageIds and fileIds. Asserts the
  call uses fileIds, and that getUserCodeFiles uses the same array
  (the symmetric design that closes the sibling-branch hole).
- Empty threadFileIds: getCodeGeneratedFiles is still called with []
  (its own internal early-return handles the empty case); getUserCodeFiles
  is gated at the call site and stays unscheduled.
2026-05-08 12:29:44 -04:00
..
src 🐛 refactor: anchor code-generated file lookup on threadFileIds for branched conversations (#13004) 2026-05-08 12:29:44 -04:00
types 🔬 ci: Add TypeScript Type Checks to Backend Workflow and Fix All Type Errors (#12451) 2026-03-28 21:06:39 -04:00
.gitignore
babel.config.cjs
jest.config.mjs 🌱 fix: Inject Code-Tool Files Into Graph Sessions on First Call (+ read_file Sandbox Fallback) (#12831) 2026-04-27 08:56:39 +09:00
jest.setup.cjs 🌱 fix: Inject Code-Tool Files Into Graph Sessions on First Call (+ read_file Sandbox Fallback) (#12831) 2026-04-27 08:56:39 +09:00
package.json 🧱 refactor: typed CodeEnvRef + kind discriminator + principal-aware sandbox cache (#12960) 2026-05-08 12:29:43 -04:00
rollup.config.js 🔄 refactor: Migrate Cache Logic to TypeScript (#9771) 2025-10-02 09:33:58 -04:00
tsconfig-paths-bootstrap.mjs
tsconfig.build.json
tsconfig.json 📦 chore: npm audit fixes and Mongoose 8.23 TypeScript follow-ups (#12996) 2026-05-07 09:47:40 -04:00
tsconfig.spec.json 📦 chore: Update TypeScript Config for TS v7 (#12794) 2026-04-23 12:51:03 -04:00