🧱 refactor: typed CodeEnvRef + kind discriminator + principal-aware sandbox cache (#12960)

* 🧱 refactor: typed CodeEnvRef + kind discriminator + tenant-aware sandbox cache

Final cutover for the LibreChat ↔ codeapi sandbox file identity. Replaces
the magic string `${session_id}/${file_id}?entity_id=...` with a typed,
discriminated `CodeEnvRef`. Pre-release lockstep deploy with codeapi
#1455 and agents #148; no legacy aliases retained.

## Final shape

```ts
type CodeEnvRef =
  | { kind: 'skill'; id: string; storage_session_id: string; file_id: string; version: number }
  | { kind: 'agent'; id: string; storage_session_id: string; file_id: string }
  | { kind: 'user';  id: string; storage_session_id: string; file_id: string };
```

`kind` drives codeapi's sessionKey: `<tenant>:<kind>:<id>[✌️<version>]`
for shared kinds, `<tenant>:user:<userId>` for user-private (auth context
provides `userId`). `version` is statically required for `kind: 'skill'`
and forbidden otherwise via discriminated union — constraint holds at
compile time on every consumer, not just codeapi's runtime validator.

`id` is sessionKey-meaningful for `'skill'` / `'agent'`; informational
only for `'user'` (codeapi resolves user identity from auth context).

## What changed

- `packages/data-provider/src/codeEnvRef.ts` — discriminated union +
  `CODE_ENV_KINDS` const-tuple keeps the runtime list and TS union
  locked together.
- Schemas: `metadata.codeEnvRef` and `SkillFile.codeEnvRef` enums
  tightened to `['skill', 'agent', 'user']`.
- `primeSkillFiles` writes `kind: 'skill'`, `id: skill._id`,
  `version: skill.version`. Cache-hit path reads `codeEnvRef`
  directly. Bumping `skill.version` on edit naturally invalidates
  the prior cache entry under the new sessionKey.
- `processCodeOutput` writes `kind: 'user'`, `id: req.user.id`. Output
  bucket is always user-scoped, regardless of which skill the
  execution invoked. New regression test pins the asymmetry.
- `primeFiles` reupload preserves `kind`/`id`/`version?` from the
  existing ref so a skill-cache-miss reupload doesn't silently demote
  to user bucket.
- `crud.js` upload functions (`uploadCodeEnvFile` /
  `batchUploadCodeEnvFiles`) thread `kind`/`id`/`version?` to the
  multipart form (codeapi #1455 option α). Without these on the wire,
  codeapi falls back to user bucketing and skill-cache invalidation
  never fires. Client-side validation mirrors codeapi's validator.
- `Files/process.js` — chat attachments use `kind: 'user'`; agent
  setup files use `kind: 'agent'`.
- Drops `entity_id` everywhere (struct, schema sub-docs, write paths,
  upload form fields). Drops `'system'` from the kind enum (no emitter
  ever existed).

## Test plan

- [x] `cd packages/data-provider && npx jest src/codeEnvRef.spec` — 4 / 4
- [x] `cd packages/data-schemas && npx jest` — 1447 / 1447
- [x] `cd packages/api && npx jest src/agents` — 81 / 81 in skillFiles +
  handlers + resources
- [x] `cd api && npx jest server/services/Files server/controllers/agents` —
  436 / 436
- [x] `cd api && npx jest server/services/Files/Code` — 98 / 98 (incl.
  new "outputs are user-scoped regardless of which skill the execution
  invoked" regression and "reupload forwards kind/id/version from
  existing ref")
- [x] `npx tsc --noEmit -p packages/data-{provider,schemas}/tsconfig.json
  && npx tsc --noEmit -p packages/api/tsconfig.json` — clean (only
  pre-existing unrelated dev errors in storage/balance, untouched here)

## Deploy notes

- **24h cache-miss burst** on first deploy. Inputs (skill caches re-prime
  under new sessionKey shape) and outputs (any pre-Phase C skill-output
  cached files become unreadable). Bounded by codeapi's 24h TTL.
- **Lockstep with codeapi #1455 and agents #148.** Either repo can land
  first since no aliases to drain, but the three deploys must overlap
  within the same maintenance window.
- **`@librechat/agents` bump to `3.1.79-dev.0`** required after agents
  #148 lands and is published.

## What this enables

Auth bridge work (JWT-based tenant/user identity between LC and codeapi)
— codeapi now derives sessionKey purely from `req.codeApiAuthContext.{
tenantId, userId}`, so the next chapter is replacing the header-asserted
user identity with a verified-claim path.

* 🩹 fix: persist execute_code uploads under codeEnvRef metadata key

Codex review P1 (chatgpt-codex-connector). `Files/process.js` was
storing the upload result under `metadata.fileIdentifier` even though:
- `uploadCodeEnvFile` now returns `{ storage_session_id, file_id }`,
  not the legacy magic string.
- The post-cutover schema (`File.metadata.codeEnvRef`) only declares
  `codeEnvRef` — mongoose strict mode silently strips unknown keys.
- All readers (`primeFiles`, `getCodeFilesByIds`,
  `categorizeFileForToolResources`, controller filtering) check
  `metadata.codeEnvRef`.

Net effect of the bug: chat-attached and agent-setup execute_code files
would lose their sandbox reference on save, and primeFiles would skip
them on subsequent code-execution turns — the file blob would still be
available locally but never re-mounted in the sandbox.

Fix: construct the full `CodeEnvRef` (`{ kind, id, storage_session_id,
file_id }`) at the write site and persist under `metadata.codeEnvRef`.
`BaseClient`'s "is this a code-env file" presence check accepts the new
shape alongside the legacy `fileIdentifier` for back-compat with any
pre-cutover records still in the database. Mirrors the same change in
`processAttachments.spec.ts` (which re-implements the BaseClient logic
for testability).

New regression tests in `process.spec.js` cover three cases:
- chat attachments (`messageAttachment=true`) → `kind: 'user'`
- agent setup (`messageAttachment=false`) → `kind: 'agent'`
- legacy `fileIdentifier` key is NOT persisted (would be schema-stripped)

* 🩹 fix: read storage_session_id on primed file refs (Codex P1)

Codex review (chatgpt-codex-connector). After Phase B's per-file
`session_id` → `storage_session_id` rename, `primeFiles` emits the
new field — but `seedCodeFilesIntoSessions` was still reading
`files[0].session_id` for the representative session and `f.session_id`
for the dedupe key. In runs with only primed attachments (no skill
seed), `representativeSessionId` was `undefined`, the function
returned the unchanged map, and `seedCodeFilesIntoSessions` silently
dropped the entire batch. The first `execute_code` call then started
without `_injected_files` and the agent couldn't see prior-turn
artifacts.

Fix:
- `codeFilesSession.ts`: read `f.storage_session_id` for both the
  dedupe key and the representative session id. JSDoc updated to
  match the new field name.
- `callbacks.js`: the two output-file persistence paths read
  `file.session_id` to pass to `processCodeOutput` — switch to
  `file.storage_session_id`. The original comment explicitly says
  this should be the STORAGE session, which is exactly the field
  Phase B renamed.
- `codeFilesSession.spec.ts`: fixture builder uses `storage_session_id`
  and `kind: 'user'` to match the post-cutover `CodeEnvFile` shape.

Lockstep coordination: this matches the post-bump shape of
`@librechat/agents` 3.1.79+. CI tsc errors against the currently-pinned
3.1.78 are expected and resolve when the dep bumps in this PR before
merge.

* 📦 chore: Bump `@librechat/agents` to version 3.1.80-dev.0 in package-lock and package.json files

* 🪪 fix: thread kind/id/version through codeapi /download URLs (Phase C α)

Symmetric fix for the upload-side wire change in 537725a. Codeapi's
`sessionAuth` middleware now requires `kind`/`id`/`version?` on every
download/freshness URL — without them it 400s with "kind must be one
of: skill, agent, user" before serving the file.

Three sites construct codeapi-side URLs that go through `sessionAuth`:

- `processCodeOutput` (`Files/Code/process.js`): `/download/<sess>/<id>`
  for freshly-generated sandbox outputs. Always `kind: 'user'` +
  `id: req.user.id` — code-output files are always user-private,
  regardless of which skill the run invoked.
- `getSessionInfo` (`Files/Code/process.js`): `/sessions/<sess>/objects/<id>`
  for the 23h freshness check. Pulls kind/id/version straight off the
  `codeEnvRef` already in scope — skill files stay skill-bucketed,
  user files stay user-bucketed.
- `/code/download/:session_id/:fileId` LC route (`routes/files/files.js`):
  proxies to codeapi for manual downloads. Code-output files only on
  this route, so `kind: 'user'` + `id: req.user.id`.

The `getCodeOutputDownloadStream` helper in `crud.js` now takes an
`identity` param, validated by a `buildCodeEnvDownloadQuery` helper
that mirrors `appendCodeEnvFileIdentity`'s shape rules: kind required
from the closed `{skill, agent, user}` set, version required for
'skill' and forbidden otherwise. Bad callers fail fast on the client
instead of round-tripping a 400.

Also cleans up two log-noise sources reported alongside the 400:

- `logAxiosError` in `packages/api/src/utils/axios.ts` was dumping
  `error.response.data` raw. With `responseType: 'arraybuffer'` that's
  a `Buffer` (~4 chars per byte after JSON-serialization); with
  `responseType: 'stream'` it's a `Readable` whose internal state
  serializes the entire ring buffer + socket. New `renderResponseData`
  decodes small buffers as UTF-8 (truncated past 2KB) and stubs streams
  as `'[stream]'`. Diagnostics stay useful, log lines stop being
  megabytes.
- `/code/download` route's catch was bare `logger.error('...', error)`,
  bypassing the redactor. Switched to `logAxiosError` so it benefits
  from the same buffer/stream handling.

Tests updated to match the new contract:
- crud.spec: `getCodeOutputDownloadStream` fixtures pass `userIdentity`;
  new cases cover skill identity (with version), bad kind rejection,
  skill-without-version rejection.
- process.spec: `getSessionInfo` test passes a full `codeEnvRef` object.

* ♻️ refactor: extract codeEnv identity helpers into packages/api

Per the project convention that new backend code lives in TypeScript
under `packages/api`, moves `appendCodeEnvFileIdentity` and
`buildCodeEnvDownloadQuery` from `api/server/services/Files/Code/crud.js`
into a new `packages/api/src/files/code/identity.ts` module.

Both helpers are pure validators that mirror codeapi's
`parseUploadSessionKeyInput` server-side rules (closed kind set,
`version` required for `'skill'` and forbidden otherwise) — they
deserve TS support and a dedicated spec rather than living as
JSDoc-typed helpers in the legacy `/api` workspace. The new module:

- Exports a `CodeEnvIdentity` interface using the
  `librechat-data-provider` `CodeEnvKind` discriminated union.
- Adds 13 unit tests in `identity.spec.ts` covering the validation
  matrix (skill+version, agent, user, and every rejection path) plus
  URL encoding for the download query.
- Re-exported from `packages/api/src/files/code/index.ts` alongside
  `classify`, `extract`, and `form`.

Consumer updates:
- `api/server/services/Files/Code/crud.js`: drops the local helpers
  and imports them from `@librechat/api`. Net -64 lines.
- `api/server/services/Files/Code/process.js`: same.
- Test mocks for `@librechat/api` in three spec files now stub the
  helpers' validation behavior locally rather than pulling them
  through `requireActual` (which would drag in provider-config
  init-time side effects). The package's `exports` field only
  surfaces the root barrel, so leaf imports aren't reachable from
  legacy `/api` test setup.

No runtime behavior change. Identity validation rules and emitted
form/query shapes are byte-for-byte identical pre/post.

* 🪪 fix: emit resource_id alongside id on _injected_files (skill 403 fix)

Companion to codeapi #1455 fix and agents 3.1.80-dev.1 — the wire
shape for shared-kind files now requires `resource_id` distinct from
the storage `id`. Without this LC change, codeapi's sessionKey
re-derivation on every shared-kind /exec rejects with 403
session_key_mismatch:

    cached:  legacy:skill:69dcf561...✌️59  (signed at upload, skill _id)
    derived: legacy:skill:ysPwEURuPk-...✌️59  (storage nanoid)

Emit sites updated:

- `primeInvokedSkills` cache-hit path: `resource_id: ref.id` (the
  persisted skill `_id` from `codeEnvRef.id`); `id: ref.file_id`
  unchanged (storage uuid).
- `primeInvokedSkills` fresh-upload path: `resource_id: skill._id.toString()`
  on every primed file (the `allPrimedFiles` builder type now carries
  the field).
- `processCodeOutput`'s `pushFile` (Code/process.js): `resource_id: ref.id`
  — for `kind: 'user'` this is informational (codeapi derives
  sessionKey from auth context) but emitted for shape uniformity
  with shared kinds.

Bumps `@librechat/agents` to `^3.1.80-dev.1` (the version that
ships the matching `CodeEnvFile.resource_id` field).

## Test plan

- [x] `cd packages/api && npx jest src/agents` — 67 / 67 pass
  (skillFiles fixtures updated to assert `resource_id` on the
  emitted CodeSessionContext.files).
- [x] `cd api && npx jest server/services/Files server/controllers/agents` —
  445 / 445 pass (process.spec fixtures updated for the reupload
  + cache-hit emission).
- [x] `npx tsc --noEmit -p packages/api/tsconfig.json` — clean.

* fix(skill-tool-call): carry resource_id through primeSkillFiles → artifact

Codeapi was 400ing every /exec following a `handle_skill` tool call
with `resource_id is invalid` (`type: 'undefined'`). Both code paths
in `primeSkillFiles` (cache-hit + fresh-upload) returned files
without `resource_id`/`kind`/`version`, and the artifact in
`handlers.ts` forwarded the stripped shape into
`tc.codeSessionContext.files` → `_injected_files`.

`primeInvokedSkills` (the NL-detected loader) had already been fixed
end-to-end; this commit aligns the tool-invoked path with the same
contract: `resource_id` = `skill._id.toString()`, `kind: 'skill'`,
`version` = the skill's monotonic counter.

Tests added to `skillFiles.spec.ts` lock the contract on
`primeSkillFiles` directly so future refactors can't silently drop
the resource identity again.

* fix(handlers.spec): align session_id → storage_session_id rename + kind discriminator

Pre-existing TS errors against the post-rename `CodeEnvFile` shape:
the test file still used `session_id` on per-file objects (renamed to
`storage_session_id` in agents Phase B/C) and was missing the `kind`
discriminator the discriminated union requires. Both inputs and the
matching `expect.toEqual(...)` mirrors updated together so the
runtime equality check still holds.

Lines 723-732 stay as-is — they sit behind `as unknown as
ToolCallRequest` and TS already skipped them.

* chore: fix `@librechat/agents`, correct version to 3.1.80-dev.0 in package.json files

* chore: bump `@librechat/agents` to version 3.1.80-dev.1 in package.json and package-lock.json

* chore: bump `@librechat/agents` to version 3.1.80-dev.2

* feat(observability): trace file priming chain from primeCodeFiles to _injected_files

Diagnosing the user-upload "files=[] on first /exec" bug requires
seeing where in the LC chain a file ref disappears. Prior to this
patch the chain (primeCodeFiles → primedCodeFiles → initialSessions
→ CodeSessionContext → _injected_files) was opaque end-to-end:
  - primeCodeFiles silently dropped files without `metadata.codeEnvRef`
  - reuploadFile catches all errors and continues with no signal
  - the handlers.ts handoff to codeapi never logged what it was sending

After this patch, a single grep on `[primeCodeFiles]` plus
`[code-env:inject]` shows the full per-file path:

  [primeCodeFiles] in: file_ids=N resourceFiles=M
  [primeCodeFiles] file=<id> path=skip reason=no-codeenvref filename=...
  [primeCodeFiles] file=<id> path=cache-hit-by-session storage_session_id=...
  [primeCodeFiles] file=<id> path=reupload reason=no-uploadtime ...
  [primeCodeFiles] file=<id> path=reupload reason=stale ...
  [primeCodeFiles] file=<id> path=reupload-success oldSession=... newSession=... newFileId=...
  [primeCodeFiles] file=<id> path=reupload-failed session=...
  [primeCodeFiles] file=<id> path=fresh-active storage_session_id=...
  [primeCodeFiles] out: returned=N skippedNoRef=M reuploadFailures=K

  [code-env:inject] tool=<name> files=N missingResourceId=K     (debug)
  [code-env:inject] M/N files missing resource_id ...           (warn)
  [code-env:inject] tool=<name> _injected_files=0 ...           (warn)

The boundary log warns when LC sends zero injected files on a
code-execution tool call — that's the user's actual symptom showing
up at the LC side instead of having to correlate against codeapi's
`Request received { files: [] }`.

Tag chosen as `[code-env:inject]` rather than `[handoff:exec]` to
avoid collision with the app-level "handoff" semantic (subagent
handoff workflow).

Structural cleanup in primeFiles: replaced the `if (ref) { ... }`
nesting with an early `if (!ref) continue` so the per-path
instrumentation hooks land at top-level scope instead of indented
inside a conditional. Behavior unchanged; pushFile / reuploadFile
identical.

Spec fixtures (handlers.spec.ts, codeFilesSession.spec.ts) updated
to include `resource_id` on `CodeEnvFile` literals — required by
the post-3.1.80-dev.2 type now installed.

## Test plan

- [x] `cd packages/api && npx jest src/agents/handlers.spec.ts src/agents/codeFilesSession.spec.ts src/agents/skillFiles.spec.ts` — 69/69 pass
- [x] `cd api && npx jest server/services/Files/Code/process.spec.js` — 84/84 pass
- [x] `npx tsc --noEmit -p packages/api` — clean
- [x] `npx eslint` on all four touched files — clean

* chore: add CONSOLE_JSON_STRING_LENGTH to .env.example for JSON log string length configuration

* fix(files): align codeapi upload filename with LC's sanitized DB filename

User-attached files for code execution were uploading to codeapi
under `file.originalname` (raw upload filename, may contain spaces /
special chars) while LC's DB record stored the sanitized form
(`sanitizeFilename(file.originalname)`, underscores). Codeapi
preserves whatever filename the upload sent, so the sandbox saw
`/mnt/data/<originalname>` while LC's `primeFiles` toolContext text
+ `_injected_files.name` referenced `file.filename` (sanitized).

Visible failure: agent gets system prompt saying

    /mnt/data/librechat_code_api_-_active_customer_-_2025-11-05.xlsx

…tries that path, hits `FileNotFoundError`, then notices the
sandbox's actual `Available files` line says

    /mnt/data/librechat code api - active customer - 2025-11-05.xlsx

…retries with spaces, succeeds. Wastes a tool call per upload and
leaks raw filenames into model context.

Fix: sanitize once and use the sanitized form in both the codeapi
upload AND the LC DB record. Sandbox path = LC toolContext text =
in-memory ref name. No drift.

Reupload path (`Code/process.js` line 867 `filename: file.filename`)
already uses the sanitized DB name, so it stays consistent with the
fresh-upload path after this change.

## Test plan

- [x] `cd api && npx jest server/services/Files/process` — 32/32 pass
- [x] `npx eslint` on the touched file — clean

* chore: bump `@librechat/agents` to version 3.1.80-dev.3 in package.json and package-lock.json
This commit is contained in:
Danny Avila 2026-05-08 10:17:52 -04:00
parent 9441563b95
commit 93c4ef4ba8
No known key found for this signature in database
GPG key ID: BF31EEB2C5CA0956
40 changed files with 1937 additions and 456 deletions

View file

@ -40,6 +40,15 @@ jest.mock('@librechat/api', () => {
* inline (non-finalize) path so existing assertions on a single
* createFile call hold. */
hasOfficeHtmlPath: jest.fn(() => false),
/* Identity-helper stub mirroring `packages/api/src/files/code/identity.ts`.
* `processCodeOutput` calls this for every output download URL;
* traversal cases don't care about the query shape, just that it
* returns something concatable. */
buildCodeEnvDownloadQuery: jest.fn(({ kind, id, version }) => {
const params = new URLSearchParams({ kind, id });
if (version != null) params.set('version', String(version));
return `?${params.toString()}`;
}),
codeServerHttpAgent: new http.Agent({ keepAlive: false }),
codeServerHttpsAgent: new https.Agent({ keepAlive: false }),
};

View file

@ -7,6 +7,8 @@ const {
createAxiosInstance,
codeServerHttpAgent,
codeServerHttpsAgent,
appendCodeEnvFileIdentity,
buildCodeEnvDownloadQuery,
} = require('@librechat/api');
const axios = createAxiosInstance();
@ -16,16 +18,22 @@ const MAX_FILE_SIZE = 150 * 1024 * 1024;
/**
* Retrieves a download stream for a specified file.
* @param {string} fileIdentifier - The identifier for the file (e.g., "session_id/fileId").
* @param {{ kind: 'skill' | 'agent' | 'user'; id: string; version?: number }} identity
* Resource identity required by codeapi's `sessionAuth` to derive the
* matching sessionKey. For code-output downloads this is always
* `kind: 'user', id: <userId>`; for skill/agent re-downloads pass
* the kind+id (+version for skill) from the file's `metadata.codeEnvRef`.
* @returns {Promise<AxiosResponse>} A promise that resolves to a readable stream of the file content.
* @throws {Error} If there's an error during the download process.
*/
async function getCodeOutputDownloadStream(fileIdentifier) {
async function getCodeOutputDownloadStream(fileIdentifier, identity) {
try {
const baseURL = getCodeBaseURL();
const query = buildCodeEnvDownloadQuery(identity);
/** @type {import('axios').AxiosRequestConfig} */
const options = {
method: 'get',
url: `${baseURL}/download/${fileIdentifier}`,
url: `${baseURL}/download/${fileIdentifier}${query}`,
responseType: 'stream',
headers: {
'User-Agent': 'LibreChat/1.0',
@ -49,20 +57,31 @@ async function getCodeOutputDownloadStream(fileIdentifier) {
/**
* Uploads a file to the Code Environment server.
*
* `kind`/`id`/`version?` are required so codeapi can route the upload to
* the correct sessionKey bucket `<tenant>:<kind>:<id>[:v:<version>]`
* for shared kinds, `<tenant>:user:<authContext.userId>` for `user`.
* Without these, codeapi falls back to user-scoped bucketing regardless
* of the resource the file belongs to, so skill-cache invalidation
* (driven by the version bump on edit) never fires. See codeapi #1455.
*
* @param {Object} params - The params object.
* @param {ServerRequest} params.req - The request object from Express. It should have a `user` property with an `id` representing the user
* @param {import('fs').ReadStream | import('stream').Readable} params.stream - The read stream for the file.
* @param {string} params.filename - The name of the file.
* @param {string} [params.entity_id] - Optional entity ID for the file.
* @returns {Promise<string>}
* @param {'skill' | 'agent' | 'user'} params.kind - Resource kind that owns this file's storage session.
* @param {string} params.id - Resource id (skillId / agentId / userId). Codeapi
* ignores this for `kind: 'user'` (auth context provides userId), but it's
* sent uniformly for shape symmetry with the discriminated union.
* @param {number} [params.version] - Required when `kind === 'skill'`; absent otherwise.
* @returns {Promise<{ storage_session_id: string; file_id: string }>}
* The codeapi storage location of the uploaded file.
* @throws {Error} If there's an error during the upload process.
*/
async function uploadCodeEnvFile({ req, stream, filename, entity_id = '' }) {
async function uploadCodeEnvFile({ req, stream, filename, kind, id, version }) {
try {
const form = new FormData();
if (entity_id.length > 0) {
form.append('entity_id', entity_id);
}
appendCodeEnvFileIdentity(form, { kind, id, version });
appendCodeEnvFile(form, stream, filename);
const baseURL = getCodeBaseURL();
@ -83,18 +102,16 @@ async function uploadCodeEnvFile({ req, stream, filename, entity_id = '' }) {
const response = await axios.post(`${baseURL}/upload`, form, options);
/** @type {{ message: string; session_id: string; files: Array<{ fileId: string; filename: string }> }} */
/** @type {{ message: string; storage_session_id: string; files: Array<{ fileId: string; filename: string }> }} */
const result = response.data;
if (result.message !== 'success') {
throw new Error(`Error uploading file: ${result.message}`);
}
const fileIdentifier = `${result.session_id}/${result.files[0].fileId}`;
if (entity_id.length === 0) {
return fileIdentifier;
}
return `${fileIdentifier}?entity_id=${entity_id}`;
return {
storage_session_id: result.storage_session_id,
file_id: result.files[0].fileId,
};
} catch (error) {
throw new Error(
logAxiosError({
@ -109,25 +126,28 @@ async function uploadCodeEnvFile({ req, stream, filename, entity_id = '' }) {
* Uploads multiple files to the code execution environment in a single request.
* Uses the /upload/batch endpoint which shares one session_id across all files.
*
* `kind`/`id`/`version?` carry the resource identity for codeapi's sessionKey
* derivation see `uploadCodeEnvFile` for the full motivation.
*
* @param {object} params
* @param {import('express').Request & { user: { id: string } }} params.req - The request object.
* @param {Array<{ stream: NodeJS.ReadableStream; filename: string }>} params.files - Files to upload.
* @param {string} [params.entity_id] - Optional entity ID.
* @param {'skill' | 'agent' | 'user'} params.kind - Resource kind that owns the batch's storage session.
* @param {string} params.id - Resource id (skillId / agentId / userId).
* @param {number} [params.version] - Required when `kind === 'skill'`; absent otherwise.
* @param {boolean} [params.read_only] - When true, codeapi tags every file in
* the batch as infrastructure (e.g. skill files). The flag is persisted as
* MinIO object metadata (`X-Amz-Meta-Read-Only`) and travels with the file
* through subsequent download/walk passes sandboxed-code modifications
* are dropped on the floor and the original ref is echoed back as
* `inherited: true`, never as a generated artifact.
* @returns {Promise<{ session_id: string; files: Array<{ fileId: string; filename: string }> }>}
* @returns {Promise<{ storage_session_id: string; files: Array<{ fileId: string; filename: string }> }>}
* @throws {Error} If the batch upload fails entirely.
*/
async function batchUploadCodeEnvFiles({ req, files, entity_id = '', read_only = false }) {
async function batchUploadCodeEnvFiles({ req, files, kind, id, version, read_only = false }) {
try {
const form = new FormData();
if (entity_id.length > 0) {
form.append('entity_id', entity_id);
}
appendCodeEnvFileIdentity(form, { kind, id, version });
if (read_only) {
form.append('read_only', 'true');
}
@ -153,12 +173,12 @@ async function batchUploadCodeEnvFiles({ req, files, entity_id = '', read_only =
const response = await axios.post(`${baseURL}/upload/batch`, form, options);
/** @type {{ message: string; session_id: string; files: Array<{ status: string; fileId?: string; filename: string; error?: string }>; succeeded: number; failed: number }} */
/** @type {{ message: string; storage_session_id: string; files: Array<{ status: string; fileId?: string; filename: string; error?: string }>; succeeded: number; failed: number }} */
const result = response.data;
if (
!result ||
typeof result !== 'object' ||
!result.session_id ||
!result.storage_session_id ||
!Array.isArray(result.files)
) {
throw new Error(`Unexpected batch upload response: ${JSON.stringify(result).slice(0, 200)}`);
@ -179,7 +199,7 @@ async function batchUploadCodeEnvFiles({ req, files, entity_id = '', read_only =
.filter((f) => f.status === 'success' && f.fileId)
.map((f) => ({ fileId: f.fileId, filename: f.filename }));
return { session_id: result.session_id, files: successFiles };
return { storage_session_id: result.storage_session_id, files: successFiles };
} catch (error) {
throw new Error(
logAxiosError({
@ -190,4 +210,8 @@ async function batchUploadCodeEnvFiles({ req, files, entity_id = '', read_only =
}
}
module.exports = { getCodeOutputDownloadStream, uploadCodeEnvFile, batchUploadCodeEnvFiles };
module.exports = {
getCodeOutputDownloadStream,
uploadCodeEnvFile,
batchUploadCodeEnvFiles,
};

View file

@ -9,6 +9,26 @@ jest.mock('@librechat/agents', () => ({
getCodeBaseURL: jest.fn(() => 'https://code-api.example.com'),
}));
/* Inline the identity helpers' validation rules instead of pulling
* them through `@librechat/api`'s root barrel (which has init-time
* provider-config side effects that don't matter here) or its leaf
* module (the package's `exports` field only surfaces the root).
* The real implementation lives in `packages/api/src/files/code/identity.ts`
* and has its own dedicated `identity.spec.ts` covering the validation
* matrix; this stub just mirrors enough behavior for the surrounding
* crud tests to exercise the upload/download flow. */
const VALID_KINDS = new Set(['skill', 'agent', 'user']);
const validateIdentity = ({ kind, id, version }, label) => {
if (!kind || !VALID_KINDS.has(kind)) throw new Error(`${label}: invalid kind "${kind}"`);
if (!id) throw new Error(`${label}: missing id for kind "${kind}"`);
if (kind === 'skill' && version == null) {
throw new Error(`${label}: kind "skill" requires a numeric version`);
}
if (kind !== 'skill' && version != null) {
throw new Error(`${label}: version is only valid for kind "skill"`);
}
};
jest.mock('@librechat/api', () => {
const http = require('http');
const https = require('https');
@ -16,6 +36,18 @@ jest.mock('@librechat/api', () => {
appendCodeEnvFile: jest.fn((form, stream, filename) => {
form.append('file', stream, { filename });
}),
appendCodeEnvFileIdentity: jest.fn((form, identity) => {
validateIdentity(identity, 'appendCodeEnvFileIdentity');
form.append('kind', identity.kind);
form.append('id', identity.id);
if (identity.version != null) form.append('version', String(identity.version));
}),
buildCodeEnvDownloadQuery: jest.fn((identity) => {
validateIdentity(identity, 'buildCodeEnvDownloadQuery');
const params = new URLSearchParams({ kind: identity.kind, id: identity.id });
if (identity.version != null) params.set('version', String(identity.version));
return `?${params.toString()}`;
}),
logAxiosError: jest.fn(({ message }) => message),
createAxiosInstance: jest.fn(() => mockAxios),
codeServerHttpAgent: new http.Agent({ keepAlive: false }),
@ -32,11 +64,17 @@ describe('Code CRUD', () => {
});
describe('getCodeOutputDownloadStream', () => {
/* Code-output downloads always carry `kind: 'user'` + `id: <userId>`
* codeapi's `sessionAuth` rejects without them post-Phase C. The
* fixture mirrors what `processCodeOutput` and the `/code/download`
* route pass in production. */
const userIdentity = { kind: 'user', id: 'user-123' };
it('should pass dedicated keepAlive:false agents to axios', async () => {
const mockResponse = { data: Readable.from(['chunk']) };
mockAxios.mockResolvedValue(mockResponse);
await getCodeOutputDownloadStream('session-1/file-1');
await getCodeOutputDownloadStream('session-1/file-1', userIdentity);
const callConfig = mockAxios.mock.calls[0][0];
expect(callConfig.httpAgent).toBe(codeServerHttpAgent);
@ -50,18 +88,52 @@ describe('Code CRUD', () => {
it('should request stream response from the correct URL', async () => {
mockAxios.mockResolvedValue({ data: Readable.from(['chunk']) });
await getCodeOutputDownloadStream('session-1/file-1');
await getCodeOutputDownloadStream('session-1/file-1', userIdentity);
const callConfig = mockAxios.mock.calls[0][0];
expect(callConfig.url).toBe('https://code-api.example.com/download/session-1/file-1');
/* URL carries `?kind=user&id=<userId>` so codeapi's `sessionAuth`
* can reconstruct the matching `<tenant>:user:<userId>` sessionKey
* (Phase C / option α). */
expect(callConfig.url).toBe(
'https://code-api.example.com/download/session-1/file-1?kind=user&id=user-123',
);
expect(callConfig.responseType).toBe('stream');
expect(callConfig.timeout).toBe(15000);
});
it('forwards skill identity (kind/id/version) when re-downloading a primed skill file', async () => {
mockAxios.mockResolvedValue({ data: Readable.from(['chunk']) });
await getCodeOutputDownloadStream('session-2/file-x', {
kind: 'skill',
id: 'skill-abc',
version: 7,
});
const callConfig = mockAxios.mock.calls[0][0];
expect(callConfig.url).toBe(
'https://code-api.example.com/download/session-2/file-x?kind=skill&id=skill-abc&version=7',
);
});
it('rejects skill identity without a version (mirrors codeapi validator)', async () => {
await expect(
getCodeOutputDownloadStream('s/f', { kind: 'skill', id: 'skill-abc' }),
).rejects.toThrow(/skill.*version/);
expect(mockAxios).not.toHaveBeenCalled();
});
it('rejects unknown kind without dispatching to codeapi', async () => {
await expect(getCodeOutputDownloadStream('s/f', { kind: 'system', id: 'x' })).rejects.toThrow(
/invalid kind/,
);
expect(mockAxios).not.toHaveBeenCalled();
});
it('should throw on network error', async () => {
mockAxios.mockRejectedValue(new Error('ECONNREFUSED'));
await expect(getCodeOutputDownloadStream('s/f')).rejects.toThrow();
await expect(getCodeOutputDownloadStream('s/f', userIdentity)).rejects.toThrow();
});
});
@ -70,13 +142,15 @@ describe('Code CRUD', () => {
req: { user: { id: 'user-123' } },
stream: Readable.from(['file-content']),
filename: 'data.csv',
kind: 'user',
id: 'user-123',
};
it('should pass dedicated keepAlive:false agents to axios', async () => {
mockAxios.post.mockResolvedValue({
data: {
message: 'success',
session_id: 'sess-1',
storage_session_id: 'sess-1',
files: [{ fileId: 'fid-1', filename: 'data.csv' }],
},
});
@ -96,7 +170,7 @@ describe('Code CRUD', () => {
mockAxios.post.mockResolvedValue({
data: {
message: 'success',
session_id: 'sess-1',
storage_session_id: 'sess-1',
files: [{ fileId: 'fid-1', filename: 'data.csv' }],
},
});
@ -107,35 +181,106 @@ describe('Code CRUD', () => {
expect(callConfig.timeout).toBe(120000);
});
it('should return fileIdentifier on success', async () => {
it('should return { storage_session_id, file_id } on success', async () => {
mockAxios.post.mockResolvedValue({
data: {
message: 'success',
session_id: 'sess-1',
storage_session_id: 'sess-1',
files: [{ fileId: 'fid-1', filename: 'data.csv' }],
},
});
const result = await uploadCodeEnvFile(baseUploadParams);
expect(result).toBe('sess-1/fid-1');
expect(result).toEqual({ storage_session_id: 'sess-1', file_id: 'fid-1' });
});
it('should append entity_id query param when provided', async () => {
mockAxios.post.mockResolvedValue({
/* Phase C / option α (codeapi #1455): the upload wire carries the
* resource identity codeapi uses for sessionKey derivation. Without
* these on the form, codeapi falls back to user bucketing for every
* upload and skill-cache invalidation never fires. Validation runs
* client-side too so a bad caller fails fast instead of round-tripping
* a 400. */
describe('codeapi resource identity (kind/id/version)', () => {
const FormData = require('form-data');
const successResponse = {
data: {
message: 'success',
session_id: 'sess-1',
storage_session_id: 'sess-1',
files: [{ fileId: 'fid-1', filename: 'data.csv' }],
},
};
let appendSpy;
beforeEach(() => {
/* Spying on the prototype lets us assert form fields without
* materializing the multipart body `form.getBuffer()` would
* fail on the file-stream entry, but we don't care about the
* stream here, only the identity fields that ride beside it. */
appendSpy = jest.spyOn(FormData.prototype, 'append');
});
const result = await uploadCodeEnvFile({ ...baseUploadParams, entity_id: 'agent-42' });
expect(result).toBe('sess-1/fid-1?entity_id=agent-42');
afterEach(() => {
appendSpy.mockRestore();
});
const fieldsAppended = () =>
appendSpy.mock.calls
.filter((call) => typeof call[1] === 'string' || typeof call[1] === 'number')
.reduce((acc, [name, value]) => ({ ...acc, [name]: value }), {});
it('forwards kind, id, and (when skill) version on the multipart form', async () => {
mockAxios.post.mockResolvedValue(successResponse);
await uploadCodeEnvFile({
...baseUploadParams,
kind: 'skill',
id: 'skill-42',
version: 7,
});
expect(fieldsAppended()).toEqual({ kind: 'skill', id: 'skill-42', version: '7' });
});
it('omits version on the form for non-skill kinds', async () => {
mockAxios.post.mockResolvedValue(successResponse);
await uploadCodeEnvFile({ ...baseUploadParams, kind: 'agent', id: 'agent-9' });
const fields = fieldsAppended();
expect(fields).toEqual({ kind: 'agent', id: 'agent-9' });
expect(fields).not.toHaveProperty('version');
});
it('rejects unknown kind without dispatching to codeapi', async () => {
await expect(
uploadCodeEnvFile({ ...baseUploadParams, kind: 'system', id: 'x' }),
).rejects.toThrow(/invalid kind/);
expect(mockAxios.post).not.toHaveBeenCalled();
});
it('rejects skill upload without a version (mirrors codeapi validator)', async () => {
await expect(
uploadCodeEnvFile({ ...baseUploadParams, kind: 'skill', id: 'skill-42' }),
).rejects.toThrow(/skill.*version/);
expect(mockAxios.post).not.toHaveBeenCalled();
});
it('rejects version on non-skill kinds (mirrors codeapi validator)', async () => {
await expect(
uploadCodeEnvFile({
...baseUploadParams,
kind: 'agent',
id: 'agent-9',
version: 3,
}),
).rejects.toThrow(/version.*skill/);
expect(mockAxios.post).not.toHaveBeenCalled();
});
});
it('should throw when server returns non-success message', async () => {
mockAxios.post.mockResolvedValue({
data: { message: 'quota_exceeded', session_id: 's', files: [] },
data: { message: 'quota_exceeded', storage_session_id: 's', files: [] },
});
await expect(uploadCodeEnvFile(baseUploadParams)).rejects.toThrow('quota_exceeded');

View file

@ -16,6 +16,7 @@ const {
extractCodeArtifactText,
getExtractedTextFormat,
getStorageMetadata,
buildCodeEnvDownloadQuery,
} = require('@librechat/api');
const {
Tools,
@ -286,7 +287,7 @@ const runPreviewFinalize = ({ finalize, fileId, previewRevision, onResolved }) =
/**
* Process code execution output files downloads and saves both images
* and non-image files. All files are saved to local storage with
* `fileIdentifier` metadata for code env re-upload.
* `codeEnvRef` metadata for code env re-upload.
*
* Returns a two-part shape so callers can ship the attachment to the
* client immediately and run preview extraction in the background:
@ -334,9 +335,15 @@ const processCodeOutput = async ({
try {
const formattedDate = currentDate.toISOString();
/* Code-output files are always user-private no skill execution
* produces a skill-scoped output bucket. The download URL must
* carry `?kind=user&id=<userId>` so codeapi's `sessionAuth`
* resolves the matching `<tenant>:user:<userId>` sessionKey. See
* codeapi #1455 / Phase C. */
const downloadQuery = buildCodeEnvDownloadQuery({ kind: 'user', id: req.user.id });
const response = await axios({
method: 'get',
url: `${baseURL}/download/${session_id}/${id}`,
url: `${baseURL}/download/${session_id}/${id}${downloadQuery}`,
responseType: 'arraybuffer',
headers: {
'User-Agent': 'LibreChat/1.0',
@ -366,7 +373,15 @@ const processCodeOutput = async ({
};
}
const fileIdentifier = `${session_id}/${id}`;
/* Code-output files belong to the user who ran the execution.
* SessionKey on codeapi will be `<tenant>:user:<userId>` for these,
* so cache and access stay user-private. */
const codeEnvRef = {
kind: 'user',
id: req.user.id,
storage_session_id: session_id,
file_id: id,
};
/* `safeName` keeps the directory structure (`a/b/file.txt` -> `a/b/file.txt`)
* so the next prime() can place the file at the same nested path in the
@ -444,7 +459,7 @@ const processCodeOutput = async ({
updatedAt: formattedDate,
source: appConfig.fileStrategy,
context: FileContext.execute_code,
metadata: { fileIdentifier },
metadata: { codeEnvRef },
};
await createFile(file, true);
return { file: Object.assign(file, { messageId, toolCallId }) };
@ -542,7 +557,7 @@ const processCodeOutput = async ({
tenantId: req.user.tenantId,
bytes: buffer.length,
updatedAt: formattedDate,
metadata: { fileIdentifier },
metadata: { codeEnvRef },
source: appConfig.fileStrategy,
context: FileContext.execute_code,
usage: isUpdate ? (claimed.usage ?? 0) + 1 : 1,
@ -651,26 +666,31 @@ function checkIfActive(dateString) {
/**
* Retrieves the `lastModified` time string for a specified file from Code Execution Server.
*
* @param {string} fileIdentifier - The identifier for the file (e.g., "session_id/fileId").
* @param {import('librechat-data-provider').CodeEnvRef} ref - Typed pointer
* into codeapi storage. Carries kind/id/storage_session_id/file_id;
* codeapi resolves the sessionKey from the request's auth context.
*
* @returns {Promise<string|null>}
* A promise that resolves to the `lastModified` time string of the file if successful, or null if there is an
* error in initialization or fetching the info.
*/
async function getSessionInfo(fileIdentifier) {
async function getSessionInfo(ref) {
try {
const baseURL = getCodeBaseURL();
const [path, queryString] = fileIdentifier.split('?');
const [session_id, fileId] = path.split('/');
let queryParams = {};
if (queryString) {
queryParams = Object.fromEntries(new URLSearchParams(queryString).entries());
}
/* `/sessions/.../objects/...` is gated by codeapi's `sessionAuth`
* middleware (post-Phase C). The middleware reconstructs the
* sessionKey from the URL query (`kind`/`id`/`version?`) plus the
* requester's auth context, then matches it against the cached
* sessionKey on the storage bucket. We have the full `codeEnvRef`
* here, so pass kind+id (+version when skill) directly. */
const query = buildCodeEnvDownloadQuery({
kind: ref.kind,
id: ref.id,
...(ref.kind === 'skill' ? { version: ref.version } : {}),
});
const response = await axios({
method: 'get',
url: `${baseURL}/sessions/${session_id}/objects/${fileId}`,
params: queryParams,
url: `${baseURL}/sessions/${ref.storage_session_id}/objects/${ref.file_id}${query}`,
headers: {
'User-Agent': 'LibreChat/1.0',
},
@ -706,6 +726,15 @@ const primeFiles = async (options) => {
const agentResourceIds = new Set(file_ids);
const resourceFiles = tool_resources?.[EToolResources.execute_code]?.files ?? [];
/* Step 1 of the priming trace: input volume. Pair with the
* per-file `[primeCodeFiles] file=...` lines and the final
* `[primeCodeFiles] returned=...` line below to locate which
* layer drops a file the sandbox doesn't end up seeing. */
logger.debug(
`[primeCodeFiles] in: file_ids=${file_ids.length} resourceFiles=${resourceFiles.length}`,
{ agentId, file_ids, resourceFileIds: resourceFiles.map((f) => f?.file_id) },
);
// Get all files first
const allFiles = (await getFiles({ file_id: { $in: file_ids } }, null, { text: 0 })) ?? [];
@ -728,146 +757,195 @@ const primeFiles = async (options) => {
const sessions = new Map();
let toolContext = '';
/* Per-file path counters emitted at the bottom so a single
* grep on `[primeCodeFiles]` shows the input volume, the per-file
* paths taken, and the final dispatch summary in one trace. */
let skippedNoRef = 0;
let reuploadFailures = 0;
for (let i = 0; i < dbFiles.length; i++) {
const file = dbFiles[i];
if (!file) {
continue;
}
if (file.metadata.fileIdentifier) {
const [path, queryString] = file.metadata.fileIdentifier.split('?');
const [session_id, id] = path.split('/');
let queryParams = {};
if (queryString) {
queryParams = Object.fromEntries(new URLSearchParams(queryString).entries());
}
/**
* `pushFile` accepts optional overrides so the reupload path can
* push the FRESH `(session_id, id, entity_id)` parsed off the new
* `fileIdentifier`. Without these overrides, the closure would
* capture the stale pre-reupload refs from the outer loop and
* the in-memory `files` array (now consumed by
* `buildInitialToolSessions` to seed `Graph.sessions`) would
* point at a sandbox object that no longer exists. The DB record
* gets the new identifier via `updateFile`, but the seed would
* still inject the old one bash_tool / read_file would 404
* trying to mount the file until the next turn re-reads metadata.
*
* `entity_id` is forwarded so codeapi can resolve sessionKey
* per-file, allowing one execute to mix files uploaded under
* different entities (e.g. a skill bundle plus a user attachment).
*/
const pushFile = (overrideSessionId, overrideId, overrideEntityId) => {
if (!toolContext) {
toolContext = `- Note: The following files are available in the "${Tools.execute_code}" tool environment:`;
}
let fileSuffix = '';
if (!agentResourceIds.has(file.file_id)) {
fileSuffix =
file.context === FileContext.execute_code
? ' (from previous code execution)'
: ' (attached by user)';
}
const entity_id = overrideEntityId ?? queryParams.entity_id;
/* Surface the preview lifecycle so the LLM knows when a
* prior-turn artifact's rich preview didn't materialize. The
* file blob is always available (`processCodeOutput` persists
* it before returning), so the model can still tell the user
* "you can download it" even when the preview never resolved.
* Absent status means legacy or non-office render normally. */
let previewSuffix = '';
if (file.status === 'pending') {
previewSuffix = ' (preview not yet generated)';
} else if (file.status === 'failed') {
previewSuffix = file.previewError
? ` (preview unavailable: ${file.previewError})`
: ' (preview unavailable)';
}
toolContext += `\n\t- /mnt/data/${file.filename}${fileSuffix}${previewSuffix}`;
files.push({
id: overrideId ?? id,
session_id: overrideSessionId ?? session_id,
name: file.filename,
...(entity_id ? { entity_id } : {}),
});
};
if (sessions.has(session_id)) {
pushFile();
continue;
}
const reuploadFile = async () => {
try {
const { getDownloadStream } = getStrategyFunctions(file.source);
const { handleFileUpload: uploadCodeEnvFile } = getStrategyFunctions(
FileSources.execute_code,
);
const stream = await getDownloadStream(options.req, file.filepath);
const fileIdentifier = await uploadCodeEnvFile({
req: options.req,
stream,
filename: file.filename,
entity_id: queryParams.entity_id,
});
// Preserve existing metadata when adding fileIdentifier
const updatedMetadata = {
...file.metadata, // Preserve existing metadata (like S3 storage info)
fileIdentifier, // Add fileIdentifier
};
await updateFile({
file_id: file.file_id,
metadata: updatedMetadata,
});
/**
* Parse the FRESH fileIdentifier returned by the reupload and
* route it through both the dedupe Map and the in-memory
* `files` list. The original `(session_id, id)` parsed at the
* top of this iteration refer to the old, expired/missing
* sandbox object using them here would silently re-introduce
* the bug `Graph.sessions` seeding is supposed to fix.
*
* `entity_id` survives the round-trip: the upload was tagged
* with `queryParams.entity_id` above, so the new identifier
* carries the same scope.
*/
const [newPath, newQuery] = fileIdentifier.split('?');
const [newSessionId, newId] = newPath.split('/');
const newQueryParams = newQuery
? Object.fromEntries(new URLSearchParams(newQuery).entries())
: {};
sessions.set(newSessionId, true);
pushFile(newSessionId, newId, newQueryParams.entity_id);
} catch (error) {
logger.error(
`Error re-uploading file ${id} in session ${session_id}: ${error.message}`,
error,
);
}
};
const uploadTime = await getSessionInfo(file.metadata.fileIdentifier);
if (!uploadTime) {
logger.warn(`Failed to get upload time for file ${id} in session ${session_id}`);
await reuploadFile();
continue;
}
if (!checkIfActive(uploadTime)) {
await reuploadFile();
continue;
}
sessions.set(session_id, true);
pushFile();
const ref = file.metadata?.codeEnvRef;
if (!ref) {
skippedNoRef += 1;
logger.debug(
`[primeCodeFiles] file=${file.file_id} path=skip reason=no-codeenvref filename=${file.filename}`,
);
continue;
}
const session_id = ref.storage_session_id;
const id = ref.file_id;
/**
* `pushFile` accepts optional overrides so the reupload path can
* push the FRESH `(storage_session_id, file_id)` from the new
* `codeEnvRef`. Without these overrides, the closure would
* capture the stale pre-reupload refs from the outer loop and
* the in-memory `files` array (now consumed by
* `buildInitialToolSessions` to seed `Graph.sessions`) would
* point at a sandbox object that no longer exists. The DB record
* gets the new ref via `updateFile`, but the seed would still
* inject the old one bash_tool / read_file would 404 trying to
* mount the file until the next turn re-reads metadata.
*
* `kind`, `id`, `version` are preserved on the in-memory ref so
* codeapi can resolve sessionKey per-file (kind switch +
* tenant prefix from auth context).
*/
const pushFile = (overrideSessionId, overrideId) => {
if (!toolContext) {
toolContext = `- Note: The following files are available in the "${Tools.execute_code}" tool environment:`;
}
let fileSuffix = '';
if (!agentResourceIds.has(file.file_id)) {
fileSuffix =
file.context === FileContext.execute_code
? ' (from previous code execution)'
: ' (attached by user)';
}
/* Surface the preview lifecycle so the LLM knows when a
* prior-turn artifact's rich preview didn't materialize. The
* file blob is always available (`processCodeOutput` persists
* it before returning), so the model can still tell the user
* "you can download it" even when the preview never resolved.
* Absent status means legacy or non-office render normally. */
let previewSuffix = '';
if (file.status === 'pending') {
previewSuffix = ' (preview not yet generated)';
} else if (file.status === 'failed') {
previewSuffix = file.previewError
? ` (preview unavailable: ${file.previewError})`
: ' (preview unavailable)';
}
toolContext += `\n\t- /mnt/data/${file.filename}${fileSuffix}${previewSuffix}`;
/* `id` is the storage file_id (drives codeapi's upload-key
* existence check), `resource_id` is the entity that owns
* the storage session (drives sessionKey re-derivation). For
* code-output files this is `kind: 'user'` and `resource_id`
* is informational (codeapi ignores it for user kind), but
* we still send it for shape uniformity with shared kinds. */
files.push({
id: overrideId ?? id,
resource_id: ref.id,
storage_session_id: overrideSessionId ?? session_id,
name: file.filename,
kind: ref.kind,
...(ref.kind === 'skill' ? { version: ref.version } : {}),
});
};
if (sessions.has(session_id)) {
logger.debug(
`[primeCodeFiles] file=${file.file_id} path=cache-hit-by-session storage_session_id=${session_id}`,
);
pushFile();
continue;
}
const reuploadFile = async () => {
try {
const { getDownloadStream } = getStrategyFunctions(file.source);
const { handleFileUpload: uploadCodeEnvFile } = getStrategyFunctions(
FileSources.execute_code,
);
const stream = await getDownloadStream(options.req, file.filepath);
/* Reupload preserves the resource identity from the existing
* ref so codeapi re-buckets under the same sessionKey shape
* (skill stays skill, user stays user). Without this, a
* skill-cache-miss reupload would land in the user bucket
* and never re-shareable cross-user. */
const uploaded = await uploadCodeEnvFile({
req: options.req,
stream,
filename: file.filename,
kind: ref.kind,
id: ref.id,
...(ref.kind === 'skill' ? { version: ref.version } : {}),
});
/**
* Use the FRESH `(storage_session_id, file_id)` from the
* reupload response and route it through the dedupe Map, the
* persisted record, and the in-memory `files` list. The
* original ref captured at the top of this iteration refers
* to the old, expired/missing sandbox object using it here
* would silently re-introduce the bug `Graph.sessions`
* seeding is supposed to fix.
*
* `kind`, `id`, `version` survive the round-trip: the
* upload preserves the resource identity, only the storage
* pointer changes.
*/
const newRef = {
kind: ref.kind,
id: ref.id,
storage_session_id: uploaded.storage_session_id,
file_id: uploaded.file_id,
...(ref.kind === 'skill' ? { version: ref.version } : {}),
};
const updatedMetadata = {
...file.metadata,
codeEnvRef: newRef,
};
await updateFile({
file_id: file.file_id,
metadata: updatedMetadata,
});
sessions.set(newRef.storage_session_id, true);
pushFile(newRef.storage_session_id, newRef.file_id);
logger.debug(
`[primeCodeFiles] file=${file.file_id} path=reupload-success ` +
`oldSession=${session_id} newSession=${newRef.storage_session_id} newFileId=${newRef.file_id}`,
);
} catch (error) {
reuploadFailures += 1;
logger.error(
`[primeCodeFiles] file=${file.file_id} path=reupload-failed session=${session_id}: ${error.message}`,
error,
);
}
};
const uploadTime = await getSessionInfo(ref);
if (!uploadTime) {
logger.debug(
`[primeCodeFiles] file=${file.file_id} path=reupload reason=no-uploadtime ` +
`storage_session_id=${session_id}`,
);
await reuploadFile();
continue;
}
if (!checkIfActive(uploadTime)) {
logger.debug(
`[primeCodeFiles] file=${file.file_id} path=reupload reason=stale ` +
`uploadTime=${uploadTime} storage_session_id=${session_id}`,
);
await reuploadFile();
continue;
}
sessions.set(session_id, true);
logger.debug(
`[primeCodeFiles] file=${file.file_id} path=fresh-active storage_session_id=${session_id}`,
);
pushFile();
}
/* Dispatch summary emitted unconditionally so a single grep on
* `[primeCodeFiles] out` always shows the final state, not only
* the per-path trail leading up to it. */
logger.debug(
`[primeCodeFiles] out: returned=${files.length} ` +
`skippedNoRef=${skippedNoRef} reuploadFailures=${reuploadFailures}`,
);
return { files, toolContext };
};

View file

@ -85,6 +85,16 @@ jest.mock('@librechat/api', () => {
* if a case needs to assert the 'html' value. */
getExtractedTextFormat: (...args) => mockGetExtractedTextFormat(...args),
getStorageMetadata: jest.fn(() => ({})),
/* Identity helpers mirror codeapi's validator. The real impl
* lives in `packages/api/src/files/code/identity.ts` with its
* own dedicated `identity.spec.ts`; here we just stub the
* download-query builder since `processCodeOutput` calls it on
* every output download. */
buildCodeEnvDownloadQuery: jest.fn(({ kind, id, version }) => {
const params = new URLSearchParams({ kind, id });
if (version != null) params.set('version', String(version));
return `?${params.toString()}`;
}),
codeServerHttpAgent: new http.Agent({ keepAlive: false }),
codeServerHttpsAgent: new https.Agent({ keepAlive: false }),
};
@ -708,17 +718,68 @@ describe('Code Process', () => {
});
describe('metadata and file properties', () => {
it('should include fileIdentifier in metadata', async () => {
it('should include codeEnvRef in metadata with kind: user', async () => {
const smallBuffer = Buffer.alloc(100);
mockAxios.mockResolvedValue({ data: smallBuffer });
const { file: result } = await processCodeOutput(baseParams);
expect(result.metadata).toEqual({
fileIdentifier: 'session-123/file-id-123',
codeEnvRef: {
kind: 'user',
id: 'user-123',
storage_session_id: 'session-123',
file_id: 'file-id-123',
},
});
});
/* Phase C lock-in: outputs are ALWAYS user-scoped, never skill-scoped.
* Even when an execution turn invoked a skill (so input files were
* `kind: 'skill'` shared cross-user), the resulting output bucket
* tags `kind: 'user'` with the requesting user's id. This prevents
* cross-user leakage of artifacts a skill may have generated for
* one user each user gets their own output sessionKey on codeapi.
*
* Drift hazard: someone reading the simple user-derivation may
* later think "we should respect input kind for outputs too" and
* widen output scope to match input scope. This test pins the
* intentional asymmetry so that change requires updating the test
* (and re-reading the rationale). */
it('outputs are user-scoped regardless of which skill the execution invoked', async () => {
const smallBuffer = Buffer.alloc(100);
mockAxios.mockResolvedValue({ data: smallBuffer });
const userA = { ...mockReq, user: { id: 'user-A' } };
const userB = { ...mockReq, user: { id: 'user-B' } };
const { file: outputA } = await processCodeOutput({ ...baseParams, req: userA });
const { file: outputB } = await processCodeOutput({ ...baseParams, req: userB });
// Each user's output ref is keyed by their own user id. The
// `id` field tracks the requesting user, never the skill.
expect(outputA.metadata.codeEnvRef).toEqual({
kind: 'user',
id: 'user-A',
storage_session_id: 'session-123',
file_id: 'file-id-123',
});
expect(outputB.metadata.codeEnvRef).toEqual({
kind: 'user',
id: 'user-B',
storage_session_id: 'session-123',
file_id: 'file-id-123',
});
// No skill identity leaks into the output ref under any property.
const refA = outputA.metadata.codeEnvRef;
const refB = outputB.metadata.codeEnvRef;
expect(refA.kind).not.toBe('skill');
expect(refB.kind).not.toBe('skill');
expect(refA).not.toHaveProperty('version');
expect(refB).not.toHaveProperty('version');
});
it('should set correct context for code-generated files', async () => {
const smallBuffer = Buffer.alloc(100);
mockAxios.mockResolvedValue({ data: smallBuffer });
@ -934,7 +995,12 @@ describe('Code Process', () => {
data: [{ name: 'sess/fid', lastModified: new Date().toISOString() }],
});
await getSessionInfo('sess/fid', 'api-key');
await getSessionInfo({
kind: 'user',
id: 'user-1',
storage_session_id: 'sess',
file_id: 'fid',
});
const callConfig = mockAxios.mock.calls[0][0];
expect(callConfig.httpAgent).toBe(codeServerHttpAgent);
@ -1511,8 +1577,8 @@ describe('Code Process', () => {
* `getStrategyFunctions(FileSources.execute_code)` for the code-env
* upload both go through the same factory in production.
*/
function setupReuploadMocks(newFileIdentifier) {
const handleFileUpload = jest.fn().mockResolvedValue(newFileIdentifier);
function setupReuploadMocks(newRef) {
const handleFileUpload = jest.fn().mockResolvedValue(newRef);
const getDownloadStream = jest.fn().mockResolvedValue('mock-stream');
getStrategyFunctions.mockImplementation((source) => {
if (source === 'execute_code') return { handleFileUpload };
@ -1526,7 +1592,7 @@ describe('Code Process', () => {
return { handleFileUpload, getDownloadStream };
}
it('seed receives FRESH session_id + id parsed off the new fileIdentifier on reupload', async () => {
it('seed receives FRESH (storage_session_id, file_id) from the reupload response', async () => {
const dbFile = {
file_id: 'librechat-file-id',
filename: 'sentinel.txt',
@ -1535,12 +1601,17 @@ describe('Code Process', () => {
context: 'execute_code',
metadata: {
/* Stale sandbox ref — this is what `getSessionInfo` will 404 on. */
fileIdentifier: 'OLD_SESSION/OLD_ID',
codeEnvRef: {
kind: 'user',
id: 'user-123',
storage_session_id: 'OLD_SESSION',
file_id: 'OLD_ID',
},
},
};
getFiles.mockResolvedValue([dbFile]);
setupReuploadMocks('NEW_SESSION/NEW_ID');
setupReuploadMocks({ storage_session_id: 'NEW_SESSION', file_id: 'NEW_ID' });
const result = await primeFiles({
req: { user: { id: 'user-123', role: 'USER' } },
@ -1553,22 +1624,82 @@ describe('Code Process', () => {
// The seed list (consumed by buildInitialToolSessions) MUST carry
// the post-reupload ids — not the stale pre-reupload ones.
expect(result.files).toEqual([
{ id: 'NEW_ID', session_id: 'NEW_SESSION', name: 'sentinel.txt' },
{
id: 'NEW_ID',
/* `resource_id` carries the codeEnvRef.id (= original
* userId for kind: 'user'), threaded onto the in-memory
* file ref for codeapi's sessionKey re-derivation. */
resource_id: 'user-123',
storage_session_id: 'NEW_SESSION',
name: 'sentinel.txt',
kind: 'user',
},
]);
});
it('persists the new fileIdentifier on the DB record (existing behavior, regression-locked)', async () => {
/* Phase C / option α (codeapi #1455): reupload preserves the
* resource identity from the existing ref so codeapi re-buckets
* under the same sessionKey shape. Without this, a skill-cache-miss
* reupload lands in the user bucket and is no longer cross-user
* shareable. */
it('reupload forwards kind/id (and version when skill) from the existing ref', async () => {
const dbFile = {
file_id: 'librechat-file-id',
filename: 'sentinel.txt',
filepath: '/uploads/sentinel.txt',
source: 'local',
context: 'execute_code',
metadata: { fileIdentifier: 'OLD_SESSION/OLD_ID' },
metadata: {
codeEnvRef: {
kind: 'skill',
id: 'skill-99',
storage_session_id: 'OLD_SESSION',
file_id: 'OLD_ID',
version: 4,
},
},
};
getFiles.mockResolvedValue([dbFile]);
setupReuploadMocks('NEW_SESSION/NEW_ID');
const { handleFileUpload } = setupReuploadMocks({
storage_session_id: 'NEW_SESSION',
file_id: 'NEW_ID',
});
await primeFiles({
req: { user: { id: 'user-123', role: 'USER' } },
tool_resources: {
execute_code: { file_ids: ['librechat-file-id'], files: [] },
},
agentId: 'agent-id',
});
expect(handleFileUpload).toHaveBeenCalledTimes(1);
const uploadArgs = handleFileUpload.mock.calls[0][0];
expect(uploadArgs.kind).toBe('skill');
expect(uploadArgs.id).toBe('skill-99');
expect(uploadArgs.version).toBe(4);
});
it('persists fresh codeEnvRef (kind/id preserved) on the DB record after reupload', async () => {
const dbFile = {
file_id: 'librechat-file-id',
filename: 'sentinel.txt',
filepath: '/uploads/sentinel.txt',
source: 'local',
context: 'execute_code',
metadata: {
codeEnvRef: {
kind: 'user',
id: 'user-123',
storage_session_id: 'OLD_SESSION',
file_id: 'OLD_ID',
},
},
};
getFiles.mockResolvedValue([dbFile]);
setupReuploadMocks({ storage_session_id: 'NEW_SESSION', file_id: 'NEW_ID' });
await primeFiles({
req: { user: { id: 'user-123', role: 'USER' } },
@ -1581,10 +1712,62 @@ describe('Code Process', () => {
expect(updateFile).toHaveBeenCalledWith(
expect.objectContaining({
file_id: 'librechat-file-id',
metadata: expect.objectContaining({ fileIdentifier: 'NEW_SESSION/NEW_ID' }),
metadata: expect.objectContaining({
codeEnvRef: {
kind: 'user',
id: 'user-123',
storage_session_id: 'NEW_SESSION',
file_id: 'NEW_ID',
},
}),
}),
);
});
it('reads codeEnvRef directly when present (skipping reupload)', async () => {
const dbFile = {
file_id: 'librechat-file-id',
filename: 'sentinel.txt',
filepath: '/uploads/sentinel.txt',
source: 'local',
context: 'execute_code',
metadata: {
codeEnvRef: {
kind: 'user',
id: 'user-123',
storage_session_id: 'STRUCT_SESSION',
file_id: 'STRUCT_ID',
},
},
};
getFiles.mockResolvedValue([dbFile]);
filterFilesByAgentAccess.mockImplementation(({ files }) => Promise.resolve(files));
// getSessionInfo returns a fresh timestamp so reupload is skipped.
mockAxios.mockResolvedValue({ data: { lastModified: new Date().toISOString() } });
const result = await primeFiles({
req: { user: { id: 'user-123', role: 'USER' } },
tool_resources: {
execute_code: { file_ids: ['librechat-file-id'], files: [] },
},
agentId: 'agent-id',
});
expect(updateFile).not.toHaveBeenCalled();
expect(result.files).toEqual([
{
id: 'STRUCT_ID',
/* `resource_id` from the persisted codeEnvRef.id for
* `kind: 'user'` this is informational (codeapi derives
* sessionKey from auth context) but threaded for shape
* uniformity with shared kinds. */
resource_id: 'user-123',
storage_session_id: 'STRUCT_SESSION',
name: 'sentinel.txt',
kind: 'user',
},
]);
});
});
describe('primeFiles toolContext surfaces preview status to the LLM', () => {
@ -1606,7 +1789,14 @@ describe('Code Process', () => {
filepath: `/uploads/${overrides.status ?? 'ready'}.xlsx`,
source: 'local',
context: 'execute_code',
metadata: { fileIdentifier: 'CURRENT_SESSION/CURRENT_ID' },
metadata: {
codeEnvRef: {
kind: 'user',
id: 'user-123',
storage_session_id: 'CURRENT_SESSION',
file_id: 'CURRENT_ID',
},
},
...overrides,
};
}

View file

@ -569,13 +569,44 @@ const processAgentFileUpload = async ({ req, res, metadata }) => {
}
const { handleFileUpload: uploadCodeEnvFile } = getStrategyFunctions(FileSources.execute_code);
const stream = fs.createReadStream(file.path);
const fileIdentifier = await uploadCodeEnvFile({
/* Resource identity for codeapi's sessionKey:
* - chat attachments (messageAttachment=true): `kind: 'user'`, codeapi
* buckets under `<tenant>:user:<authContext.userId>` regardless of `id`.
* - agent setup files (messageAttachment=false): `kind: 'agent'`, shared
* per agent identity. `id` carries the agent id. */
const codeKind = messageAttachment === true ? 'user' : 'agent';
const codeId = messageAttachment === true ? req.user.id : agent_id;
/* Upload under the same sanitized filename LC stores in its DB
* (`fileInfo.filename` below uses `sanitizeFilename(originalname)`).
* Codeapi/file_server use this as the on-disk name in the sandbox
* `/mnt/data/<filename>` and `primeFiles`'s `toolContext` text
* + `_injected_files.name` both reference `file.filename`. Sending
* the unsanitized `file.originalname` here makes the sandbox path
* (with spaces / special chars) drift from what LC tells the model
* is available, causing FileNotFoundError on the first reference. */
const sandboxFilename = sanitizeFilename(file.originalname);
const uploaded = await uploadCodeEnvFile({
req,
stream,
filename: file.originalname,
entity_id,
filename: sandboxFilename,
kind: codeKind,
id: codeId,
});
fileInfoMetadata = { fileIdentifier };
/* Persist under the structured `codeEnvRef` shape the only key the
* post-cutover schema (`metadata.codeEnvRef`) and downstream readers
* (`primeFiles`, `getCodeFilesByIds`, `categorizeFileForToolResources`,
* controller filtering) accept. Storing under the legacy
* `fileIdentifier` key would be silently dropped by mongoose strict
* mode and the file would lose its sandbox reference on subsequent
* priming turns. */
fileInfoMetadata = {
codeEnvRef: {
kind: codeKind,
id: codeId,
storage_session_id: uploaded.storage_session_id,
file_id: uploaded.file_id,
},
};
} else if (tool_resource === EToolResources.file_search) {
const isFileSearchEnabled = await checkCapability(req, AgentCapabilities.file_search);
if (!isFileSearchEnabled) {

View file

@ -346,6 +346,128 @@ describe('processAgentFileUpload', () => {
).resolves.not.toThrow();
});
});
/* Phase C / option α regression: the upload must persist its sandbox
* pointer under `metadata.codeEnvRef` (the post-cutover schema). The
* legacy `metadata.fileIdentifier` key is silently stripped by mongoose
* strict mode and downstream readers (`primeFiles`, `getCodeFilesByIds`,
* `categorizeFileForToolResources`, controller filtering) only check
* `codeEnvRef`. Storing under the legacy key would orphan the file
* priming would skip it on subsequent code-execution turns and the
* sandbox copy would never re-mount. */
describe('execute_code uploads persist codeEnvRef metadata', () => {
const fs = require('fs');
const { Readable } = require('stream');
let createReadStreamSpy;
beforeEach(() => {
/* `processAgentFileUpload` opens the multer-staged temp file via
* `fs.createReadStream`. The test fixture path doesn't exist, so
* stub it to a tiny in-memory stream. */
createReadStreamSpy = jest
.spyOn(fs, 'createReadStream')
.mockImplementation(() => Readable.from(Buffer.from('')));
});
afterEach(() => {
createReadStreamSpy.mockRestore();
});
const setupCodeEnvUpload = (uploaded) => {
/* `processAgentFileUpload` calls `getStrategyFunctions` twice:
* once with `execute_code` for the codeapi upload, then again with
* the on-disk strategy (`local`) for the standard storage step that
* runs in the same flow. Both must return a working
* `handleFileUpload`. */
const codeEnvUpload = jest.fn().mockResolvedValue(uploaded);
const localUpload = jest.fn().mockResolvedValue({
bytes: 0,
filename: 'upload.bin',
filepath: '/uploads/upload.bin',
});
getStrategyFunctions.mockImplementation((src) =>
src === FileSources.execute_code
? { handleFileUpload: codeEnvUpload }
: { handleFileUpload: localUpload, saveBuffer: jest.fn() },
);
return codeEnvUpload;
};
it('persists kind:user codeEnvRef for chat attachments (messageAttachment=true)', async () => {
setupCodeEnvUpload({ storage_session_id: 'sess-1', file_id: 'fid-1' });
const req = makeReq();
await processAgentFileUpload({
req,
res: mockRes,
metadata: {
agent_id: 'agent-abc',
tool_resource: EToolResources.execute_code,
file_id: 'file-uuid',
message_file: true,
},
});
expect(db.createFile).toHaveBeenCalledWith(
expect.objectContaining({
metadata: {
codeEnvRef: {
kind: 'user',
id: 'user-123',
storage_session_id: 'sess-1',
file_id: 'fid-1',
},
},
}),
true,
);
});
it('persists kind:agent codeEnvRef for agent setup files (messageAttachment=false)', async () => {
setupCodeEnvUpload({ storage_session_id: 'sess-2', file_id: 'fid-2' });
const req = makeReq();
await processAgentFileUpload({
req,
res: mockRes,
metadata: {
agent_id: 'agent-abc',
tool_resource: EToolResources.execute_code,
file_id: 'file-uuid',
},
});
expect(db.createFile).toHaveBeenCalledWith(
expect.objectContaining({
metadata: {
codeEnvRef: {
kind: 'agent',
id: 'agent-abc',
storage_session_id: 'sess-2',
file_id: 'fid-2',
},
},
}),
true,
);
});
it('does not persist legacy fileIdentifier key (mongoose strict drops it)', async () => {
setupCodeEnvUpload({ storage_session_id: 'sess-3', file_id: 'fid-3' });
const req = makeReq();
await processAgentFileUpload({
req,
res: mockRes,
metadata: {
agent_id: 'agent-abc',
tool_resource: EToolResources.execute_code,
file_id: 'file-uuid',
message_file: true,
},
});
const persisted = db.createFile.mock.calls[0][0];
expect(persisted.metadata).not.toHaveProperty('fileIdentifier');
});
});
});
describe('processFileURL', () => {