From 93c4ef4ba87edf05c3370360609b8f03e4a2fff2 Mon Sep 17 00:00:00 2001 From: Danny Avila Date: Fri, 8 May 2026 10:17:52 -0400 Subject: [PATCH] =?UTF-8?q?=F0=9F=A7=B1=20refactor:=20typed=20CodeEnvRef?= =?UTF-8?q?=20+=20kind=20discriminator=20+=20principal-aware=20sandbox=20c?= =?UTF-8?q?ache=20(#12960)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * 🧱 refactor: typed CodeEnvRef + kind discriminator + tenant-aware sandbox cache Final cutover for the LibreChat ↔ codeapi sandbox file identity. Replaces the magic string `${session_id}/${file_id}?entity_id=...` with a typed, discriminated `CodeEnvRef`. Pre-release lockstep deploy with codeapi #1455 and agents #148; no legacy aliases retained. ## Final shape ```ts type CodeEnvRef = | { kind: 'skill'; id: string; storage_session_id: string; file_id: string; version: number } | { kind: 'agent'; id: string; storage_session_id: string; file_id: string } | { kind: 'user'; id: string; storage_session_id: string; file_id: string }; ``` `kind` drives codeapi's sessionKey: `::[:v:]` for shared kinds, `:user:` for user-private (auth context provides `userId`). `version` is statically required for `kind: 'skill'` and forbidden otherwise via discriminated union β€” constraint holds at compile time on every consumer, not just codeapi's runtime validator. `id` is sessionKey-meaningful for `'skill'` / `'agent'`; informational only for `'user'` (codeapi resolves user identity from auth context). ## What changed - `packages/data-provider/src/codeEnvRef.ts` β€” discriminated union + `CODE_ENV_KINDS` const-tuple keeps the runtime list and TS union locked together. - Schemas: `metadata.codeEnvRef` and `SkillFile.codeEnvRef` enums tightened to `['skill', 'agent', 'user']`. - `primeSkillFiles` writes `kind: 'skill'`, `id: skill._id`, `version: skill.version`. Cache-hit path reads `codeEnvRef` directly. Bumping `skill.version` on edit naturally invalidates the prior cache entry under the new sessionKey. - `processCodeOutput` writes `kind: 'user'`, `id: req.user.id`. Output bucket is always user-scoped, regardless of which skill the execution invoked. New regression test pins the asymmetry. - `primeFiles` reupload preserves `kind`/`id`/`version?` from the existing ref so a skill-cache-miss reupload doesn't silently demote to user bucket. - `crud.js` upload functions (`uploadCodeEnvFile` / `batchUploadCodeEnvFiles`) thread `kind`/`id`/`version?` to the multipart form (codeapi #1455 option Ξ±). Without these on the wire, codeapi falls back to user bucketing and skill-cache invalidation never fires. Client-side validation mirrors codeapi's validator. - `Files/process.js` β€” chat attachments use `kind: 'user'`; agent setup files use `kind: 'agent'`. - Drops `entity_id` everywhere (struct, schema sub-docs, write paths, upload form fields). Drops `'system'` from the kind enum (no emitter ever existed). ## Test plan - [x] `cd packages/data-provider && npx jest src/codeEnvRef.spec` β€” 4 / 4 - [x] `cd packages/data-schemas && npx jest` β€” 1447 / 1447 - [x] `cd packages/api && npx jest src/agents` β€” 81 / 81 in skillFiles + handlers + resources - [x] `cd api && npx jest server/services/Files server/controllers/agents` β€” 436 / 436 - [x] `cd api && npx jest server/services/Files/Code` β€” 98 / 98 (incl. new "outputs are user-scoped regardless of which skill the execution invoked" regression and "reupload forwards kind/id/version from existing ref") - [x] `npx tsc --noEmit -p packages/data-{provider,schemas}/tsconfig.json && npx tsc --noEmit -p packages/api/tsconfig.json` β€” clean (only pre-existing unrelated dev errors in storage/balance, untouched here) ## Deploy notes - **24h cache-miss burst** on first deploy. Inputs (skill caches re-prime under new sessionKey shape) and outputs (any pre-Phase C skill-output cached files become unreadable). Bounded by codeapi's 24h TTL. - **Lockstep with codeapi #1455 and agents #148.** Either repo can land first since no aliases to drain, but the three deploys must overlap within the same maintenance window. - **`@librechat/agents` bump to `3.1.79-dev.0`** required after agents #148 lands and is published. ## What this enables Auth bridge work (JWT-based tenant/user identity between LC and codeapi) β€” codeapi now derives sessionKey purely from `req.codeApiAuthContext.{ tenantId, userId}`, so the next chapter is replacing the header-asserted user identity with a verified-claim path. * 🩹 fix: persist execute_code uploads under codeEnvRef metadata key Codex review P1 (chatgpt-codex-connector). `Files/process.js` was storing the upload result under `metadata.fileIdentifier` even though: - `uploadCodeEnvFile` now returns `{ storage_session_id, file_id }`, not the legacy magic string. - The post-cutover schema (`File.metadata.codeEnvRef`) only declares `codeEnvRef` β€” mongoose strict mode silently strips unknown keys. - All readers (`primeFiles`, `getCodeFilesByIds`, `categorizeFileForToolResources`, controller filtering) check `metadata.codeEnvRef`. Net effect of the bug: chat-attached and agent-setup execute_code files would lose their sandbox reference on save, and primeFiles would skip them on subsequent code-execution turns β€” the file blob would still be available locally but never re-mounted in the sandbox. Fix: construct the full `CodeEnvRef` (`{ kind, id, storage_session_id, file_id }`) at the write site and persist under `metadata.codeEnvRef`. `BaseClient`'s "is this a code-env file" presence check accepts the new shape alongside the legacy `fileIdentifier` for back-compat with any pre-cutover records still in the database. Mirrors the same change in `processAttachments.spec.ts` (which re-implements the BaseClient logic for testability). New regression tests in `process.spec.js` cover three cases: - chat attachments (`messageAttachment=true`) β†’ `kind: 'user'` - agent setup (`messageAttachment=false`) β†’ `kind: 'agent'` - legacy `fileIdentifier` key is NOT persisted (would be schema-stripped) * 🩹 fix: read storage_session_id on primed file refs (Codex P1) Codex review (chatgpt-codex-connector). After Phase B's per-file `session_id` β†’ `storage_session_id` rename, `primeFiles` emits the new field β€” but `seedCodeFilesIntoSessions` was still reading `files[0].session_id` for the representative session and `f.session_id` for the dedupe key. In runs with only primed attachments (no skill seed), `representativeSessionId` was `undefined`, the function returned the unchanged map, and `seedCodeFilesIntoSessions` silently dropped the entire batch. The first `execute_code` call then started without `_injected_files` and the agent couldn't see prior-turn artifacts. Fix: - `codeFilesSession.ts`: read `f.storage_session_id` for both the dedupe key and the representative session id. JSDoc updated to match the new field name. - `callbacks.js`: the two output-file persistence paths read `file.session_id` to pass to `processCodeOutput` β€” switch to `file.storage_session_id`. The original comment explicitly says this should be the STORAGE session, which is exactly the field Phase B renamed. - `codeFilesSession.spec.ts`: fixture builder uses `storage_session_id` and `kind: 'user'` to match the post-cutover `CodeEnvFile` shape. Lockstep coordination: this matches the post-bump shape of `@librechat/agents` 3.1.79+. CI tsc errors against the currently-pinned 3.1.78 are expected and resolve when the dep bumps in this PR before merge. * πŸ“¦ chore: Bump `@librechat/agents` to version 3.1.80-dev.0 in package-lock and package.json files * πŸͺͺ fix: thread kind/id/version through codeapi /download URLs (Phase C Ξ±) Symmetric fix for the upload-side wire change in 537725a. Codeapi's `sessionAuth` middleware now requires `kind`/`id`/`version?` on every download/freshness URL β€” without them it 400s with "kind must be one of: skill, agent, user" before serving the file. Three sites construct codeapi-side URLs that go through `sessionAuth`: - `processCodeOutput` (`Files/Code/process.js`): `/download//` for freshly-generated sandbox outputs. Always `kind: 'user'` + `id: req.user.id` β€” code-output files are always user-private, regardless of which skill the run invoked. - `getSessionInfo` (`Files/Code/process.js`): `/sessions//objects/` for the 23h freshness check. Pulls kind/id/version straight off the `codeEnvRef` already in scope β€” skill files stay skill-bucketed, user files stay user-bucketed. - `/code/download/:session_id/:fileId` LC route (`routes/files/files.js`): proxies to codeapi for manual downloads. Code-output files only on this route, so `kind: 'user'` + `id: req.user.id`. The `getCodeOutputDownloadStream` helper in `crud.js` now takes an `identity` param, validated by a `buildCodeEnvDownloadQuery` helper that mirrors `appendCodeEnvFileIdentity`'s shape rules: kind required from the closed `{skill, agent, user}` set, version required for 'skill' and forbidden otherwise. Bad callers fail fast on the client instead of round-tripping a 400. Also cleans up two log-noise sources reported alongside the 400: - `logAxiosError` in `packages/api/src/utils/axios.ts` was dumping `error.response.data` raw. With `responseType: 'arraybuffer'` that's a `Buffer` (~4 chars per byte after JSON-serialization); with `responseType: 'stream'` it's a `Readable` whose internal state serializes the entire ring buffer + socket. New `renderResponseData` decodes small buffers as UTF-8 (truncated past 2KB) and stubs streams as `'[stream]'`. Diagnostics stay useful, log lines stop being megabytes. - `/code/download` route's catch was bare `logger.error('...', error)`, bypassing the redactor. Switched to `logAxiosError` so it benefits from the same buffer/stream handling. Tests updated to match the new contract: - crud.spec: `getCodeOutputDownloadStream` fixtures pass `userIdentity`; new cases cover skill identity (with version), bad kind rejection, skill-without-version rejection. - process.spec: `getSessionInfo` test passes a full `codeEnvRef` object. * ♻️ refactor: extract codeEnv identity helpers into packages/api Per the project convention that new backend code lives in TypeScript under `packages/api`, moves `appendCodeEnvFileIdentity` and `buildCodeEnvDownloadQuery` from `api/server/services/Files/Code/crud.js` into a new `packages/api/src/files/code/identity.ts` module. Both helpers are pure validators that mirror codeapi's `parseUploadSessionKeyInput` server-side rules (closed kind set, `version` required for `'skill'` and forbidden otherwise) β€” they deserve TS support and a dedicated spec rather than living as JSDoc-typed helpers in the legacy `/api` workspace. The new module: - Exports a `CodeEnvIdentity` interface using the `librechat-data-provider` `CodeEnvKind` discriminated union. - Adds 13 unit tests in `identity.spec.ts` covering the validation matrix (skill+version, agent, user, and every rejection path) plus URL encoding for the download query. - Re-exported from `packages/api/src/files/code/index.ts` alongside `classify`, `extract`, and `form`. Consumer updates: - `api/server/services/Files/Code/crud.js`: drops the local helpers and imports them from `@librechat/api`. Net -64 lines. - `api/server/services/Files/Code/process.js`: same. - Test mocks for `@librechat/api` in three spec files now stub the helpers' validation behavior locally rather than pulling them through `requireActual` (which would drag in provider-config init-time side effects). The package's `exports` field only surfaces the root barrel, so leaf imports aren't reachable from legacy `/api` test setup. No runtime behavior change. Identity validation rules and emitted form/query shapes are byte-for-byte identical pre/post. * πŸͺͺ fix: emit resource_id alongside id on _injected_files (skill 403 fix) Companion to codeapi #1455 fix and agents 3.1.80-dev.1 β€” the wire shape for shared-kind files now requires `resource_id` distinct from the storage `id`. Without this LC change, codeapi's sessionKey re-derivation on every shared-kind /exec rejects with 403 session_key_mismatch: cached: legacy:skill:69dcf561...:v:59 (signed at upload, skill _id) derived: legacy:skill:ysPwEURuPk-...:v:59 (storage nanoid) Emit sites updated: - `primeInvokedSkills` cache-hit path: `resource_id: ref.id` (the persisted skill `_id` from `codeEnvRef.id`); `id: ref.file_id` unchanged (storage uuid). - `primeInvokedSkills` fresh-upload path: `resource_id: skill._id.toString()` on every primed file (the `allPrimedFiles` builder type now carries the field). - `processCodeOutput`'s `pushFile` (Code/process.js): `resource_id: ref.id` β€” for `kind: 'user'` this is informational (codeapi derives sessionKey from auth context) but emitted for shape uniformity with shared kinds. Bumps `@librechat/agents` to `^3.1.80-dev.1` (the version that ships the matching `CodeEnvFile.resource_id` field). ## Test plan - [x] `cd packages/api && npx jest src/agents` β€” 67 / 67 pass (skillFiles fixtures updated to assert `resource_id` on the emitted CodeSessionContext.files). - [x] `cd api && npx jest server/services/Files server/controllers/agents` β€” 445 / 445 pass (process.spec fixtures updated for the reupload + cache-hit emission). - [x] `npx tsc --noEmit -p packages/api/tsconfig.json` β€” clean. * fix(skill-tool-call): carry resource_id through primeSkillFiles β†’ artifact Codeapi was 400ing every /exec following a `handle_skill` tool call with `resource_id is invalid` (`type: 'undefined'`). Both code paths in `primeSkillFiles` (cache-hit + fresh-upload) returned files without `resource_id`/`kind`/`version`, and the artifact in `handlers.ts` forwarded the stripped shape into `tc.codeSessionContext.files` β†’ `_injected_files`. `primeInvokedSkills` (the NL-detected loader) had already been fixed end-to-end; this commit aligns the tool-invoked path with the same contract: `resource_id` = `skill._id.toString()`, `kind: 'skill'`, `version` = the skill's monotonic counter. Tests added to `skillFiles.spec.ts` lock the contract on `primeSkillFiles` directly so future refactors can't silently drop the resource identity again. * fix(handlers.spec): align session_id β†’ storage_session_id rename + kind discriminator Pre-existing TS errors against the post-rename `CodeEnvFile` shape: the test file still used `session_id` on per-file objects (renamed to `storage_session_id` in agents Phase B/C) and was missing the `kind` discriminator the discriminated union requires. Both inputs and the matching `expect.toEqual(...)` mirrors updated together so the runtime equality check still holds. Lines 723-732 stay as-is β€” they sit behind `as unknown as ToolCallRequest` and TS already skipped them. * chore: fix `@librechat/agents`, correct version to 3.1.80-dev.0 in package.json files * chore: bump `@librechat/agents` to version 3.1.80-dev.1 in package.json and package-lock.json * chore: bump `@librechat/agents` to version 3.1.80-dev.2 * feat(observability): trace file priming chain from primeCodeFiles to _injected_files Diagnosing the user-upload "files=[] on first /exec" bug requires seeing where in the LC chain a file ref disappears. Prior to this patch the chain (primeCodeFiles β†’ primedCodeFiles β†’ initialSessions β†’ CodeSessionContext β†’ _injected_files) was opaque end-to-end: - primeCodeFiles silently dropped files without `metadata.codeEnvRef` - reuploadFile catches all errors and continues with no signal - the handlers.ts handoff to codeapi never logged what it was sending After this patch, a single grep on `[primeCodeFiles]` plus `[code-env:inject]` shows the full per-file path: [primeCodeFiles] in: file_ids=N resourceFiles=M [primeCodeFiles] file= path=skip reason=no-codeenvref filename=... [primeCodeFiles] file= path=cache-hit-by-session storage_session_id=... [primeCodeFiles] file= path=reupload reason=no-uploadtime ... [primeCodeFiles] file= path=reupload reason=stale ... [primeCodeFiles] file= path=reupload-success oldSession=... newSession=... newFileId=... [primeCodeFiles] file= path=reupload-failed session=... [primeCodeFiles] file= path=fresh-active storage_session_id=... [primeCodeFiles] out: returned=N skippedNoRef=M reuploadFailures=K [code-env:inject] tool= files=N missingResourceId=K (debug) [code-env:inject] M/N files missing resource_id ... (warn) [code-env:inject] tool= _injected_files=0 ... (warn) The boundary log warns when LC sends zero injected files on a code-execution tool call β€” that's the user's actual symptom showing up at the LC side instead of having to correlate against codeapi's `Request received { files: [] }`. Tag chosen as `[code-env:inject]` rather than `[handoff:exec]` to avoid collision with the app-level "handoff" semantic (subagent handoff workflow). Structural cleanup in primeFiles: replaced the `if (ref) { ... }` nesting with an early `if (!ref) continue` so the per-path instrumentation hooks land at top-level scope instead of indented inside a conditional. Behavior unchanged; pushFile / reuploadFile identical. Spec fixtures (handlers.spec.ts, codeFilesSession.spec.ts) updated to include `resource_id` on `CodeEnvFile` literals β€” required by the post-3.1.80-dev.2 type now installed. ## Test plan - [x] `cd packages/api && npx jest src/agents/handlers.spec.ts src/agents/codeFilesSession.spec.ts src/agents/skillFiles.spec.ts` β€” 69/69 pass - [x] `cd api && npx jest server/services/Files/Code/process.spec.js` β€” 84/84 pass - [x] `npx tsc --noEmit -p packages/api` β€” clean - [x] `npx eslint` on all four touched files β€” clean * chore: add CONSOLE_JSON_STRING_LENGTH to .env.example for JSON log string length configuration * fix(files): align codeapi upload filename with LC's sanitized DB filename User-attached files for code execution were uploading to codeapi under `file.originalname` (raw upload filename, may contain spaces / special chars) while LC's DB record stored the sanitized form (`sanitizeFilename(file.originalname)`, underscores). Codeapi preserves whatever filename the upload sent, so the sandbox saw `/mnt/data/` while LC's `primeFiles` toolContext text + `_injected_files.name` referenced `file.filename` (sanitized). Visible failure: agent gets system prompt saying /mnt/data/librechat_code_api_-_active_customer_-_2025-11-05.xlsx …tries that path, hits `FileNotFoundError`, then notices the sandbox's actual `Available files` line says /mnt/data/librechat code api - active customer - 2025-11-05.xlsx …retries with spaces, succeeds. Wastes a tool call per upload and leaks raw filenames into model context. Fix: sanitize once and use the sanitized form in both the codeapi upload AND the LC DB record. Sandbox path = LC toolContext text = in-memory ref name. No drift. Reupload path (`Code/process.js` line 867 `filename: file.filename`) already uses the sanitized DB name, so it stays consistent with the fresh-upload path after this change. ## Test plan - [x] `cd api && npx jest server/services/Files/process` β€” 32/32 pass - [x] `npx eslint` on the touched file β€” clean * chore: bump `@librechat/agents` to version 3.1.80-dev.3 in package.json and package-lock.json --- .env.example | 4 + api/app/clients/BaseClient.js | 6 +- api/package.json | 2 +- api/server/controllers/agents/callbacks.js | 30 +- api/server/controllers/agents/client.js | 2 +- api/server/routes/files/files.js | 19 +- .../Code/__tests__/process-traversal.spec.js | 9 + api/server/services/Files/Code/crud.js | 74 ++-- api/server/services/Files/Code/crud.spec.js | 175 +++++++- api/server/services/Files/Code/process.js | 372 +++++++++++------- .../services/Files/Code/process.spec.js | 218 +++++++++- api/server/services/Files/process.js | 39 +- api/server/services/Files/process.spec.js | 122 ++++++ package-lock.json | 10 +- packages/api/package.json | 2 +- .../api/src/agents/codeFilesSession.spec.ts | 30 +- packages/api/src/agents/codeFilesSession.ts | 40 +- packages/api/src/agents/handlers.spec.ts | 84 +++- packages/api/src/agents/handlers.ts | 111 +++++- packages/api/src/agents/resources.test.ts | 14 +- packages/api/src/agents/resources.ts | 4 +- packages/api/src/agents/skillFiles.spec.ts | 208 ++++++++-- packages/api/src/agents/skillFiles.ts | 292 ++++++++------ packages/api/src/files/code/identity.spec.ts | 116 ++++++ packages/api/src/files/code/identity.ts | 78 ++++ packages/api/src/files/code/index.ts | 1 + .../files/encode/processAttachments.spec.ts | 8 +- packages/api/src/utils/axios.ts | 32 +- packages/data-provider/src/codeEnvRef.spec.ts | 55 +++ packages/data-provider/src/codeEnvRef.ts | 59 +++ packages/data-provider/src/index.ts | 2 + packages/data-provider/src/types/files.ts | 11 +- .../data-schemas/src/methods/file.spec.ts | 9 +- packages/data-schemas/src/methods/file.ts | 4 +- .../data-schemas/src/methods/skill.spec.ts | 88 ++++- packages/data-schemas/src/methods/skill.ts | 9 +- packages/data-schemas/src/schema/file.ts | 18 +- packages/data-schemas/src/schema/skillFile.ts | 18 +- packages/data-schemas/src/types/file.ts | 9 +- packages/data-schemas/src/types/skill.ts | 9 +- 40 files changed, 1937 insertions(+), 456 deletions(-) create mode 100644 packages/api/src/files/code/identity.spec.ts create mode 100644 packages/api/src/files/code/identity.ts create mode 100644 packages/data-provider/src/codeEnvRef.spec.ts create mode 100644 packages/data-provider/src/codeEnvRef.ts diff --git a/.env.example b/.env.example index 9a5514ee9f..5349027e95 100644 --- a/.env.example +++ b/.env.example @@ -58,6 +58,10 @@ TRUST_PROXY=1 # Use when process console logs in cloud deployment like GCP/AWS CONSOLE_JSON=false +# The maximum length of a string in a JSON log object. +# Default: 255 +# CONSOLE_JSON_STRING_LENGTH=255 + #===============# # Debug Logging # #===============# diff --git a/api/app/clients/BaseClient.js b/api/app/clients/BaseClient.js index 626b4c77f1..e641017c74 100644 --- a/api/app/clients/BaseClient.js +++ b/api/app/clients/BaseClient.js @@ -1234,7 +1234,11 @@ class BaseClient { allFiles.push(file); continue; } - if (file.embedded === true || file.metadata?.fileIdentifier != null) { + if ( + file.embedded === true || + file.metadata?.codeEnvRef != null || + file.metadata?.fileIdentifier != null + ) { allFiles.push(file); continue; } diff --git a/api/package.json b/api/package.json index 11c27cd622..05a2ab0834 100644 --- a/api/package.json +++ b/api/package.json @@ -45,7 +45,7 @@ "@azure/storage-blob": "^12.30.0", "@google/genai": "^1.19.0", "@keyv/redis": "^4.3.3", - "@librechat/agents": "^3.1.79", + "@librechat/agents": "^3.1.80-dev.3", "@librechat/api": "*", "@librechat/data-schemas": "*", "@microsoft/microsoft-graph-client": "^3.0.7", diff --git a/api/server/controllers/agents/callbacks.js b/api/server/controllers/agents/callbacks.js index c9612c6b62..6390e60666 100644 --- a/api/server/controllers/agents/callbacks.js +++ b/api/server/controllers/agents/callbacks.js @@ -616,22 +616,23 @@ function createToolEndCallback({ req, res, artifactPromises, streamId = null }) toolCallId, conversationId: metadata.thread_id, /** - * Use the FILE's session_id (storage session), not the - * top-level artifact session_id (exec session). The codeapi - * worker reports two distinct ids on a tool result: + * Use the FILE's `storage_session_id` (storage session), + * not the top-level artifact `session_id` (exec session). + * The codeapi worker reports two distinct ids on a tool + * result: * - `artifact.session_id` is the EXEC session β€” the * sandbox VM that ran the bash command. Files don't * live there; it's torn down post-execution. - * - `file.session_id` is the STORAGE session β€” the - * file-server bucket prefix where artifacts actually - * live and are served from. + * - `file.storage_session_id` is the STORAGE session β€” + * the file-server bucket prefix where artifacts + * actually live and are served from. * `processCodeOutput` builds `/download/{session_id}/{id}`, * so passing the exec id resolves to a path the file-server * doesn't know about and 404s. Fall back to artifact-level * for older worker payloads that may not populate per-file * ids. */ - session_id: file.session_id ?? output.artifact.session_id, + session_id: file.storage_session_id ?? output.artifact.session_id, }); const fileMetadata = result?.file ?? null; const finalize = result?.finalize; @@ -882,22 +883,23 @@ function createResponsesToolEndCallback({ req, res, tracker, artifactPromises }) toolCallId, conversationId: metadata.thread_id, /** - * Use the FILE's session_id (storage session), not the - * top-level artifact session_id (exec session). The codeapi - * worker reports two distinct ids on a tool result: + * Use the FILE's `storage_session_id` (storage session), + * not the top-level artifact `session_id` (exec session). + * The codeapi worker reports two distinct ids on a tool + * result: * - `artifact.session_id` is the EXEC session β€” the * sandbox VM that ran the bash command. Files don't * live there; it's torn down post-execution. - * - `file.session_id` is the STORAGE session β€” the - * file-server bucket prefix where artifacts actually - * live and are served from. + * - `file.storage_session_id` is the STORAGE session β€” + * the file-server bucket prefix where artifacts + * actually live and are served from. * `processCodeOutput` builds `/download/{session_id}/{id}`, * so passing the exec id resolves to a path the file-server * doesn't know about and 404s. Fall back to artifact-level * for older worker payloads that may not populate per-file * ids. */ - session_id: file.session_id ?? output.artifact.session_id, + session_id: file.storage_session_id ?? output.artifact.session_id, }); const fileMetadata = result?.file ?? null; const finalize = result?.finalize; diff --git a/api/server/controllers/agents/client.js b/api/server/controllers/agents/client.js index 8f4af5f252..40c4ab6d96 100644 --- a/api/server/controllers/agents/client.js +++ b/api/server/controllers/agents/client.js @@ -331,7 +331,7 @@ class AgentClient extends BaseClient { this.contextHandlers?.processFile(file); continue; } - if (file.metadata?.fileIdentifier) { + if (file.metadata?.codeEnvRef) { continue; } } diff --git a/api/server/routes/files/files.js b/api/server/routes/files/files.js index 4176642a71..a43c184e50 100644 --- a/api/server/routes/files/files.js +++ b/api/server/routes/files/files.js @@ -2,6 +2,7 @@ const fs = require('fs').promises; const express = require('express'); const { logger, SystemCapabilities } = require('@librechat/data-schemas'); const { + logAxiosError, refreshS3FileUrls, resolveUploadErrorMessage, verifyAgentUploadPermission, @@ -309,12 +310,26 @@ router.get('/code/download/:session_id/:fileId', async (req, res) => { return res.status(501).send('Not Implemented'); } + /* Code-output downloads are always user-private β€” `processCodeOutput` + * persists every code-execution artifact under + * `metadata.codeEnvRef.kind === 'user'` regardless of which skill + * the run invoked. Pass `kind: 'user'` + `id: ` so codeapi's + * `sessionAuth` resolves the matching `:user:` + * sessionKey; without these query params it 400s with + * "kind must be one of: skill, agent, user". */ /** @type {AxiosResponse | undefined} */ - const response = await getDownloadStream(`${session_id}/${fileId}`); + const response = await getDownloadStream(`${session_id}/${fileId}`, { + kind: 'user', + id: req.user.id, + }); res.set(response.headers); response.data.pipe(res); } catch (error) { - logger.error('Error downloading file:', error); + /* `logAxiosError` redacts buffer/stream response bodies β€” without + * it, a stream-typed axios failure dumps the entire `Readable`'s + * internal state (megabytes of socket + readableState) into the + * log line. Plain `logger.error(error)` would do that here. */ + logAxiosError({ message: 'Error downloading code-output file', error }); res.status(500).send('Error downloading file'); } }); diff --git a/api/server/services/Files/Code/__tests__/process-traversal.spec.js b/api/server/services/Files/Code/__tests__/process-traversal.spec.js index 7412c4a28a..751db31bc2 100644 --- a/api/server/services/Files/Code/__tests__/process-traversal.spec.js +++ b/api/server/services/Files/Code/__tests__/process-traversal.spec.js @@ -40,6 +40,15 @@ jest.mock('@librechat/api', () => { * inline (non-finalize) path so existing assertions on a single * createFile call hold. */ hasOfficeHtmlPath: jest.fn(() => false), + /* Identity-helper stub mirroring `packages/api/src/files/code/identity.ts`. + * `processCodeOutput` calls this for every output download URL; + * traversal cases don't care about the query shape, just that it + * returns something concatable. */ + buildCodeEnvDownloadQuery: jest.fn(({ kind, id, version }) => { + const params = new URLSearchParams({ kind, id }); + if (version != null) params.set('version', String(version)); + return `?${params.toString()}`; + }), codeServerHttpAgent: new http.Agent({ keepAlive: false }), codeServerHttpsAgent: new https.Agent({ keepAlive: false }), }; diff --git a/api/server/services/Files/Code/crud.js b/api/server/services/Files/Code/crud.js index 6b4e58d89b..cb25f351ec 100644 --- a/api/server/services/Files/Code/crud.js +++ b/api/server/services/Files/Code/crud.js @@ -7,6 +7,8 @@ const { createAxiosInstance, codeServerHttpAgent, codeServerHttpsAgent, + appendCodeEnvFileIdentity, + buildCodeEnvDownloadQuery, } = require('@librechat/api'); const axios = createAxiosInstance(); @@ -16,16 +18,22 @@ const MAX_FILE_SIZE = 150 * 1024 * 1024; /** * Retrieves a download stream for a specified file. * @param {string} fileIdentifier - The identifier for the file (e.g., "session_id/fileId"). + * @param {{ kind: 'skill' | 'agent' | 'user'; id: string; version?: number }} identity + * Resource identity required by codeapi's `sessionAuth` to derive the + * matching sessionKey. For code-output downloads this is always + * `kind: 'user', id: `; for skill/agent re-downloads pass + * the kind+id (+version for skill) from the file's `metadata.codeEnvRef`. * @returns {Promise} A promise that resolves to a readable stream of the file content. * @throws {Error} If there's an error during the download process. */ -async function getCodeOutputDownloadStream(fileIdentifier) { +async function getCodeOutputDownloadStream(fileIdentifier, identity) { try { const baseURL = getCodeBaseURL(); + const query = buildCodeEnvDownloadQuery(identity); /** @type {import('axios').AxiosRequestConfig} */ const options = { method: 'get', - url: `${baseURL}/download/${fileIdentifier}`, + url: `${baseURL}/download/${fileIdentifier}${query}`, responseType: 'stream', headers: { 'User-Agent': 'LibreChat/1.0', @@ -49,20 +57,31 @@ async function getCodeOutputDownloadStream(fileIdentifier) { /** * Uploads a file to the Code Environment server. + * + * `kind`/`id`/`version?` are required so codeapi can route the upload to + * the correct sessionKey bucket β€” `::[:v:]` + * for shared kinds, `:user:` for `user`. + * Without these, codeapi falls back to user-scoped bucketing regardless + * of the resource the file belongs to, so skill-cache invalidation + * (driven by the version bump on edit) never fires. See codeapi #1455. + * * @param {Object} params - The params object. * @param {ServerRequest} params.req - The request object from Express. It should have a `user` property with an `id` representing the user * @param {import('fs').ReadStream | import('stream').Readable} params.stream - The read stream for the file. * @param {string} params.filename - The name of the file. - * @param {string} [params.entity_id] - Optional entity ID for the file. - * @returns {Promise} + * @param {'skill' | 'agent' | 'user'} params.kind - Resource kind that owns this file's storage session. + * @param {string} params.id - Resource id (skillId / agentId / userId). Codeapi + * ignores this for `kind: 'user'` (auth context provides userId), but it's + * sent uniformly for shape symmetry with the discriminated union. + * @param {number} [params.version] - Required when `kind === 'skill'`; absent otherwise. + * @returns {Promise<{ storage_session_id: string; file_id: string }>} + * The codeapi storage location of the uploaded file. * @throws {Error} If there's an error during the upload process. */ -async function uploadCodeEnvFile({ req, stream, filename, entity_id = '' }) { +async function uploadCodeEnvFile({ req, stream, filename, kind, id, version }) { try { const form = new FormData(); - if (entity_id.length > 0) { - form.append('entity_id', entity_id); - } + appendCodeEnvFileIdentity(form, { kind, id, version }); appendCodeEnvFile(form, stream, filename); const baseURL = getCodeBaseURL(); @@ -83,18 +102,16 @@ async function uploadCodeEnvFile({ req, stream, filename, entity_id = '' }) { const response = await axios.post(`${baseURL}/upload`, form, options); - /** @type {{ message: string; session_id: string; files: Array<{ fileId: string; filename: string }> }} */ + /** @type {{ message: string; storage_session_id: string; files: Array<{ fileId: string; filename: string }> }} */ const result = response.data; if (result.message !== 'success') { throw new Error(`Error uploading file: ${result.message}`); } - const fileIdentifier = `${result.session_id}/${result.files[0].fileId}`; - if (entity_id.length === 0) { - return fileIdentifier; - } - - return `${fileIdentifier}?entity_id=${entity_id}`; + return { + storage_session_id: result.storage_session_id, + file_id: result.files[0].fileId, + }; } catch (error) { throw new Error( logAxiosError({ @@ -109,25 +126,28 @@ async function uploadCodeEnvFile({ req, stream, filename, entity_id = '' }) { * Uploads multiple files to the code execution environment in a single request. * Uses the /upload/batch endpoint which shares one session_id across all files. * + * `kind`/`id`/`version?` carry the resource identity for codeapi's sessionKey + * derivation β€” see `uploadCodeEnvFile` for the full motivation. + * * @param {object} params * @param {import('express').Request & { user: { id: string } }} params.req - The request object. * @param {Array<{ stream: NodeJS.ReadableStream; filename: string }>} params.files - Files to upload. - * @param {string} [params.entity_id] - Optional entity ID. + * @param {'skill' | 'agent' | 'user'} params.kind - Resource kind that owns the batch's storage session. + * @param {string} params.id - Resource id (skillId / agentId / userId). + * @param {number} [params.version] - Required when `kind === 'skill'`; absent otherwise. * @param {boolean} [params.read_only] - When true, codeapi tags every file in * the batch as infrastructure (e.g. skill files). The flag is persisted as * MinIO object metadata (`X-Amz-Meta-Read-Only`) and travels with the file * through subsequent download/walk passes β€” sandboxed-code modifications * are dropped on the floor and the original ref is echoed back as * `inherited: true`, never as a generated artifact. - * @returns {Promise<{ session_id: string; files: Array<{ fileId: string; filename: string }> }>} + * @returns {Promise<{ storage_session_id: string; files: Array<{ fileId: string; filename: string }> }>} * @throws {Error} If the batch upload fails entirely. */ -async function batchUploadCodeEnvFiles({ req, files, entity_id = '', read_only = false }) { +async function batchUploadCodeEnvFiles({ req, files, kind, id, version, read_only = false }) { try { const form = new FormData(); - if (entity_id.length > 0) { - form.append('entity_id', entity_id); - } + appendCodeEnvFileIdentity(form, { kind, id, version }); if (read_only) { form.append('read_only', 'true'); } @@ -153,12 +173,12 @@ async function batchUploadCodeEnvFiles({ req, files, entity_id = '', read_only = const response = await axios.post(`${baseURL}/upload/batch`, form, options); - /** @type {{ message: string; session_id: string; files: Array<{ status: string; fileId?: string; filename: string; error?: string }>; succeeded: number; failed: number }} */ + /** @type {{ message: string; storage_session_id: string; files: Array<{ status: string; fileId?: string; filename: string; error?: string }>; succeeded: number; failed: number }} */ const result = response.data; if ( !result || typeof result !== 'object' || - !result.session_id || + !result.storage_session_id || !Array.isArray(result.files) ) { throw new Error(`Unexpected batch upload response: ${JSON.stringify(result).slice(0, 200)}`); @@ -179,7 +199,7 @@ async function batchUploadCodeEnvFiles({ req, files, entity_id = '', read_only = .filter((f) => f.status === 'success' && f.fileId) .map((f) => ({ fileId: f.fileId, filename: f.filename })); - return { session_id: result.session_id, files: successFiles }; + return { storage_session_id: result.storage_session_id, files: successFiles }; } catch (error) { throw new Error( logAxiosError({ @@ -190,4 +210,8 @@ async function batchUploadCodeEnvFiles({ req, files, entity_id = '', read_only = } } -module.exports = { getCodeOutputDownloadStream, uploadCodeEnvFile, batchUploadCodeEnvFiles }; +module.exports = { + getCodeOutputDownloadStream, + uploadCodeEnvFile, + batchUploadCodeEnvFiles, +}; diff --git a/api/server/services/Files/Code/crud.spec.js b/api/server/services/Files/Code/crud.spec.js index 487ae56786..a84689d8bf 100644 --- a/api/server/services/Files/Code/crud.spec.js +++ b/api/server/services/Files/Code/crud.spec.js @@ -9,6 +9,26 @@ jest.mock('@librechat/agents', () => ({ getCodeBaseURL: jest.fn(() => 'https://code-api.example.com'), })); +/* Inline the identity helpers' validation rules instead of pulling + * them through `@librechat/api`'s root barrel (which has init-time + * provider-config side effects that don't matter here) or its leaf + * module (the package's `exports` field only surfaces the root). + * The real implementation lives in `packages/api/src/files/code/identity.ts` + * and has its own dedicated `identity.spec.ts` covering the validation + * matrix; this stub just mirrors enough behavior for the surrounding + * crud tests to exercise the upload/download flow. */ +const VALID_KINDS = new Set(['skill', 'agent', 'user']); +const validateIdentity = ({ kind, id, version }, label) => { + if (!kind || !VALID_KINDS.has(kind)) throw new Error(`${label}: invalid kind "${kind}"`); + if (!id) throw new Error(`${label}: missing id for kind "${kind}"`); + if (kind === 'skill' && version == null) { + throw new Error(`${label}: kind "skill" requires a numeric version`); + } + if (kind !== 'skill' && version != null) { + throw new Error(`${label}: version is only valid for kind "skill"`); + } +}; + jest.mock('@librechat/api', () => { const http = require('http'); const https = require('https'); @@ -16,6 +36,18 @@ jest.mock('@librechat/api', () => { appendCodeEnvFile: jest.fn((form, stream, filename) => { form.append('file', stream, { filename }); }), + appendCodeEnvFileIdentity: jest.fn((form, identity) => { + validateIdentity(identity, 'appendCodeEnvFileIdentity'); + form.append('kind', identity.kind); + form.append('id', identity.id); + if (identity.version != null) form.append('version', String(identity.version)); + }), + buildCodeEnvDownloadQuery: jest.fn((identity) => { + validateIdentity(identity, 'buildCodeEnvDownloadQuery'); + const params = new URLSearchParams({ kind: identity.kind, id: identity.id }); + if (identity.version != null) params.set('version', String(identity.version)); + return `?${params.toString()}`; + }), logAxiosError: jest.fn(({ message }) => message), createAxiosInstance: jest.fn(() => mockAxios), codeServerHttpAgent: new http.Agent({ keepAlive: false }), @@ -32,11 +64,17 @@ describe('Code CRUD', () => { }); describe('getCodeOutputDownloadStream', () => { + /* Code-output downloads always carry `kind: 'user'` + `id: ` + * β€” codeapi's `sessionAuth` rejects without them post-Phase C. The + * fixture mirrors what `processCodeOutput` and the `/code/download` + * route pass in production. */ + const userIdentity = { kind: 'user', id: 'user-123' }; + it('should pass dedicated keepAlive:false agents to axios', async () => { const mockResponse = { data: Readable.from(['chunk']) }; mockAxios.mockResolvedValue(mockResponse); - await getCodeOutputDownloadStream('session-1/file-1'); + await getCodeOutputDownloadStream('session-1/file-1', userIdentity); const callConfig = mockAxios.mock.calls[0][0]; expect(callConfig.httpAgent).toBe(codeServerHttpAgent); @@ -50,18 +88,52 @@ describe('Code CRUD', () => { it('should request stream response from the correct URL', async () => { mockAxios.mockResolvedValue({ data: Readable.from(['chunk']) }); - await getCodeOutputDownloadStream('session-1/file-1'); + await getCodeOutputDownloadStream('session-1/file-1', userIdentity); const callConfig = mockAxios.mock.calls[0][0]; - expect(callConfig.url).toBe('https://code-api.example.com/download/session-1/file-1'); + /* URL carries `?kind=user&id=` so codeapi's `sessionAuth` + * can reconstruct the matching `:user:` sessionKey + * (Phase C / option Ξ±). */ + expect(callConfig.url).toBe( + 'https://code-api.example.com/download/session-1/file-1?kind=user&id=user-123', + ); expect(callConfig.responseType).toBe('stream'); expect(callConfig.timeout).toBe(15000); }); + it('forwards skill identity (kind/id/version) when re-downloading a primed skill file', async () => { + mockAxios.mockResolvedValue({ data: Readable.from(['chunk']) }); + + await getCodeOutputDownloadStream('session-2/file-x', { + kind: 'skill', + id: 'skill-abc', + version: 7, + }); + + const callConfig = mockAxios.mock.calls[0][0]; + expect(callConfig.url).toBe( + 'https://code-api.example.com/download/session-2/file-x?kind=skill&id=skill-abc&version=7', + ); + }); + + it('rejects skill identity without a version (mirrors codeapi validator)', async () => { + await expect( + getCodeOutputDownloadStream('s/f', { kind: 'skill', id: 'skill-abc' }), + ).rejects.toThrow(/skill.*version/); + expect(mockAxios).not.toHaveBeenCalled(); + }); + + it('rejects unknown kind without dispatching to codeapi', async () => { + await expect(getCodeOutputDownloadStream('s/f', { kind: 'system', id: 'x' })).rejects.toThrow( + /invalid kind/, + ); + expect(mockAxios).not.toHaveBeenCalled(); + }); + it('should throw on network error', async () => { mockAxios.mockRejectedValue(new Error('ECONNREFUSED')); - await expect(getCodeOutputDownloadStream('s/f')).rejects.toThrow(); + await expect(getCodeOutputDownloadStream('s/f', userIdentity)).rejects.toThrow(); }); }); @@ -70,13 +142,15 @@ describe('Code CRUD', () => { req: { user: { id: 'user-123' } }, stream: Readable.from(['file-content']), filename: 'data.csv', + kind: 'user', + id: 'user-123', }; it('should pass dedicated keepAlive:false agents to axios', async () => { mockAxios.post.mockResolvedValue({ data: { message: 'success', - session_id: 'sess-1', + storage_session_id: 'sess-1', files: [{ fileId: 'fid-1', filename: 'data.csv' }], }, }); @@ -96,7 +170,7 @@ describe('Code CRUD', () => { mockAxios.post.mockResolvedValue({ data: { message: 'success', - session_id: 'sess-1', + storage_session_id: 'sess-1', files: [{ fileId: 'fid-1', filename: 'data.csv' }], }, }); @@ -107,35 +181,106 @@ describe('Code CRUD', () => { expect(callConfig.timeout).toBe(120000); }); - it('should return fileIdentifier on success', async () => { + it('should return { storage_session_id, file_id } on success', async () => { mockAxios.post.mockResolvedValue({ data: { message: 'success', - session_id: 'sess-1', + storage_session_id: 'sess-1', files: [{ fileId: 'fid-1', filename: 'data.csv' }], }, }); const result = await uploadCodeEnvFile(baseUploadParams); - expect(result).toBe('sess-1/fid-1'); + expect(result).toEqual({ storage_session_id: 'sess-1', file_id: 'fid-1' }); }); - it('should append entity_id query param when provided', async () => { - mockAxios.post.mockResolvedValue({ + /* Phase C / option Ξ± (codeapi #1455): the upload wire carries the + * resource identity codeapi uses for sessionKey derivation. Without + * these on the form, codeapi falls back to user bucketing for every + * upload and skill-cache invalidation never fires. Validation runs + * client-side too so a bad caller fails fast instead of round-tripping + * a 400. */ + describe('codeapi resource identity (kind/id/version)', () => { + const FormData = require('form-data'); + const successResponse = { data: { message: 'success', - session_id: 'sess-1', + storage_session_id: 'sess-1', files: [{ fileId: 'fid-1', filename: 'data.csv' }], }, + }; + let appendSpy; + + beforeEach(() => { + /* Spying on the prototype lets us assert form fields without + * materializing the multipart body β€” `form.getBuffer()` would + * fail on the file-stream entry, but we don't care about the + * stream here, only the identity fields that ride beside it. */ + appendSpy = jest.spyOn(FormData.prototype, 'append'); }); - const result = await uploadCodeEnvFile({ ...baseUploadParams, entity_id: 'agent-42' }); - expect(result).toBe('sess-1/fid-1?entity_id=agent-42'); + afterEach(() => { + appendSpy.mockRestore(); + }); + + const fieldsAppended = () => + appendSpy.mock.calls + .filter((call) => typeof call[1] === 'string' || typeof call[1] === 'number') + .reduce((acc, [name, value]) => ({ ...acc, [name]: value }), {}); + + it('forwards kind, id, and (when skill) version on the multipart form', async () => { + mockAxios.post.mockResolvedValue(successResponse); + + await uploadCodeEnvFile({ + ...baseUploadParams, + kind: 'skill', + id: 'skill-42', + version: 7, + }); + + expect(fieldsAppended()).toEqual({ kind: 'skill', id: 'skill-42', version: '7' }); + }); + + it('omits version on the form for non-skill kinds', async () => { + mockAxios.post.mockResolvedValue(successResponse); + + await uploadCodeEnvFile({ ...baseUploadParams, kind: 'agent', id: 'agent-9' }); + + const fields = fieldsAppended(); + expect(fields).toEqual({ kind: 'agent', id: 'agent-9' }); + expect(fields).not.toHaveProperty('version'); + }); + + it('rejects unknown kind without dispatching to codeapi', async () => { + await expect( + uploadCodeEnvFile({ ...baseUploadParams, kind: 'system', id: 'x' }), + ).rejects.toThrow(/invalid kind/); + expect(mockAxios.post).not.toHaveBeenCalled(); + }); + + it('rejects skill upload without a version (mirrors codeapi validator)', async () => { + await expect( + uploadCodeEnvFile({ ...baseUploadParams, kind: 'skill', id: 'skill-42' }), + ).rejects.toThrow(/skill.*version/); + expect(mockAxios.post).not.toHaveBeenCalled(); + }); + + it('rejects version on non-skill kinds (mirrors codeapi validator)', async () => { + await expect( + uploadCodeEnvFile({ + ...baseUploadParams, + kind: 'agent', + id: 'agent-9', + version: 3, + }), + ).rejects.toThrow(/version.*skill/); + expect(mockAxios.post).not.toHaveBeenCalled(); + }); }); it('should throw when server returns non-success message', async () => { mockAxios.post.mockResolvedValue({ - data: { message: 'quota_exceeded', session_id: 's', files: [] }, + data: { message: 'quota_exceeded', storage_session_id: 's', files: [] }, }); await expect(uploadCodeEnvFile(baseUploadParams)).rejects.toThrow('quota_exceeded'); diff --git a/api/server/services/Files/Code/process.js b/api/server/services/Files/Code/process.js index a94981f721..2839d08c5a 100644 --- a/api/server/services/Files/Code/process.js +++ b/api/server/services/Files/Code/process.js @@ -16,6 +16,7 @@ const { extractCodeArtifactText, getExtractedTextFormat, getStorageMetadata, + buildCodeEnvDownloadQuery, } = require('@librechat/api'); const { Tools, @@ -286,7 +287,7 @@ const runPreviewFinalize = ({ finalize, fileId, previewRevision, onResolved }) = /** * Process code execution output files β€” downloads and saves both images * and non-image files. All files are saved to local storage with - * `fileIdentifier` metadata for code env re-upload. + * `codeEnvRef` metadata for code env re-upload. * * Returns a two-part shape so callers can ship the attachment to the * client immediately and run preview extraction in the background: @@ -334,9 +335,15 @@ const processCodeOutput = async ({ try { const formattedDate = currentDate.toISOString(); + /* Code-output files are always user-private β€” no skill execution + * produces a skill-scoped output bucket. The download URL must + * carry `?kind=user&id=` so codeapi's `sessionAuth` + * resolves the matching `:user:` sessionKey. See + * codeapi #1455 / Phase C. */ + const downloadQuery = buildCodeEnvDownloadQuery({ kind: 'user', id: req.user.id }); const response = await axios({ method: 'get', - url: `${baseURL}/download/${session_id}/${id}`, + url: `${baseURL}/download/${session_id}/${id}${downloadQuery}`, responseType: 'arraybuffer', headers: { 'User-Agent': 'LibreChat/1.0', @@ -366,7 +373,15 @@ const processCodeOutput = async ({ }; } - const fileIdentifier = `${session_id}/${id}`; + /* Code-output files belong to the user who ran the execution. + * SessionKey on codeapi will be `:user:` for these, + * so cache and access stay user-private. */ + const codeEnvRef = { + kind: 'user', + id: req.user.id, + storage_session_id: session_id, + file_id: id, + }; /* `safeName` keeps the directory structure (`a/b/file.txt` -> `a/b/file.txt`) * so the next prime() can place the file at the same nested path in the @@ -444,7 +459,7 @@ const processCodeOutput = async ({ updatedAt: formattedDate, source: appConfig.fileStrategy, context: FileContext.execute_code, - metadata: { fileIdentifier }, + metadata: { codeEnvRef }, }; await createFile(file, true); return { file: Object.assign(file, { messageId, toolCallId }) }; @@ -542,7 +557,7 @@ const processCodeOutput = async ({ tenantId: req.user.tenantId, bytes: buffer.length, updatedAt: formattedDate, - metadata: { fileIdentifier }, + metadata: { codeEnvRef }, source: appConfig.fileStrategy, context: FileContext.execute_code, usage: isUpdate ? (claimed.usage ?? 0) + 1 : 1, @@ -651,26 +666,31 @@ function checkIfActive(dateString) { /** * Retrieves the `lastModified` time string for a specified file from Code Execution Server. * - * @param {string} fileIdentifier - The identifier for the file (e.g., "session_id/fileId"). + * @param {import('librechat-data-provider').CodeEnvRef} ref - Typed pointer + * into codeapi storage. Carries kind/id/storage_session_id/file_id; + * codeapi resolves the sessionKey from the request's auth context. * * @returns {Promise} * A promise that resolves to the `lastModified` time string of the file if successful, or null if there is an * error in initialization or fetching the info. */ -async function getSessionInfo(fileIdentifier) { +async function getSessionInfo(ref) { try { const baseURL = getCodeBaseURL(); - const [path, queryString] = fileIdentifier.split('?'); - const [session_id, fileId] = path.split('/'); - let queryParams = {}; - if (queryString) { - queryParams = Object.fromEntries(new URLSearchParams(queryString).entries()); - } - + /* `/sessions/.../objects/...` is gated by codeapi's `sessionAuth` + * middleware (post-Phase C). The middleware reconstructs the + * sessionKey from the URL query (`kind`/`id`/`version?`) plus the + * requester's auth context, then matches it against the cached + * sessionKey on the storage bucket. We have the full `codeEnvRef` + * here, so pass kind+id (+version when skill) directly. */ + const query = buildCodeEnvDownloadQuery({ + kind: ref.kind, + id: ref.id, + ...(ref.kind === 'skill' ? { version: ref.version } : {}), + }); const response = await axios({ method: 'get', - url: `${baseURL}/sessions/${session_id}/objects/${fileId}`, - params: queryParams, + url: `${baseURL}/sessions/${ref.storage_session_id}/objects/${ref.file_id}${query}`, headers: { 'User-Agent': 'LibreChat/1.0', }, @@ -706,6 +726,15 @@ const primeFiles = async (options) => { const agentResourceIds = new Set(file_ids); const resourceFiles = tool_resources?.[EToolResources.execute_code]?.files ?? []; + /* Step 1 of the priming trace: input volume. Pair with the + * per-file `[primeCodeFiles] file=...` lines and the final + * `[primeCodeFiles] returned=...` line below to locate which + * layer drops a file the sandbox doesn't end up seeing. */ + logger.debug( + `[primeCodeFiles] in: file_ids=${file_ids.length} resourceFiles=${resourceFiles.length}`, + { agentId, file_ids, resourceFileIds: resourceFiles.map((f) => f?.file_id) }, + ); + // Get all files first const allFiles = (await getFiles({ file_id: { $in: file_ids } }, null, { text: 0 })) ?? []; @@ -728,146 +757,195 @@ const primeFiles = async (options) => { const sessions = new Map(); let toolContext = ''; + /* Per-file path counters β€” emitted at the bottom so a single + * grep on `[primeCodeFiles]` shows the input volume, the per-file + * paths taken, and the final dispatch summary in one trace. */ + let skippedNoRef = 0; + let reuploadFailures = 0; + for (let i = 0; i < dbFiles.length; i++) { const file = dbFiles[i]; if (!file) { continue; } - if (file.metadata.fileIdentifier) { - const [path, queryString] = file.metadata.fileIdentifier.split('?'); - const [session_id, id] = path.split('/'); - - let queryParams = {}; - if (queryString) { - queryParams = Object.fromEntries(new URLSearchParams(queryString).entries()); - } - - /** - * `pushFile` accepts optional overrides so the reupload path can - * push the FRESH `(session_id, id, entity_id)` parsed off the new - * `fileIdentifier`. Without these overrides, the closure would - * capture the stale pre-reupload refs from the outer loop and - * the in-memory `files` array (now consumed by - * `buildInitialToolSessions` to seed `Graph.sessions`) would - * point at a sandbox object that no longer exists. The DB record - * gets the new identifier via `updateFile`, but the seed would - * still inject the old one β€” bash_tool / read_file would 404 - * trying to mount the file until the next turn re-reads metadata. - * - * `entity_id` is forwarded so codeapi can resolve sessionKey - * per-file, allowing one execute to mix files uploaded under - * different entities (e.g. a skill bundle plus a user attachment). - */ - const pushFile = (overrideSessionId, overrideId, overrideEntityId) => { - if (!toolContext) { - toolContext = `- Note: The following files are available in the "${Tools.execute_code}" tool environment:`; - } - - let fileSuffix = ''; - if (!agentResourceIds.has(file.file_id)) { - fileSuffix = - file.context === FileContext.execute_code - ? ' (from previous code execution)' - : ' (attached by user)'; - } - - const entity_id = overrideEntityId ?? queryParams.entity_id; - - /* Surface the preview lifecycle so the LLM knows when a - * prior-turn artifact's rich preview didn't materialize. The - * file blob is always available (`processCodeOutput` persists - * it before returning), so the model can still tell the user - * "you can download it" even when the preview never resolved. - * Absent status means legacy or non-office β€” render normally. */ - let previewSuffix = ''; - if (file.status === 'pending') { - previewSuffix = ' (preview not yet generated)'; - } else if (file.status === 'failed') { - previewSuffix = file.previewError - ? ` (preview unavailable: ${file.previewError})` - : ' (preview unavailable)'; - } - - toolContext += `\n\t- /mnt/data/${file.filename}${fileSuffix}${previewSuffix}`; - files.push({ - id: overrideId ?? id, - session_id: overrideSessionId ?? session_id, - name: file.filename, - ...(entity_id ? { entity_id } : {}), - }); - }; - - if (sessions.has(session_id)) { - pushFile(); - continue; - } - - const reuploadFile = async () => { - try { - const { getDownloadStream } = getStrategyFunctions(file.source); - const { handleFileUpload: uploadCodeEnvFile } = getStrategyFunctions( - FileSources.execute_code, - ); - const stream = await getDownloadStream(options.req, file.filepath); - const fileIdentifier = await uploadCodeEnvFile({ - req: options.req, - stream, - filename: file.filename, - entity_id: queryParams.entity_id, - }); - - // Preserve existing metadata when adding fileIdentifier - const updatedMetadata = { - ...file.metadata, // Preserve existing metadata (like S3 storage info) - fileIdentifier, // Add fileIdentifier - }; - - await updateFile({ - file_id: file.file_id, - metadata: updatedMetadata, - }); - /** - * Parse the FRESH fileIdentifier returned by the reupload and - * route it through both the dedupe Map and the in-memory - * `files` list. The original `(session_id, id)` parsed at the - * top of this iteration refer to the old, expired/missing - * sandbox object β€” using them here would silently re-introduce - * the bug `Graph.sessions` seeding is supposed to fix. - * - * `entity_id` survives the round-trip: the upload was tagged - * with `queryParams.entity_id` above, so the new identifier - * carries the same scope. - */ - const [newPath, newQuery] = fileIdentifier.split('?'); - const [newSessionId, newId] = newPath.split('/'); - const newQueryParams = newQuery - ? Object.fromEntries(new URLSearchParams(newQuery).entries()) - : {}; - sessions.set(newSessionId, true); - pushFile(newSessionId, newId, newQueryParams.entity_id); - } catch (error) { - logger.error( - `Error re-uploading file ${id} in session ${session_id}: ${error.message}`, - error, - ); - } - }; - const uploadTime = await getSessionInfo(file.metadata.fileIdentifier); - if (!uploadTime) { - logger.warn(`Failed to get upload time for file ${id} in session ${session_id}`); - await reuploadFile(); - continue; - } - if (!checkIfActive(uploadTime)) { - await reuploadFile(); - continue; - } - sessions.set(session_id, true); - pushFile(); + const ref = file.metadata?.codeEnvRef; + if (!ref) { + skippedNoRef += 1; + logger.debug( + `[primeCodeFiles] file=${file.file_id} path=skip reason=no-codeenvref filename=${file.filename}`, + ); + continue; } + const session_id = ref.storage_session_id; + const id = ref.file_id; + + /** + * `pushFile` accepts optional overrides so the reupload path can + * push the FRESH `(storage_session_id, file_id)` from the new + * `codeEnvRef`. Without these overrides, the closure would + * capture the stale pre-reupload refs from the outer loop and + * the in-memory `files` array (now consumed by + * `buildInitialToolSessions` to seed `Graph.sessions`) would + * point at a sandbox object that no longer exists. The DB record + * gets the new ref via `updateFile`, but the seed would still + * inject the old one β€” bash_tool / read_file would 404 trying to + * mount the file until the next turn re-reads metadata. + * + * `kind`, `id`, `version` are preserved on the in-memory ref so + * codeapi can resolve sessionKey per-file (kind switch + + * tenant prefix from auth context). + */ + const pushFile = (overrideSessionId, overrideId) => { + if (!toolContext) { + toolContext = `- Note: The following files are available in the "${Tools.execute_code}" tool environment:`; + } + + let fileSuffix = ''; + if (!agentResourceIds.has(file.file_id)) { + fileSuffix = + file.context === FileContext.execute_code + ? ' (from previous code execution)' + : ' (attached by user)'; + } + + /* Surface the preview lifecycle so the LLM knows when a + * prior-turn artifact's rich preview didn't materialize. The + * file blob is always available (`processCodeOutput` persists + * it before returning), so the model can still tell the user + * "you can download it" even when the preview never resolved. + * Absent status means legacy or non-office β€” render normally. */ + let previewSuffix = ''; + if (file.status === 'pending') { + previewSuffix = ' (preview not yet generated)'; + } else if (file.status === 'failed') { + previewSuffix = file.previewError + ? ` (preview unavailable: ${file.previewError})` + : ' (preview unavailable)'; + } + + toolContext += `\n\t- /mnt/data/${file.filename}${fileSuffix}${previewSuffix}`; + /* `id` is the storage file_id (drives codeapi's upload-key + * existence check), `resource_id` is the entity that owns + * the storage session (drives sessionKey re-derivation). For + * code-output files this is `kind: 'user'` and `resource_id` + * is informational (codeapi ignores it for user kind), but + * we still send it for shape uniformity with shared kinds. */ + files.push({ + id: overrideId ?? id, + resource_id: ref.id, + storage_session_id: overrideSessionId ?? session_id, + name: file.filename, + kind: ref.kind, + ...(ref.kind === 'skill' ? { version: ref.version } : {}), + }); + }; + + if (sessions.has(session_id)) { + logger.debug( + `[primeCodeFiles] file=${file.file_id} path=cache-hit-by-session storage_session_id=${session_id}`, + ); + pushFile(); + continue; + } + + const reuploadFile = async () => { + try { + const { getDownloadStream } = getStrategyFunctions(file.source); + const { handleFileUpload: uploadCodeEnvFile } = getStrategyFunctions( + FileSources.execute_code, + ); + const stream = await getDownloadStream(options.req, file.filepath); + /* Reupload preserves the resource identity from the existing + * ref so codeapi re-buckets under the same sessionKey shape + * (skill stays skill, user stays user). Without this, a + * skill-cache-miss reupload would land in the user bucket + * and never re-shareable cross-user. */ + const uploaded = await uploadCodeEnvFile({ + req: options.req, + stream, + filename: file.filename, + kind: ref.kind, + id: ref.id, + ...(ref.kind === 'skill' ? { version: ref.version } : {}), + }); + + /** + * Use the FRESH `(storage_session_id, file_id)` from the + * reupload response and route it through the dedupe Map, the + * persisted record, and the in-memory `files` list. The + * original ref captured at the top of this iteration refers + * to the old, expired/missing sandbox object β€” using it here + * would silently re-introduce the bug `Graph.sessions` + * seeding is supposed to fix. + * + * `kind`, `id`, `version` survive the round-trip: the + * upload preserves the resource identity, only the storage + * pointer changes. + */ + const newRef = { + kind: ref.kind, + id: ref.id, + storage_session_id: uploaded.storage_session_id, + file_id: uploaded.file_id, + ...(ref.kind === 'skill' ? { version: ref.version } : {}), + }; + + const updatedMetadata = { + ...file.metadata, + codeEnvRef: newRef, + }; + + await updateFile({ + file_id: file.file_id, + metadata: updatedMetadata, + }); + sessions.set(newRef.storage_session_id, true); + pushFile(newRef.storage_session_id, newRef.file_id); + logger.debug( + `[primeCodeFiles] file=${file.file_id} path=reupload-success ` + + `oldSession=${session_id} newSession=${newRef.storage_session_id} newFileId=${newRef.file_id}`, + ); + } catch (error) { + reuploadFailures += 1; + logger.error( + `[primeCodeFiles] file=${file.file_id} path=reupload-failed session=${session_id}: ${error.message}`, + error, + ); + } + }; + const uploadTime = await getSessionInfo(ref); + if (!uploadTime) { + logger.debug( + `[primeCodeFiles] file=${file.file_id} path=reupload reason=no-uploadtime ` + + `storage_session_id=${session_id}`, + ); + await reuploadFile(); + continue; + } + if (!checkIfActive(uploadTime)) { + logger.debug( + `[primeCodeFiles] file=${file.file_id} path=reupload reason=stale ` + + `uploadTime=${uploadTime} storage_session_id=${session_id}`, + ); + await reuploadFile(); + continue; + } + sessions.set(session_id, true); + logger.debug( + `[primeCodeFiles] file=${file.file_id} path=fresh-active storage_session_id=${session_id}`, + ); + pushFile(); } + /* Dispatch summary β€” emitted unconditionally so a single grep on + * `[primeCodeFiles] out` always shows the final state, not only + * the per-path trail leading up to it. */ + logger.debug( + `[primeCodeFiles] out: returned=${files.length} ` + + `skippedNoRef=${skippedNoRef} reuploadFailures=${reuploadFailures}`, + ); + return { files, toolContext }; }; diff --git a/api/server/services/Files/Code/process.spec.js b/api/server/services/Files/Code/process.spec.js index e816cde3fb..ea2cd29429 100644 --- a/api/server/services/Files/Code/process.spec.js +++ b/api/server/services/Files/Code/process.spec.js @@ -85,6 +85,16 @@ jest.mock('@librechat/api', () => { * if a case needs to assert the 'html' value. */ getExtractedTextFormat: (...args) => mockGetExtractedTextFormat(...args), getStorageMetadata: jest.fn(() => ({})), + /* Identity helpers mirror codeapi's validator. The real impl + * lives in `packages/api/src/files/code/identity.ts` with its + * own dedicated `identity.spec.ts`; here we just stub the + * download-query builder since `processCodeOutput` calls it on + * every output download. */ + buildCodeEnvDownloadQuery: jest.fn(({ kind, id, version }) => { + const params = new URLSearchParams({ kind, id }); + if (version != null) params.set('version', String(version)); + return `?${params.toString()}`; + }), codeServerHttpAgent: new http.Agent({ keepAlive: false }), codeServerHttpsAgent: new https.Agent({ keepAlive: false }), }; @@ -708,17 +718,68 @@ describe('Code Process', () => { }); describe('metadata and file properties', () => { - it('should include fileIdentifier in metadata', async () => { + it('should include codeEnvRef in metadata with kind: user', async () => { const smallBuffer = Buffer.alloc(100); mockAxios.mockResolvedValue({ data: smallBuffer }); const { file: result } = await processCodeOutput(baseParams); expect(result.metadata).toEqual({ - fileIdentifier: 'session-123/file-id-123', + codeEnvRef: { + kind: 'user', + id: 'user-123', + storage_session_id: 'session-123', + file_id: 'file-id-123', + }, }); }); + /* Phase C lock-in: outputs are ALWAYS user-scoped, never skill-scoped. + * Even when an execution turn invoked a skill (so input files were + * `kind: 'skill'` shared cross-user), the resulting output bucket + * tags `kind: 'user'` with the requesting user's id. This prevents + * cross-user leakage of artifacts a skill may have generated for + * one user β€” each user gets their own output sessionKey on codeapi. + * + * Drift hazard: someone reading the simple user-derivation may + * later think "we should respect input kind for outputs too" and + * widen output scope to match input scope. This test pins the + * intentional asymmetry so that change requires updating the test + * (and re-reading the rationale). */ + it('outputs are user-scoped regardless of which skill the execution invoked', async () => { + const smallBuffer = Buffer.alloc(100); + mockAxios.mockResolvedValue({ data: smallBuffer }); + + const userA = { ...mockReq, user: { id: 'user-A' } }; + const userB = { ...mockReq, user: { id: 'user-B' } }; + + const { file: outputA } = await processCodeOutput({ ...baseParams, req: userA }); + const { file: outputB } = await processCodeOutput({ ...baseParams, req: userB }); + + // Each user's output ref is keyed by their own user id. The + // `id` field tracks the requesting user, never the skill. + expect(outputA.metadata.codeEnvRef).toEqual({ + kind: 'user', + id: 'user-A', + storage_session_id: 'session-123', + file_id: 'file-id-123', + }); + expect(outputB.metadata.codeEnvRef).toEqual({ + kind: 'user', + id: 'user-B', + storage_session_id: 'session-123', + file_id: 'file-id-123', + }); + + // No skill identity leaks into the output ref under any property. + const refA = outputA.metadata.codeEnvRef; + const refB = outputB.metadata.codeEnvRef; + expect(refA.kind).not.toBe('skill'); + expect(refB.kind).not.toBe('skill'); + expect(refA).not.toHaveProperty('version'); + expect(refB).not.toHaveProperty('version'); + }); + it('should set correct context for code-generated files', async () => { const smallBuffer = Buffer.alloc(100); mockAxios.mockResolvedValue({ data: smallBuffer }); @@ -934,7 +995,12 @@ describe('Code Process', () => { data: [{ name: 'sess/fid', lastModified: new Date().toISOString() }], }); - await getSessionInfo('sess/fid', 'api-key'); + await getSessionInfo({ + kind: 'user', + id: 'user-1', + storage_session_id: 'sess', + file_id: 'fid', + }); const callConfig = mockAxios.mock.calls[0][0]; expect(callConfig.httpAgent).toBe(codeServerHttpAgent); @@ -1511,8 +1577,8 @@ describe('Code Process', () => { * `getStrategyFunctions(FileSources.execute_code)` for the code-env * upload β€” both go through the same factory in production. */ - function setupReuploadMocks(newFileIdentifier) { - const handleFileUpload = jest.fn().mockResolvedValue(newFileIdentifier); + function setupReuploadMocks(newRef) { + const handleFileUpload = jest.fn().mockResolvedValue(newRef); const getDownloadStream = jest.fn().mockResolvedValue('mock-stream'); getStrategyFunctions.mockImplementation((source) => { if (source === 'execute_code') return { handleFileUpload }; @@ -1526,7 +1592,7 @@ describe('Code Process', () => { return { handleFileUpload, getDownloadStream }; } - it('seed receives FRESH session_id + id parsed off the new fileIdentifier on reupload', async () => { + it('seed receives FRESH (storage_session_id, file_id) from the reupload response', async () => { const dbFile = { file_id: 'librechat-file-id', filename: 'sentinel.txt', @@ -1535,12 +1601,17 @@ describe('Code Process', () => { context: 'execute_code', metadata: { /* Stale sandbox ref β€” this is what `getSessionInfo` will 404 on. */ - fileIdentifier: 'OLD_SESSION/OLD_ID', + codeEnvRef: { + kind: 'user', + id: 'user-123', + storage_session_id: 'OLD_SESSION', + file_id: 'OLD_ID', + }, }, }; getFiles.mockResolvedValue([dbFile]); - setupReuploadMocks('NEW_SESSION/NEW_ID'); + setupReuploadMocks({ storage_session_id: 'NEW_SESSION', file_id: 'NEW_ID' }); const result = await primeFiles({ req: { user: { id: 'user-123', role: 'USER' } }, @@ -1553,22 +1624,82 @@ describe('Code Process', () => { // The seed list (consumed by buildInitialToolSessions) MUST carry // the post-reupload ids β€” not the stale pre-reupload ones. expect(result.files).toEqual([ - { id: 'NEW_ID', session_id: 'NEW_SESSION', name: 'sentinel.txt' }, + { + id: 'NEW_ID', + /* `resource_id` carries the codeEnvRef.id (= original + * userId for kind: 'user'), threaded onto the in-memory + * file ref for codeapi's sessionKey re-derivation. */ + resource_id: 'user-123', + storage_session_id: 'NEW_SESSION', + name: 'sentinel.txt', + kind: 'user', + }, ]); }); - it('persists the new fileIdentifier on the DB record (existing behavior, regression-locked)', async () => { + /* Phase C / option Ξ± (codeapi #1455): reupload preserves the + * resource identity from the existing ref so codeapi re-buckets + * under the same sessionKey shape. Without this, a skill-cache-miss + * reupload lands in the user bucket and is no longer cross-user + * shareable. */ + it('reupload forwards kind/id (and version when skill) from the existing ref', async () => { const dbFile = { file_id: 'librechat-file-id', filename: 'sentinel.txt', filepath: '/uploads/sentinel.txt', source: 'local', context: 'execute_code', - metadata: { fileIdentifier: 'OLD_SESSION/OLD_ID' }, + metadata: { + codeEnvRef: { + kind: 'skill', + id: 'skill-99', + storage_session_id: 'OLD_SESSION', + file_id: 'OLD_ID', + version: 4, + }, + }, }; getFiles.mockResolvedValue([dbFile]); - setupReuploadMocks('NEW_SESSION/NEW_ID'); + const { handleFileUpload } = setupReuploadMocks({ + storage_session_id: 'NEW_SESSION', + file_id: 'NEW_ID', + }); + + await primeFiles({ + req: { user: { id: 'user-123', role: 'USER' } }, + tool_resources: { + execute_code: { file_ids: ['librechat-file-id'], files: [] }, + }, + agentId: 'agent-id', + }); + + expect(handleFileUpload).toHaveBeenCalledTimes(1); + const uploadArgs = handleFileUpload.mock.calls[0][0]; + expect(uploadArgs.kind).toBe('skill'); + expect(uploadArgs.id).toBe('skill-99'); + expect(uploadArgs.version).toBe(4); + }); + + it('persists fresh codeEnvRef (kind/id preserved) on the DB record after reupload', async () => { + const dbFile = { + file_id: 'librechat-file-id', + filename: 'sentinel.txt', + filepath: '/uploads/sentinel.txt', + source: 'local', + context: 'execute_code', + metadata: { + codeEnvRef: { + kind: 'user', + id: 'user-123', + storage_session_id: 'OLD_SESSION', + file_id: 'OLD_ID', + }, + }, + }; + getFiles.mockResolvedValue([dbFile]); + + setupReuploadMocks({ storage_session_id: 'NEW_SESSION', file_id: 'NEW_ID' }); await primeFiles({ req: { user: { id: 'user-123', role: 'USER' } }, @@ -1581,10 +1712,62 @@ describe('Code Process', () => { expect(updateFile).toHaveBeenCalledWith( expect.objectContaining({ file_id: 'librechat-file-id', - metadata: expect.objectContaining({ fileIdentifier: 'NEW_SESSION/NEW_ID' }), + metadata: expect.objectContaining({ + codeEnvRef: { + kind: 'user', + id: 'user-123', + storage_session_id: 'NEW_SESSION', + file_id: 'NEW_ID', + }, + }), }), ); }); + + it('reads codeEnvRef directly when present (skipping reupload)', async () => { + const dbFile = { + file_id: 'librechat-file-id', + filename: 'sentinel.txt', + filepath: '/uploads/sentinel.txt', + source: 'local', + context: 'execute_code', + metadata: { + codeEnvRef: { + kind: 'user', + id: 'user-123', + storage_session_id: 'STRUCT_SESSION', + file_id: 'STRUCT_ID', + }, + }, + }; + getFiles.mockResolvedValue([dbFile]); + filterFilesByAgentAccess.mockImplementation(({ files }) => Promise.resolve(files)); + // getSessionInfo returns a fresh timestamp so reupload is skipped. + mockAxios.mockResolvedValue({ data: { lastModified: new Date().toISOString() } }); + + const result = await primeFiles({ + req: { user: { id: 'user-123', role: 'USER' } }, + tool_resources: { + execute_code: { file_ids: ['librechat-file-id'], files: [] }, + }, + agentId: 'agent-id', + }); + + expect(updateFile).not.toHaveBeenCalled(); + expect(result.files).toEqual([ + { + id: 'STRUCT_ID', + /* `resource_id` from the persisted codeEnvRef.id β€” for + * `kind: 'user'` this is informational (codeapi derives + * sessionKey from auth context) but threaded for shape + * uniformity with shared kinds. */ + resource_id: 'user-123', + storage_session_id: 'STRUCT_SESSION', + name: 'sentinel.txt', + kind: 'user', + }, + ]); + }); }); describe('primeFiles toolContext surfaces preview status to the LLM', () => { @@ -1606,7 +1789,14 @@ describe('Code Process', () => { filepath: `/uploads/${overrides.status ?? 'ready'}.xlsx`, source: 'local', context: 'execute_code', - metadata: { fileIdentifier: 'CURRENT_SESSION/CURRENT_ID' }, + metadata: { + codeEnvRef: { + kind: 'user', + id: 'user-123', + storage_session_id: 'CURRENT_SESSION', + file_id: 'CURRENT_ID', + }, + }, ...overrides, }; } diff --git a/api/server/services/Files/process.js b/api/server/services/Files/process.js index 69b4a0b382..ea8ee14840 100644 --- a/api/server/services/Files/process.js +++ b/api/server/services/Files/process.js @@ -569,13 +569,44 @@ const processAgentFileUpload = async ({ req, res, metadata }) => { } const { handleFileUpload: uploadCodeEnvFile } = getStrategyFunctions(FileSources.execute_code); const stream = fs.createReadStream(file.path); - const fileIdentifier = await uploadCodeEnvFile({ + /* Resource identity for codeapi's sessionKey: + * - chat attachments (messageAttachment=true): `kind: 'user'`, codeapi + * buckets under `:user:` regardless of `id`. + * - agent setup files (messageAttachment=false): `kind: 'agent'`, shared + * per agent identity. `id` carries the agent id. */ + const codeKind = messageAttachment === true ? 'user' : 'agent'; + const codeId = messageAttachment === true ? req.user.id : agent_id; + /* Upload under the same sanitized filename LC stores in its DB + * (`fileInfo.filename` below uses `sanitizeFilename(originalname)`). + * Codeapi/file_server use this as the on-disk name in the sandbox + * β€” `/mnt/data/` β€” and `primeFiles`'s `toolContext` text + * + `_injected_files.name` both reference `file.filename`. Sending + * the unsanitized `file.originalname` here makes the sandbox path + * (with spaces / special chars) drift from what LC tells the model + * is available, causing FileNotFoundError on the first reference. */ + const sandboxFilename = sanitizeFilename(file.originalname); + const uploaded = await uploadCodeEnvFile({ req, stream, - filename: file.originalname, - entity_id, + filename: sandboxFilename, + kind: codeKind, + id: codeId, }); - fileInfoMetadata = { fileIdentifier }; + /* Persist under the structured `codeEnvRef` shape β€” the only key the + * post-cutover schema (`metadata.codeEnvRef`) and downstream readers + * (`primeFiles`, `getCodeFilesByIds`, `categorizeFileForToolResources`, + * controller filtering) accept. Storing under the legacy + * `fileIdentifier` key would be silently dropped by mongoose strict + * mode and the file would lose its sandbox reference on subsequent + * priming turns. */ + fileInfoMetadata = { + codeEnvRef: { + kind: codeKind, + id: codeId, + storage_session_id: uploaded.storage_session_id, + file_id: uploaded.file_id, + }, + }; } else if (tool_resource === EToolResources.file_search) { const isFileSearchEnabled = await checkCapability(req, AgentCapabilities.file_search); if (!isFileSearchEnabled) { diff --git a/api/server/services/Files/process.spec.js b/api/server/services/Files/process.spec.js index 2bcea53058..99457522d4 100644 --- a/api/server/services/Files/process.spec.js +++ b/api/server/services/Files/process.spec.js @@ -346,6 +346,128 @@ describe('processAgentFileUpload', () => { ).resolves.not.toThrow(); }); }); + + /* Phase C / option Ξ± regression: the upload must persist its sandbox + * pointer under `metadata.codeEnvRef` (the post-cutover schema). The + * legacy `metadata.fileIdentifier` key is silently stripped by mongoose + * strict mode and downstream readers (`primeFiles`, `getCodeFilesByIds`, + * `categorizeFileForToolResources`, controller filtering) only check + * `codeEnvRef`. Storing under the legacy key would orphan the file β€” + * priming would skip it on subsequent code-execution turns and the + * sandbox copy would never re-mount. */ + describe('execute_code uploads persist codeEnvRef metadata', () => { + const fs = require('fs'); + const { Readable } = require('stream'); + let createReadStreamSpy; + + beforeEach(() => { + /* `processAgentFileUpload` opens the multer-staged temp file via + * `fs.createReadStream`. The test fixture path doesn't exist, so + * stub it to a tiny in-memory stream. */ + createReadStreamSpy = jest + .spyOn(fs, 'createReadStream') + .mockImplementation(() => Readable.from(Buffer.from(''))); + }); + + afterEach(() => { + createReadStreamSpy.mockRestore(); + }); + + const setupCodeEnvUpload = (uploaded) => { + /* `processAgentFileUpload` calls `getStrategyFunctions` twice: + * once with `execute_code` for the codeapi upload, then again with + * the on-disk strategy (`local`) for the standard storage step that + * runs in the same flow. Both must return a working + * `handleFileUpload`. */ + const codeEnvUpload = jest.fn().mockResolvedValue(uploaded); + const localUpload = jest.fn().mockResolvedValue({ + bytes: 0, + filename: 'upload.bin', + filepath: '/uploads/upload.bin', + }); + getStrategyFunctions.mockImplementation((src) => + src === FileSources.execute_code + ? { handleFileUpload: codeEnvUpload } + : { handleFileUpload: localUpload, saveBuffer: jest.fn() }, + ); + return codeEnvUpload; + }; + + it('persists kind:user codeEnvRef for chat attachments (messageAttachment=true)', async () => { + setupCodeEnvUpload({ storage_session_id: 'sess-1', file_id: 'fid-1' }); + const req = makeReq(); + await processAgentFileUpload({ + req, + res: mockRes, + metadata: { + agent_id: 'agent-abc', + tool_resource: EToolResources.execute_code, + file_id: 'file-uuid', + message_file: true, + }, + }); + + expect(db.createFile).toHaveBeenCalledWith( + expect.objectContaining({ + metadata: { + codeEnvRef: { + kind: 'user', + id: 'user-123', + storage_session_id: 'sess-1', + file_id: 'fid-1', + }, + }, + }), + true, + ); + }); + + it('persists kind:agent codeEnvRef for agent setup files (messageAttachment=false)', async () => { + setupCodeEnvUpload({ storage_session_id: 'sess-2', file_id: 'fid-2' }); + const req = makeReq(); + await processAgentFileUpload({ + req, + res: mockRes, + metadata: { + agent_id: 'agent-abc', + tool_resource: EToolResources.execute_code, + file_id: 'file-uuid', + }, + }); + + expect(db.createFile).toHaveBeenCalledWith( + expect.objectContaining({ + metadata: { + codeEnvRef: { + kind: 'agent', + id: 'agent-abc', + storage_session_id: 'sess-2', + file_id: 'fid-2', + }, + }, + }), + true, + ); + }); + + it('does not persist legacy fileIdentifier key (mongoose strict drops it)', async () => { + setupCodeEnvUpload({ storage_session_id: 'sess-3', file_id: 'fid-3' }); + const req = makeReq(); + await processAgentFileUpload({ + req, + res: mockRes, + metadata: { + agent_id: 'agent-abc', + tool_resource: EToolResources.execute_code, + file_id: 'file-uuid', + message_file: true, + }, + }); + + const persisted = db.createFile.mock.calls[0][0]; + expect(persisted.metadata).not.toHaveProperty('fileIdentifier'); + }); + }); }); describe('processFileURL', () => { diff --git a/package-lock.json b/package-lock.json index 94817f82e3..5993e8cdb7 100644 --- a/package-lock.json +++ b/package-lock.json @@ -60,7 +60,7 @@ "@azure/storage-blob": "^12.30.0", "@google/genai": "^1.19.0", "@keyv/redis": "^4.3.3", - "@librechat/agents": "^3.1.79", + "@librechat/agents": "^3.1.80-dev.3", "@librechat/api": "*", "@librechat/data-schemas": "*", "@microsoft/microsoft-graph-client": "^3.0.7", @@ -12221,9 +12221,9 @@ } }, "node_modules/@librechat/agents": { - "version": "3.1.79", - "resolved": "https://registry.npmjs.org/@librechat/agents/-/agents-3.1.79.tgz", - "integrity": "sha512-sHQPoIRCG8j7H2U2bAQGI4hJwHZ9IAA00v9XhP029E1jhxKbY2wwMj9eHLrvVPJNi9ZnNR6QVvanjQgy3Wd7Iw==", + "version": "3.1.80-dev.3", + "resolved": "https://registry.npmjs.org/@librechat/agents/-/agents-3.1.80-dev.3.tgz", + "integrity": "sha512-yHf346QR6Uo8EkE6IMkJhQTMbugPRO79keulifeo2ILmVmvoCa6G0zOo6/lughTGPNITRXPIbPKH2wr5tuw2jQ==", "license": "MIT", "dependencies": { "@anthropic-ai/sdk": "^0.92.0", @@ -44679,7 +44679,7 @@ "@azure/storage-blob": "^12.30.0", "@google/genai": "^1.19.0", "@keyv/redis": "^4.3.3", - "@librechat/agents": "^3.1.79", + "@librechat/agents": "^3.1.80-dev.3", "@librechat/data-schemas": "*", "@modelcontextprotocol/sdk": "^1.29.0", "@smithy/node-http-handler": "^4.4.5", diff --git a/packages/api/package.json b/packages/api/package.json index 73915e0556..5d3489eb55 100644 --- a/packages/api/package.json +++ b/packages/api/package.json @@ -98,7 +98,7 @@ "@azure/storage-blob": "^12.30.0", "@google/genai": "^1.19.0", "@keyv/redis": "^4.3.3", - "@librechat/agents": "^3.1.79", + "@librechat/agents": "^3.1.80-dev.3", "@librechat/data-schemas": "*", "@modelcontextprotocol/sdk": "^1.29.0", "@smithy/node-http-handler": "^4.4.5", diff --git a/packages/api/src/agents/codeFilesSession.spec.ts b/packages/api/src/agents/codeFilesSession.spec.ts index a02d5089f9..a5ac577703 100644 --- a/packages/api/src/agents/codeFilesSession.spec.ts +++ b/packages/api/src/agents/codeFilesSession.spec.ts @@ -6,10 +6,19 @@ import { type CodeFilesAgent, } from './codeFilesSession'; -const file = (id: string, session_id: string, name: string): CodeEnvFile => ({ +const file = (id: string, storage_session_id: string, name: string): CodeEnvFile => ({ id, - session_id, + /* `resource_id` is informational for `kind: 'user'` (codeapi derives + * the sessionKey from the auth-context user, not this field) but + * still required by the type for shape uniformity. The id stand-in + * is fine here because no test asserts on its value. */ + resource_id: id, + storage_session_id, name, + /* `kind` drives codeapi's sessionKey shape; defaults to `'user'` for + * primed code-execution attachments at the test boundary (most are + * chat uploads). Tests that need shared kinds set it explicitly. */ + kind: 'user', }); describe('seedCodeFilesIntoSessions', () => { @@ -75,13 +84,18 @@ describe('seedCodeFilesIntoSessions', () => { expect(entry.session_id).toBe('skill-sess'); }); - it('skips seeding when incoming files have no session_id (defensive)', () => { - const fileWithoutSession = { id: 'x', session_id: '', name: 'orphan.csv' } as CodeEnvFile; + it('skips seeding when incoming files have no storage_session_id (defensive)', () => { + const fileWithoutSession = { + id: 'x', + storage_session_id: '', + name: 'orphan.csv', + kind: 'user', + } as CodeEnvFile; const result = seedCodeFilesIntoSessions([fileWithoutSession], undefined); expect(result).toBeUndefined(); }); - it('dedupes incoming files that share session_id + id with existing entries', () => { + it('dedupes incoming files that share storage_session_id + id with existing entries', () => { /** * Regression for Codex review #2: shared conversation files commonly * appear in multiple agents' `primedCodeFiles`. Without dedupe, @@ -128,7 +142,7 @@ describe('seedCodeFilesIntoSessions', () => { const result = seedCodeFilesIntoSessions([a, a, b], undefined); const entry = result!.get(Constants.EXECUTE_CODE) as CodeSessionContext; expect(entry.files).toHaveLength(2); - expect(entry.files!.map((f) => f.session_id).sort()).toEqual(['sess-A', 'sess-B']); + expect(entry.files!.map((f) => f.storage_session_id).sort()).toEqual(['sess-A', 'sess-B']); }); }); @@ -267,8 +281,8 @@ describe('buildInitialToolSessions', () => { expect(entry.session_id).toBe('sess-PRIMARY'); /* All three agents still contributed their files into the merged set. */ expect(entry.files!.map((f) => f.name).sort()).toEqual(['a.txt', 'b.txt', 'top.txt']); - /* And the per-file session_ids are preserved (ToolNode injects per-file). */ - const byName = new Map(entry.files!.map((f) => [f.name, f.session_id])); + /* And the per-file storage_session_ids are preserved (ToolNode injects per-file). */ + const byName = new Map(entry.files!.map((f) => [f.name, f.storage_session_id])); expect(byName.get('top.txt')).toBe('sess-PRIMARY'); expect(byName.get('a.txt')).toBe('sess-A'); expect(byName.get('b.txt')).toBe('sess-B'); diff --git a/packages/api/src/agents/codeFilesSession.ts b/packages/api/src/agents/codeFilesSession.ts index de75f2217f..2b37b27f89 100644 --- a/packages/api/src/agents/codeFilesSession.ts +++ b/packages/api/src/agents/codeFilesSession.ts @@ -22,21 +22,24 @@ export interface CodeFilesAgent { * after the first call returns one, so primed files were silently dropped. * * Files from `primeFiles` (api/server/services/Files/Code/process.js) carry - * per-file `session_id`s. The map's representative `session_id` is taken from - * the first incoming file (matching `primeInvokedSkills`); per-file ids on the - * `files` array are what `ToolNode` actually uses (`file.session_id ?? codeSession.session_id`). + * per-file `storage_session_id`s (the long-lived storage bucket id). The + * map's representative top-level `session_id` is seeded from the first + * incoming file's `storage_session_id` (matching `primeInvokedSkills`), + * since no execution session exists yet at seed time. Per-file ids on the + * `files` array are what `ToolNode` actually uses + * (`file.storage_session_id ?? execSessionId`). * * When an entry already exists (e.g. seeded by `primeInvokedSkills` for skill * files), incoming files are appended after the existing ones. The pre-existing * representative `session_id` is preserved so a partial-cache/fresh-prime * collision doesn't shift which session id `ToolNode` picks for the call. * - * Files are deduplicated by `session_id + id` as the stable identity key. - * Multiple agents in the same run commonly carry the same primed - * code-execution resources (shared conversation files), and without dedupe - * `_injected_files` would grow proportionally to agent count and inflate - * every `/exec` POST. First-seen wins so the original ordering / source - * is preserved. + * Files are deduplicated by `storage_session_id + id` as the stable + * identity key. Multiple agents in the same run commonly carry the same + * primed code-execution resources (shared conversation files), and without + * dedupe `_injected_files` would grow proportionally to agent count and + * inflate every `/exec` POST. First-seen wins so the original ordering / + * source is preserved. */ export function seedCodeFilesIntoSessions( files: CodeEnvFile[] | undefined, @@ -50,15 +53,15 @@ export function seedCodeFilesIntoSessions( const prior = sessions.get(Constants.EXECUTE_CODE) as CodeSessionContext | undefined; /** - * Compose `(session_id, id)` as a stable identity. `name` alone isn't - * sufficient β€” two distinct primed uploads can share a filename - * (different sessions, different file_ids). The composite stays + * Compose `(storage_session_id, id)` as a stable identity. `name` alone + * isn't sufficient β€” two distinct primed uploads can share a filename + * (different storage sessions, different file_ids). The composite stays * cheap to compute and the keys are short uuids. */ const seenKeys = new Set(); const mergedFiles: FileRefs = []; - const pushIfFresh = (f: { id?: string; session_id?: string; name?: string }): void => { - const key = `${f.session_id ?? ''}\0${f.id ?? ''}`; + const pushIfFresh = (f: { id?: string; storage_session_id?: string; name?: string }): void => { + const key = `${f.storage_session_id ?? ''}\0${f.id ?? ''}`; if (seenKeys.has(key)) return; seenKeys.add(key); mergedFiles.push(f as FileRefs[number]); @@ -68,7 +71,14 @@ export function seedCodeFilesIntoSessions( } for (const f of files) pushIfFresh(f); - const representativeSessionId = prior?.session_id ?? files[0].session_id; + /* Representative top-level `session_id` for the seed CodeSessionContext. + * No execution session exists yet at seed time, so the first incoming + * file's `storage_session_id` stands in until the first `/exec` call + * returns a real execution session id. ToolNode reads per-file + * `storage_session_id` for actual injection β€” the representative is + * informational rather than load-bearing. Mirrors the same convention + * used in `primeInvokedSkills`. */ + const representativeSessionId = prior?.session_id ?? files[0].storage_session_id; if (!representativeSessionId) { return existing; } diff --git a/packages/api/src/agents/handlers.spec.ts b/packages/api/src/agents/handlers.spec.ts index f6b0939fbb..91f6e531af 100644 --- a/packages/api/src/agents/handlers.spec.ts +++ b/packages/api/src/agents/handlers.spec.ts @@ -72,8 +72,20 @@ describe('createToolExecuteHandler', () => { codeSessionContext: { session_id: 'prev-session-abc', files: [ - { session_id: 'prev-session-abc', id: 'f1', name: 'data.parquet' }, - { session_id: 'prev-session-abc', id: 'f2', name: 'chart.png' }, + { + storage_session_id: 'prev-session-abc', + id: 'f1', + resource_id: 'user_alice', + name: 'data.parquet', + kind: 'user', + }, + { + storage_session_id: 'prev-session-abc', + id: 'f2', + resource_id: 'user_alice', + name: 'chart.png', + kind: 'user', + }, ], }, }, @@ -84,8 +96,20 @@ describe('createToolExecuteHandler', () => { expect(capturedConfigs).toHaveLength(1); expect(capturedConfigs[0].session_id).toBe('prev-session-abc'); expect(capturedConfigs[0]._injected_files).toEqual([ - { session_id: 'prev-session-abc', id: 'f1', name: 'data.parquet' }, - { session_id: 'prev-session-abc', id: 'f2', name: 'chart.png' }, + { + storage_session_id: 'prev-session-abc', + id: 'f1', + resource_id: 'user_alice', + name: 'data.parquet', + kind: 'user', + }, + { + storage_session_id: 'prev-session-abc', + id: 'f2', + resource_id: 'user_alice', + name: 'chart.png', + kind: 'user', + }, ]); }); @@ -141,7 +165,15 @@ describe('createToolExecuteHandler', () => { args: { lang: 'python', code: 'step_1()' }, codeSessionContext: { session_id: 'session-A', - files: [{ session_id: 'session-A', id: 'fa', name: 'a.csv' }], + files: [ + { + storage_session_id: 'session-A', + id: 'fa', + resource_id: 'user_alice', + name: 'a.csv', + kind: 'user', + }, + ], }, }, { @@ -150,7 +182,15 @@ describe('createToolExecuteHandler', () => { args: { lang: 'python', code: 'step_2()' }, codeSessionContext: { session_id: 'session-A', - files: [{ session_id: 'session-A', id: 'fa', name: 'a.csv' }], + files: [ + { + storage_session_id: 'session-A', + id: 'fa', + resource_id: 'user_alice', + name: 'a.csv', + kind: 'user', + }, + ], }, }, ]; @@ -161,7 +201,13 @@ describe('createToolExecuteHandler', () => { for (const config of capturedConfigs) { expect(config.session_id).toBe('session-A'); expect(config._injected_files).toEqual([ - { session_id: 'session-A', id: 'fa', name: 'a.csv' }, + { + storage_session_id: 'session-A', + id: 'fa', + resource_id: 'user_alice', + name: 'a.csv', + kind: 'user', + }, ]); } }); @@ -177,7 +223,15 @@ describe('createToolExecuteHandler', () => { args: { query: 'test' }, codeSessionContext: { session_id: 'should-be-ignored', - files: [{ session_id: 'x', id: 'y', name: 'z' }], + files: [ + { + storage_session_id: 'x', + id: 'y', + resource_id: 'user_alice', + name: 'z', + kind: 'user', + }, + ], }, }, ]; @@ -205,6 +259,7 @@ describe('createToolExecuteHandler', () => { name: 'pii-redactor', body: 'restricted body', fileCount: 0, + version: 1, disableModelInvocation: true, })); const handler = createSkillHandler(getSkillByName); @@ -247,6 +302,7 @@ describe('createToolExecuteHandler', () => { name: 'normal-skill', body: 'body', fileCount: 0, + version: 1, })); const handler = createSkillHandler(getSkillByName); @@ -276,6 +332,7 @@ describe('createToolExecuteHandler', () => { name: 'maybe-disabled', body: 'body', fileCount: 0, + version: 1, })); const handler = createSkillHandler(getSkillByName); @@ -302,6 +359,7 @@ describe('createToolExecuteHandler', () => { name: 'maybe-disabled-read', body: '# Body', fileCount: 0, + version: 1, })); const handler = createToolExecuteHandler({ loadTools: jest.fn(async () => ({ @@ -342,6 +400,7 @@ describe('createToolExecuteHandler', () => { name: 'manually-primed', body: '# Body', fileCount: 0, + version: 1, })); const handler = createToolExecuteHandler({ loadTools: jest.fn(async () => ({ @@ -391,6 +450,7 @@ describe('createToolExecuteHandler', () => { name: 'pii-redactor', body: 'restricted body', fileCount: 0, + version: 1, disableModelInvocation: true, })); const handler = createToolExecuteHandler({ @@ -421,6 +481,7 @@ describe('createToolExecuteHandler', () => { name: 'normal-skill', body: '# Body', fileCount: 0, + version: 1, })); const handler = createToolExecuteHandler({ loadTools: jest.fn(async () => ({ @@ -454,6 +515,7 @@ describe('createToolExecuteHandler', () => { name: 'manual-only-skill', body: '# Use references/docs.md for details', fileCount: 0, + version: 1, disableModelInvocation: true, })); const handler = createToolExecuteHandler({ @@ -489,6 +551,7 @@ describe('createToolExecuteHandler', () => { name: 'other-disabled-skill', body: 'restricted', fileCount: 0, + version: 1, disableModelInvocation: true, })); const handler = createToolExecuteHandler({ @@ -526,6 +589,7 @@ describe('createToolExecuteHandler', () => { name: 'always-applied-legal', body: '# Cite references/policy.md when advising', fileCount: 0, + version: 1, disableModelInvocation: true, })); const handler = createToolExecuteHandler({ @@ -566,6 +630,7 @@ describe('createToolExecuteHandler', () => { name: 'collides', body: '# primed body', fileCount: 0, + version: 1, })); const handler = createToolExecuteHandler({ loadTools: jest.fn(async () => ({ @@ -615,6 +680,7 @@ describe('createToolExecuteHandler', () => { name: 'brand-guidelines', body: 'skill body', fileCount: 2, + version: 1, })); /* `loadTools` injects `codeEnvAvailable` into the returned `configurable`, which mirrors production flow through @@ -789,6 +855,7 @@ describe('createToolExecuteHandler', () => { name: 'primed-only-skill', body: '# Primed skill body', fileCount: 1, + version: 1, })); const readSandboxFile = jest.fn(); const getSkillFileByPath = jest.fn(async () => ({ @@ -888,6 +955,7 @@ describe('createToolExecuteHandler', () => { name: 'real-skill', body: '# Real Body', fileCount: 0, + version: 1, })); const readSandboxFile = jest.fn(); const handler = makeReadFileHandler({ diff --git a/packages/api/src/agents/handlers.ts b/packages/api/src/agents/handlers.ts index 4d1951a308..05f892eff8 100644 --- a/packages/api/src/agents/handlers.ts +++ b/packages/api/src/agents/handlers.ts @@ -10,6 +10,7 @@ import type { ToolExecuteBatchRequest, } from '@librechat/agents'; import { Types } from 'mongoose'; +import type { CodeEnvRef } from 'librechat-data-provider'; import type { StructuredToolInterface } from '@librechat/agents/langchain/tools'; import type { SkillFileRecord } from './skillFiles'; import type { ServerRequest } from '~/types'; @@ -67,6 +68,11 @@ export interface ToolExecuteOptions { body: string; name: string; _id: Types.ObjectId; + /** Monotonic counter on the skill record. Threaded into + * `codeEnvRef.version` so codeapi's sessionKey scopes the cache + * per-revision; bumping the version on edit invalidates the + * prior cache entry. */ + version: number; fileCount: number; /** * Set when the skill author opted out of model invocation. The handler @@ -83,22 +89,30 @@ export interface ToolExecuteOptions { getDownloadStream?: (req: ServerRequest, filepath: string) => Promise; [key: string]: unknown; }; - /** Batch uploads files to the code execution environment */ + /** Batch uploads files to the code execution environment. `kind`/`id`/ + * `version?` carry the resource identity codeapi uses to derive the + * sessionKey for the batch's storage bucket. */ batchUploadCodeEnvFiles?: (params: { req: ServerRequest; files: Array<{ stream: NodeJS.ReadableStream; filename: string }>; - entity_id?: string; - }) => Promise<{ session_id: string; files: Array<{ fileId: string; filename: string }> }>; + kind: 'skill' | 'agent' | 'user'; + id: string; + version?: number; + read_only?: boolean; + }) => Promise<{ + storage_session_id: string; + files: Array<{ fileId: string; filename: string }>; + }>; /** Checks if a code env file is still active. Returns lastModified or null. */ - getSessionInfo?: (fileIdentifier: string) => Promise; + getSessionInfo?: (ref: CodeEnvRef) => Promise; /** 23-hour freshness check */ checkIfActive?: (dateString: string) => boolean; - /** Persists codeEnvIdentifiers on skill files after upload */ + /** Persists `codeEnvRef` on skill files after upload */ updateSkillFileCodeEnvIds?: ( updates: Array<{ skillId: Types.ObjectId | string; relativePath: string; - codeEnvIdentifier: string; + codeEnvRef: CodeEnvRef; }>, ) => Promise; /** Loads a skill file by path (for read_file tool) */ @@ -888,7 +902,19 @@ async function handleSkillToolCall( const contentText = `Skill "${args.skillName}" loaded. Follow the instructions below.`; let artifact: - | { session_id: string; files: Array<{ id: string; session_id: string; name: string }> } + | { + session_id: string; + files: Array<{ + id: string; + /** Resource id (skill `_id`). codeapi requires this distinct + * from the storage `id` to scope sessionKey by resource. */ + resource_id: string; + name: string; + storage_session_id: string; + kind?: 'skill' | 'agent' | 'user'; + version?: number; + }>; + } | undefined; // Prime skill files to code env β€” only when the `execute_code` capability @@ -916,7 +942,26 @@ async function handleSkillToolCall( updateSkillFileCodeEnvIds, }); if (primeResult) { - artifact = primeResult; + /* `session_id` at the top of the artifact is the (representative) + * execution session β€” ToolNode reads it for CodeSessionContext + * continuity. Per-file storage lives on each file's + * `storage_session_id`. Skill files carry `kind: 'skill'` and + * the skill's version so codeapi's sessionKey scopes the + * cache per-revision. */ + artifact = { + session_id: primeResult.storage_session_id, + files: primeResult.files.map((f) => ({ + id: f.id, + /* `resource_id` (skill `_id`) is what codeapi feeds into + * `:skill::v:` β€” without it the next + * /exec authorizer sees `resource_id: undefined` and 400s. */ + resource_id: f.resource_id, + name: f.name, + storage_session_id: f.storage_session_id, + kind: 'skill', + version: skill.version, + })), + }; } } catch (error) { logger.error( @@ -1015,6 +1060,56 @@ export function createToolExecuteHandler(options: ToolExecuteOptions): EventHand toolCallConfig.session_id = tc.codeSessionContext.session_id; if (tc.codeSessionContext.files && tc.codeSessionContext.files.length > 0) { toolCallConfig._injected_files = tc.codeSessionContext.files; + /* Last LC-controlled point before the wire. Mirrors + * codeapi's validator context so the two log sides + * correlate on a single grep. */ + const refs = tc.codeSessionContext.files as Array<{ + id?: unknown; + resource_id?: unknown; + storage_session_id?: unknown; + kind?: unknown; + version?: unknown; + name?: unknown; + }>; + const summary = refs.map((f) => ({ + kind: f.kind, + hasResourceId: typeof f.resource_id === 'string' && !!f.resource_id, + hasStorageSessionId: + typeof f.storage_session_id === 'string' && !!f.storage_session_id, + hasVersion: typeof f.version === 'number', + })); + const missingResourceId = summary.filter((s) => !s.hasResourceId).length; + logger.debug( + `[code-env:inject] tool=${tc.name} files=${refs.length} ` + + `missingResourceId=${missingResourceId}`, + { summary }, + ); + if (missingResourceId > 0) { + logger.warn( + `[code-env:inject] ${missingResourceId}/${refs.length} files missing resource_id ` + + `for tool=${tc.name} β€” codeapi will reject with 400`, + { summary }, + ); + } + } else { + /* Empty `_injected_files` on a code-execution tool + * call. Almost always means the seeding chain + * (primeCodeFiles β†’ initialSessions β†’ + * CodeSessionContext) dropped the file upstream. + * `session_id` is still emitted; agents falls + * through to the `/files/` legacy fetch + * which is post-cutover broken (returns 400). + * Pair with `[primeCodeFiles]` traces below to + * locate the layer that lost the ref. */ + logger.warn( + `[code-env:inject] tool=${tc.name} _injected_files=0 β€” sandbox will see no input files`, + { + tool: tc.name, + session_id: tc.codeSessionContext.session_id, + codeSessionContextHasFiles: tc.codeSessionContext.files !== undefined, + codeSessionContextFileCount: tc.codeSessionContext.files?.length ?? 0, + }, + ); } } diff --git a/packages/api/src/agents/resources.test.ts b/packages/api/src/agents/resources.test.ts index 641fb9284c..718e5cbdec 100644 --- a/packages/api/src/agents/resources.test.ts +++ b/packages/api/src/agents/resources.test.ts @@ -127,7 +127,7 @@ describe('primeResources', () => { embedded: false, usage: 0, metadata: { - fileIdentifier: 'python-script', + codeEnvRef: { kind: 'user', id: 'user1', storage_session_id: 'sess', file_id: 'fid' }, }, }, ]; @@ -328,7 +328,7 @@ describe('primeResources', () => { embedded: false, usage: 0, metadata: { - fileIdentifier: 'python-script', + codeEnvRef: { kind: 'user', id: 'user1', storage_session_id: 'sess', file_id: 'fid' }, }, }, ]; @@ -542,7 +542,7 @@ describe('primeResources', () => { embedded: false, usage: 0, metadata: { - fileIdentifier: 'python-script', + codeEnvRef: { kind: 'user', id: 'user1', storage_session_id: 'sess', file_id: 'fid' }, }, }; @@ -713,7 +713,7 @@ describe('primeResources', () => { embedded: false, usage: 0, metadata: { - fileIdentifier: 'python-script', + codeEnvRef: { kind: 'user', id: 'user1', storage_session_id: 'sess', file_id: 'fid' }, }, }; @@ -728,7 +728,7 @@ describe('primeResources', () => { embedded: false, usage: 0, metadata: { - fileIdentifier: 'python-script', + codeEnvRef: { kind: 'user', id: 'user1', storage_session_id: 'sess', file_id: 'fid' }, }, }; @@ -878,7 +878,7 @@ describe('primeResources', () => { embedded: false, usage: 0, metadata: { - fileIdentifier: 'python-script', + codeEnvRef: { kind: 'user', id: 'user1', storage_session_id: 'sess', file_id: 'fid' }, }, }; @@ -1128,7 +1128,7 @@ describe('primeResources', () => { embedded: false, usage: 0, metadata: { - fileIdentifier: 'python-script', + codeEnvRef: { kind: 'user', id: 'user1', storage_session_id: 'sess', file_id: 'fid' }, }, }, { diff --git a/packages/api/src/agents/resources.ts b/packages/api/src/agents/resources.ts index e147c743cf..56cd9b6c0d 100644 --- a/packages/api/src/agents/resources.ts +++ b/packages/api/src/agents/resources.ts @@ -80,7 +80,7 @@ const addFileToResource = ({ /** * Categorizes a file into the appropriate tool resource based on its properties * Files are categorized as: - * - execute_code: Files with fileIdentifier metadata + * - execute_code: Files with a code-environment ref (`codeEnvRef`) * - file_search: Files marked as embedded * - image_edit: Image files in the request file set with dimensions * @param params - Parameters object @@ -100,7 +100,7 @@ const categorizeFileForToolResources = ({ requestFileSet: Set; processedResourceFiles: Set; }): void => { - if (file.metadata?.fileIdentifier) { + if (file.metadata?.codeEnvRef) { addFileToResource({ file, resourceType: EToolResources.execute_code, diff --git a/packages/api/src/agents/skillFiles.spec.ts b/packages/api/src/agents/skillFiles.spec.ts index e4475c2048..4338b34e4e 100644 --- a/packages/api/src/agents/skillFiles.spec.ts +++ b/packages/api/src/agents/skillFiles.spec.ts @@ -14,15 +14,16 @@ jest.mock('./run', () => ({ import { Readable } from 'stream'; import { Types } from 'mongoose'; -import { primeInvokedSkills } from './skillFiles'; -import type { PrimeInvokedSkillsDeps } from './skillFiles'; +import { primeInvokedSkills, primeSkillFiles } from './skillFiles'; +import type { PrimeInvokedSkillsDeps, PrimeSkillFilesParams } from './skillFiles'; const SKILL_ID = new Types.ObjectId(); +const SKILL_VERSION = 7; function makeDeps(overrides: Partial = {}): PrimeInvokedSkillsDeps { const listSkillFiles = jest.fn().mockResolvedValue([]); const batchUploadCodeEnvFiles = jest.fn().mockResolvedValue({ - session_id: 'session-x', + storage_session_id: 'session-x', files: [], }); return { @@ -34,6 +35,7 @@ function makeDeps(overrides: Partial = {}): PrimeInvoked _id: SKILL_ID, name: 'brand-guidelines', body: 'skill body', + version: SKILL_VERSION, fileCount: 2, }), listSkillFiles, @@ -79,14 +81,10 @@ describe('primeInvokedSkills β€” execute_code capability gate', () => { ]; const listSkillFiles = jest.fn().mockResolvedValue(fileRecords); const getStrategyFunctions = jest.fn().mockReturnValue({ - /* A real empty Readable β€” matches the production contract for - `getDownloadStream` and works even if `batchUploadCodeEnvFiles` - is later replaced with a partially-real implementation that - iterates the stream. */ getDownloadStream: jest.fn().mockResolvedValue(Readable.from(Buffer.from(''))), }); const batchUploadCodeEnvFiles = jest.fn().mockResolvedValue({ - session_id: 'session-42', + storage_session_id: 'session-42', files: [{ fileId: 'file-1', filename: 'brand-guidelines/references/style.md' }], }); @@ -101,17 +99,17 @@ describe('primeInvokedSkills β€” execute_code capability gate', () => { expect(batchUploadCodeEnvFiles).toHaveBeenCalledTimes(1); const [uploadArgs] = batchUploadCodeEnvFiles.mock.calls[0]; - /* Phase 8 deprecation: LibreChat no longer threads an apiKey through - the sandbox upload path. The agents library / sandbox service owns - auth internally. */ expect(uploadArgs).not.toHaveProperty('apiKey'); - expect(uploadArgs.entity_id).toBe(SKILL_ID.toString()); - /* Skill files are infrastructure inputs; the read_only flag tells codeapi - to seal them so any sandboxed-code modifications are dropped instead - of surfaced as ghost generated artifacts. */ + expect(uploadArgs).not.toHaveProperty('entity_id'); expect(uploadArgs.read_only).toBe(true); - /* One uploaded file per `fileRecords` entry plus the synthetic - SKILL.md that `primeSkillFiles` always prepends. */ + /* Resource identity for codeapi's sessionKey: skill files share + * cross-user-within-tenant under `:skill::v:`. + * Without these on the upload, codeapi falls back to user bucketing + * and skill-cache invalidation (driven by version bump on edit) + * never fires. See codeapi #1455 (option Ξ±). */ + expect(uploadArgs.kind).toBe('skill'); + expect(uploadArgs.id).toBe(SKILL_ID.toString()); + expect(uploadArgs.version).toBe(SKILL_VERSION); expect(uploadArgs.files).toHaveLength(fileRecords.length + 1); expect(uploadArgs.files.map((f: { filename: string }) => f.filename)).toEqual( expect.arrayContaining(['brand-guidelines/SKILL.md', 'brand-guidelines/references/style.md']), @@ -128,12 +126,12 @@ describe('primeInvokedSkills β€” execute_code capability gate', () => { expect(deps.getSkillByName).not.toHaveBeenCalled(); }); - it('forwards entity_id on every file in initialSessions after fresh upload', async () => { - /* Regression: codeapi now resolves sessionKey per-file using `entity_id`. - * Skill files must carry `entity_id=` through priming so that - * a subsequent execute mixing skill files and a user attachment can be - * authorized β€” both files in the same call resolve to their own scope - * instead of collapsing onto a single request-level entity. */ + it('writes kind/version/storage_session_id on every cached file after a fresh upload', async () => { + /* Per-file refs in `CodeSessionContext.files` carry the resource + * identity (kind=skill, id=skillId, version=skill.version) so + * codeapi can derive the right sessionKey. Cache scope is per + * (tenant, kind, id, version) β€” bumping the skill version + * naturally invalidates the prior cache. */ const listSkillFiles = jest.fn().mockResolvedValue([ { relativePath: 'references/style.md', @@ -147,7 +145,7 @@ describe('primeInvokedSkills β€” execute_code capability gate', () => { getDownloadStream: jest.fn().mockResolvedValue(Readable.from(Buffer.from(''))), }); const batchUploadCodeEnvFiles = jest.fn().mockResolvedValue({ - session_id: 'session-42', + storage_session_id: 'session-42', files: [{ fileId: 'file-1', filename: 'brand-guidelines/references/style.md' }], }); @@ -164,9 +162,16 @@ describe('primeInvokedSkills β€” execute_code capability gate', () => { expect(codeSession?.files).toEqual([ { id: 'file-1', + /* `resource_id` is the skill `_id` β€” drives codeapi's + * sessionKey re-derivation. Distinct from `id` (the storage + * file_id). The split fixes the shared-kind 403 mismatch + * where codeapi computed sessionKey from the storage nanoid + * instead of the skill _id. */ + resource_id: SKILL_ID.toString(), name: 'brand-guidelines/references/style.md', - session_id: 'session-42', - entity_id: SKILL_ID.toString(), + storage_session_id: 'session-42', + kind: 'skill', + version: SKILL_VERSION, }, ]); }); @@ -175,10 +180,10 @@ describe('primeInvokedSkills β€” execute_code capability gate', () => { /* Concurrency regression: when many users hit the same skill at * once, fire-and-forget keeps the cache in miss-steady-state for * the burst β€” User N's prime reads SkillFile docs before User N-1's - * persist commits, sees no `codeEnvIdentifier`, re-uploads, and - * fires its own forget that User N+1 also races. Awaiting the - * persist before the prime resolves ensures the next concurrent - * caller observes the cache pointer instead of racing a write. */ + * persist commits, sees no codeEnvRef, re-uploads, and fires its + * own forget that User N+1 also races. Awaiting the persist before + * the prime resolves ensures the next concurrent caller observes + * the cache pointer instead of racing a write. */ const fileRecords = [ { relativePath: 'references/style.md', @@ -193,7 +198,7 @@ describe('primeInvokedSkills β€” execute_code capability gate', () => { getDownloadStream: jest.fn().mockResolvedValue(Readable.from(Buffer.from(''))), }); const batchUploadCodeEnvFiles = jest.fn().mockResolvedValue({ - session_id: 'session-42', + storage_session_id: 'session-42', files: [{ fileId: 'file-1', filename: 'brand-guidelines/references/style.md' }], }); @@ -235,17 +240,23 @@ describe('primeInvokedSkills β€” execute_code capability gate', () => { { skillId: SKILL_ID, relativePath: 'references/style.md', - codeEnvIdentifier: `session-42/file-1?entity_id=${SKILL_ID.toString()}`, + codeEnvRef: { + kind: 'skill', + id: SKILL_ID.toString(), + storage_session_id: 'session-42', + file_id: 'file-1', + version: SKILL_VERSION, + }, }, ]); }); - it('parses entity_id off codeEnvIdentifier in the cached-skills hot path', async () => { + it('reads codeEnvRef in the cached-skills hot path', async () => { /* When all skill files are still active in codeapi, primeInvokedSkills - * skips the batch upload entirely and reconstructs file refs from each - * skill file's persisted `codeEnvIdentifier`. The query string carries - * `?entity_id=`, which must survive into `_injected_files` so - * downstream authorization still uses the skill's scope. */ + * skips the batch upload entirely and reconstructs file refs from + * each skill file's persisted `codeEnvRef`. The kind/version travel + * through to `CodeSessionContext.files` so the next exec carries + * them on `_injected_files` for codeapi's sessionKey derivation. */ const listSkillFiles = jest.fn().mockResolvedValue([ { relativePath: 'references/style.md', @@ -253,7 +264,13 @@ describe('primeInvokedSkills β€” execute_code capability gate', () => { filepath: '/storage/brand-guidelines/references/style.md', source: 's3', bytes: 256, - codeEnvIdentifier: `session-cached/file-cached?entity_id=${SKILL_ID.toString()}`, + codeEnvRef: { + kind: 'skill', + id: SKILL_ID.toString(), + storage_session_id: 'session-cached', + file_id: 'file-cached', + version: SKILL_VERSION, + }, }, ]); const batchUploadCodeEnvFiles = jest.fn(); @@ -261,7 +278,7 @@ describe('primeInvokedSkills β€” execute_code capability gate', () => { codeEnvAvailable: true, listSkillFiles, batchUploadCodeEnvFiles, - getSessionInfo: jest.fn().mockResolvedValue('2026-05-05T00:00:00Z'), + getSessionInfo: jest.fn().mockResolvedValue('2026-05-06T00:00:00Z'), checkIfActive: jest.fn().mockReturnValue(true), }); @@ -272,9 +289,118 @@ describe('primeInvokedSkills β€” execute_code capability gate', () => { expect(codeSession?.files).toEqual([ { id: 'file-cached', + /* From the cache-hit path: pulls `resource_id` directly off + * the persisted `codeEnvRef.id` (the skill `_id`). */ + resource_id: SKILL_ID.toString(), name: 'brand-guidelines/references/style.md', - session_id: 'session-cached', - entity_id: SKILL_ID.toString(), + storage_session_id: 'session-cached', + kind: 'skill', + version: SKILL_VERSION, + }, + ]); + }); +}); + +/* The tool-invoked skill loader (`handle_skill` -> `primeSkillFiles`) + * is a separate code path from `primeInvokedSkills` (the NL-detected + * loader). Both feed `_injected_files` on the next /exec; both must + * carry `resource_id` end-to-end, otherwise codeapi 400s with + * `resource_id is invalid` (`type: 'undefined'`). Tests below lock + * that contract on the lower-level helper directly. */ +describe('primeSkillFiles β€” resource identity propagation', () => { + beforeEach(() => { + jest.clearAllMocks(); + }); + + function makeSkillFilesDeps( + overrides: Partial = {}, + ): PrimeSkillFilesParams { + return { + skill: { + _id: SKILL_ID, + name: 'brand-guidelines', + body: 'skill body', + version: SKILL_VERSION, + }, + skillFiles: [], + req: { user: { id: 'user-1' } } as PrimeSkillFilesParams['req'], + getStrategyFunctions: jest.fn().mockReturnValue({ + getDownloadStream: jest.fn().mockResolvedValue(Readable.from(Buffer.from(''))), + }), + batchUploadCodeEnvFiles: jest.fn().mockResolvedValue({ + storage_session_id: 'session-fresh', + files: [ + { fileId: 'file-fresh', filename: 'brand-guidelines/references/style.md' }, + { fileId: 'file-skillmd', filename: 'brand-guidelines/SKILL.md' }, + ], + }), + ...overrides, + }; + } + + it('fresh-upload path: emits resource_id=skill._id, kind=skill, version on each file', async () => { + const deps = makeSkillFilesDeps({ + skillFiles: [ + { + relativePath: 'references/style.md', + filename: 'style.md', + filepath: '/storage/brand-guidelines/references/style.md', + source: 's3', + bytes: 256, + }, + ], + }); + + const result = await primeSkillFiles(deps); + + expect(result?.files).toEqual([ + { + id: 'file-fresh', + resource_id: SKILL_ID.toString(), + storage_session_id: 'session-fresh', + name: 'brand-guidelines/references/style.md', + kind: 'skill', + version: SKILL_VERSION, + }, + ]); + }); + + it('cache-hit path: re-derives resource_id/kind/version from persisted codeEnvRef', async () => { + const cachedRef = { + kind: 'skill' as const, + id: SKILL_ID.toString(), + storage_session_id: 'session-cached', + file_id: 'file-cached', + version: SKILL_VERSION, + }; + const batchUploadCodeEnvFiles = jest.fn(); + const deps = makeSkillFilesDeps({ + skillFiles: [ + { + relativePath: 'references/style.md', + filename: 'style.md', + filepath: '/storage/brand-guidelines/references/style.md', + source: 's3', + bytes: 256, + codeEnvRef: cachedRef, + }, + ], + batchUploadCodeEnvFiles, + getSessionInfo: jest.fn().mockResolvedValue('2026-05-06T00:00:00Z'), + checkIfActive: jest.fn().mockReturnValue(true), + }); + + const result = await primeSkillFiles(deps); + + expect(batchUploadCodeEnvFiles).not.toHaveBeenCalled(); + expect(result?.files).toEqual([ + { + id: 'file-cached', + resource_id: SKILL_ID.toString(), + storage_session_id: 'session-cached', + name: 'brand-guidelines/references/style.md', + kind: 'skill', + version: SKILL_VERSION, }, ]); }); diff --git a/packages/api/src/agents/skillFiles.ts b/packages/api/src/agents/skillFiles.ts index 9f274f6a99..dc48914b83 100644 --- a/packages/api/src/agents/skillFiles.ts +++ b/packages/api/src/agents/skillFiles.ts @@ -2,6 +2,7 @@ import { Readable } from 'stream'; import { Constants } from '@librechat/agents'; import { logger } from '@librechat/data-schemas'; import type { ToolSessionMap, CodeSessionContext } from '@librechat/agents'; +import type { CodeEnvRef } from 'librechat-data-provider'; import type { Types } from 'mongoose'; import type { ServerRequest } from '~/types'; import { extractInvokedSkillsFromPayload } from './run'; @@ -12,11 +13,19 @@ export interface SkillFileRecord { filepath: string; source: string; bytes: number; - codeEnvIdentifier?: string; + codeEnvRef?: CodeEnvRef; } export interface PrimeSkillFilesParams { - skill: { body: string; name: string; _id: Types.ObjectId | string }; + skill: { + body: string; + name: string; + _id: Types.ObjectId | string; + /** Monotonic counter on the skill record. Bumped on every edit + * (frontmatter / body / file upsert). Threaded into `codeEnvRef.version` + * so codeapi's sessionKey scopes the cache per-revision. */ + version: number; + }; skillFiles: SkillFileRecord[]; req: ServerRequest; getStrategyFunctions: (source: string) => { @@ -26,44 +35,53 @@ export interface PrimeSkillFilesParams { batchUploadCodeEnvFiles: (params: { req: ServerRequest; files: Array<{ stream: NodeJS.ReadableStream; filename: string }>; - entity_id?: string; + /** Resource kind that owns the batch's storage session. Drives codeapi's + * sessionKey derivation (`::[:v:]`). */ + kind: 'skill' | 'agent' | 'user'; + /** Resource id (skillId / agentId / userId). */ + id: string; + /** Required when `kind === 'skill'`; forbidden otherwise. */ + version?: number; /** When true, codeapi tags every file in the batch as infrastructure * (read-only inputs that must never surface as generated artifacts, * even if sandboxed code mutates the bytes on disk). */ read_only?: boolean; }) => Promise<{ - session_id: string; + storage_session_id: string; files: Array<{ fileId: string; filename: string }>; }>; /** Checks if a code env file is still active. Returns lastModified timestamp or null. */ - getSessionInfo?: (fileIdentifier: string) => Promise; + getSessionInfo?: (ref: CodeEnvRef) => Promise; /** 23-hour freshness check */ checkIfActive?: (dateString: string) => boolean; - /** Persists codeEnvIdentifier on skill files after upload. Implementations + /** Persists `codeEnvRef` on skill files after upload. Implementations * warn-log on partial writes (matchedCount/modifiedCount mismatch) * internally β€” caller can fire-and-forget without losing visibility. */ updateSkillFileCodeEnvIds?: ( updates: Array<{ skillId: Types.ObjectId | string; relativePath: string; - codeEnvIdentifier: string; + codeEnvRef: CodeEnvRef; }>, ) => Promise<{ matchedCount: number; modifiedCount: number } | void>; } export interface PrimeSkillFilesResult { - session_id: string; - files: Array<{ id: string; session_id: string; name: string; entity_id?: string }>; -} - -/** Parses `entity_id` out of a persisted `codeEnvIdentifier`'s query string. - * Returns `undefined` when the identifier has no query string or no - * `entity_id` (legacy user-attachment uploads). */ -function parseEntityIdFromCodeEnvIdentifier(identifier: string): string | undefined { - const queryString = identifier.split('?')[1]; - if (!queryString) return undefined; - const value = new URLSearchParams(queryString).get('entity_id'); - return value && value.length > 0 ? value : undefined; + /** Representative storage session id (first file's). */ + storage_session_id: string; + files: Array<{ + /** Storage file id (the per-file uuid file_server returned at upload). */ + id: string; + /** Resource id β€” the entity that owns the storage session. For skill + * files this is `skill._id.toString()`. Distinct from `id`; codeapi + * derives the sessionKey from `resource_id` (shared cache scope) but + * validates upload presence under `id` (per-file storage key). */ + resource_id: string; + storage_session_id: string; + name: string; + kind: 'skill' | 'agent' | 'user'; + version?: number; + }>; } /** @@ -90,29 +108,27 @@ export async function primeSkillFiles( updateSkillFileCodeEnvIds, } = params; - // Check if ALL existing sessions are still active before reusing cached refs. - // Files normally share one session, but a partial bulkWrite failure during - // updateSkillFileCodeEnvIds can leave mixed session IDs. Checking every - // distinct session prevents serving stale identifiers that 404 in code env. + /* Cache-hit path: every skillFile carries a `codeEnvRef` from the + * previous prime. Check freshness against codeapi for every distinct + * storage session; if all are still active, reuse without + * re-uploading. The skill version is part of the ref β€” when the + * skill is edited, the upsert clears the ref and forces a fresh + * upload on the next prime. */ if (getSessionInfo && checkIfActive && skillFiles.length > 0) { - // All files must have identifiers for the cache to be complete. - // Any missing identifier means partial persistence β€” fall through to re-upload. - const allHaveIds = skillFiles.every((sf) => sf.codeEnvIdentifier); - if (allHaveIds) { - const sessionIds = new Set( - skillFiles.map((sf) => (sf.codeEnvIdentifier as string).split('?')[0].split('/')[0]), - ); + const allHaveRefs = skillFiles.every((sf) => sf.codeEnvRef !== undefined); + if (allHaveRefs) { + const refsBySession = new Map(); + for (const sf of skillFiles) { + const ref = sf.codeEnvRef; + if (ref && !refsBySession.has(ref.storage_session_id)) { + refsBySession.set(ref.storage_session_id, ref); + } + } try { const checkResults = await Promise.all( - Array.from(sessionIds).map(async (sid) => { - const representative = skillFiles.find((sf) => - sf.codeEnvIdentifier!.startsWith(`${sid}/`), - ); - if (!representative) { - return false; - } - const lastModified = await getSessionInfo(representative.codeEnvIdentifier!); + Array.from(refsBySession.values()).map(async (ref) => { + const lastModified = await getSessionInfo(ref); return !!(lastModified && checkIfActive(lastModified)); }), ); @@ -121,22 +137,30 @@ export async function primeSkillFiles( if (allActive) { const files: PrimeSkillFilesResult['files'] = []; for (const sf of skillFiles) { - const identifier = sf.codeEnvIdentifier as string; - const [sid, fid] = identifier.split('?')[0].split('/'); - const entity_id = parseEntityIdFromCodeEnvIdentifier(identifier); + const ref = sf.codeEnvRef; + if (!ref) continue; + /* Cache-hit refs already carry resource identity (kind / id / + * version) β€” pull them through so the artifact emitted by + * `handle_skill` and forwarded to `_injected_files` includes + * `resource_id`. Without this the next /exec sends + * `resource_id: undefined` and codeapi 400s. The discriminated + * union pins `version` to the skill branch only β€” destructure + * before the spread so TS accepts the conditional pull. */ files.push({ - id: fid, - session_id: sid, + id: ref.file_id, + resource_id: ref.id, + storage_session_id: ref.storage_session_id, name: `${skill.name}/${sf.relativePath}`, - ...(entity_id != null ? { entity_id } : {}), + kind: ref.kind, + ...(ref.kind === 'skill' ? { version: ref.version } : {}), }); } if (files.length > 0) { logger.debug( - `[primeSkillFiles] All ${sessionIds.size} session(s) active for skill "${skill.name}", reusing ${files.length} files`, + `[primeSkillFiles] All ${refsBySession.size} session(s) active for skill "${skill.name}", reusing ${files.length} files`, ); - return { session_id: files[0].session_id, files }; + return { storage_session_id: files[0].storage_session_id, files }; } } } catch { @@ -183,7 +207,13 @@ export async function primeSkillFiles( const result = await batchUploadCodeEnvFiles({ req, files: filesToUpload, - entity_id: entityId, + /* Resource identity for codeapi's sessionKey: skill files share + * cross-user-within-tenant under `:skill::v:`. + * Bumping `skill.version` on edit naturally invalidates the prior + * cache entry under the new sessionKey. */ + kind: 'skill', + id: entityId, + version: skill.version, /* Skill files are infrastructure: SKILL.md + bundled scripts/schemas/ * docs that the agent reads but should never edit. Tag the upload as * read-only so codeapi seals the inputs (chmod 444 in-sandbox) and @@ -194,15 +224,20 @@ export async function primeSkillFiles( read_only: true, }); // Exclude SKILL.md from the returned files array β€” it is uploaded to disk - // for bash access but has no codeEnvIdentifier (cannot be cached). Omitting - // it here keeps the fresh-upload and cache-hit code paths consistent. - const files = result.files + // for bash access but has no codeEnvRef (cannot be cached). Omitting it + // here keeps the fresh-upload and cache-hit code paths consistent. + const files: PrimeSkillFilesResult['files'] = result.files .filter((f) => !f.filename.endsWith('/SKILL.md')) .map((f) => ({ id: f.fileId, - session_id: result.session_id, + /* `resource_id` is the skill `_id` (the entity codeapi scopes + * the sessionKey on). Distinct from `id` (the per-file storage + * uuid) β€” both are required on the request. */ + resource_id: entityId, + storage_session_id: result.storage_session_id, name: f.filename, - entity_id: entityId, + kind: 'skill', + version: skill.version, })); // Treat partial upload failures as a priming failure β€” missing bundled @@ -220,39 +255,48 @@ export async function primeSkillFiles( } /** - * Persist codeEnvIdentifiers on skill files. Awaited (not - * fire-and-forget) so the next prime β€” which can start within - * milliseconds when many users hit the same skill concurrently β€” - * sees the cache pointer instead of racing the read against an - * in-flight write. Without the await, a fire-and-forget under - * concurrency stays in cache-miss steady-state for the duration - * of the burst (each user's prime reads stale, re-uploads, then - * fires its own forget that the next user also misses). Latency - * cost is ~10–50ms on the prime that does the upload; subsequent - * primes save an entire batch upload. Failures don't fail the - * prime β€” the file refs returned to the caller are still valid. + * Persist codeEnvRefs on skill files. Awaited (not fire-and-forget) + * so the next prime β€” which can start within milliseconds when + * many users hit the same skill concurrently β€” sees the cache + * pointer instead of racing the read against an in-flight write. + * Without the await, a fire-and-forget under concurrency stays in + * cache-miss steady-state for the duration of the burst (each + * user's prime reads stale, re-uploads, then fires its own forget + * that the next user also misses). Latency cost is ~10–50ms on + * the prime that does the upload; subsequent primes save an entire + * batch upload. Failures don't fail the prime β€” the file refs + * returned to the caller are still valid. */ if (updateSkillFileCodeEnvIds) { const updates = result.files .filter((f) => !f.filename.endsWith('/SKILL.md')) - .map((f) => ({ - skillId: skill._id, - relativePath: f.filename.slice(f.filename.indexOf('/') + 1), - codeEnvIdentifier: `${result.session_id}/${f.fileId}?entity_id=${entityId}`, - })); + .map((f) => { + const ref: CodeEnvRef = { + kind: 'skill', + id: entityId, + storage_session_id: result.storage_session_id, + file_id: f.fileId, + version: skill.version, + }; + return { + skillId: skill._id, + relativePath: f.filename.slice(f.filename.indexOf('/') + 1), + codeEnvRef: ref, + }; + }); if (updates.length > 0) { try { await updateSkillFileCodeEnvIds(updates); } catch (err: unknown) { logger.warn( - '[primeSkillFiles] Failed to persist codeEnvIdentifiers:', + '[primeSkillFiles] Failed to persist codeEnvRefs:', err instanceof Error ? err.message : err, ); } } } - return { session_id: result.session_id, files }; + return { storage_session_id: result.storage_session_id, files }; } catch (error) { logger.error( `[primeSkillFiles] Batch upload failed for skill "${skill.name}":`, @@ -274,7 +318,13 @@ export interface PrimeInvokedSkillsDeps { getSkillByName: ( name: string, accessibleIds: Types.ObjectId[], - ) => Promise<{ body: string; name: string; _id: Types.ObjectId; fileCount: number } | null>; + ) => Promise<{ + body: string; + name: string; + _id: Types.ObjectId; + version: number; + fileCount: number; + } | null>; listSkillFiles: (skillId: Types.ObjectId | string) => Promise; getStrategyFunctions: PrimeSkillFilesParams['getStrategyFunctions']; batchUploadCodeEnvFiles: PrimeSkillFilesParams['batchUploadCodeEnvFiles']; @@ -324,6 +374,7 @@ export async function primeInvokedSkills( body: string; name: string; _id: Types.ObjectId; + version: number; fileCount: number; }> = []; for (const r of resolveResults) { @@ -353,23 +404,24 @@ export async function primeInvokedSkills( // ALL distinct sessions for freshness. If all are active, return cached // references with zero re-uploads. If any expired, re-upload everything. if (deps.getSessionInfo && deps.checkIfActive) { - const allFiles = fileListResults.flatMap((r) => r.files); - const allFilesWithIds = allFiles.filter((f) => f.codeEnvIdentifier); + const allResolved = fileListResults.flatMap((r) => + r.files.map((f) => ({ skillName: r.skill.name, file: f, ref: f.codeEnvRef })), + ); + const resolvedWithRef = allResolved.filter((x) => x.ref !== undefined); - // Only use cache when ALL files have identifiers (no partial persistence) - if (allFilesWithIds.length > 0 && allFilesWithIds.length === allFiles.length) { - const sessionIds = new Set( - allFilesWithIds.map((f) => f.codeEnvIdentifier!.split('?')[0].split('/')[0]), - ); + // Only use cache when ALL files have refs (no partial persistence) + if (resolvedWithRef.length > 0 && resolvedWithRef.length === allResolved.length) { + const refsBySession = new Map(); + for (const { ref } of resolvedWithRef) { + if (ref && !refsBySession.has(ref.storage_session_id)) { + refsBySession.set(ref.storage_session_id, ref); + } + } const checkResults = await Promise.all( - Array.from(sessionIds).map(async (sid) => { - const representative = allFilesWithIds.find((f) => - f.codeEnvIdentifier!.startsWith(`${sid}/`), - ); - if (!representative) return true; + Array.from(refsBySession.values()).map(async (ref) => { try { - const lastModified = await deps.getSessionInfo?.(representative.codeEnvIdentifier!); + const lastModified = await deps.getSessionInfo?.(ref); return !!(lastModified && deps.checkIfActive?.(lastModified)); } catch { return false; @@ -379,30 +431,33 @@ export async function primeInvokedSkills( const allActive = checkResults.every(Boolean); if (allActive) { - const cachedFiles = fileListResults.flatMap((r) => - r.files - .filter((f) => f.codeEnvIdentifier) - .map((f) => { - const identifier = f.codeEnvIdentifier as string; - const [sid, fid] = identifier.split('?')[0].split('/'); - const entity_id = parseEntityIdFromCodeEnvIdentifier(identifier); - return { - id: fid, - name: `${r.skill.name}/${f.relativePath}`, - session_id: sid, - ...(entity_id != null ? { entity_id } : {}), - }; - }), - ); + /* `id` is the STORAGE file_id (the per-file uuid the + * file_server registered the upload under); `resource_id` + * is the entity that owns the storage session β€” the + * skill's `_id` here. codeapi's auth layer needs both: + * `id` for the upload-existence check, `resource_id` for + * sessionKey re-derivation (`:skill::v:`). + * Conflating them sent the storage nanoid through the + * sessionKey switch and 403'd every shared-kind /exec. */ + const cachedFiles = resolvedWithRef.map(({ skillName, file, ref }) => ({ + id: ref!.file_id, + resource_id: ref!.id, + name: `${skillName}/${file.relativePath}`, + storage_session_id: ref!.storage_session_id, + kind: ref!.kind, + ...(ref!.kind === 'skill' ? { version: ref!.version } : {}), + })); if (cachedFiles.length > 0) { logger.debug( - `[primeInvokedSkills] All ${sessionIds.size} session(s) active, reusing ${cachedFiles.length} cached files`, + `[primeInvokedSkills] All ${refsBySession.size} session(s) active, reusing ${cachedFiles.length} cached files`, ); sessions = new Map(); - // session_id is a representative value. ToolNode uses per-file - // session_id from the files array (file.session_id ?? codeSession.session_id). + /* `session_id` at the top of CodeSessionContext is the + * (representative) execution session β€” ToolNode reads it + * for continuity. Per-file storage is on each + * `files[i].storage_session_id`. */ sessions.set(Constants.EXECUTE_CODE, { - session_id: cachedFiles[0].session_id, + session_id: cachedFiles[0].storage_session_id, files: cachedFiles, lastUpdated: Date.now(), } satisfies CodeSessionContext); @@ -412,14 +467,18 @@ export async function primeInvokedSkills( } } - // Per-skill upload: each skill gets its own session with entity_id=skillId. - // primeSkillFiles handles freshness caching per-skill, so only expired - // skills re-upload. The code env handles mixed session_ids natively. + // Per-skill upload: each skill gets its own storage session keyed + // by `(kind: 'skill', id: skillId, version: skill.version)`. + // primeSkillFiles handles freshness caching per-skill, so only + // expired skills re-upload. CodeAPI handles mixed + // storage_session_ids natively. const allPrimedFiles: Array<{ id: string; + resource_id: string; name: string; - session_id: string; - entity_id?: string; + storage_session_id: string; + kind: 'skill'; + version: number; }> = []; const primeResults = await Promise.allSettled( fileListResults.map(async ({ skill, files }) => { @@ -441,9 +500,14 @@ export async function primeInvokedSkills( for (const f of r.value.result.files) { allPrimedFiles.push({ id: f.id, + /* `resource_id` is the skill's `_id` β€” drives codeapi's + * sessionKey re-derivation. See cachedFiles above for the + * full id-vs-resource_id rationale. */ + resource_id: r.value.skill._id.toString(), name: f.name, - session_id: f.session_id, - ...(f.entity_id != null ? { entity_id: f.entity_id } : {}), + storage_session_id: f.storage_session_id, + kind: 'skill', + version: r.value.skill.version, }); } } else if (r.status === 'rejected') { @@ -453,11 +517,11 @@ export async function primeInvokedSkills( if (allPrimedFiles.length > 0) { sessions = new Map(); - // session_id is a representative value (first skill's session). ToolNode - // uses per-file session_id from the files array (file.session_id ?? - // codeSession.session_id), so mixed sessions work correctly. + /* `session_id` at the top of CodeSessionContext is the + * (representative) execution session. Per-file storage is on + * each file's `storage_session_id`. */ sessions.set(Constants.EXECUTE_CODE, { - session_id: allPrimedFiles[0].session_id, + session_id: allPrimedFiles[0].storage_session_id, files: allPrimedFiles, lastUpdated: Date.now(), } satisfies CodeSessionContext); diff --git a/packages/api/src/files/code/identity.spec.ts b/packages/api/src/files/code/identity.spec.ts new file mode 100644 index 0000000000..642e29e4cf --- /dev/null +++ b/packages/api/src/files/code/identity.spec.ts @@ -0,0 +1,116 @@ +import FormData from 'form-data'; +import { appendCodeEnvFileIdentity, buildCodeEnvDownloadQuery } from './identity'; + +describe('appendCodeEnvFileIdentity', () => { + /* Validation rules mirror codeapi's `parseUploadSessionKeyInput` + * server-side validator. Asserting client-side gives bad callers a + * synchronous failure with file path + line number, vs. round-tripping + * a 400 from a multipart POST. */ + + it('appends kind/id/version on the multipart form for skill uploads', () => { + const form = new FormData(); + appendCodeEnvFileIdentity(form, { kind: 'skill', id: 'skill-42', version: 7 }); + + /* Spy on the form internally β€” `getBuffer()` chokes on stream + * entries other tests append, but the form has only the three + * scalar fields here so dump-and-grep is safe. */ + const body = form.getBuffer().toString(); + expect(body).toContain('name="kind"'); + expect(body).toContain('skill'); + expect(body).toContain('name="id"'); + expect(body).toContain('skill-42'); + expect(body).toContain('name="version"'); + expect(body).toContain('7'); + }); + + it('omits version on the multipart form for non-skill kinds', () => { + const form = new FormData(); + appendCodeEnvFileIdentity(form, { kind: 'agent', id: 'agent-9' }); + expect(form.getBuffer().toString()).not.toContain('name="version"'); + }); + + it('rejects unknown kinds before mutating the form', () => { + const form = new FormData(); + expect(() => + appendCodeEnvFileIdentity(form, { + kind: 'system' as 'user', + id: 'x', + }), + ).toThrow(/invalid kind/); + /* form-data emits multipart boundaries even on an empty form, so + * "did we append" is a substring check, not buffer equality. */ + expect(form.getBuffer().toString()).not.toContain('name="kind"'); + expect(form.getBuffer().toString()).not.toContain('name="id"'); + }); + + it('rejects skill identity without a version', () => { + expect(() => + appendCodeEnvFileIdentity(new FormData(), { kind: 'skill', id: 'skill-42' }), + ).toThrow(/skill.*version/); + }); + + it('rejects version on non-skill kinds', () => { + expect(() => + appendCodeEnvFileIdentity(new FormData(), { + kind: 'agent', + id: 'agent-9', + version: 3, + }), + ).toThrow(/version.*skill/); + }); + + it('rejects missing id', () => { + expect(() => appendCodeEnvFileIdentity(new FormData(), { kind: 'user', id: '' })).toThrow( + /missing id/, + ); + }); +}); + +describe('buildCodeEnvDownloadQuery', () => { + it('builds the canonical user-private query string', () => { + expect(buildCodeEnvDownloadQuery({ kind: 'user', id: 'user-123' })).toBe( + '?kind=user&id=user-123', + ); + }); + + it('appends version when kind is skill', () => { + expect(buildCodeEnvDownloadQuery({ kind: 'skill', id: 'skill-abc', version: 7 })).toBe( + '?kind=skill&id=skill-abc&version=7', + ); + }); + + it('omits version on agent kind', () => { + expect(buildCodeEnvDownloadQuery({ kind: 'agent', id: 'agent-xyz' })).toBe( + '?kind=agent&id=agent-xyz', + ); + }); + + it('URL-encodes ids that contain special characters', () => { + /* `id` shouldn't normally contain unsafe chars (it's a uuid or + * mongo ObjectId), but the URLSearchParams plumbing handles it + * uniformly so a future call site that passes through a + * user-supplied identifier doesn't accidentally produce a + * malformed URL. */ + expect(buildCodeEnvDownloadQuery({ kind: 'user', id: 'a b/c?d' })).toMatch( + /\?kind=user&id=a\+b%2Fc%3Fd/, + ); + }); + + it('throws on unknown kind', () => { + expect(() => buildCodeEnvDownloadQuery({ kind: 'system' as 'user', id: 'x' })).toThrow( + /invalid kind/, + ); + }); + + it('throws on skill missing version', () => { + expect(() => buildCodeEnvDownloadQuery({ kind: 'skill', id: 'skill-42' })).toThrow( + /skill.*version/, + ); + }); + + it('throws on version with non-skill kind', () => { + expect(() => buildCodeEnvDownloadQuery({ kind: 'agent', id: 'agent-9', version: 3 })).toThrow( + /version.*skill/, + ); + }); +}); diff --git a/packages/api/src/files/code/identity.ts b/packages/api/src/files/code/identity.ts new file mode 100644 index 0000000000..731aba492e --- /dev/null +++ b/packages/api/src/files/code/identity.ts @@ -0,0 +1,78 @@ +import { CODE_ENV_KINDS } from 'librechat-data-provider'; +import type { CodeEnvKind } from 'librechat-data-provider'; +import type FormData from 'form-data'; + +/** + * Resource identity required by codeapi's sessionKey resolver. The + * shape is symmetric across the upload form fields and the URL query + * params on download/freshness routes β€” codeapi normalizes both into + * the same `parseUploadSessionKeyInput`. Validation rules (closed + * kind set, `version` required for `'skill'` and forbidden otherwise) + * mirror codeapi's validator so a bad caller fails fast on the client + * instead of round-tripping a 400. See codeapi #1455 / Phase C. + * + * `id` is sessionKey-meaningful for `'skill'` / `'agent'` and + * informational-only for `'user'` (codeapi resolves user identity + * from auth context). It's required uniformly here for shape + * consistency with the runtime validator. + */ +export interface CodeEnvIdentity { + kind: CodeEnvKind; + id: string; + /** Required when `kind === 'skill'`; forbidden otherwise. */ + version?: number; +} + +const VALID_KINDS = new Set(CODE_ENV_KINDS); + +const validateIdentity = (identity: CodeEnvIdentity, callerLabel: string): void => { + const { kind, id, version } = identity; + if (!kind || !VALID_KINDS.has(kind)) { + throw new Error(`${callerLabel}: invalid kind "${kind}"`); + } + if (!id) { + throw new Error(`${callerLabel}: missing id for kind "${kind}"`); + } + if (kind === 'skill' && version == null) { + throw new Error(`${callerLabel}: kind "skill" requires a numeric version`); + } + if (kind !== 'skill' && version != null) { + throw new Error(`${callerLabel}: version is only valid for kind "skill"`); + } +}; + +/** + * Append `kind`/`id`/`version?` form fields to a multipart upload. + * Used by `uploadCodeEnvFile` and `batchUploadCodeEnvFiles` so codeapi + * can route the upload to the correct sessionKey bucket + * (`::[:v:]` for shared kinds, + * `:user:` for `'user'`). + * + * Mutates the form in place and throws synchronously on bad input. + */ +export function appendCodeEnvFileIdentity(form: FormData, identity: CodeEnvIdentity): void { + validateIdentity(identity, 'appendCodeEnvFileIdentity'); + form.append('kind', identity.kind); + form.append('id', identity.id); + if (identity.version != null) { + form.append('version', String(identity.version)); + } +} + +/** + * Build the `?kind=...&id=...&version=...` query string codeapi's + * `sessionAuth` middleware requires on `/download//` + * and `/sessions//objects/` URLs. Without these, + * codeapi 400s with "kind must be one of: skill, agent, user" before + * serving the file. + * + * Returns a string with leading `?`; concatenate onto a path. + */ +export function buildCodeEnvDownloadQuery(identity: CodeEnvIdentity): string { + validateIdentity(identity, 'buildCodeEnvDownloadQuery'); + const params = new URLSearchParams({ kind: identity.kind, id: identity.id }); + if (identity.version != null) { + params.set('version', String(identity.version)); + } + return `?${params.toString()}`; +} diff --git a/packages/api/src/files/code/index.ts b/packages/api/src/files/code/index.ts index b48d6ce3cb..11d2fbb15b 100644 --- a/packages/api/src/files/code/index.ts +++ b/packages/api/src/files/code/index.ts @@ -1,3 +1,4 @@ export * from './classify'; export * from './extract'; export * from './form'; +export * from './identity'; diff --git a/packages/api/src/files/encode/processAttachments.spec.ts b/packages/api/src/files/encode/processAttachments.spec.ts index 91d6de4a0c..f9af6fd1e7 100644 --- a/packages/api/src/files/encode/processAttachments.spec.ts +++ b/packages/api/src/files/encode/processAttachments.spec.ts @@ -16,7 +16,7 @@ function categorizeFile( type?: string | null; source?: string; embedded?: boolean; - metadata?: { fileIdentifier?: string }; + metadata?: { fileIdentifier?: string; codeEnvRef?: unknown }; }, isBedrock: boolean, mergedFileConfig: FileConfig | undefined, @@ -26,7 +26,11 @@ function categorizeFile( if (source === FileSources.text) { return 'skipped'; } - if (file.embedded === true || file.metadata?.fileIdentifier != null) { + if ( + file.embedded === true || + file.metadata?.codeEnvRef != null || + file.metadata?.fileIdentifier != null + ) { return 'skipped'; } diff --git a/packages/api/src/utils/axios.ts b/packages/api/src/utils/axios.ts index 5d73955f0f..bab5242d8e 100644 --- a/packages/api/src/utils/axios.ts +++ b/packages/api/src/utils/axios.ts @@ -1,7 +1,37 @@ +import { Buffer } from 'buffer'; import axios from 'axios'; import { logger } from '@librechat/data-schemas'; import type { AxiosInstance, AxiosProxyConfig, AxiosError } from 'axios'; +/** + * Render `error.response.data` as a small, log-safe value. Axios surfaces + * the response body in whichever native shape the request asked for, so + * `responseType: 'arraybuffer'` yields a `Buffer` (raw bytes β€” JSON- + * serializes as `{type: 'Buffer', data: [123, 34, ...]}`, ~4 chars per + * byte) and `responseType: 'stream'` yields a `Readable` (whose internal + * state JSON-serializes the full readableState ring buffer + socket + * fields, easily megabytes per error). Both are useless for diagnostics + * and drown the log line. Decode small buffers as UTF-8 (truncated) and + * stub streams entirely. + */ +const renderResponseData = (data: unknown): unknown => { + if (data == null) return data; + if (Buffer.isBuffer(data)) { + const MAX = 2048; + const text = data.subarray(0, MAX).toString('utf8'); + return data.length > MAX ? `${text}…[+${data.length - MAX} bytes]` : text; + } + // Readable streams (responseType: 'stream') and other piped sources. + if ( + typeof data === 'object' && + data !== null && + typeof (data as { pipe?: unknown }).pipe === 'function' + ) { + return '[stream]'; + } + return data; +}; + /** * Logs Axios errors based on the error object and a custom message. * @param options - The options object. @@ -33,7 +63,7 @@ export const logAxiosError = ({ logger.error(logMessage, { status, headers, - data, + data: renderResponseData(data), stack, }); } else if (axios.isAxiosError(error) && error.request) { diff --git a/packages/data-provider/src/codeEnvRef.spec.ts b/packages/data-provider/src/codeEnvRef.spec.ts new file mode 100644 index 0000000000..561850c502 --- /dev/null +++ b/packages/data-provider/src/codeEnvRef.spec.ts @@ -0,0 +1,55 @@ +/* `CodeEnvRef` is a plain typed struct (no helpers, no resolvers). + * Behavioral coverage lives at consumer sites β€” `processCodeOutput` + * (write), `primeFiles` (read+reupload), `primeSkillFiles` (read+write), + * agents `ToolNode` (forward to codeapi). This file just pins the + * shape so a future refactor can't silently widen or narrow the + * fields without surfacing here. */ +import { CODE_ENV_KINDS } from './codeEnvRef'; +import type { CodeEnvKind, CodeEnvRef } from './codeEnvRef'; + +describe('CodeEnvRef', () => { + it('accepts the canonical shape for kind: skill', () => { + const ref: CodeEnvRef = { + kind: 'skill', + id: 'skill_123', + storage_session_id: 'sess_abc', + file_id: 'file_xyz', + version: 7, + }; + expect(ref.kind).toBe('skill'); + expect(ref.version).toBe(7); + }); + + it('accepts the canonical shape for kind: user', () => { + const ref: CodeEnvRef = { + kind: 'user', + id: 'user_456', + storage_session_id: 'sess_def', + file_id: 'file_uvw', + }; + expect(ref.kind).toBe('user'); + /* `version` is statically absent on the user variant of the + * discriminated union β€” the property doesn't exist on the type, so + * accessing it would be a compile error. Probe at runtime via an + * `unknown` cast to assert the absence at runtime as well. */ + expect((ref as unknown as Record).version).toBeUndefined(); + }); + + it('accepts the canonical shape for kind: agent', () => { + const ref: CodeEnvRef = { + kind: 'agent', + id: 'agent_789', + storage_session_id: 'sess_ghi', + file_id: 'file_rst', + }; + expect(ref.kind).toBe('agent'); + }); + + it('CODE_ENV_KINDS pins the closed set of kinds', () => { + /* Adding a new kind requires updating the runtime tuple AND + * surfacing it in `resolveSessionKey`'s exhaustive switch. The + * `as const` shape makes this catch-able by the type system. */ + const kinds: CodeEnvKind[] = [...CODE_ENV_KINDS]; + expect(kinds).toEqual(['skill', 'agent', 'user']); + }); +}); diff --git a/packages/data-provider/src/codeEnvRef.ts b/packages/data-provider/src/codeEnvRef.ts new file mode 100644 index 0000000000..469592bfdc --- /dev/null +++ b/packages/data-provider/src/codeEnvRef.ts @@ -0,0 +1,59 @@ +/** + * Closed set of resource kinds for sandbox file caching. Defined as a + * `as const` tuple so the runtime list and the TypeScript union can't + * drift on future additions β€” adding a new kind to the tuple updates + * both at once. + * + * - `skill`: shared per skill identity. Cross-user-within-tenant + * sharing. CodeAPI sessionKey omits the user dimension. + * `version` is required (the skill's monotonic counter scopes the + * cache per revision so any edit invalidates the prior cache + * entry naturally). + * - `agent`: shared per agent identity. Same sharing semantic as + * skills (agents are addressable resources accessible to a + * permission-defined audience). + * - `user`: user-private. CodeAPI sessionKey is keyed by the + * requesting user from auth context. Used for chat attachments + * and code-output artifacts. + */ +export const CODE_ENV_KINDS = ['skill', 'agent', 'user'] as const; +export type CodeEnvKind = (typeof CODE_ENV_KINDS)[number]; + +/** + * Typed reference to a file in the code-execution sandbox. + * + * `storage_session_id` is intentionally distinct from the *execution* + * session id at the top level of an execute response β€” they are + * different concepts that historically shared the field name + * `session_id`. This is the long-lived storage session keyed by the + * resource's identity (skill/agent/user), not the transient + * sandbox-run session. + * + * `kind` and `id` together name the resource that owns this file's + * storage session. CodeAPI uses them (plus the auth-context tenant + * id) to derive the sessionKey, which determines who shares the + * cache. Cross-user sharing for shared resources (skills, agents) is + * a designed property of the kind switch, not an emergent side + * effect. See codeapi #1455 / agents #148 / LC #12960. + * + * `version` is statically required when `kind === 'skill'` and + * statically forbidden otherwise via the discriminated union below β€” + * the constraint is enforced at compile time, not just by codeapi's + * runtime validator. + */ +interface CodeEnvRefBase { + /** Resource identity. Semantics depend on `kind`: + * - `skill`: skill `_id` (sessionKey-meaningful, cross-user shared). + * - `agent`: agent id (sessionKey-meaningful, cross-user shared). + * - `user`: informational only β€” sessionKey derivation uses the + * requesting user from auth context. Kept on the type for shape + * uniformity across kinds; do not rely on it for routing. */ + id: string; + storage_session_id: string; + file_id: string; +} + +export type CodeEnvRef = + | (CodeEnvRefBase & { kind: 'skill'; version: number }) + | (CodeEnvRefBase & { kind: 'agent' }) + | (CodeEnvRefBase & { kind: 'user' }); diff --git a/packages/data-provider/src/index.ts b/packages/data-provider/src/index.ts index aab5752f7d..bb166d975a 100644 --- a/packages/data-provider/src/index.ts +++ b/packages/data-provider/src/index.ts @@ -49,3 +49,5 @@ export { default as createPayload } from './createPayload'; /* feedback */ export * from './feedback'; export * from './parameterSettings'; +/* code-execution sandbox */ +export * from './codeEnvRef'; diff --git a/packages/data-provider/src/types/files.ts b/packages/data-provider/src/types/files.ts index 6771033614..1d6f17c371 100644 --- a/packages/data-provider/src/types/files.ts +++ b/packages/data-provider/src/types/files.ts @@ -1,4 +1,5 @@ import { EToolResources } from './assistants'; +import type { CodeEnvRef } from '../codeEnvRef'; export enum FileSources { local = 'local', @@ -144,7 +145,15 @@ export type TFile = { * Suitable for tooltip text but not user-facing prose. */ previewError?: string; - metadata?: { fileIdentifier?: string }; + metadata?: { + fileIdentifier?: string; + /** + * Structured form of `fileIdentifier`. Persisted alongside the + * legacy string during the dual-write transition; readers should + * resolve via `resolveCodeEnvRef`. + */ + codeEnvRef?: CodeEnvRef; + }; createdAt?: string | Date; updatedAt?: string | Date; }; diff --git a/packages/data-schemas/src/methods/file.spec.ts b/packages/data-schemas/src/methods/file.spec.ts index e3882f4958..cb573c9df3 100644 --- a/packages/data-schemas/src/methods/file.spec.ts +++ b/packages/data-schemas/src/methods/file.spec.ts @@ -299,7 +299,14 @@ describe('File Methods', () => { type: 'text/x-python', bytes: 100, context: FileContext.execute_code, - metadata: { fileIdentifier: 'some-identifier' }, + metadata: { + codeEnvRef: { + kind: 'user', + id: userId.toString(), + storage_session_id: 'session-x', + file_id: codeFileId, + }, + }, }); // execute_code files are explicitly excluded from getToolFilesByIds diff --git a/packages/data-schemas/src/methods/file.ts b/packages/data-schemas/src/methods/file.ts index 61beee5b17..4261588a33 100644 --- a/packages/data-schemas/src/methods/file.ts +++ b/packages/data-schemas/src/methods/file.ts @@ -128,7 +128,7 @@ export function createFileMethods(mongoose: typeof import('mongoose')) { conversationId, context: FileContext.execute_code, messageId: { $exists: true, $in: messageIds }, - 'metadata.fileIdentifier': { $exists: true }, + 'metadata.codeEnvRef': { $exists: true }, }; const selectFields: SelectProjection = { text: 0 }; @@ -158,7 +158,7 @@ export function createFileMethods(mongoose: typeof import('mongoose')) { const filter: FilterQuery = { file_id: { $in: fileIds }, context: { $ne: FileContext.execute_code }, - 'metadata.fileIdentifier': { $exists: true }, + 'metadata.codeEnvRef': { $exists: true }, }; const selectFields: SelectProjection = { text: 0 }; diff --git a/packages/data-schemas/src/methods/skill.spec.ts b/packages/data-schemas/src/methods/skill.spec.ts index 86cf101678..85ecf41bf8 100644 --- a/packages/data-schemas/src/methods/skill.spec.ts +++ b/packages/data-schemas/src/methods/skill.spec.ts @@ -1339,6 +1339,67 @@ describe('SkillFile methods', () => { expect(files[0].storageRegion).toBe('us-east-2'); }); + it('clears codeEnvRef when a skill file is upserted (replacement)', async () => { + /* A re-upload of a skill file replaces the row's contents β€” but the + * cached `codeEnvRef` refers to the OLD bytes living in codeapi. + * Leaving it populated would make the next prime resolve a stale + * pointer. The upsert MUST $unset codeEnvRef. */ + const { skill } = await methods.createSkill(makeSkillInput()); + await methods.upsertSkillFile({ + skillId: skill._id, + relativePath: 'scripts/parse.sh', + file_id: 'file-1', + filename: 'parse.sh', + filepath: '/tmp/v1', + source: 'local', + mimeType: 'text/plain', + bytes: 10, + author: owner._id, + }); + + const entityId = skill._id.toString(); + await methods.updateSkillFileCodeEnvIds([ + { + skillId: skill._id, + relativePath: 'scripts/parse.sh', + codeEnvRef: { + kind: 'skill', + id: entityId, + storage_session_id: 'sess-old', + file_id: 'file-old', + version: 1, + }, + }, + ]); + + /* Sanity: ref is persisted before the replacement. */ + const before = await methods.listSkillFiles(skill._id); + expect(before[0].codeEnvRef).toMatchObject({ + kind: 'skill', + storage_session_id: 'sess-old', + file_id: 'file-old', + version: 1, + }); + + /* Replace the row with new bytes. */ + await methods.upsertSkillFile({ + skillId: skill._id, + relativePath: 'scripts/parse.sh', + file_id: 'file-2', + filename: 'parse.sh', + filepath: '/tmp/v2', + source: 'local', + mimeType: 'text/plain', + bytes: 20, + author: owner._id, + }); + + const after = await methods.listSkillFiles(skill._id); + expect(after).toHaveLength(1); + expect(after[0].file_id).toBe('file-2'); + expect(after[0].codeEnvRef).toBeUndefined(); + }); + it('deleteSkillFile recounts and bumps version', async () => { const { skill } = await methods.createSkill(makeSkillInput()); await methods.upsertSkillFile({ @@ -1395,7 +1456,7 @@ describe('SkillFile methods', () => { * egress per chat load). Pinning the contract here so the caller * can warn-log on partial writes instead of failing closed. */ - it('persists codeEnvIdentifier and reports matched/modified counts', async () => { + it('persists codeEnvRef and reports matched/modified counts', async () => { const { skill } = await methods.createSkill(makeSkillInput()); await methods.upsertSkillFile({ skillId: skill._id, @@ -1409,11 +1470,18 @@ describe('SkillFile methods', () => { author: owner._id, }); + const entityId = skill._id.toString(); const result = await methods.updateSkillFileCodeEnvIds([ { skillId: skill._id, relativePath: 'scripts/a.sh', - codeEnvIdentifier: `session-1/file-1?entity_id=${skill._id.toString()}`, + codeEnvRef: { + kind: 'skill', + id: entityId, + storage_session_id: 'session-1', + file_id: 'file-1', + version: 1, + }, }, ]); @@ -1421,7 +1489,13 @@ describe('SkillFile methods', () => { expect(result.modifiedCount).toBe(1); const files = await methods.listSkillFiles(skill._id); - expect(files[0].codeEnvIdentifier).toBe(`session-1/file-1?entity_id=${skill._id.toString()}`); + expect(files[0].codeEnvRef).toMatchObject({ + kind: 'skill', + id: entityId, + storage_session_id: 'session-1', + file_id: 'file-1', + version: 1, + }); }); it('reports modifiedCount=0 when no SkillFile rows match the (skillId, relativePath) filter', async () => { @@ -1430,7 +1504,13 @@ describe('SkillFile methods', () => { { skillId: skill._id, relativePath: 'does/not/exist.sh', - codeEnvIdentifier: 'sid/fid?entity_id=x', + codeEnvRef: { + kind: 'skill', + id: 'x', + storage_session_id: 'sid', + file_id: 'fid', + version: 1, + }, }, ]); expect(result.modifiedCount).toBe(0); diff --git a/packages/data-schemas/src/methods/skill.ts b/packages/data-schemas/src/methods/skill.ts index 502ca34c47..bc1d2260c6 100644 --- a/packages/data-schemas/src/methods/skill.ts +++ b/packages/data-schemas/src/methods/skill.ts @@ -7,6 +7,7 @@ import { SKILL_BODY_MAX_LENGTH, SKILL_NAME_PATTERN as SKILL_NAME_PATTERN_SHARED, } from 'librechat-data-provider'; +import type { CodeEnvRef } from 'librechat-data-provider'; import type { Model, Types, FilterQuery } from 'mongoose'; import type { ISkill, @@ -1436,7 +1437,7 @@ export function createSkillMethods(mongoose: typeof import('mongoose'), deps: Sk author: row.author, tenantId: row.tenantId, }, - $unset: { content: '', isBinary: '', codeEnvIdentifier: '' }, + $unset: { content: '', isBinary: '', codeEnvRef: '' }, }, { new: false, upsert: true }, ).lean(); @@ -1493,7 +1494,7 @@ export function createSkillMethods(mongoose: typeof import('mongoose'), deps: Sk updates: Array<{ skillId: Types.ObjectId | string; relativePath: string; - codeEnvIdentifier: string; + codeEnvRef: CodeEnvRef; }>, ): Promise<{ matchedCount: number; modifiedCount: number }> { if (updates.length === 0) return { matchedCount: 0, modifiedCount: 0 }; @@ -1501,7 +1502,7 @@ export function createSkillMethods(mongoose: typeof import('mongoose'), deps: Sk const ops = updates.map((u) => ({ updateOne: { filter: { skillId: u.skillId, relativePath: u.relativePath }, - update: { $set: { codeEnvIdentifier: u.codeEnvIdentifier } }, + update: { $set: { codeEnvRef: u.codeEnvRef } }, }, })); @@ -1515,7 +1516,7 @@ export function createSkillMethods(mongoose: typeof import('mongoose'), deps: Sk const result = await tenantSafeBulkWrite(SkillFile, ops); if (result.modifiedCount < updates.length) { logger.warn( - `[updateSkillFileCodeEnvIds] Persisted ${result.modifiedCount}/${updates.length} codeEnvIdentifiers (matched ${result.matchedCount}). Subsequent primes for unmatched files will re-upload.`, + `[updateSkillFileCodeEnvIds] Persisted ${result.modifiedCount}/${updates.length} codeEnvRefs (matched ${result.matchedCount}). Subsequent primes for unmatched files will re-upload.`, ); } return { matchedCount: result.matchedCount, modifiedCount: result.modifiedCount }; diff --git a/packages/data-schemas/src/schema/file.ts b/packages/data-schemas/src/schema/file.ts index 4b9d462b2d..c8e7c72f52 100644 --- a/packages/data-schemas/src/schema/file.ts +++ b/packages/data-schemas/src/schema/file.ts @@ -118,7 +118,23 @@ const file: Schema = new Schema( width: Number, height: Number, metadata: { - fileIdentifier: String, + codeEnvRef: { + type: new Schema( + { + kind: { + type: String, + enum: ['skill', 'agent', 'user'], + required: true, + }, + id: { type: String, required: true }, + storage_session_id: { type: String, required: true }, + file_id: { type: String, required: true }, + version: { type: Number }, + }, + { _id: false }, + ), + default: undefined, + }, }, expiresAt: { type: Date, diff --git a/packages/data-schemas/src/schema/skillFile.ts b/packages/data-schemas/src/schema/skillFile.ts index fbaa35155c..efb5a29c1d 100644 --- a/packages/data-schemas/src/schema/skillFile.ts +++ b/packages/data-schemas/src/schema/skillFile.ts @@ -102,8 +102,22 @@ const skillFileSchema: Schema = new Schema( isBinary: { type: Boolean, }, - codeEnvIdentifier: { - type: String, + codeEnvRef: { + type: new Schema( + { + kind: { + type: String, + enum: ['skill', 'agent', 'user'], + required: true, + }, + id: { type: String, required: true }, + storage_session_id: { type: String, required: true }, + file_id: { type: String, required: true }, + version: { type: Number }, + }, + { _id: false }, + ), + default: undefined, }, }, { diff --git a/packages/data-schemas/src/types/file.ts b/packages/data-schemas/src/types/file.ts index 28170ee57f..b47a18abb6 100644 --- a/packages/data-schemas/src/types/file.ts +++ b/packages/data-schemas/src/types/file.ts @@ -1,4 +1,5 @@ import { Document, Types } from 'mongoose'; +import type { CodeEnvRef } from 'librechat-data-provider'; export interface IMongoFile extends Omit { user: Types.ObjectId; @@ -62,7 +63,13 @@ export interface IMongoFile extends Omit { width?: number; height?: number; metadata?: { - fileIdentifier?: string; + /** + * Code-environment cache pointer for files re-uploadable to + * codeapi (chat attachments, agent tool resources, code-output + * files). Carries the resource kind + identity so codeapi can + * derive the sessionKey explicitly. + */ + codeEnvRef?: CodeEnvRef; }; expiresAt?: Date; createdAt?: Date; diff --git a/packages/data-schemas/src/types/skill.ts b/packages/data-schemas/src/types/skill.ts index 90555ffe65..a2cba928d1 100644 --- a/packages/data-schemas/src/types/skill.ts +++ b/packages/data-schemas/src/types/skill.ts @@ -1,3 +1,4 @@ +import type { CodeEnvRef } from 'librechat-data-provider'; import type { Document, Types } from 'mongoose'; /** @@ -130,11 +131,11 @@ export interface ISkillFile { /** Set on first read. `true` prevents repeated storage reads for non-text files. */ isBinary?: boolean; /** - * Code environment file identifier (`session_id/fileId`). - * Set after uploading to code env, used to check freshness on subsequent runs. - * Cleared when the skill file is re-uploaded to storage. + * Code-environment cache pointer. Set after uploading the file to + * codeapi, used to check freshness on subsequent primes. Cleared + * when the skill file is re-uploaded to storage. */ - codeEnvIdentifier?: string; + codeEnvRef?: CodeEnvRef; createdAt?: Date; updatedAt?: Date; }