LibreChat

mirror of https://github.com/danny-avila/LibreChat.git synced 2026-06-25 17:06:24 +00:00

Author	SHA1	Message	Date
Danny Avila	d18d62e7c1	🪙 refactor: Reconcile Context Gauge to Actual Provider Tokens (#13780 ) * 🪙 fix: Reconcile Context Gauge to Actual Provider Tokens The context gauge could read several× too high (e.g. 213K when the real prompt was 56K) and stay there across reloads. Root cause: the SDK's calibrationRatio is `cumulativeProviderReported / cumulativeRawSent`, but a provider's server-side web search injects large fetched content into the prompt that the SDK never sent or counted — pinning the ratio at its cap (5) and multiplying every later message estimate, including post-summary ones. The gauge rendered (and persisted) that inflated estimate, never the provider's actual token count. Fix: reconcile the snapshot to the call's ACTUAL prompt tokens (input + cache), which already arrive in on_token_usage. Only messageTokens is calibration-scaled (instructions/summary are raw tiktoken), so keep those and set messageTokens to the remainder, recomputing free space. Shared `promptTokensFromUsage` + `reconcileContextUsage` in data-provider; applied server-side in buildPersistedContextUsage (reload-stable) and client-side in useUsageHandler on each primary usage (corrects at turn-end, no follow-up needed). Also drop the summary double-count from the Breakdown Messages row. Deferred (separate agents PR): the SDK over-calibration also fires summarization prematurely; fixing it needs decoupling real-content estimation from server-side injection headroom without weakening pruning-overflow safety. * 🪙 fix: Harden Token Reconciliation for Provider-less + Resume Paths Codex review on the reconciliation: - promptTokensFromUsage: when the provider is absent (custom/OpenAI-compatible payloads), fall back to the same magnitude heuristic normalizeUsageUnits uses (cache ≤ input ⇒ already included) so cached events aren't re-inflated. - Resume: backfillUsage restores a primary call's usage without replaying a live on_token_usage (Redis mode), so the live reconcile never ran and a reconnected session stayed on the inflated estimate. New reconcileBackfill reconciles the restored snapshot from the final primary call after contextHandler installs it. * 🪙 fix: Reconcile Resume Snapshot Server-Side, Not via Backfill Codex: the client reconcileBackfill scanned the resumed run's collectedUsage and applied the final primary to the latest snapshot — but on a mid-call resume that usage belongs to an EARLIER call, corrupting the restored gauge. Move the resume reconciliation server-side: GenerationJobManager.persistTokenUsage reconciles the stored contextUsage to a primary usage's actual prompt tokens as it arrives. That usage is the post-invoke truth for the call the latest stored snapshot precedes (no snapshot is captured between a call's pre-invoke dispatch and its usage), so it's correct by construction and run-matched. A mid-call resume (no usage yet) keeps the raw snapshot instead of mis-applying an earlier call's tokens; it reconciles once the call completes. Removed client reconcileBackfill; the live-path reconcile (non-resume) stays. * 🪙 fix: Guard Reconciliation Against Replays and Snapshot Races Two Codex concurrency findings on the reconciliation: - Client: reconcile only on a NEWLY folded primary usage. A replayed duplicate (folded=false on resume) can be an earlier tool-loop call sharing the run id, which would overwrite the latest snapshot with an earlier, smaller prompt. Moved the reconcile after the folded guard. - Server: serialize the context-usage write through the same per-stream queue as the token-usage write. persistTokenUsage reconciles the stored snapshot (read-modify-write); an unserialized trackContextUsage could store a newer snapshot between the read and write — or a stale reconciled write could land after a newer snapshot — clobbering the newer run's gauge when calls interleave. FIFO keeps each call's snapshot ahead of its own usage and behind the next. * chore: import order in GenerationJobManager.ts	2026-06-16 11:05:44 -04:00
Danny Avila	055585f9f1	🪢 fix: Tie MCP Cleanup To Resumable Runs (#13769 ) Some checks failed Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details Publish `@librechat/client` to NPM / pack (push) Has been cancelled Details Publish `librechat-data-provider` to NPM / pack (push) Has been cancelled Details Publish `@librechat/data-schemas` to NPM / pack (push) Has been cancelled Details Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Has been cancelled Details Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Has been cancelled Details GitNexus Index / index (push) Has been cancelled Details Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Has been cancelled Details Sync Helm Chart Tags / Ignore non-main push (push) Has been cancelled Details Sync Helm Chart Tags / Sync chart tags (push) Has been cancelled Details Publish `@librechat/client` to NPM / publish-npm (push) Has been cancelled Details Publish `librechat-data-provider` to NPM / publish-npm (push) Has been cancelled Details Publish `@librechat/data-schemas` to NPM / publish-npm (push) Has been cancelled Details GitNexus Index / post-index (push) Has been cancelled Details Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Has been cancelled Details * fix: Clean up request-scoped MCP connections * test: Format MCP request context spec * refactor: Move MCP request context to API package	2026-06-15 15:26:03 -04:00
Danny Avila	44c253d48a	🪙 fix: Correct Context Usage Gauge After Summarization (#13744 ) * 🪙 fix: Persist Context Snapshot + Summary Marker After Summarization The post-summarization context is correctly compacted by the SDK, but the breakdown wasn't reliably reaching the client, leaving the gauge on the whole-history estimate (stuck at 100% forever once a conversation compacts). Two server changes in buildResponseMetadata: - Snapshot guard: persist the breakdown when a PRIMARY usage event follows the latest snapshot (tracked via contextUsageSink.latestUsageIndex, recorded in the on_context_usage handler) instead of a brittle snapshot-vs-primary count. A summarization detour adds an extra snapshot whose only following usage is tagged 'summarization', which the count guard could miscount and drop. - Summary marker: whenever a turn compacts (summaryTokens > 0), persist a lightweight metadata.summaryUsedTokens (the pre-invoke compacted context size) UNCONDITIONALLY — so even when the full snapshot can't be saved (interrupted final call) or never reaches the client, the per-message estimate has a signal to cap the discarded history. Tests: client.contextMetadata.spec (guard + marker, incl. marker-survives-drop) and a real-pipeline summarization integration test. * 🪙 fix: Cap the Context Estimate at the Summary Marker When the gauge falls back to the per-message estimate (no usable snapshot on the branch), sumBranch summed the ENTIRE branch history — after a summarization that discarded most of it, this over-counts and pins the gauge at 100% in perpetuity. sumBranch now stops at the deepest summarized response (metadata.summaryUsedTokens) and records it as summaryBaseline; the walk counts only post-summary messages, and useTokenUsage adds the baseline. So the estimate reflects the compacted context (summary + recent turns), not the discarded history. USD/default behavior unchanged when no marker is present. Test: sumBranch caps a huge pre-summary history at the compacted baseline. * 🪙 fix: Address Codex Review on the Summarization Marker - Branch cost/usage is no longer truncated at the summary marker — sumBranch caps only the CONTEXT-window count there and keeps accumulating provider usage/cost to the root (cumulative spend isn't discarded by compaction). - findBranchSnapshotAnchor stops at a summarized response with no snapshot of its own, so it can't recover a stale PRE-summary snapshot and show discarded history; the summary-baseline estimate is used instead. - Abort path: buildAbortedResponseMetadata now persists the summaryUsedTokens marker (pre-invoke, no completedOutputTokens ambiguity, so safe on abort) so a STOPPED summarized turn isn't re-summed on reload. - Marker baseline fallback now includes summaryTokens (a separate breakdown field) so it doesn't under-report the compacted size. DRY'd into a shared computeSummaryUsedTokens used by the completion and abort paths. - Estimate popover surfaces the summary baseline as a row so the displayed rows reconcile with the header total. Tests: sumBranch cost-not-truncated + anchor-stops-at-marker (client); computeSummaryUsedTokens fallback + abort marker (packages/api). * 🪙 fix: Attribute Persisted Context Usage to the Snapshot Run Match the post-snapshot primary usage to the latest snapshot's runId before persisting metadata.contextUsage. Parallel/direct runs interleave snapshots and usage (A snapshot → B snapshot → A usage → B no-usage); the prior index-only guard persisted B's snapshot with A's output. finalCallOutputTokens now filters completedOutputTokens to the snapshot's run. Untagged events (older lib/resume) match any run for back-compat. * 🪙 fix: Harden Summary Marker Against Tool-Loops, Stale Anchors, and Emit Races Codex round on the summarization marker: - Avoid double-counting earlier tool-loop outputs in the summary marker: those outputs sit in BOTH the latest snapshot's pre-invoke baseline AND the response message's tokenCount the client estimate adds on top. computeSummaryUsedTokens now subtracts the run's prior primary outputs (priorRunOutputTokens) — the live path bounds them by the snapshot's usage index, the abort path by all primaries (an interrupted final call emits none). Single-call turns subtract 0. - Stop treating pre-summary anchors as active: sumBranch no longer sets containsAnchor once the context is capped at a summary marker, so a stale pre-summary snapshot can't override the summary-baseline estimate. - Capture latestUsageIndex BEFORE awaiting emitEvent: a yield (resumable SSE / Redis) during parallel runs could let this call's own usage advance the index past the event that proves the snapshot completed, dropping a valid breakdown. * 🪙 fix: Subtract Summarization Output from the Summary Marker recordCollectedUsage folds the summarization call's completion into the response message's tokenCount, while the generated summary is also in the snapshot baseline as summaryTokens. The client estimate (summaryBaseline + responseTokenCount) thus counted the summary twice — inflating the gauge after compaction even on a single-call turn whenever the full snapshot is unavailable. priorRunOutputTokens now also counts summarization-tagged output (still excluding subagent/sequential, which recordCollectedUsage keeps out of the reported total), so the marker subtracts it. Updated unit + guard tests. * 🪙 fix: Refine Marker Subtraction for Summarization RunId and Abort Boundary Two Codex follow-ups on the marker-subtraction logic: - Subtract summarization output regardless of runId: the summarize detour is its own model-end call that may carry a distinct runId, but its output still lands in this response's tokenCount AND the snapshot baseline (summaryTokens). It is now counted unconditionally (still within the response's own usageEmitSink), while primaries keep the parallel-run runId filter. - Don't subtract primaries on the abort path: the job stores no snapshot/usage boundary, so a primary that completed AFTER the latest snapshot is NOT in the baseline; subtracting it would cancel real output and under-report. priorRun- OutputTokens gains an includePrimary flag (false for abort) — abort subtracts only the always-pre-snapshot summarization output. * 🪙 fix: Run-Scope Summary Subtraction and Stop Subtracting on Abort Two Codex follow-ups, resolved by reverting the round-4 detour: - Run-scope the summarization subtraction: the summarize detour inherits the graph run id (traceConfig spreads config.metadata.run_id), so its usage shares the answer snapshot's runId — it is NOT a distinct run. priorRunOutputTokens now filters summarization by runId like primaries, so a parallel sibling run's summary (different runId, in the sibling's baseline) is no longer subtracted from this branch's marker. Drops the includePrimary flag added last round. - Stop subtracting on the abort path: abort tokenCount is countTokens(text) (abortMiddleware) or absent (agents route) — it does not fold in summarization or earlier-call output the way recordCollectedUsage does, so the marker must keep the full baseline. buildAbortedResponseMetadata now subtracts nothing.	2026-06-14 18:23:30 -04:00
Danny Avila	2350ebb24a	📨 feat: Custom Headers on Built-in Provider Endpoints (#13742 ) * 📨 feat: Custom Headers on Built-in Provider Endpoints Add a `headers` config option to the built-in `openAI`, `anthropic`, and `google` endpoints (incl. Anthropic/Google Vertex), mirroring the custom endpoint header mechanism. Values support the same placeholder resolution (env vars, `{{LIBRECHAT_USER_}}`, `{{LIBRECHAT_BODY_CONVERSATIONID}}`) and are resolved at request time so dynamic values like conversationId resolve against the live request — without losing provider-native request shaping. Closes #13082. Covers #13713: forwarding conversationId to a reverse proxy is now `X-Conversation-Id: '{{LIBRECHAT_BODY_CONVERSATIONID}}'` — an unknown header is ignored by the native Anthropic API, so no 400 and no metadata gating needed. - Schema: `headers` on `baseEndpointSchema` (openAI/google/anthropic/all). - New `mergeHeaders`/`resolveConfigHeaders` utils centralize the per-provider header locations (`configuration.defaultHeaders`, Anthropic `clientOptions.defaultHeaders`, Google `customHeaders`); provider-managed headers (auth, `anthropic-beta`) always win on collision. - Each initializer threads configured headers (endpoint over `all`) into the right place; request-time resolution runs across all locations in the main and title flows. 🩹 fix: Cast endpoints.all to TEndpoint for headers DeepPartial widening Adding `headers` (a Record) to `baseEndpointSchema` makes `DeepPartial<TCustomConfig>` widen its value type to `string \| undefined`, which is not assignable to the concrete `TEndpoint['headers']: Record<string, string>` at the `loadedEndpoints.all` assignment. Cast at the assignment site, mirroring the existing `anthropicConfig as TAnthropicEndpoint` cast in the same function. * 🛡️ fix: Harden built-in endpoint custom headers (Codex review) Address Codex P2 findings on the custom-headers feature: - Anthropic title requests: `omitTitleOptions` strips the `clientOptions` carrier, which dropped its `defaultHeaders`. Preserve just the header carrier so gateway/reverse-proxy metadata still reaches title generation. - mergeHeaders: match header names case-insensitively so an override (e.g. a provider-managed `Authorization`/`anthropic-beta`) replaces/uniones a case-variant from the base instead of emitting two names a client may collapse. - OpenAI: withhold admin-configured headers when the user supplies the base URL (`user_provided`), since values may carry `${SECRET}`/token placeholders that must not reach a user-controlled endpoint — mirrors the custom-endpoint guard. - Azure: honor global `endpoints.all` headers (same OpenAI carrier) while keeping Azure-managed `api-key`/version headers authoritative. Adds tests for each. * 🔐 fix: Resolve-once + provider-managed header safety (Codex review round 2) Address Codex P2 findings: - Azure: keep global `endpoints.all` headers unresolved at init and let request-time `resolveConfigHeaders` resolve them once, avoiding a second-order env expansion of already-substituted user values. - Google: `resolveConfigHeaders` no longer template-resolves the provider-managed `Authorization` header (built from a possibly user-provided key), so a user key like `${ENV}` can't leak server environment values. - Model fetches: thread configured headers (endpoint over `all`) + user object through `getOpenAIModels`/`getAnthropicModels` → `fetchModels`, so a gateway-fronted built-in provider receives the header on `/models` too. Fixed `fetchModels` to merge custom headers for Anthropic instead of overwriting them (managed `x-api-key`/version still win). Adds/updates tests for each. * 🧯 fix: Header provenance, memory/title coverage, idempotency (Codex round 3) Address Codex P2 findings, including two regressions from the prior round: - Google auth (findings 6 & 8): move native Google header resolution to init (`initializeGoogle`), resolving admin templates BEFORE the key-derived auth header is built. resolveConfigHeaders no longer touches Google `customHeaders`, so admin `Authorization` templates resolve again (fixes the round-2 regression) while the SDK auth header (possibly a user-provided key) is never env-expanded. - Memory runs: memory extraction now calls `resolveConfigHeaders`, so native Anthropic (and OpenAI) headers resolve for memory requests too. - Vertex titles: restore the ORIGINAL `clientOptions` object reference (not a copy) when preserving headers across `omitTitleOptions`, so the Vertex `createClient` closure and the resolved headers stay on the same object. - Reuse: `resolveConfigHeaders` is now idempotent (resolve-once per header map), preventing a second pass from env-expanding values already substituted with user/body data when an agent object flows through buildAgentInput twice. Adds/updates tests for each.	2026-06-14 17:02:04 -04:00
Danny Avila	4ee68d5240	💸 feat: Per-Agent Endpoint Token Config in Multi-Endpoint Billing (#13738 ) * 💸 feat: Per-Agent Endpoint Token Config in Multi-Endpoint Billing Price each collected/emitted usage item with the producing agent's resolved endpoint token config, instead of the primary agent's for the whole graph. Previously AgentClient.recordCollectedUsage and the subagent usage emitter used a single this.options.endpointTokenConfig (the primary's) for every usage item. A connected agent or subagent on a different custom endpoint that shares a model id with an entry in the primary's tokenConfig was therefore mis-priced (a model absent from it already fell back to the built-in rate map — no regression). - Tag each usage with its producing agent: ModelEndHandler stamps usage.agentId = agentContext.agentId; createSubagentUsageSink stamps the child's subagentAgentId (UsageMetadata gains an optional agentId). - buildAgentToolContext retains endpointTokenConfig so initialize.js can build an agentId -> endpointTokenConfig map from agentToolContexts (the one map that holds every agent, including pure subagents pruned from agentConfigs). - AgentClient.resolveAgentEndpointTokenConfig(usage) looks up that map by agentId, falling back to the primary config; used by both the billing path (new optional resolveEndpointTokenConfig on recordCollectedUsage) and the subagent cost emitter. - recordCollectedUsage's resolver is optional and falls back to the batch endpointTokenConfig, so the shared responses.js/openai.js call sites are unchanged. - Tests: two-endpoint graph with a colliding model id prices per-agent; resolver nullish falls back to batch; subagent sink tags the child agent id. * fix: Align emit-path cost with per-agent billing; honor known-agent built-in pricing Addresses Codex review on the per-agent endpoint token config: - Emit path (callbacks.js) now prices each on_token_usage event with the producing agent's config (resolved via usageCost.resolveEndpointTokenConfig), so streamed/persisted metadata.usage.cost matches the per-agent balance transaction. The agentId tag is resolved server-side and stripped from the emitted/persisted payload. - Resolver (resolveAgentTokenConfig) now treats a known agent's config as authoritative, including undefined → built-in pricing, so a known non-custom agent in a custom-primary graph is no longer charged the primary's rates. Only untagged/unknown usage falls back to the primary config. - endpointTokenConfigByAgentId records every known agent (value may be undefined) so the resolver distinguishes known-no-rates from unknown.	2026-06-14 12:00:32 -04:00
Danny Avila	b03b2a0a29	💾 feat: Persist Context Breakdown & Branch/Total Usage Cost (#13734 ) * 💾 feat: Persist Context Breakdown & Branch/Total Usage Cost Persist the granular context breakdown and per-response usage/cost on the response message metadata, and re-derive branch + total usage/cost from a per-message index so the popover survives reloads and is branch-aware live. - Add aggregateEmittedUsage + buildPersistedContextUsage helpers in packages/api; capture the latest visible snapshot and every emitted on_token_usage payload via contextUsageSink/usageEmitSink. - Attach metadata.contextUsage (Part A) and metadata.usage (Part B) on the agents response message in sendCompletion. - Carry per-message usage on the token index; add sumTotalUsage/setEntryUsage and branch-scoped usage on sumBranch. - Repurpose the session accumulator into a single in-flight pending holder; flush it into the index at finalize; hydrate breakdowns on load. - Render branch cost with a conditional all-branches total in the breakdown. * 🧹 chore: Remove orphaned com_ui_session_cost i18n key * 🩹 fix: Address Codex review — normalize usage server-side, fix reload deltas - Persist per-event-normalized display units in metadata.usage (TResponseUsage) so reloaded mixed-provider turns match the live session; client reads them directly instead of re-normalizing with a single stamped provider (P2). - Persist completedOutputTokens (final call output) on metadata.contextUsage so a reloaded multi-call turn adds the post-snapshot delta, not the full tokenCount the snapshot already counts (P2). - buildIndex preserves a prior entry's immutable usage when a rebuilt cache message lacks metadata.usage, so a mid-session rebuild (regenerate) keeps a sibling branch's flushed cost (fixes the e2e regenerate failure). - Track costKnown so turns saved with contextCost off don't render $0.00 when cost display is later enabled (P3). - Use an epsilon for the all-branches cost comparison to avoid a spurious total row from float summation order (P3). - Update unit/integration/e2e tests for the new shapes; regenerate e2e asserts the all-branches total after reload (deterministic via persisted metadata). * 🩹 fix: Address Codex round 2 — pending leak, cost coverage, reload delta - Clear the in-flight pending usage on terminal abort/error (resetLive), so a stopped generation's tokens no longer merge into the next response (P2). - costKnown now means COMPLETE coverage (ANDed): a branch mixing cost-bearing and cost-less turns is flagged incomplete and the cost row is hidden rather than rendering an under-reported total (P2). - Drop the tokenCount fallback for completedOutputTokens on reload: only the persisted post-snapshot delta is used, so a multi-call turn whose provider emitted no usage_metadata no longer double-counts earlier output (P2). - Update tokens.spec for AND coverage semantics + incomplete-cost case. * 🩹 fix: Address Codex round 3 — no-usage snapshots, total coverage, provider-less cache - Skip persisting metadata.contextUsage when the response emitted no primary usage event: without a known post-snapshot output the granular gauge would undercount the reply on reload, so fall back to the coarse per-message estimate instead (P2). - Gate the all-branches cost row on totalUsage.costKnown so an incomplete total (a sibling saved without cost) never renders an under-reported figure (P2). - aggregateEmittedUsage/finalCallOutputTokens now normalize per-event with the client's magnitude fallback (normalizeEventUnits) instead of billing splitUsage, so provider-less cached events match live on reload (P2). - Add backend test for the provider-less cached case. * 🩹 fix: Address Codex round 4 — abort attribution, complete cost coverage - aggregateEmittedUsage persists cost only when EVERY call was priced; a partial pricing failure now omits cost so the client treats coverage as unknown rather than reading an under-reported sum as authoritative (P2). - finalizeUsage flushes pending into the response entry only when events were folded this session (eventCount > 0), so a late/second resumable subscriber carrying persisted metadata.usage keeps it instead of being overwritten with an empty pending record (P2). - On user stop, attribute the in-flight pending usage to the partial response (new attributePending handler) instead of discarding it in resetLive — the stopped reply's billed tokens are kept and still can't leak into the next response; resetLive's discard remains for the error path (P2). * 🐛 fix: Persist branch cost across branch switches via sticky usage history Branch cost vanished on switching to a sibling branch (until a new turn) — the cost analog of the granularity bug. buildIndex rebuilds the token index from the messages cache; a sibling generated this session whose cache message lacks metadata.usage (and is transiently dropped from the cache during regenerate) lost its live-flushed usage, so sumBranch found none and the cost row hid. Fix: a sticky per-response usage map (conversationId → messageId → usage), written by setEntryUsage and never rebuilt from the cache — the usage counterpart of snapshotsByAnchorFamily for the breakdown. buildIndex/upsertEntries restore an entry's usage from it when the message carries none; cleared on convo switch and migrated with the index. Add unit coverage for the drop-then-readd regression and an e2e assertion that branch cost survives a branch switch. * 🐛 fix: Re-index on branch switch so branch cost survives the switch The sticky usage history alone didn't fix the reported branch-switch cost drop: on a branch switch no cache `updated` event fires, so the index subscriber never re-ran, and the post-regenerate rebuild was skipped while `isSubmitting` was still true — leaving the index stale and missing the now-viewed branch's response entirely (sticky can only restore entries present in a rebuild). Re-index from the messages cache on every tail change (created/finalize AND branch switch), not just while submitting. The cache holds the full message set at switch time, so the viewed branch's response is re-added and its usage restored from metadata.usage or the sticky history → sumBranch finds it and the branch cost renders. Verified locally: the branch-switch e2e now passes (the cost section shows both the branch row and the all-branches total). Also fixed that e2e assertion to target a single cost value (strict-mode safe). * 🩹 fix: Handle stopped-stream usage — reset pending + persist abort metadata Codex round (stop/abort edges): - Resumable explicit-stop (intentional SSE close) reset UI state but never cleared pendingUsageFamily, so usage folded before the stop leaked into the next response in the conversation. Discard pending on intentional close (resetLive); a resume re-folds via backfillUsage, so nothing is lost. - The abort save path (abortMiddleware) persisted the stopped response without metadata.usage/contextUsage, so its cost + breakdown vanished on reload. Rebuild both from the job's persisted tokenUsage (emitted payloads incl. cost) and contextUsage snapshot — parity with the normal sendCompletion path; breakdown gated on a primary usage event like buildResponseMetadata. Deferred (per scope decision): mid-stream branch-switch transiently shows the streaming branch's pending on the viewed sibling (cosmetic, until finalize). * 🩹 fix: Persist abort metadata on the real agents route + tighten snapshot gate Codex round (corrects last round's wrong-path fixes): - Stopped AGENTS responses are saved by routes/agents/index.js (/chat/abort), not abortMiddleware — so last round's metadata fix never ran for them. Moved the rollup/snapshot builder into packages/api as buildAbortedResponseMetadata (shared, unit-tested) and applied it in BOTH abort save paths, so a stopped agent reply keeps its cost + breakdown on reload. - Persist the breakdown only when the FINAL visible call emitted usage: track a per-response snapshot count and require primaryUsageCount >= snapshotCount. Previously any earlier primary usage event passed the gate, so a multi-call turn whose final call emitted no usage_metadata used an earlier call's output as completedOutputTokens (already counted by the latest snapshot) → reload over-reported. Now it falls back to the coarse estimate. Resumable stop pending-reset (prior round, 3cde6fe035) already flows through clearAllSubmissions → SSE close → the intentional-close handler's resetLive. Deferred per scope: mid-stream branch-switch pending attribution (tracked). * 🩹 fix: Abort breakdown over-count + resume re-fold after pending discard Codex round (on the re-applied abort/snapshot work): - buildAbortedResponseMetadata now persists ONLY the usage/cost rollup, not the context breakdown. The abort path can't tell whether the final call emitted usage (the job stores only the latest snapshot, not a count), so persisting the breakdown risked reusing an earlier call's output as completedOutputTokens (already in the snapshot) → reload over-count. Stopped/incomplete responses now fall back to the coarse gauge estimate, which is safe and apt. - resetLive now also forgets the conversation's folded usage-event identities (clearUsageFolded). Discarding pending on a terminal/intentional close left the folded keys set, so a later resume's backfillUsage saw the persisted events as duplicates and never rebuilt pending — leaving the response's usage missing until a full reload. Clearing them lets the resume re-fold.	2026-06-14 10:48:07 -04:00
Danny Avila	db7011d567	📊 feat: Real-Time Context Window & Token Usage Tracking (#13670 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details GitNexus Index / index (push) Waiting to run Details GitNexus Index / post-index (push) Blocked by required conditions Details * 📊 feat: Real-Time Context Window & Token Usage Tracking * 🧪 fix: Align Pricing Spec Dep Signatures with TxDeps * 🩹 fix: Resolve Codex Findings for Context Usage Tracking * 📊 feat: Granular Tool Token Breakdown with Deferred Splits * 🧪 test: Cover Session Cost in Mock E2E and Scope Usage Selectors * 🧪 test: Live Host-Pipeline Usage Verification (Env-Gated) * 🧪 test: Local Real-Provider Multi-Turn E2E Harness * 🪙 fix: Keep Tagged Usage Buckets Out of the Live Context Estimate * 🩹 fix: Scoped Token-Config Fallback and Sequential Visibility for Usage Events * 🩹 fix: Address Usage Review Findings — Cost Timing, Scoped Caches, Finalized Output - carry the post-snapshot output estimate into the context snapshot at finalize so the gauge keeps the last response after live resets - accumulate per-rate billable units and price the session cost at render, so usage events arriving before the token-config load still count once it resolves - pass user-scoped token-config cache keys through loadConfigModels fetches and drop the controller's unscoped fallback to prevent serving another user's resolved config - tag emitted usage events with a per-run seq so resume dedupe never drops a distinct call with an identical payload - admit the static tokenConfig override in the custom endpoint schema so it survives zod parsing into req.config * 🩹 fix: Align Client Usage Accounting with Backend Cost Semantics - classify cache tokens by provider (shared inputTokensIncludesCache from data-provider, consumed by both the backend billing path and the client) instead of a magnitude heuristic, so Anthropic/Bedrock turns where cache is smaller than uncached input no longer under-bill input - mirror resolveCompletionTokens on the client so Vertex-style hidden thinking tokens are reflected in the Output row and session cost - prefer endpoint pricing over adapter-provider pricing so a custom endpoint can price a known model name without built-in rates shadowing it - carry static cacheRead/cacheWrite overrides through the tokenConfig schema and buildTokenConfigMap * 🩹 fix: Honor Static Token Config in Billing; Tighten Usage Freshness - initializeCustom now uses a static endpoint tokenConfig as the agent's endpointTokenConfig (billing + balance checks), not just the advertised UI config — previously the gauge showed admin rates while the agent billed against built-in tables - invalidate the token-config query alongside models on user-key add/ revoke so context windows and pricing refresh without a reload - include maxContextTokens in ChatForm's stabilized conversation memo so the gauge reflects a changed context-window setting immediately - feed the live output estimate from the legacy content path (direct and assistants streams), setting from cumulative part text rather than accumulating deltas * 🩹 fix: Resume Usage Dedup, Agent Pricing, and Partial Override Billing - fold usage events idempotently by (runId, seq) so resume backfill no longer resets the conversation totals — a mid-stream reconnect keeps the usage of prompts already completed earlier in the session - tap replayed pending message/reasoning/content events so output streamed past the resume snapshot reaches the live estimate, not just the message - resolve cost against the agent's backing endpoint (Agents conversations report endpoint `agents` / provider `openAI`, neither of which keys a custom endpoint's tokenConfig) - getMultiplier/getCacheMultiplier fall back to the standard tables for models absent from a partial endpointTokenConfig, so a partial static override no longer bills non-listed models at defaultRate while the UI shows the correct pattern rate * 🩹 fix: Repaired Output in Gauge, Cache-Rate Keys, Config Gate, Usage Cleanup - live/completed gauge counts the repaired completion (normalized output), so under-reporting providers don't drop the response from used context - translate static tokenConfig cacheWrite/cacheRead onto the write/read keys getCacheMultiplier reads, so cache tokens bill at the configured rate instead of the prompt-rate fallback - clear the token index and usage atoms when leaving a conversation, so visited histories don't accumulate in memory for the tab's lifetime - wait for startupConfig before mounting the gauge, so a deployment with contextUsage disabled never briefly mounts it or fires the token-config query on first load * 🩹 fix: Move Token-Config Resolution to TS; Key Live Usage by Created Convo - extract the token-config resolution (override gathering + cache lookup + buildTokenConfigMap) into resolveTokenConfigMap in packages/api, leaving the /api controller a thin request-scoped wrapper (CLAUDE.md TS rule) - getConvoKey prefers the user message's real conversationId once the `created` event stamps it, so a new chat's first-response live gauge and totals land under the id TokenUsage subscribes to instead of NEW_CONVO * 🩹 fix: Clear Stale Redis Job Usage; Live-Tap Legacy Streams; Share Fetched Config - DEL the Redis job hash before re-creating it so a reused streamId can't inherit a prior run's contextUsage/tokenUsage and backfill stale usage - tap the legacy {message,text} stream branch (non-agent OpenAI/Anthropic streams) into the live estimate, not just the content path - copy a deduped fetch's token config to every sibling endpoint sharing the baseURL/key/headers, so /token-config resolves each by its own name * ⏪ revert: Don't DEL Redis job hash in createJob (breaks cross-replica resume) createJob is an idempotent join — a second replica calls it for the same streamId to share an in-flight stream's state. DELeting the hash wiped the prior replica's persisted created/usage state, so a joining replica missed the created event (GenerationJobManager cross-replica integration test). Reverts the F1 change from `2bfce0c34b`; the stale-usage concern doesn't arise in practice (streamId is unique per generation). * 🩹 fix: Best-Effort Usage Emit; Tag Hidden Sequential-Agent Usage - wrap the ModelEndHandler usage emit in try/catch so a failed telemetry delivery (closed SSE / Redis publish error) can't abort the handler before thought-signature capture, which would break resumed tool calls - tag hidden sequential-agent usage as 'sequential' (non-primary) so the client folds it into session cost/totals but not the live context gauge, instead of letting an undefined usage_type inflate the visible gauge * 🩹 fix: Refetch Stale Token Config on Mount; Normalize Vertex for Lookup - useTokenConfigQuery refetches on mount when stale, so a user-key change that invalidates tokenConfig while the gauge is unmounted takes effect on return instead of serving the prior key's resolved config - normalize a Vertex-backed agent's provider (vertexai) to the google token-config key, so Gemini context windows and rates resolve instead of showing unknown context / $0 cost * ✨ feat: Server-Side Per-Event Cost (Authoritative Pricing for the Gauge) Move usage-cost pricing to the single source of truth. The backend prices each model call with the same billing functions (premium tiers via getMultiplier(inputTokenCount), cache rates) and emits the USD cost on on_token_usage when interface.contextCost is enabled; the client sums emitted costs instead of re-deriving from base token-config rates. - computeUsageCostUSD reuses prepareTokenSpend/prepareStructuredTokenSpend so the emitted cost matches what is billed (incl. premium thresholds) - getDefaultHandlers gains a usageCost pricing context; initialize.js wires db.getMultiplier/getCacheMultiplier gated on contextCost (agents path) - client UsageTotals carries a summed costUSD; retire the client-side rate lookups (costFromUnits/calcUsageCost) that drifted from backend pricing and produced the provider-keying / cache-key / Vertex / premium findings - keep normalizeUsageUnits for the displayed token counts; token-config is still used for the context-window meter Fixes the premium-tier session-cost under-report (gpt-5.x / gemini-3.1 above their input thresholds). * 🩹 fix: Branch-Accurate Usage Snapshot + Clearer Gauge Track Contrast - re-anchor the context snapshot from the user message to the response message at finalize. Regenerating a response branches off a shared user message, so anchoring on it made the snapshot read as "active" on both branches — switching to the sibling branch showed the wrong (other branch's) context. The response message is branch-unique, so sibling branches now correctly fall back to their own per-branch totals. - raise the gauge ring's track/fill contrast (muted track, prominent fill) so the used portion reads clearly as a fill-level indicator * 🩹 fix: Tag Sequential Usage in Billing; Emit Subagent Cost; Reset Live on Resume Errors - tag hidden sequential-agent usage `usage_type: 'sequential'` on the COLLECTED usage (not just the emit), and treat it as non-primary in recordCollectedUsage (billed, excluded from the reported output total) so hidden intermediate output stops inflating the parent's tokenCount/pruning - emit on_token_usage from the subagent usage sink (tagged `subagent`, with authoritative cost when contextCost is on) so the gauge's session cost/totals include billed subagent usage; it stays out of the live meter - call resetLive on the resumable 404 and max-retry terminal branches so the gauge doesn't keep counting stale in-flight tokens after the stream ends * 🎨 fix: Contrast the Popup Context Bar; Revert Ring Restyle - raise the popup breakdown's context progressbar contrast (muted surface-tertiary track, prominent text-primary fill) — that's the bar the contrast feedback was about - revert the gauge ring restyle (kept its original border-heavy track / text-secondary fill); the ring wasn't the element in question * 🩹 fix: Stop Snapshot Granularity Leaking Across Branches; Revert Tree Memo - a null-anchor context snapshot was treated as active on every branch, leaking one generation's granular breakdown onto sibling branches. Require a non-null (response-message) anchor on the viewed branch instead, so siblings without a matching snapshot fall back to their own totals. - revert the buildTree WeakMap memo in messages.ts. buildTree is pure (builds from shallow copies) so the memo was behaviorally identical, but it was the feature's only change to core branch-navigation selectors — removing it matches upstream and rules it out of branch-navigation debugging. * 🪙 fix: Thread Endpoint Token Config to Agent Billing, Cost, and Context Limits Custom-endpoint agents resolve an endpointTokenConfig during agent init but it never reached the AgentClient, so spending, emitted cost, and runtime max-token resolution all fell back to default rates for those agents. - Surface options.endpointTokenConfig on the returned InitializedAgent. - Pass it to the AgentClient (this.options.endpointTokenConfig) so the spending path bills at configured rates. - Thread it through usageCost to computeUsageCostUSD so emitted per-event cost matches billing. - getModelMaxTokens/getModelMaxOutputTokens fall back to the built-in map for models absent from a partial override (matches buildTokenConfigMap); consolidates the duplicated fallback in pricing.ts. * 🪙 fix: Preserve Granular Breakdown Across Branch Switches The granular context breakdown lives only in the live on_context_usage snapshot — a single per-conversation slot, anchored to the latest response and overwritten by each generation. Switching to a branch generated earlier this session lost its tool/skill/system rows and fell back to coarse totals. Retain each generation's finalized snapshot in a per-conversation map keyed by its branch-unique response id (snapshotsByAnchorFamily). When the live snapshot is off the viewed branch, walk the branch tail for its deepest stored anchor and render that breakdown. Bounded by generation count and cleared on conversation switch; the live/just-generated path is unchanged. * 🪙 fix: Harden Resume Seeding and Subagent Usage Emission - useResumableSSE: skip the trailing-output live seed when the resume carries a context snapshot; the snapshot's messageTokens already counts produced output, so seeding it again inflated usage until the next reset. - AgentClient subagent emitter: await GenerationJobManager.emitChunk like every other caller (it persists before publishing), so a floating promise can't race job cleanup and a Redis/publish failure is caught by the emitter's try/catch instead of surfacing as an unhandled rejection. * 🧪 test: Playwright Coverage for Context Breakdown Granularity Add a test-only data-testid distinguishing the granular snapshot breakdown (context-breakdown) from the coarse message-history estimate (context-estimate), then assert granularity in the mock e2e harness: - renders the granular breakdown from the live on_context_usage snapshot (guards that the snapshot event actually reaches the popover, not just the usage totals). - preserves the granular breakdown after switching branches — regenerate to overwrite the single live snapshot, switch back, and confirm the rows survive via the per-anchor snapshot history map. Branch regenerate/sibling selectors mirror the existing chat.spec branch test. All three usage specs pass against the mock pipeline. * 🪙 fix: Correct Resume Live-Seed, Fallback Re-index, and Subagent Emit Flush Codex round on the prior commit: - countTrailingOutputChars now counts only output at the very END of the aggregated content (0 when the model paused at a tool call), and the resume path always seeds it. The earlier skip-trailing-tool-parts behavior plus the skip-seed-when-snapshot gate together over- or under-counted in-flight output on resume; one rule fixes both — pre-invoke snapshot budget is never double-counted, and genuine in-flight output is no longer dropped. - useTokenUsage re-indexes from the messages cache on tail change while submitting. The cache subscriber is muted during streaming, so without a context snapshot (non-agent streams) sumBranch missed the created tail and dropped history + prompt until finalize. Bounded — tailId only shifts on created/finalize/branch-switch. - AgentClient tracks subagent usage emit promises and flushes them in chatCompletion's finally. The sink fires the emitter without awaiting, and resume reads the usage emitChunk persists (HSET), so cleanup must not race it or resumed clients miss billed subagent usage.	2026-06-13 19:38:28 -04:00
Danny Avila	3c3837bb7d	🧾 fix: Bill Subagent Child-Run Model Usage in Parent Transactions (#13683 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details GitNexus Index / index (push) Waiting to run Details GitNexus Index / post-index (push) Blocked by required conditions Details * 🧾 fix: Bill Subagent Child-Run Model Usage in Parent Transactions * 🩹 fix: Type Subagent Usage Sink Structurally Until SDK Release * 🔧 chore: Update @librechat/agents dependency to version 3.2.35 in package-lock.json and related package.json files	2026-06-13 14:55:48 -04:00
Danny Avila	139d61c437	🚐 fix: Reuse Request-Scoped MCP Connections per Run (#13673 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details GitNexus Index / index (push) Waiting to run Details GitNexus Index / post-index (push) Blocked by required conditions Details Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run Details Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run Details Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run Details Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions Details Sync Helm Chart Tags / Ignore non-main push (push) Waiting to run Details Sync Helm Chart Tags / Sync chart tags (push) Waiting to run Details * fix(mcp): reuse request-scoped connections per run * test(mcp): update connection factory defaults	2026-06-11 01:17:14 -04:00
Danny Avila	65bca95023	🎒 fix: Carry Request-Scoped MCP Tools into PTC Execution (#13669 ) * fix(mcp): preserve request-scoped tools for PTC execution * fix(mcp): preserve run-scoped tools on initialized agents	2026-06-10 23:48:04 -04:00
Dustin Healy	5867f1a065	🛡️ feat: Configurable Message PII Filter (#13602 ) * 🛡️ feat: Reject chat messages matching configured credential patterns Adds an opt-in `messagePiiFilter` middleware mounted on the agent chat route ahead of `moderateText`. When the configured patterns match the user's input the request is refused with 400, so the credential never reaches OpenAI moderation, the model, or MongoDB. Three starter patterns ship by default and operators can subset them or add their own regex via `customPatterns` in librechat.yaml. * 🧪 test: Memoize compiled patterns + add middleware spec Memoize the compiled pattern array via a WeakMap keyed by the messagePiiFilter config object so repeat requests against the same config skip the per-request RegExp construction. Cache entries are released automatically when the config object itself rotates. Adds packages/api/src/middleware/messagePiiFilter.spec.ts covering the default-starter rejections, the starterPatterns subset and empty-array semantics, customPatterns matching layered on top of and in place of the starters, the no-config and empty-text pass-through paths, and a memoization regression check. * 🛡️ fix: Skip invalid customPattern regexes instead of crashing the request Admin DB overrides for `messagePiiFilter.customPatterns` reach `req.config` via `mergeConfigOverrides`, which deep-merges raw override values without re-running `configSchema`. A typo'd regex like `(` would slip past the YAML-load validation and throw inside `new RegExp(...)` during `compile()`, returning 500 for every chat request until the operator rolled the override back. Wrapped the per-pattern compile in a try/catch that logs the invalid pattern id + reason and skips it, so other valid patterns (starters and other custom entries) keep filtering. Added a regression test alongside the existing spec. * 🛡️ feat: Extend PII filter to OpenAI-compatible and Responses agent APIs The chat-route middleware operates on `req.body.text`, but the remote agent API endpoints (`/api/agents/v1/chat/completions`, `/api/agents/v1/responses`) accept the same prompt content as a `messages` array or an `input` field. A caller using their API key could send a credential-shaped value through either route and bypass the configured PII filter even though they share the same agent and model backbone the middleware is meant to guard. Factored out `findPiiMatchInMessages`, a tolerant walker that handles both `content: string` and `content: ContentPart[]` user-message shapes against the same compiled, cached pattern list. Wired it into the OpenAI-compat controller after agent lookup and into the Responses controller right after `convertToInternalMessages`. Each returns the endpoint's native 400 error shape (`sendErrorResponse` / `sendResponsesErrorResponse`) with the `message_pii_filter_block` code when a user message matches. * 🩹 test: Add findPiiMatchInMessages to OpenAI + Responses controller mocks The OpenAI-compat and Responses controller specs mock `@librechat/api` with a hand-listed object. The new `findPiiMatchInMessages` export wired into both controllers in `3ea35af9a` was missing from those mocks, so the production lookup returned undefined and the controllers threw at request time under jest. Added the missing entries (default mock: returns null so the handlers fall through to the existing happy paths). All 278 agents-controller tests pass locally. * 🧹 refactor: Namespace messagePiiFilter under messageFilter.pii + fix import order Renames the yaml field `messagePiiFilter` to `messageFilter.pii`, the module to `messageFilterPii`, the factory to `createMessageFilterPii`, the type to `MessageFilterPiiConfig`, and the error code to `message_filter_pii_block`. The wrapper `messageFilter` namespace gives future safety filters (e.g. `messageFilter.toxicity`) a place to plug in without restructuring the config later. The `findPiiMatchInMessages` helper kept its name because it already describes what it does at the value level. Also fixes import order Danny flagged on the OpenAI-compatible and Responses controllers: `findPiiMatchInMessages` was appended at the bottom of two `require('@librechat/api')` destructures rather than placed in the length-sorted slot the house style expects. * 🧹 chore: Length-sort the general require destructure in responses.js Reorders the general sub-group inside the `require('@librechat/api')` destructure shortest to longest so the whole block conforms to the length-sort rule the file's `// Responses API` sub-group already follows. Pure reorder, no other changes. * 🧹 chore: Length-sort the defaultConfig block in AppService Reorders the `defaultConfig` keys in `packages/data-schemas/src/app/service.ts` shortest-line to longest-line, with the explicit-value entries (`mcpConfig`, `fileStrategies`, `cloudfront`) trailing the shorthand ones. Pure reorder, no behavior change.	2026-06-10 09:03:05 -04:00
Danny Avila	793cbd49f0	✂️ fix: Deduplicate Skill Bodies Across Fresh Primes and History (#13610 ) When a skill is primed fresh this turn (manual $-popover or always-apply) AND also appears in history as a `skill` tool_call, its SKILL.md body was injected twice — once by injectSkillPrimes and once reconstructed by formatAgentMessages. - add `collectFreshSkillPrimeNames` helper (packages/api) — union of manual + always-apply prime names - client.js: pass the set as `skipSkillBodyNames` to formatAgentMessages for both the initialMessages and memoryMessages paths so the body reconstructs once. Names not primed this turn still reconstruct (sticky manual re-prime). Requires `@librechat/agents` with `skipSkillBodyNames` support; the published dist silently ignores the unknown option until upgraded.	2026-06-09 17:16:24 -04:00
Danny Avila	8fc2314208	🧠 fix: Bound Memory Agent Input (#13606 )	2026-06-09 14:38:21 -04:00
Danny Avila	fd4728232c	🧵 fix: Reject Preliminary Parent Follow-Ups (#13619 ) * fix: Reject preliminary parent follow-ups * chore: Sort frontend imports * fix: Narrow preliminary parent detection * fix: Preserve refused submit state * fix: Propagate refused submit result	2026-06-09 12:06:51 -04:00
Danny Avila	2a956f143d	🪞 fix: Preserve Model Spec Icons Across Stream Resume and Abort (#13603 ) Some checks are pending Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run Details Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run Details GitNexus Index / index (push) Waiting to run Details GitNexus Index / post-index (push) Blocked by required conditions Details Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run Details Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions Details Sync Helm Chart Tags / Ignore non-main push (push) Waiting to run Details Sync Helm Chart Tags / Sync chart tags (push) Waiting to run Details	2026-06-08 17:14:21 -04:00
Danny Avila	cb1d536874	📻 fix: Replay MCP OAuth Prompts for Coalesced Connections (#13565 ) * fix: Replay MCP OAuth URL for Joined Connections * chore: Sort MCP OAuth Imports * test: Restore MCP OAuth Registry Spies * fix: Replay pending MCP OAuth prompts * fix: Replay MCP OAuth on Stream Resume * fix: Preserve MCP OAuth Replay Context * chore: Format MCP OAuth Replay Context * test: Expect MCP OAuth Replay Expiry * fix: Render pending MCP OAuth prompts * chore: Clean MCP OAuth Replay Type Narrowing * fix: Stabilize new MCP OAuth chats * fix: Re-emit cached MCP OAuth prompts * fix: Replay pending OAuth for selected MCP tools * fix: Avoid stalling pending MCP OAuth replay * test: Clean MCP OAuth review findings * test: Restore MCP OAuth registry spy * fix: Resolve OAuth Typecheck Regressions * fix: Harden MCP OAuth replay edge cases * test: Cover MCP OAuth joined prompt expiry * test: Mark joined OAuth replay fixture * test: Use OAuth fixture for joined replay expiry * fix: Anchor resumed MCP OAuth prompts * fix: Seed resumable turn metadata before MCP init * test: Format resume metadata regression * fix: Prioritize resumable stream routes * fix: Preserve MCP OAuth resume message tree * test: Fix MCP OAuth Resume Test Types * fix: Replay MCP OAuth Regenerate Prompts * fix: Skip OAuth-only Abort Persistence * fix: Stabilize OAuth Resume Replay * fix: Target Non-Tail Regenerate Responses * fix: Scope Regenerate Step Updates * fix: Clean Up OAuth Abort State * fix: Preserve Regenerate Branch Siblings * fix: Preserve OAuth Resume Branch State * fix: Preserve OAuth Branch Resume State * chore: Sort OAuth Resume Imports * fix: Address OAuth Resume Review Findings * test: Fix Abort Fixture Typing	2026-06-07 10:45:54 -04:00
Danny Avila	1612dba353	🏷️ fix: Preserve Generated Conversation Title on Stop (#13568 ) Immediate title generation discarded an already-generated title when the user stopped the turn, both in the backend (skipped saveConvo) and the frontend (rolled back the streamed title), leaving the chat as "Untitled" in the interim and "New Chat" after refresh. Split the title abort into two signals: `signal` still cancels an in-flight title model call on Stop, while a new `discardSignal` discards an already-generated title only when the stream is superseded by a newer run or the turn fails. A plain user Stop now persists and keeps the title. The frontend no longer rolls back a real, already-applied title on an aborted final event.	2026-06-07 08:59:05 -04:00
Danny Avila	21607ba3d7	📎 fix: Preserve Provider Document Uploads (#13550 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details GitNexus Index / index (push) Waiting to run Details GitNexus Index / post-index (push) Blocked by required conditions Details * fix: Preserve provider document uploads * test: Add provider upload e2e coverage	2026-06-06 10:03:32 -04:00
Danny Avila	5118a566df	🧭 fix: Restore Empty Skill Allowlist Catalog (#13526 )	2026-06-05 12:30:48 -04:00
Danny Avila	2c8d54e18c	🗂️ feat: Add Deployment Skill Directory (#13523 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details GitNexus Index / index (push) Waiting to run Details GitNexus Index / post-index (push) Blocked by required conditions Details * feat: Add deployment skill directory * chore: Address deployment skill review feedback * fix: Include deployment skill file metadata * test: Add deployment skills e2e smoke test	2026-06-05 10:24:28 -04:00
Danny Avila	dc42748813	🧷 fix: Bind Agent File Context to Current Turn (#13506 ) * fix: Bind agent file context to current turn * fix: Avoid duplicating agent file context * fix: Export agent file context prepender * test: Use exported file context prepender * fix: Keep file context transient for memory and counts	2026-06-04 09:03:43 -04:00
Danny Avila	1da789bac0	🗂️ feat: Add Agent File Authoring Tools (#13435 ) * feat: add agent file authoring tools * style: format file authoring changes * style: satisfy file authoring prettier * test: fix file authoring initialization expectations * fix: complete skill file authoring flow * fix: pass skill authoring state on edit * test: mock missing bundled skill file * fix: harden agent file authoring gates * fix: preserve file authoring runtime context * test: fix authoring context mock typing * fix: preserve subagent skill primes * test: avoid array at in handler spec * refactor: deepen skill authoring runtime wiring * fix: address codex authoring review findings * test: fix authoring collision fixture type * test: add skill file authoring mock e2e * fix: Improve skill file authoring recovery * fix: Show file authoring args while running * fix: Clarify skill rename authoring errors * fix: Keep code-only file authoring schemas sandbox scoped * fix: Address skill authoring review findings * fix: Gate skill authoring on write access	2026-06-03 23:58:12 -04:00
Danny Avila	baa23a8e24	🗂️ feat: Add Private Chat Projects (#13467 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details GitNexus Index / index (push) Waiting to run Details GitNexus Index / post-index (push) Blocked by required conditions Details Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run Details Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run Details Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run Details Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions Details Sync Helm Chart Tags / Ignore non-main push (push) Waiting to run Details Sync Helm Chart Tags / Sync chart tags (push) Waiting to run Details * feat: Add private chat projects * fix: Format project files * fix: Address project review findings * fix: Resolve project review follow-ups * fix: Handle project stats and cache edge cases * style: align projects UI with sidebar patterns * fix: resolve projects UI lint issues * style: Align project menus and composer * fix: Avoid project placeholder shadowing * fix: Handle project search and stale ids * fix: Polish project sidebar behavior * fix: Preserve new chat stream after creation * fix: Stabilize project sidebar sections * fix: Smooth project sidebar organization * fix: stabilize project chat entry * fix: keep project workspace outside chat context * fix: show default model on project workspace * fix: fallback project workspace model label * fix: preserve project scope during draft hydration * fix: include route project in new chat submission * fix: persist project id in agent chat saves * fix: refine project sidebar and creation UX * fix: export chat project method types * fix: polish project landing context * fix: refine project navigation affordances * feat: rework projects UX — coexisting sidebar sections + URL-driven scope Sidebar - Replace the chronological/by-project mode toggle with coexisting Projects + Chats sections (both always visible) - Remove ProjectConversations (927 lines), the org-mode Header, and types - Add ProjectsSection: collapsible project rows that unfurl chats inline (full-size rows), with per-project new chat and an open/rename/delete menu - Lift the marketplace/favorites shortcuts above the Projects section Chat scope - Derive a new chat's project strictly from the URL ?projectId, so the global New Chat no longer stays stuck in a project after a project chat Surfaces - Chat landing: subtle, clickable project chip instead of the floating badge - Project workspace: modest header, composer-style entry, chats list - All-projects grid: Claude-style cards with pluralized chat counts * chore: prune unused i18n keys; fix project chat-count pluralization * fix: project new-chat keeps model spec; sidebar header + row polish - newConversation: ignore a chatProjectId-only template when deciding to apply the default model spec, so starting a chat in a project no longer strips the conversation `spec` - useSelectMention: the Model Selector and @ command now retain the active project across endpoint/spec/preset switches; other new-chat paths still clear it - Chats header now matches the Projects header (inline chevron + a new-chat icon button) and starts a non-project chat - Project rows: use the new-chat icon for the per-project add button, render at text-sm to match the chat list, and align the row actions + hover color with conversation rows * fix: read project scope from router params; align sidebar header icons - useSelectMention now reads the active project from React Router's search params instead of window.location, which can drift out of sync because new-chat params are written to the URL via raw history.pushState; the Model Selector and @ command now reliably keep the project on switch - Move the Chats section header out of the virtualized list so it renders in the same context as the Projects header and isn't shifted by the list scrollbar - Inset header action icons (pr-2) so Projects/Chats header icons line up with the project-row and conversation-row trailing actions - Extract getRouteChatProjectId into utils for the submit path * fix: preserve chatProjectId through the new-chat template reduction The param-endpoint guard in newConversation reduced a new chat's template to { endpoint } only, dropping the chatProjectId injected by the Model Selector / @ switch — so switching models cleared the project scope. Keep chatProjectId in the reduced template. * style: align chat-history panel top padding; improve projects page contrast - Add pt-2 to the chat-history panel so its top spacing matches the other side panels (agent builder, skills, files, etc.) - Projects grid + workspace now use the darkest surface for the page (surface-primary) with cards, inputs, and the composer one step lighter (surface-secondary) and tertiary on hover, so cards read as elevated rather than darker than the background * feat: interactive project landing chip + gallery icon for all-projects - All-projects sidebar button uses the gallery-vertical-end icon - The project landing chip is now interactive: click it to switch projects via a searchable combobox (ControlCombobox), or the trailing × to drop the project scope. Both update the draft conversation and the ?projectId search param in place, so the typed message and selected model are preserved * test: fix Conversations unit test for refactored sidebar; add projects e2e - Update Conversations.test.tsx mocks for the inline Chats header (useNewConvo, useQueryClient, conversation atom, NewChatIcon, TooltipAnchor), drop the removed chatsHeaderControls prop, and remove the mock for the deleted ../Header module — fixes the failing frontend Jest job - Add e2e/specs/mock/projects.spec.ts covering project creation, the project-scoped new-chat landing + interactive chip (switch/remove), and listing projects on /projects - Give the landing chip combobox a stable selectId for reliable targeting * fix: refresh project stats after project-chat activity; stabilize e2e - useEventHandlers: when a project chat is created/updated, invalidate the live [projects] query (gated on chatProjectId) instead of the now-unused projectConversations key, so the sidebar + all-projects stats refresh after a streamed reply (addresses a Codex finding) - projects e2e: assert the reliable project-landing behavior (chip, scoped composer, accepted send) rather than the /c/:id transition, which the mock LLM harness doesn't complete * test: verify a project chat saves and is filed under its project (e2e) - Switch to a mock endpoint before sending so the message streams without a real API key (the default model failed with "No key found", so no chat was saved and the page never left /c/new); this also asserts the project chip survives the model switch - Restore the reply + /c/:id transition assertions and add a check that the chat is listed under the expanded project in the sidebar - Add data-testid="project-chats-<id>" to the inline project chat list * fix: address Codex review findings (project scope edge cases) - useSelectMention: fall back to the conversation's chatProjectId when the URL has no projectId, so switching model/spec inside an existing project chat (/c/:id) keeps the project assignment - Conversations: include chatProjectId in the MemoizedConvo comparator so a sidebar row's project menu doesn't stay stale after a reassignment - useDeleteProjectMutation: clear the active conversation's chatProjectId when its project is deleted (mirrors the assignment mutation); drop the now-dead projectConversations invalidation - useQueryParams: carry the project into the new conversation when applying URL settings, so /c/new?projectId=...&<settings> stays scoped * fix: project stats pagination + archived-chat edge cases (data-schemas) - listChatProjects: include the null lastConversationAt bucket in the desc cursor so empty projects paginate (a $lt:<date> predicate excluded nulls, hiding chat-less projects from "Load more") - saveConvo: recompute project stats instead of the incremental fast path when the saved conversation is itself archived/temporary/expired, so a project's lastConversationAt/Id no longer points at a hidden chat * test: cover chat-less project pagination across the dated→null boundary * fix: validate project ownership in bulkSaveConvos Bulk paths (import/duplicate/fork) persisted whatever chatProjectId the payload carried; an id that does not belong to the user created an orphan assignment hidden from both the project and the unassigned sidebar. Validate ownership like saveConvo and strip un-owned project ids before persisting, refreshing stats only for owned projects. * fix(projects): preserve chatProjectId on continuation, basename-safe delete redirect, project-detail invalidation * fix(projects): navigate project workspace chats via useNavigateToConvo to avoid stale conversation state * fix(projects): include projectConversations cache when resolving deleted chat's project for detail invalidation * fix(projects): refresh both projects when a save or bulk write moves a chat between them * style(projects): use Folders icon for the sidebar Projects header * fix(projects): require id on ProjectUser so ProjectRequest extends Express Request cleanly * style(projects): taller project chip with hover-revealed remove button, upward combobox; sort en translations * style(projects): show endpoint/agent icon for project workspace chat rows	2026-06-03 15:29:18 -04:00
Danny Avila	2ef7bdfbc2	⚡ feat: Immediate Conversation Title Generation (#13395 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details GitNexus Index / index (push) Waiting to run Details GitNexus Index / post-index (push) Blocked by required conditions Details Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run Details Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run Details Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run Details Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions Details Sync Helm Chart Tags / Ignore non-main push (push) Waiting to run Details Sync Helm Chart Tags / Sync chart tags (push) Waiting to run Details * ⚡ feat: Immediate Conversation Title Generation Generate conversation titles as soon as the request is made (in parallel with the response, from the user's first message) as the new default, fixing the #13318 race where a transient /gen_title 404 left new chats stuck on "New Chat". - Add per-endpoint `titleTiming` ('immediate' \| 'final') to baseEndpointSchema; `endpoints.all` acts as the global default, unset = immediate. Resolve via a new `resolveTitleTiming` helper (`all` takes precedence). - Fire title generation in parallel with `sendMessage`; `titleConvo` waits (bounded, abortable) for the agent run and titles from the user input only. Persist after the conversation row exists; defer `disposeClient` until the title settles. - Expose `titleGenerationTiming` via startup config; `useTitleGeneration` fetches eagerly in immediate mode with a bounded 404 retry and never treats a transient 404 as final. Skip title queueing for temporary conversations. - Supersedes #13329 while incorporating its bounded 404-retry. * 🩹 fix: Address Copilot review findings on title timing - Guard against an undefined conversationId in addTitle (skip + warn) so the gen_title cache key can't collide as `userId-undefined` and saveConvo is never called without a conversationId. - Gate the title `useQueries` on `enabled` so no /gen_title request fires while unauthenticated (e.g. after logout) even if the module queue holds IDs. - Drop the stale `conversationId` param from the titleConvo JSDoc. - Add a regression test for the undefined-conversationId guard. * 🧵 fix: Harden immediate-title edge cases from codex review - Cancel in-flight immediate title generation when the request aborts: thread job.abortController.signal through addTitle so pressing Stop on a new chat neither consumes the title model nor surfaces a title for a cancelled turn. - Preserve a locally-applied title when the final SSE event's conversation carries no title yet (built before the title was saved), so long immediate-mode responses no longer revert the chat to "New Chat" until reload. - Guarantee one full post-completion gen_title fetch cycle before giving up, so a `final`-mode title (generated only after the stream ends) is still fetched under a global `immediate` default instead of being stranded. - Add regression tests for the abort propagation and the undefined-conversationId guard. * 🔁 fix: Correct title abort, post-completion refetch, and replacement ordering Follow-up to codex review of the immediate-title fixes: - Use a dedicated title AbortController instead of `job.abortController`. The latter is also aborted by `completeJob` on successful completion, which cancelled any title slower than a short response. The title is now cancelled only on a real user Stop or when the stream is replaced; a completed-then- aborted title is discarded (no save, cache cleared) rather than persisted. - Reset (not remove) the post-completion title query: `resetQueries` refetches the mounted observer with a fresh retry budget, whereas `removeQueries` left it stuck in its error state, so the promised post-completion cycle never ran. - Run the job-replacement check before resolving `convoReady`, and on a replaced stream cancel/discard the stale title so a discarded prompt can't persist a title. * 🧷 fix: Tighten title abort ordering and endpoint-level timing resolution Follow-up to codex review: - Abort the title controller before resolving `convoReady` on a stopped turn, so the title task can't resume and persist before the later abort. - Cancel the title and unblock its waits on ANY send failure (not just user aborts): a preflight/quota failure before the run exists otherwise hangs `_waitForRun`, deferring client disposal until the 45s title timeout. - Resolve `titleTiming` for custom endpoints via `getCustomEndpointConfig` (their config lives under `endpoints.custom[]`, not `endpoints[endpoint]`). - Derive the startup `titleGenerationTiming` via `resolveTitleTiming` for the agents endpoint so an endpoint-level `final` (without `endpoints.all`) is honored client-side instead of defaulting to immediate and burning eager gen_title polls. * 🪢 fix: Per-agent title timing and safer abort/replacement handling Follow-up to codex review: - Resolve `titleTiming` from the agent's actual endpoint after initialization, so a per-endpoint `final` override on a custom/provider endpoint backing an (ephemeral) agent is honored instead of always using the `agents` endpoint's value. - Don't preserve a locally-fetched title on a stopped (unfinished) turn: the server cancels and discards that title, so keeping it client-side would diverge from server state and leave the stopped chat titled until reload. - On abort/replacement, only delete the cached title if it still holds THIS task's value — a replacement stream shares the `userId-conversationId` key and may have already cached its own valid title that must not be removed. * 🪞 fix: Mirror AgentClient title-config resolution for titleTiming Per maintainer guidance, keep titleTiming resolution identical to how `AgentClient#titleConvo` already resolves the endpoint config — `endpoints.all` is the intended global override and the agent's actual provider endpoint is used: - Resolve via `endpoints.all ?? endpoints[endpoint] ?? getProviderConfig(endpoint) .customEndpointConfig` (was using `getCustomEndpointConfig` directly). Going through `getProviderConfig` picks up its case-insensitive fallback for normalized provider names (e.g. `openrouter` → `OpenRouter`), so a custom endpoint's `titleTiming` is honored like its other title settings. - Add `titleTiming` to the Azure endpoint schema `.pick()` so `endpoints.azureOpenAI.titleTiming` is no longer silently stripped by Zod. Note: per-endpoint title settings being skipped when `endpoints.all` is present is the existing, intended global-override behavior — not changed here. * 🧪 test: Cover useTitleGeneration effect logic (integration) Adds a deterministic white-box integration test that drives the real hook's React effects with a controllable react-query surface, locking down the stateful decisions that previously had no coverage: - immediate mode fetches a queued conversation while its stream is still active - final mode gates until the stream completes, then becomes eligible - success applies the fetched title to the conversation caches - a 404 while active defers (removeQueries) instead of giving up - a 404 after completion forces a fresh fetch via resetQueries (post-completion remount) * feat: Stream immediate title events * style: Format title SSE handler * test: Preserve data-provider exports in OAuth mock * test: Isolate OAuth route API mock * test: Keep OAuth callback factory capture * fix: Replay streamed title events on resume * fix: Honor agents title timing precedence * style: Format title timing fixes	2026-06-02 16:40:57 -04:00
Danny Avila	479e9d59b7	🧠 refactor: Memoize MCP Permission Checks Per Request (#13419 )	2026-05-30 18:32:06 -04:00
Danny Avila	100871c3ec	🛂 fix: Enforce MCP Permissions for Agent Tools (#13174 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details GitNexus Index / index (push) Waiting to run Details GitNexus Index / post-index (push) Blocked by required conditions Details * fix: Enforce MCP Permissions for Agent Tools * fix: Measure MCP Image Limit by Decoded Size * fix: gate cached MCP tools and tighten remote image URL detection Addresses Codex review findings on the MCP permissions PR: - filterAuthorizedTools previously fast-accepted any tool present in the global tool cache before reaching the MCP-use permission gate. App-level MCP tools (keyed `name_mcp_server` by MCPServerInspector and merged into the cache via mergeAppTools) therefore bypassed the canUseMCP check, letting a user without MCP_SERVERS.USE persist/bind them. Route all MCP-delimited tools through the permission + server-access gate regardless of cache presence. - assertImageDataWithinLimit / image formatter used startsWith("http") to skip the size cap, which also matched base64 payloads that happen to begin with those chars. Require http:// or https:// via a shared isRemoteImageUrl helper so oversized inline base64 can no longer bypass MCP_IMAGE_DATA_MAX_BYTES. Adds regression tests for both paths. * fix: address Codex round-2 findings on MCP permissions PR - parsers.ts: parseAsString dropped the image payload for unrecognized providers, returning only `Image result: <mimeType>`. Pre-PR these items survived via JSON.stringify(item). Keep the size guard but fall through to JSON.stringify so the data/URL is preserved. - MCP.js: the runtime MCP-use check only read `configurable.user`, so paths that propagate `user_id` only (e.g. the OpenAI-compatible API in agents/openai/service.ts) rejected every MCP tool call for an authenticated user. Add resolveMCPPermissionUser: use the safe user directly when it already carries a role (no extra DB call), otherwise fall back to loading the role by user_id. Update fail-closed tests to the resolved behavior. - v1.js: the update path only re-filtered newly added MCP tools, so a user who lost MCP_SERVERS.USE kept existing MCP bindings on edit while create/duplicate/revert stripped them. Strip all MCP tools on update when the permission is revoked; keep the narrower new-tool gating (and disconnect/registry preservation) when it is intact. Updates and adds regression tests for all three paths. * fix: populate safe user at producer instead of resolving in runtime MCP check Corrects the Finding B approach from the previous commit. Rather than loading the user by id inside the runtime MCP permission check, populate `configurable.user` (and createRun's `user`) with the full safe user at the producer, matching the in-repo agent controllers (responses.js / openai.js) which already pass `createSafeUser(req.user)`. - service.ts: derive `safeUser` via createSafeUser(req.user) and pass it to both createRun and processStream's configurable, so the role-bearing identity reaches the runtime `userCanUseMCPServers(configurable.user)` check. Falls back to a bare id when the host app attached no user, which correctly leaves MCP gated (fail closed). - MCP.js: revert the resolveMCPPermissionUser DB-load fallback; the runtime check again reads configurable.user directly and fails closed when absent (defense in depth). - MCP.spec.js: revert to the matching runtime test expectations. * test: cover safe-user propagation in createAgentChatCompletion Adds a focused spec for the OpenAI-compatible chat completion service (the producer fixed for Codex Finding B). Injects mocked deps and asserts that createRun and processStream's configurable.user carry the role from req.user (with sensitive fields stripped by createSafeUser), and that an unauthenticated request falls back to a bare { id: 'api-user' } so the runtime MCP check fails closed. * fix: address Codex round-3 findings + TS6133 - MCP.js (P1): the assistants required-action path invokes tool._call( toolInput) with no LangChain config, so the runtime check saw no configurable.user and rejected authorized users. createToolInstance now captures the creation-time user (req.user via createMCPTool) and _call falls back to it for both the permission check and userId. Still fails closed when neither config nor captured user carries a role. - v1.js (P2): the update-path isMCPTool used a bare mcp_delimiter substring check, misclassifying action tools whose operationId contains "_mcp_" (e.g. sync_mcp_state_action_...) as MCP and dropping them on a permission-revoked edit. Delegate to the canonical isActionTool so only real MCP tools are gated. Regression test added. - service.ts: drop the now-unused IUser import (TS6133); derive reqUser's type from createSafeUser's own parameter instead. * fix: resolve TS7022 self-reference in service.spec mock res The mock response object referenced `res` inside its own `status`/`json` initializers without a type annotation, so tsc inferred `res` as `any` (TS7022). Annotate the object and assign the self-referencing chainable methods after declaration. * fix: correct round-4 findings (isActionTool import, captured user, partial-update) - v1.js: import isActionTool from librechat-data-provider (its real export; @librechat/api does not export it, so the prior import was undefined and threw TypeError). Exclude action tools from MCP classification in both the main filterAuthorizedTools loop and the update path, so action tools whose operationId contains _mcp_ (e.g. sync_mcp_state_action_...) are preserved regardless of MCP permission. - v1.js: evaluate the effective tool set (updateData.tools ?? existingAgent.tools) so a tools-less PATCH by a user who lost MCP_SERVERS.USE still strips stale MCP bindings, matching create/duplicate/revert. - MCP.js: createToolInstance now receives the construction-time user and _call falls back to it (permissionUser) when configurable.user is absent, fixing the assistants required-action path that invokes _call without a config and resolving the capturedUser no-undef/ReferenceError. - Tests: action-tool preservation (authorized + denied), tools-less revocation PATCH, updated revocation test to expect all MCP tools stripped. Affected specs pass locally: MCP 49/49, filterAuthorizedTools 49/49. * fix: guard isActionTool against non-string tools; correct actionDelimiter import Two test regressions from the prior commit: - The main filterAuthorizedTools loop called isActionTool(tool) directly, but isActionTool does toolName.indexOf(...) and throws on null/undefined. Compute isActionToolName = typeof tool === 'string' && isActionTool(tool) once and reuse it, restoring graceful null/undefined handling. - The action-tool test referenced Constants.actionDelimiter (undefined); actionDelimiter is a standalone librechat-data-provider export. Import and use it directly. filterAuthorizedTools 36/36 and MCP 40/40 pass locally. * fix: address MCP permission review follow-ups * fix: preserve shared agent MCP tools	2026-05-30 16:19:49 -04:00
Danny Avila	6d9c01927d	🧠 refactor: Replay DeepSeek `reasoning_content` via OpenRouter (#13368 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details GitNexus Index / index (push) Waiting to run Details GitNexus Index / post-index (push) Blocked by required conditions Details Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run Details Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run Details Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run Details Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions Details * 🧠 fix: Replay DeepSeek `reasoning_content` via OpenRouter DeepSeek's thinking-mode API rejects multi-turn tool-calling requests unless `reasoning_content` from each tool-bearing assistant message is replayed verbatim, returning HTTP 400 "The `reasoning_content` in the thinking mode must be passed back to the API." The agents SDK already handles this for direct `Providers.DEEPSEEK`, but DeepSeek models routed via OpenRouter use `Providers.OPENROUTER` — `formatAgentMessages` skipped the reasoning-preservation branch, and `ChatOpenRouter` left `includeReasoningContent` unset, so the field silently dropped on every subsequent turn. Add `isDeepSeekReasoningProvider(provider, model)` and use it in two places: (1) `getOpenAILLMConfig` flips `includeReasoningContent: true` when OpenRouter is dispatching a `deepseek/` model so the LangChain client emits the field on assistant turns that have non-empty `additional_kwargs.reasoning_content`, and (2) `AgentClient` spoofs the provider hint to `Providers.DEEPSEEK` when calling `formatAgentMessages`, triggering the SDK's existing `preserveReasoningContent` path that re-attaches the field to reconstructed tool-bearing AIMessages. The downstream `_convertMessagesToOpenAIParams` is already gated on non-empty `reasoning_content`, so the flag is a no-op outside thinking mode. Resolves #13366. fix: Harden DeepSeek detection against OpenRouter routing edges Address three Codex review findings on #13368: 1. Strip OpenRouter's `~` latest-routing prefix before applying the DeepSeek model regex. `~deepseek-chat` and `~deepseek/r1` were previously left unmatched because the regex's start/`/` boundary only saw the `~`. Mirror the SDK's `normalizeOpenRouterModel()` here and in `getOpenAILLMConfig`. 2. Add a custom-endpoint fallback: when the model id carries the unambiguous `deepseek/...` OpenRouter namespace, accept it regardless of the resolved provider. Covers the case where a user configures OpenRouter under a non-standard endpoint name and `initializeAgent` normalizes the unknown provider to `openai`, stranding the spoof. Bare `deepseek-` ids still require an explicit DeepSeek/OpenRouter provider so unrelated endpoints labelling a model `deepseek-r1` don't trigger. 3. Inspect every agent in `this.agentConfigs` when deciding whether to spoof the format provider. Multi-agent handoff runs feed all agents' messages through one `formatAgentMessages` call, so a DeepSeek handoff under a non-DeepSeek primary previously lost its persisted reasoning_content too. Also addresses Copilot's review note: only pass the options object to `formatAgentMessages` when the DeepSeek spoof is actually needed, preserving the pre-fix behavior for everyone else. fix: Extend DeepSeek reasoning_content fix to OpenAI-compat agent paths Address two more Codex P2 findings on #13368: 1. `getOpenAILLMConfig` no longer gates `includeReasoningContent` on `useOpenRouter`. Any DeepSeek-style model id (with `~` latest-routing prefix stripped) is sufficient. This re-aligns the LLM gate with `AgentClient`'s formatter spoof, which already treats a `deepseek/` id as authoritative — so a custom-named OpenRouter endpoint or a DeepSeek-compatible proxy gets the field both attached to history AND serialized to the wire. Direct `ChatDeepSeek` ignores the flag (its own conversion path hardcodes `includeReasoningContent: true`), so this is a harmless no-op there. 2. Thread the same `Providers.DEEPSEEK` formatter hint through `api/server/controllers/agents/openai.js` and `responses.js` (the OpenAI-/Responses-compatible serving paths). Without it those paths restored `additional_kwargs.reasoning_content` only in `AgentClient` while the LLM config flipped `includeReasoningContent` on for them too — so DeepSeek tool turns served from those endpoints would still ship requests with the flag set but no field present, hitting the same second-turn 400. The `needsDeepSeekFormatHint` helper in `openai.js` mirrors `AgentClient`'s per-agent check. fix: Tighten DeepSeek detection and cover handoff sub-agents Address four more Codex P2 findings on #13368: - Tighten the DeepSeek model regex to `^deepseek(?:[-/]\|$)/i` (anchored to start). Rejects cloned/distilled slugs like `mistral/deepseek-distilled-foo` and `community/deepseek-r1` that previously matched via the `(?:^\|/)` alternation, which could attach the DeepSeek-only `reasoning_content` field on proxies that don't accept it. - Anchoring also collapses the namespace-only fallback into the same pattern, so bare `deepseek-chat` / `deepseek-reasoner` on a custom OpenAI-compatible DeepSeek proxy are now recognized — fixing the asymmetry where `getOpenAILLMConfig` would flip `includeReasoningContent` for those bare ids but `AgentClient` wouldn't pass the formatter hint. - Extend `needsDeepSeekFormatHint` in `openai.js` (and the inline check in `responses.js`) to walk `handoffAgentConfigs` too. In multi-agent runs where the primary isn't DeepSeek but a connected handoff agent is, the SDK's `formatAgentMessages` previously dropped the handoff's persisted reasoning_content before the next tool turn, preserving the 400 the PR was meant to prevent. - Mirror the regex change in `getOpenAILLMConfig`. Out of scope: the OpenAI-compatible serving paths still don't preserve incoming `reasoning_content`/`reasoning` fields in `convertMessages`, nor does the Responses API persist reasoning in `saveResponseOutput`. Those are deeper persistence/conversion fixes worth a separate PR. * test: Allow includeReasoningContent for Azure-serverless DeepSeek CI surfaced a backward-compat expectation that snapshotted the pre-fix behavior. Azure-serverless DeepSeek deployments (e.g. `DeepSeek-R1`) forward to the same DeepSeek thinking-mode tool-call contract, so the LLM gate now correctly flips `includeReasoningContent: true` for them too. The downstream gate on a non-empty `additional_kwargs.reasoning_content` keeps this a no-op outside thinking mode. * chore: Trim noisy comments Per CLAUDE.md ("self-documenting code; no inline comments narrating what code does"), strip the multi-paragraph rationale that crept into the DeepSeek reasoning_content fix. The commit history and PR description carry the why; the code says the what. Keeps one single-line JSDoc on `isDeepSeekReasoningProvider` (linking to the DeepSeek docs) and a `(#13366)` tag on each opt-in site so future readers can find the context. * revert: Drop non-functional DeepSeek hint from OpenAI-compat serving paths Codex's later review passes correctly flagged that threading the DeepSeek formatter hint through openai.js (`/v1/chat/completions`) and responses.js (`/v1/responses`) doesn't actually fix the second-turn 400 in those paths. Empirical check against the real SDK confirmed the gap is deeper and pre-existing: formatAgentMessages(payload, ..., { provider: DEEPSEEK }) where payload is the `convertMessages`/`convertInputToMessages` output shape (string content + TOP-LEVEL `tool_calls`) produces NO tool-bearing AIMessage at all — `formatAssistantMessage` only reconstructs tool calls from `tool_call`-typed content parts, never a top-level `tool_calls` field. So those serving paths don't reconstruct tool-call history (let alone reasoning) regardless of the hint. The Responses persistence layer likewise stores only output text, not tool calls or reasoning. Making those paths work requires reworking the wire->internal message conversion (and Responses persistence) to emit content-part arrays — a broad, pre-existing concern beyond this issue and risky to land here. Rather than ship a hint that looks like a fix but is inert, revert the serving-path changes and scope this PR to the validated AgentClient chat path (the actual surface in #13366). Reverts the openai.js/responses.js threading and their spec mocks to main. Keeps the AgentClient fix, `isDeepSeekReasoningProvider`, the `getOpenAILLMConfig` flag, and the type.	2026-05-28 22:10:49 -07:00
Danny Avila	94c73123ee	📋 fix: Cap Default Limit on Agent List Queries (#13382 ) * 🛡️ fix: Cap Default Limit on Agent List Queries (#13363) `GET /api/agents` accepted unbounded requests: when the client omitted `limit`, the value flowed straight into `getListAgentsByAccess`, which set `isPaginated = false` and issued an uncapped MongoDB query. Combined with the unindexed `findPubliclyAccessibleResources` AclEntry scan run on every request, this produced 10-19s response times and stalled the connection pool on instances with 100+ agents. - Default `limit` to 100 in the route handler so client requests without `?limit=` paginate by default. - Default `limit` to 100 in `getListAgentsByAccess` itself as defense-in-depth. The function already caps numeric limits at 100, so there is no client-facing change. - Pass `limit: null` explicitly in the actions route, which legitimately needs the full editable-agent set, to preserve its existing behavior. - Add regression tests covering the default cap and the explicit unbounded opt-out. * 🛡️ fix: Avoid agent-list regression for users with 100+ agents Codex review pointed out that capping `getListAgentsByAccess` at 100 silently truncated agents past the first page for the four consumers (`useAgentsMap`, `AgentSelect`, `ModelSelectorContext`, `useMentions`) that read `res.data` without following `has_more`/`after`. - Raise the function's hard cap from 100 to 1000 to match `MAX_AVATAR_REFRESH_AGENTS`, the realistic upper bound the avatar-refresh path already assumes. (Side effect: the avatar refresh call site was silently being capped at 100 by the old normalize step.) - In `useListAgentsQuery`, merge `limit: 1000` into params so the four consumers above get the user's full accessible set in a single round-trip instead of needing cursor pagination. - Route handler default stays at 100 as defense-in-depth for any other caller that omits `limit`. - Add a regression test asserting an explicit `limit` above 100 now returns the full set instead of being clipped. * 🪢 fix: Keep agent-list cache key stable for mutations Codex P2 review noted that folding `limit: 1000` into the cache key broke `allAgentViewAndEditQueryKeys` in `Agents/mutations.ts`, which references `[QueryKeys.agents, { requiredPermission }]` directly across eight mutation handlers. After my prior change the cached entry lived under `[QueryKeys.agents, { limit: 1000, requiredPermission }]`, so create/update/delete/avatar/action mutations stopped updating the list the four consumer hooks render — and with `refetchOnMount` and focus/ reconnect refetches disabled, the UI would stay stale until something else triggered a fetch. Split the merged limit out of the cache key: the request to `dataService.listAgents` still uses `requestParams` (with the default limit applied), but the React Query cache key uses the caller's `params` as-is. The mutation cache updates land again, and the request still returns the user's full accessible set in one round-trip. * 🛡️ fix: Index AclEntry and paginate agent list internally (#13363) Completes the perf fix for #13363 properly — resolves both the unbounded ACL scans Copilot flagged and Codex's tension between "show all agents" and "don't bypass the server cap". Backend: - Add a compound index on `{ principalType, resourceType, permBits, resourceId }` to the AclEntry schema. This is the index missing for `findPublicResourceIds` and the public branch of the `$or` in `findAccessibleResources`, both of which previously fell back to a collection scan on every `GET /api/agents`. Adds an `explain`-based regression test asserting the public query no longer COLLSCANs. Client: - Rewrite `useListAgentsQuery` to follow the server's cursor pagination internally and concatenate every page into a single flat `AgentListResponse`. Consumers (`useAgentsMap`, `AgentSelect`, `ModelSelectorContext`, `useMentions`) get the user's complete accessible-agent set without any of them needing to learn about cursors, and each individual request uses the server's default page size (so the route's 100-default defense-in-depth fires for real). Cache key shape is unchanged, so the eight mutation handlers in `Agents/mutations.ts` keep matching `allAgentViewAndEditQueryKeys` and update the cached list as before. - Drop the `FULL_AGENT_LIST_LIMIT = 1000` injection added in the previous commit — no longer needed once pagination handles the full set, and removing it stops bypassing the route default. * 🧹 fix: CI fallout from C-done-properly refactor - Collapse multi-line `fetchAllAgentPages` signature in queries.ts so prettier stops complaining. - In the new public-principal index test, grant one ACL entry before calling `.explain()` so the collection exists (otherwise mongo returns `nonExistentNamespace` and there is no winning plan to inspect). - Cast the `.explain('queryPlanner')` result to a typed shape — the mongoose return type doesn't expose `queryPlanner` directly and was failing the TypeScript check. * 🧪 fix: Test the AclEntry public-principal index via hint, not planner choice The previous test asserted the query planner did not pick COLLSCAN for the public-principal lookup. That assertion fails on small collections (under the planner's collection-size heuristic) — the index exists and is usable, but with a single document in the test the planner correctly chooses COLLSCAN as the cheaper plan. Reshape the assertion: 1. Confirm the new compound index is actually declared by inspecting `collection.indexes()` after `syncIndexes()`. 2. Force the planner to that index via `.hint()` and assert the winning plan is `IXSCAN` — proves the index is real and serves this query shape, without depending on collection-size heuristics. * 🧹 chore: Slim down verbose comments The JSDoc and inline comments added across the perf fix had drifted into multi-paragraph rationale better suited to the PR description than the source. Collapse to single-line JSDoc that just describes what each piece does; drop the inline comment in `actions.js` entirely — the call is self-evident.	2026-05-28 21:37:53 -07:00
Danny Avila	749eb06e67	🧭 fix: Reduce MCP Registry ACL Lookups (#13195 )	2026-05-19 17:16:37 -04:00
Danny Avila	68eac104ad	🗂️ fix: Scope Handoff Agent Context Docs (#13167 ) * fix: Scope agent context docs to handoff agents * fix: Deduplicate scoped request context * refactor: Extract agent attachment helpers	2026-05-18 15:36:22 -04:00
Danny Avila	62da4c28ed	🛡️ fix: Sanitize Agent List Skill Scope (#13122 )	2026-05-14 09:27:41 -04:00
Danny Avila	030dc98a1d	☁️ fix: Enable Azure Agent Provider Uploads (#13045 )	2026-05-10 17:47:05 -04:00
Danny Avila	d90567204e	🛟 fix: persist Vertex Gemini 3 `thoughtSignatures` across DB round-trips (#13026 ) When a tool round-trip is interrupted between the tool result and the model's text reply (user aborted, network drop, pod restart, ...) and LibreChat persists the partial assistant message, the next conversation turn reconstructs an `AIMessage` from `formatAgentMessages` that has `tool_calls` populated but no `additional_kwargs.signatures`. Vertex Gemini 3 rejects the resumed request with 400 because the most recent historical functionCall has no `thought_signature`. ## Storage shape Capture as `Record<tool_call_id, signature>` rather than a flat array. This addresses the codex P1 review: > When an assistant turn contains multiple sequential tool-call batches, > this restoration path writes all persisted thoughtSignatures onto only > the last tool-bearing AIMessage. Vertex/Gemini validates signatures > for each step in the current tool-calling turn, so earlier > functionCall steps reconstructed without their signature can still > fail with 400. A single agent run can fire multiple `chat_model_end` events when the loop cycles the LLM with intervening tool results — each cycle owns a distinct `tool_call_id`. Per-id storage maps each signature back onto the right reconstructed `AIMessage`, not just the last one. ## Mapping `additional_kwargs.signatures` is a flat array indexed by response part (text + functionCall interleaved). `tool_calls` is just the function calls in their original order. Non-empty signatures correspond 1:1 with tool_calls in order — see `partsToSignatures` in `@langchain/google-common`. Single-pass walk maps `signatures[i]` (when non-empty) onto the i-th `tool_call.id`. ## Pipeline \| Stage \| File \| Change \| \|---\|---\|---\| \| Capture \| callbacks.js \| `ModelEndHandler` accepts `Record<string,string>` map; walks signatures + tool_calls in tandem to record per-id. Gated on the map being provided — non-Vertex flows are no-op (and also no-op even when provided, since they don't emit signatures). \| \| Plumbing \| initialize.js \| Allocate `collectedThoughtSignatures = {}`, share with handler + client. Always allocated; the JSDoc explicitly documents that it stays empty for non-Vertex providers. \| \| Surface \| client.js \| `sendCompletion` returns `metadata.thoughtSignatures` when the map has entries; falls through unchanged when empty. \| \| Persist \| (existing BaseClient.handleRespCompletion) \| Writes `metadata` from `sendCompletion` onto `responseMessage.metadata`. Mongoose `Mixed` — no migration. \| \| Restore \| formatMessages.js \| Track every tool-bearing AIMessage produced from a TMessage. For each, build a position-aligned `additional_kwargs.signatures` array (empty placeholders for tool_calls without a stored sig). Agents' `fixThoughtSignatures` dispatches non-empty entries to functionCall parts in order. \| ## Live verification - Single-step: real Vertex `gemini-3.1-flash-lite-preview` resume-after-tool case. With fix ✅ / without ❌ 400. - Multi-step (codex case): real two-step agent loop (list /tmp → echo done). Each step's signature attaches to its own reconstructed AIMessage. With fix ✅ / without ❌ 400. - Cross-provider: Anthropic Claude haiku-4.5 + OpenAI gpt-5-mini accept the persisted/restored shape unchanged. ## Tests `modelEndHandler.spec.js` (new) — 6 tests: - maps non-empty signatures onto tool_call_ids in order - accumulates per-id across multiple `model_end` events (multi-step) - no-op when `collectedThoughtSignatures` is null - no-op when `signatures` field missing (non-Vertex) - no-op when `tool_calls` missing - preserves existing `collectedUsage` array contract `formatAgentMessages.spec.js` — 6 new tests: - restores onto the AIMessage that owns the tool_call - per-step attachment for multi-step turns (codex review case) - preserves tool_call ordering when signatures are partial - no-op when metadata.thoughtSignatures absent - no-op when assistant has no tool_calls - no-op when stored ids don't match any current tool_call 37 passing across 3 suites; 15 existing formatAgentMessages tests unchanged. ## Compatibility - Backward-compatible — restore gated on `metadata.thoughtSignatures` being a populated object; capture gated on the map being provided. - No schema migration — uses `Message.metadata: Mixed` already in place. - Cross-provider safe — non-Vertex providers tolerate the field (verified live against Anthropic + OpenAI converters). - Pairs with [agents#159](https://github.com/danny-avila/agents/pull/159) for full coverage on histories that mix plain-text and toolcall AIMessages.	2026-05-08 18:51:34 -04:00
Danny Avila	93c4ef4ba8	🧱 refactor: typed CodeEnvRef + kind discriminator + principal-aware sandbox cache (#12960 ) * 🧱 refactor: typed CodeEnvRef + kind discriminator + tenant-aware sandbox cache Final cutover for the LibreChat ↔ codeapi sandbox file identity. Replaces the magic string `${session_id}/${file_id}?entity_id=...` with a typed, discriminated `CodeEnvRef`. Pre-release lockstep deploy with codeapi #1455 and agents #148; no legacy aliases retained. ## Final shape ```ts type CodeEnvRef = \| { kind: 'skill'; id: string; storage_session_id: string; file_id: string; version: number } \| { kind: 'agent'; id: string; storage_session_id: string; file_id: string } \| { kind: 'user'; id: string; storage_session_id: string; file_id: string }; ``` `kind` drives codeapi's sessionKey: `<tenant>:<kind>:<id>[✌️<version>]` for shared kinds, `<tenant>:user:<userId>` for user-private (auth context provides `userId`). `version` is statically required for `kind: 'skill'` and forbidden otherwise via discriminated union — constraint holds at compile time on every consumer, not just codeapi's runtime validator. `id` is sessionKey-meaningful for `'skill'` / `'agent'`; informational only for `'user'` (codeapi resolves user identity from auth context). ## What changed - `packages/data-provider/src/codeEnvRef.ts` — discriminated union + `CODE_ENV_KINDS` const-tuple keeps the runtime list and TS union locked together. - Schemas: `metadata.codeEnvRef` and `SkillFile.codeEnvRef` enums tightened to `['skill', 'agent', 'user']`. - `primeSkillFiles` writes `kind: 'skill'`, `id: skill._id`, `version: skill.version`. Cache-hit path reads `codeEnvRef` directly. Bumping `skill.version` on edit naturally invalidates the prior cache entry under the new sessionKey. - `processCodeOutput` writes `kind: 'user'`, `id: req.user.id`. Output bucket is always user-scoped, regardless of which skill the execution invoked. New regression test pins the asymmetry. - `primeFiles` reupload preserves `kind`/`id`/`version?` from the existing ref so a skill-cache-miss reupload doesn't silently demote to user bucket. - `crud.js` upload functions (`uploadCodeEnvFile` / `batchUploadCodeEnvFiles`) thread `kind`/`id`/`version?` to the multipart form (codeapi #1455 option α). Without these on the wire, codeapi falls back to user bucketing and skill-cache invalidation never fires. Client-side validation mirrors codeapi's validator. - `Files/process.js` — chat attachments use `kind: 'user'`; agent setup files use `kind: 'agent'`. - Drops `entity_id` everywhere (struct, schema sub-docs, write paths, upload form fields). Drops `'system'` from the kind enum (no emitter ever existed). ## Test plan - [x] `cd packages/data-provider && npx jest src/codeEnvRef.spec` — 4 / 4 - [x] `cd packages/data-schemas && npx jest` — 1447 / 1447 - [x] `cd packages/api && npx jest src/agents` — 81 / 81 in skillFiles + handlers + resources - [x] `cd api && npx jest server/services/Files server/controllers/agents` — 436 / 436 - [x] `cd api && npx jest server/services/Files/Code` — 98 / 98 (incl. new "outputs are user-scoped regardless of which skill the execution invoked" regression and "reupload forwards kind/id/version from existing ref") - [x] `npx tsc --noEmit -p packages/data-{provider,schemas}/tsconfig.json && npx tsc --noEmit -p packages/api/tsconfig.json` — clean (only pre-existing unrelated dev errors in storage/balance, untouched here) ## Deploy notes - 24h cache-miss burst on first deploy. Inputs (skill caches re-prime under new sessionKey shape) and outputs (any pre-Phase C skill-output cached files become unreadable). Bounded by codeapi's 24h TTL. - Lockstep with codeapi #1455 and agents #148. Either repo can land first since no aliases to drain, but the three deploys must overlap within the same maintenance window. - `@librechat/agents` bump to `3.1.79-dev.0` required after agents #148 lands and is published. ## What this enables Auth bridge work (JWT-based tenant/user identity between LC and codeapi) — codeapi now derives sessionKey purely from `req.codeApiAuthContext.{ tenantId, userId}`, so the next chapter is replacing the header-asserted user identity with a verified-claim path. * 🩹 fix: persist execute_code uploads under codeEnvRef metadata key Codex review P1 (chatgpt-codex-connector). `Files/process.js` was storing the upload result under `metadata.fileIdentifier` even though: - `uploadCodeEnvFile` now returns `{ storage_session_id, file_id }`, not the legacy magic string. - The post-cutover schema (`File.metadata.codeEnvRef`) only declares `codeEnvRef` — mongoose strict mode silently strips unknown keys. - All readers (`primeFiles`, `getCodeFilesByIds`, `categorizeFileForToolResources`, controller filtering) check `metadata.codeEnvRef`. Net effect of the bug: chat-attached and agent-setup execute_code files would lose their sandbox reference on save, and primeFiles would skip them on subsequent code-execution turns — the file blob would still be available locally but never re-mounted in the sandbox. Fix: construct the full `CodeEnvRef` (`{ kind, id, storage_session_id, file_id }`) at the write site and persist under `metadata.codeEnvRef`. `BaseClient`'s "is this a code-env file" presence check accepts the new shape alongside the legacy `fileIdentifier` for back-compat with any pre-cutover records still in the database. Mirrors the same change in `processAttachments.spec.ts` (which re-implements the BaseClient logic for testability). New regression tests in `process.spec.js` cover three cases: - chat attachments (`messageAttachment=true`) → `kind: 'user'` - agent setup (`messageAttachment=false`) → `kind: 'agent'` - legacy `fileIdentifier` key is NOT persisted (would be schema-stripped) * 🩹 fix: read storage_session_id on primed file refs (Codex P1) Codex review (chatgpt-codex-connector). After Phase B's per-file `session_id` → `storage_session_id` rename, `primeFiles` emits the new field — but `seedCodeFilesIntoSessions` was still reading `files[0].session_id` for the representative session and `f.session_id` for the dedupe key. In runs with only primed attachments (no skill seed), `representativeSessionId` was `undefined`, the function returned the unchanged map, and `seedCodeFilesIntoSessions` silently dropped the entire batch. The first `execute_code` call then started without `_injected_files` and the agent couldn't see prior-turn artifacts. Fix: - `codeFilesSession.ts`: read `f.storage_session_id` for both the dedupe key and the representative session id. JSDoc updated to match the new field name. - `callbacks.js`: the two output-file persistence paths read `file.session_id` to pass to `processCodeOutput` — switch to `file.storage_session_id`. The original comment explicitly says this should be the STORAGE session, which is exactly the field Phase B renamed. - `codeFilesSession.spec.ts`: fixture builder uses `storage_session_id` and `kind: 'user'` to match the post-cutover `CodeEnvFile` shape. Lockstep coordination: this matches the post-bump shape of `@librechat/agents` 3.1.79+. CI tsc errors against the currently-pinned 3.1.78 are expected and resolve when the dep bumps in this PR before merge. * 📦 chore: Bump `@librechat/agents` to version 3.1.80-dev.0 in package-lock and package.json files * 🪪 fix: thread kind/id/version through codeapi /download URLs (Phase C α) Symmetric fix for the upload-side wire change in 537725a. Codeapi's `sessionAuth` middleware now requires `kind`/`id`/`version?` on every download/freshness URL — without them it 400s with "kind must be one of: skill, agent, user" before serving the file. Three sites construct codeapi-side URLs that go through `sessionAuth`: - `processCodeOutput` (`Files/Code/process.js`): `/download/<sess>/<id>` for freshly-generated sandbox outputs. Always `kind: 'user'` + `id: req.user.id` — code-output files are always user-private, regardless of which skill the run invoked. - `getSessionInfo` (`Files/Code/process.js`): `/sessions/<sess>/objects/<id>` for the 23h freshness check. Pulls kind/id/version straight off the `codeEnvRef` already in scope — skill files stay skill-bucketed, user files stay user-bucketed. - `/code/download/:session_id/:fileId` LC route (`routes/files/files.js`): proxies to codeapi for manual downloads. Code-output files only on this route, so `kind: 'user'` + `id: req.user.id`. The `getCodeOutputDownloadStream` helper in `crud.js` now takes an `identity` param, validated by a `buildCodeEnvDownloadQuery` helper that mirrors `appendCodeEnvFileIdentity`'s shape rules: kind required from the closed `{skill, agent, user}` set, version required for 'skill' and forbidden otherwise. Bad callers fail fast on the client instead of round-tripping a 400. Also cleans up two log-noise sources reported alongside the 400: - `logAxiosError` in `packages/api/src/utils/axios.ts` was dumping `error.response.data` raw. With `responseType: 'arraybuffer'` that's a `Buffer` (~4 chars per byte after JSON-serialization); with `responseType: 'stream'` it's a `Readable` whose internal state serializes the entire ring buffer + socket. New `renderResponseData` decodes small buffers as UTF-8 (truncated past 2KB) and stubs streams as `'[stream]'`. Diagnostics stay useful, log lines stop being megabytes. - `/code/download` route's catch was bare `logger.error('...', error)`, bypassing the redactor. Switched to `logAxiosError` so it benefits from the same buffer/stream handling. Tests updated to match the new contract: - crud.spec: `getCodeOutputDownloadStream` fixtures pass `userIdentity`; new cases cover skill identity (with version), bad kind rejection, skill-without-version rejection. - process.spec: `getSessionInfo` test passes a full `codeEnvRef` object. * ♻️ refactor: extract codeEnv identity helpers into packages/api Per the project convention that new backend code lives in TypeScript under `packages/api`, moves `appendCodeEnvFileIdentity` and `buildCodeEnvDownloadQuery` from `api/server/services/Files/Code/crud.js` into a new `packages/api/src/files/code/identity.ts` module. Both helpers are pure validators that mirror codeapi's `parseUploadSessionKeyInput` server-side rules (closed kind set, `version` required for `'skill'` and forbidden otherwise) — they deserve TS support and a dedicated spec rather than living as JSDoc-typed helpers in the legacy `/api` workspace. The new module: - Exports a `CodeEnvIdentity` interface using the `librechat-data-provider` `CodeEnvKind` discriminated union. - Adds 13 unit tests in `identity.spec.ts` covering the validation matrix (skill+version, agent, user, and every rejection path) plus URL encoding for the download query. - Re-exported from `packages/api/src/files/code/index.ts` alongside `classify`, `extract`, and `form`. Consumer updates: - `api/server/services/Files/Code/crud.js`: drops the local helpers and imports them from `@librechat/api`. Net -64 lines. - `api/server/services/Files/Code/process.js`: same. - Test mocks for `@librechat/api` in three spec files now stub the helpers' validation behavior locally rather than pulling them through `requireActual` (which would drag in provider-config init-time side effects). The package's `exports` field only surfaces the root barrel, so leaf imports aren't reachable from legacy `/api` test setup. No runtime behavior change. Identity validation rules and emitted form/query shapes are byte-for-byte identical pre/post. * 🪪 fix: emit resource_id alongside id on _injected_files (skill 403 fix) Companion to codeapi #1455 fix and agents 3.1.80-dev.1 — the wire shape for shared-kind files now requires `resource_id` distinct from the storage `id`. Without this LC change, codeapi's sessionKey re-derivation on every shared-kind /exec rejects with 403 session_key_mismatch: cached: legacy:skill:69dcf561...✌️59 (signed at upload, skill _id) derived: legacy:skill:ysPwEURuPk-...✌️59 (storage nanoid) Emit sites updated: - `primeInvokedSkills` cache-hit path: `resource_id: ref.id` (the persisted skill `_id` from `codeEnvRef.id`); `id: ref.file_id` unchanged (storage uuid). - `primeInvokedSkills` fresh-upload path: `resource_id: skill._id.toString()` on every primed file (the `allPrimedFiles` builder type now carries the field). - `processCodeOutput`'s `pushFile` (Code/process.js): `resource_id: ref.id` — for `kind: 'user'` this is informational (codeapi derives sessionKey from auth context) but emitted for shape uniformity with shared kinds. Bumps `@librechat/agents` to `^3.1.80-dev.1` (the version that ships the matching `CodeEnvFile.resource_id` field). ## Test plan - [x] `cd packages/api && npx jest src/agents` — 67 / 67 pass (skillFiles fixtures updated to assert `resource_id` on the emitted CodeSessionContext.files). - [x] `cd api && npx jest server/services/Files server/controllers/agents` — 445 / 445 pass (process.spec fixtures updated for the reupload + cache-hit emission). - [x] `npx tsc --noEmit -p packages/api/tsconfig.json` — clean. * fix(skill-tool-call): carry resource_id through primeSkillFiles → artifact Codeapi was 400ing every /exec following a `handle_skill` tool call with `resource_id is invalid` (`type: 'undefined'`). Both code paths in `primeSkillFiles` (cache-hit + fresh-upload) returned files without `resource_id`/`kind`/`version`, and the artifact in `handlers.ts` forwarded the stripped shape into `tc.codeSessionContext.files` → `_injected_files`. `primeInvokedSkills` (the NL-detected loader) had already been fixed end-to-end; this commit aligns the tool-invoked path with the same contract: `resource_id` = `skill._id.toString()`, `kind: 'skill'`, `version` = the skill's monotonic counter. Tests added to `skillFiles.spec.ts` lock the contract on `primeSkillFiles` directly so future refactors can't silently drop the resource identity again. * fix(handlers.spec): align session_id → storage_session_id rename + kind discriminator Pre-existing TS errors against the post-rename `CodeEnvFile` shape: the test file still used `session_id` on per-file objects (renamed to `storage_session_id` in agents Phase B/C) and was missing the `kind` discriminator the discriminated union requires. Both inputs and the matching `expect.toEqual(...)` mirrors updated together so the runtime equality check still holds. Lines 723-732 stay as-is — they sit behind `as unknown as ToolCallRequest` and TS already skipped them. * chore: fix `@librechat/agents`, correct version to 3.1.80-dev.0 in package.json files * chore: bump `@librechat/agents` to version 3.1.80-dev.1 in package.json and package-lock.json * chore: bump `@librechat/agents` to version 3.1.80-dev.2 * feat(observability): trace file priming chain from primeCodeFiles to _injected_files Diagnosing the user-upload "files=[] on first /exec" bug requires seeing where in the LC chain a file ref disappears. Prior to this patch the chain (primeCodeFiles → primedCodeFiles → initialSessions → CodeSessionContext → _injected_files) was opaque end-to-end: - primeCodeFiles silently dropped files without `metadata.codeEnvRef` - reuploadFile catches all errors and continues with no signal - the handlers.ts handoff to codeapi never logged what it was sending After this patch, a single grep on `[primeCodeFiles]` plus `[code-env:inject]` shows the full per-file path: [primeCodeFiles] in: file_ids=N resourceFiles=M [primeCodeFiles] file=<id> path=skip reason=no-codeenvref filename=... [primeCodeFiles] file=<id> path=cache-hit-by-session storage_session_id=... [primeCodeFiles] file=<id> path=reupload reason=no-uploadtime ... [primeCodeFiles] file=<id> path=reupload reason=stale ... [primeCodeFiles] file=<id> path=reupload-success oldSession=... newSession=... newFileId=... [primeCodeFiles] file=<id> path=reupload-failed session=... [primeCodeFiles] file=<id> path=fresh-active storage_session_id=... [primeCodeFiles] out: returned=N skippedNoRef=M reuploadFailures=K [code-env:inject] tool=<name> files=N missingResourceId=K (debug) [code-env:inject] M/N files missing resource_id ... (warn) [code-env:inject] tool=<name> _injected_files=0 ... (warn) The boundary log warns when LC sends zero injected files on a code-execution tool call — that's the user's actual symptom showing up at the LC side instead of having to correlate against codeapi's `Request received { files: [] }`. Tag chosen as `[code-env:inject]` rather than `[handoff:exec]` to avoid collision with the app-level "handoff" semantic (subagent handoff workflow). Structural cleanup in primeFiles: replaced the `if (ref) { ... }` nesting with an early `if (!ref) continue` so the per-path instrumentation hooks land at top-level scope instead of indented inside a conditional. Behavior unchanged; pushFile / reuploadFile identical. Spec fixtures (handlers.spec.ts, codeFilesSession.spec.ts) updated to include `resource_id` on `CodeEnvFile` literals — required by the post-3.1.80-dev.2 type now installed. ## Test plan - [x] `cd packages/api && npx jest src/agents/handlers.spec.ts src/agents/codeFilesSession.spec.ts src/agents/skillFiles.spec.ts` — 69/69 pass - [x] `cd api && npx jest server/services/Files/Code/process.spec.js` — 84/84 pass - [x] `npx tsc --noEmit -p packages/api` — clean - [x] `npx eslint` on all four touched files — clean * chore: add CONSOLE_JSON_STRING_LENGTH to .env.example for JSON log string length configuration * fix(files): align codeapi upload filename with LC's sanitized DB filename User-attached files for code execution were uploading to codeapi under `file.originalname` (raw upload filename, may contain spaces / special chars) while LC's DB record stored the sanitized form (`sanitizeFilename(file.originalname)`, underscores). Codeapi preserves whatever filename the upload sent, so the sandbox saw `/mnt/data/<originalname>` while LC's `primeFiles` toolContext text + `_injected_files.name` referenced `file.filename` (sanitized). Visible failure: agent gets system prompt saying /mnt/data/librechat_code_api_-_active_customer_-_2025-11-05.xlsx …tries that path, hits `FileNotFoundError`, then notices the sandbox's actual `Available files` line says /mnt/data/librechat code api - active customer - 2025-11-05.xlsx …retries with spaces, succeeds. Wastes a tool call per upload and leaks raw filenames into model context. Fix: sanitize once and use the sanitized form in both the codeapi upload AND the LC DB record. Sandbox path = LC toolContext text = in-memory ref name. No drift. Reupload path (`Code/process.js` line 867 `filename: file.filename`) already uses the sanitized DB name, so it stays consistent with the fresh-upload path after this change. ## Test plan - [x] `cd api && npx jest server/services/Files/process` — 32/32 pass - [x] `npx eslint` on the touched file — clean * chore: bump `@librechat/agents` to version 3.1.80-dev.3 in package.json and package-lock.json	2026-05-08 12:29:43 -04:00
Danny Avila	5c338a4642	🛂 fix: Harden Agent File Preview Access (#12981 ) * fix: harden agent file access * style: format agent file query * fix: prune agent file refs on alternate writes * test: fix agent pruning specs	2026-05-06 19:56:04 -04:00
Danny Avila	9c81792d25	🔐 feat: Add Signed CloudFront File Downloads (#12970 ) * feat: add signed CloudFront downloads * fix: preserve local IdP avatar paths * fix: address signed download review findings * fix: harden CloudFront cookie scope validation * fix: preserve URL save API compatibility * fix: store CDN SSO avatars under shared prefix * fix: Harden CloudFront tenant file access * fix: Preserve CloudFront download compatibility * fix: Address CloudFront review follow-ups * fix: Preserve file URL fallback user paths * fix: Address download review hardening * fix: Use file owner for S3 RAG cleanup * fix: Address final download review nits * fix: Clear stale avatar CloudFront cookies * fix: Align download filename helpers with dev * fix: Address final CloudFront review follow-ups * fix: Stream S3 URL uploads * fix: Set S3 stream upload length * fix: Preserve download metadata filepath * fix: Avoid remote content length for stream uploads * fix: Use bounded multipart URL uploads * fix: Harden S3 filename boundaries	2026-05-06 19:48:30 -04:00
Danny Avila	6c6c72def7	🚀 feat: Decouple File Attachment Persistence from Preview Rendering (#12957 ) * 🗂️ feat: add `status` lifecycle to file records for two-phase previews Schema and model foundation for decoupling the agent's final response from CPU-heavy office-format HTML extraction. - `MongoFile.status: 'pending' \| 'ready' \| 'failed'` (indexed) and `previewError?: string` mirror the lifecycle: phase-1 emits the file record at `pending` so the response is unblocked; phase-2 transitions to `ready` (with text/textFormat) or `failed` (with previewError) in the background. Absent for legacy records — clients treat that as `ready` for back-compat. - Mirror types added to `TFile` in data-provider so frontend cache consumers see the new fields. - New `sweepOrphanedPreviews(maxAgeMs)` method on the file model recovers stale `pending` records left behind by a process restart mid-extraction; transitions them to `failed` with `previewError: 'orphaned'`. Cheap because `status` is indexed. * ⚡ feat: two-phase code-execution preview flow (unblocks final response) The agent's final response no longer waits on CPU-heavy office HTML extraction. Phase-1 (download + storage save + DB record at `status: 'pending'`) is awaited as before; phase-2 (extract + `updateFile`) runs in the background with a hard 60s ceiling. Three flows, all funneling through `processCodeOutput` and updated to the new `{ file, finalize? }` return shape: - `callbacks.js` (chat-completions + Open Responses streaming): emit the phase-1 attachment immediately (carries `status: 'pending'` for office buckets so the UI shows "preparing preview…"), then fire-and-forget `finalize()`. If the SSE stream is still open when phase-2 lands, push an `attachment` update event with the same `file_id` so the client merges over the placeholder in place. - `tools.js` direct endpoint: same split — return the phase-1 metadata immediately, run extraction in the background. Client polls for the resolved record. `finalize()` wraps the existing 12s per-render timeout in a 60s outer `withTimeout`. The HTML-or-null contract from #12934 is preserved: office types that fail extraction transition to `status: 'failed'` with `previewError: 'parser-error' \| 'timeout'` rather than falling back to plain text (would be an XSS vector). Promises continue running after the HTTP response closes (Node doesn't kill them). The boot-time orphan sweep covers the only case that loses progress — actual process restart mid-extraction. `primeFiles` annotates the agent's `toolContext` line for prior-turn files: `(preview not yet generated)` for pending, `(preview unavailable: <reason>)` for failed. The model can volunteer "you can still download it" instead of pretending the preview is fine. `hasOfficeHtmlPath` exported from `@librechat/api` so `processCodeOutput` can decide whether a file expects a preview at all. * 🔍 feat: `GET /api/files/:file_id/preview` endpoint and boot orphan sweep - New `GET /api/files/:file_id/preview` route returns `{ status, text?, textFormat?, previewError? }`. The frontend's `useFilePreview` React Query hook polls this while phase-2 is in flight, then auto-stops on terminal status. ACL identical to the download route (reuses `fileAccess` middleware). Defaults `status` to `'ready'` for legacy records so back-compat is implicit. `text` only included when `status === 'ready'` and non-null — preserves the HTML-or-null security contract from #12934. - `sweepOrphanedPreviews()` invoked on boot in both `server/index.js` and `server/experimental.js`. Recovers any `pending` records left behind by a process restart mid-extraction (the only case the in-process two-phase flow can't handle on its own). Fire-and-forget so a transient sweep failure doesn't block startup. * 🖥️ feat: frontend two-phase preview consumer (polling + UI states) Wires the React side to the new lifecycle so the user sees what's happening with their file while phase-2 extraction runs in the background and after the response stream closes. - `useAttachmentHandler` upserts by `file_id` (was append-only) so the phase-2 SSE update event merges over the pending placeholder in place. Lightweight attachments without a `file_id` (web_search / file_search citations) keep the legacy append path. - `useFilePreview(file_id)` React Query hook with `refetchInterval: (data) => data?.status === 'pending' ? 2500 : false` so polling auto-stops on the first terminal response without the caller having to flip `enabled`. - `useAttachmentPreviewSync(attachment)` bridges polled data into `messageAttachmentsMap`. Polling enabled iff `status === 'pending' && isAnySubmitting` — per the design ask: active polling while the LLM is still generating, then quiet. Process-restart and post-stream cases are covered by polling on the next interaction. - `Attachment.tsx` renders a small `PreviewStatusIndicator` (spinner + "Preparing preview…" for pending, alert icon + "Preview unavailable" for failed) inside `FileAttachment`. Download button stays fully functional in both states. Two new English locale keys. - Data-provider scaffolding: `TFilePreview` type, `endpoints.filePreview`, `dataService.getFilePreview`, `QueryKeys.filePreview`. * 🧪 fix: stub `useAttachmentPreviewSync` in pre-existing Attachment test mocks The new `useAttachmentPreviewSync` hook is called unconditionally inside `FileAttachment` (added in the prior commit). Two pre-existing test files mock `~/hooks` to provide `useLocalize` only — the un-mocked preview hook reference resolved to undefined and crashed render with `(0 , _hooks.useAttachmentPreviewSync) is not a function` on the Ubuntu/Windows CI runners. Fix is local to the test mocks: add a no-op stub that returns `{ status: 'ready' }` so the component renders the legacy chip path. The two-phase preview behavior itself has its own dedicated suites (`useAttachmentHandler.spec.tsx`, `useAttachmentPreviewSync.spec.tsx`). * 🐛 fix: route phase-2 attachment update to current-run messageId Codex P1 review on PR #12957. `processCodeOutput` intentionally preserves the original DB `messageId` across cross-turn filename reuse so `getCodeGeneratedFiles` can still trace a file back to the assistant message that originally produced it. The phase-1 SSE emit already routes by the current run's messageId — `processCodeOutput` runtime-overlays it via `Object.assign(file, { messageId, toolCallId })` and the callback writes `result.file` directly. Phase-2 was passing the raw `updateFile` return through `attachmentFromFileMetadata`, which read `messageId` straight off the DB record. On a turn-N run that re-emitted a filename from turn-1 (e.g. agent writes `output.csv` again), the phase-2 SSE update routed to `turn-1-msg` instead of `turn-N-msg`. Frontend's `useAttachmentHandler` upserts under the wrong messageAttachmentsMap slot — turn-N's pending chip stays stuck at "preparing preview…" while turn-1's already-resolved attachment gets re-merged. Fix: thread `runtimeMessageId` through `attachmentFromFileMetadata` and pass `metadata.run_id` from the phase-2 emit site. Mirrors how phase-1 sources its messageId. Tests cover the cross-turn reuse case plus the writableEnded / null-finalize / no-finalize paths to lock in the broader phase-2 emit contract. * 🛠️ refactor: address codex audit findings (wire-shape parity, DRY, defensive catch) Comprehensive audit on PR #12957. Resolves all valid findings: - MAJOR #1 — Wire-shape parity: phase-1 ships the full `fileMetadata` record over SSE; phase-2 was using a tight `attachmentFromFileMetadata` projection. Drop the projection and have phase-2 spread `{...updated, messageId, toolCallId}` so both events match the long-standing legacy phase-1 shape clients depend on. - MAJOR #2 — DRY: extract `runPhase2Finalize({ finalize, fileId, onResolved })` into `process.js` (alongside `processCodeOutput` whose contract it pairs with). Both `callbacks.js` paths and `tools.js` now flow through it. Single catch path eliminates divergence surface — the fix landed in 01704d4f0 (cross-turn messageId routing) was a symptom of this duplication risk. - MINOR #3 — JSDoc accuracy: `finalizePreview`'s buffer is bounded by `fileSizeLimit`, not the 1MB extractor cap. Updated and added a note about peak heap from queued buffers. - MINOR #4 — Defensive catch: `runPhase2Finalize`'s catch attempts a best-effort `updateFile({ status: 'failed', previewError: 'unexpected' })` for the file_id, so a programming bug in `finalizePreview` doesn't leave the record stuck `'pending'` until the next boot-time orphan sweep. - NIT #6 — Stale PR refs: 12952 → 12957 in 3 places. - NIT #7 — Schema bound: `previewError` capped at `maxlength: 200` to prevent a future codepath from accidentally persisting a stack trace. Skipped per audit verdict (non-blocking): - #5 (memory pressure): documented in JSDoc; impl change was reviewer's "consider", not actionable. - #8 (double DB query per poll): low cost, indexed by_id, polling is gated narrow. - #9 (TAttachment cast): the union type is intentional; the casts are safe widening, refactoring TAttachment is invasive and out of scope. Tests: 11 new (7 `runPhase2Finalize` unit tests covering happy path, null-finalize, throws, double-fail, no-fileId, no-onResolved; +4 wire-shape parity assertions in the existing cross-turn test). 328 backend tests pass; 528 frontend tests pass; lint and typecheck clean. * 🛡️ refactor: address codex P1+P2 + rename to drop phase-1/2 jargon Codex round 2 review on PR #12957 caught two race conditions and one recovery gap, all triggered by cross-turn filename reuse (`claimCodeFile` intentionally returns the same `file_id` for the same `(filename, conversationId)` across turns). Plus naming cleanup the user requested — internal "phase 1 / phase 2" vocabulary leaks across sprints, replace it everywhere with terms describing what's actually happening. P1 — stale render overwrites newer revision (process.js) Two turns reusing `output.csv` share a `file_id`. If turn-1's background render resolves AFTER turn-2's persist step, the unconditional `updateFile` writes turn-1's stale text/status over turn-2's pending placeholder. Fix: stamp a fresh `previewRevision` UUID on every emit, thread it through `finalizePreview`, and make the commit conditional via a new optional `extraFilter` argument on `updateFile` (`{ previewRevision: <expected> }`). The defensive `updateFile` in `runPreviewFinalize`'s catch uses the same guard so a programming error from an older render also can't override a newer turn. P1 — stale React Query cache on pending remount (queries.ts) Same root cause from the frontend side. Cache key `[QueryKeys.filePreview, file_id]` may hold a prior turn's `'ready'` payload; with `refetchOnMount: false` and the polling gate on `pending`, polling never starts for the new placeholder. Fix: `useAttachmentHandler` invalidates that query whenever an attachment with a `file_id` arrives. Both initial-emit and update events trigger invalidation — uniform gate. P2 — quick-restart orphans skipped by boot sweep (files.js) Boot `sweepOrphanedPreviews` uses a 5-min cutoff for multi-instance safety. A crash + restart inside the cutoff leaves `pending` records that never get touched again. Fix: lazy sweep inside the preview endpoint — if a polled record is `pending` and `updatedAt` is older than 5 min, mark it `failed:orphaned` on the spot before responding. Conditional on the same `updatedAt` we observed so a concurrent legitimate update wins. Cheap, bounded by user activity. Naming cleanup - `runPhase2Finalize` → `runPreviewFinalize` - `PHASE_TWO_TIMEOUT_MS` → `PREVIEW_FINALIZE_TIMEOUT_MS` - All `phase-1` / `phase-2` / `two-phase` prose replaced with "the immediate emit", "the deferred render", "the persist step", "the deferred preview", etc. Skill-feature `phase 1/2` references (different feature) left alone. Tests: 10 new (4 lazy-sweep × preview endpoint, 3 cache-invalidation × useAttachmentHandler, 3 extraFilter × updateFile data-schemas). Backend 332/332, frontend 531/531, data-schemas 37/37, lint clean. * 🛠️ refactor: address comprehensive review (round 3) — stale-cache MAJOR + 3 minors Comprehensive review on PR #12957 caught a P1 follow-on bug from the prior `invalidateQueries` fix, plus 3 maintainability findings. MAJOR: stale React Query cache not actually fixed by `invalidateQueries` The previous fix called `invalidateQueries` to flush stale cached preview data on cross-turn filename reuse. But `useFilePreview` had `refetchOnMount: false`, which made the new observer read the stale-marked 'ready' data without refetching. The polling `refetchInterval` then evaluated against stale 'ready' → returned `false` → polling never started → user stuck on stale content. Fix (belt-and-suspenders): a) `useAttachmentHandler` switched to `removeQueries` — drops the cache entry entirely so the next mount has nothing to read and must fetch. b) `useFilePreview` no longer sets `refetchOnMount: false`, so the React Query default (`true`) kicks in — second line of defense if any future codepath observes stale data before the handler has a chance to evict. MINOR: `finalizePreview` JSDoc missing `previewRevision` param Added with explanation of the conditional update guard. MINOR: asymmetric stream-writable guard between SSE protocols Chat-completions delegated the gate to `writeAttachmentUpdate`; Open Responses inlined `!res.writableEnded && res.headersSent`. Extracted `isStreamWritable(res, streamId)` predicate; both paths + `writeAttachmentUpdate` now share the single source of truth. NIT: `(data as Partial<TFile>).file_id` cast repeated 4 times Extracted to a `fileId` local at the top of the handler. Tests: existing 9 invalidate-tests rewritten as remove-tests; +1 new lock-in test asserts removeQueries is called and invalidateQueries is NOT (regression guard against round-3 finding). 332 backend pass, 532 frontend pass, lint clean. Skipped findings (deferred / acceptable): - MINOR: post-submission pending state has no auto-recovery — the `isAnySubmitting` polling gate was the user's explicit design; LLM context surfaces failed/pending so the model can volunteer. Worth a follow-up if real users hit it. - NIT: double DB query per preview poll — reviewer marked acceptable; changing `fileAccess` middleware is out of scope. * 🛡️ test: address comprehensive review NITs (initial-emit guard + isStreamWritable coverage) NIT — chat-completions initial emit skips writableEnded check The Open Responses initial emit was switched to use the new `isStreamWritable` predicate in the round-3 commit, but the chat-completions initial emit kept the older narrower check (`streamId \|\| res.headersSent`). On a client disconnect mid-stream (`writableEnded === true`) it would still hit `res.write` and raise `ERR_STREAM_WRITE_AFTER_END` — caught by the outer IIFE catch but logged as noise. Switch this site to `isStreamWritable` too so both initial-emit paths share the same gate as the deferred update emits. NIT — `isStreamWritable` not directly unit-tested The predicate was only covered indirectly via the deferred-preview SSE tests (writableEnded skip, headersSent check). Export from `callbacks.js` and add 5 parametric tests pinning down each branch (streamId truthy, res null, !headersSent, writableEnded, happy path) so a future condition addition can't silently regress. * 🐛 fix: stuck "Preparing preview…" + inline the chip subtitle Two related fixes for a stuck-spinner bug a user reported in manual testing of PR #12957. Stuck spinner (the bug) The deferred preview render can complete a few seconds AFTER the SSE stream closes (typical case: PPTX render finishes ~3s after the LLM emits FINAL). When that happens, the SSE update is silently dropped (`isStreamWritable` returns false on a closed stream) and polling is the only recovery path. The earlier polling gate was `status === 'pending' && isAnySubmitting`, which mirrored the original design intent ("only query while the LLM is still generating"). But `isAnySubmitting` flips false the moment the model emits FINAL — milliseconds before the deferred render commits. Polling never runs, the chip stays "Preparing preview…" forever even though the DB has `status: 'ready'` with valid HTML. Drop the `isAnySubmitting` part of the gate. `useFilePreview`'s `refetchInterval` is already a function-form that returns `false` on the first terminal response, so polling auto-stops within one tick of resolution. The server-side render ceiling (60s) plus the lazy sweep in the preview endpoint cap the worst case to ~24 polls per pending attachment. Polling itself never blocks UX — the gate's purpose was "don't waste cycles", and capping by terminal status is the correct expression of that. Inline the chip subtitle (the visual) The previous design rendered "Preparing preview…" as a loose-feeling spinner+text BELOW the file chip. The chip itself looked done while a floating annotation said it wasn't. `FileContainer` gains an optional `subtitle?: ReactNode` prop that overrides the default file-type label. `Attachment.tsx` passes a `PreviewStatusSubtitle` (spinner + "Preparing preview…" / alert + "Preview unavailable") into that slot when the file's preview is pending or failed. The chip footprint stays identical to its `'ready'` form — just the second row swaps from "PowerPoint Presentation" to the status indicator. No floating element, no layout shift. Tests: regression test pinning down "polling stays enabled after the LLM finishes" so a future revert can't reintroduce the stuck-spinner bug. Existing FileContainer tests pass unchanged (subtitle override is opt-in). 522 frontend tests pass; lint clean. * 🐛 fix: deferred-preview survives reload + matches artifact card chrome Fixes the remaining stuck-pending case after the polling gate fix: on a reloaded conversation, message.attachments come from the DB frozen at the immediate-persist `status: 'pending'`, but `messageAttachmentsMap` is empty because no SSE handler ever fired for that messageId. Polling now INSERTS a new live entry when no record matches the file_id, and `useAttachments` merges live entries onto DB entries by file_id so the resolved text/textFormat reach `artifactTypeForAttachment` and the chip routes through the proper PanelArtifact card. Also replaces the small file chip used during the pending state with a PreviewPlaceholderCard that mirrors ToolArtifactCard chrome, so the transition to the resolved PanelArtifact no longer reshapes the UI. * ✨ feat: auto-open panel when deferred preview resolves pending→ready The legacy auto-open path is gated only on `isSubmitting`, so an office-file preview that resolves after the SSE stream closes would render in place but never auto-open the panel — even though that's exactly the moment the result becomes meaningful to the user. Adds a per-file_id one-shot signal that `useAttachmentPreviewSync` flips on the pending→ready edge; `ToolArtifactCard` consumes it on mount and auto-opens regardless of submission state. The signal is only set on the actual transition (history loads of pre-resolved files don't trigger it) and is consumed once (panel close + reopen on the same card stays user-controlled). * 🐛 fix: drop placeholder Terminal overlay + scope auto-open to fresh resolutions Two fixes for issues spotted in manual testing of the deferred-preview auto-open feature: 1. PreviewPlaceholderCard was passing `file={attachment}` to FilePreview, which triggered SourceIcon's Terminal overlay (`metadata.fileIdentifier` is set on every code-execution file). The artifact card itself doesn't show that overlay; the placeholder shouldn't either, so the pending→resolved transition is visually seamless. 2. The `previewJustResolved` flag flipped on every pending→ready transition observed by the polling hook — including stale-pending DB records that resolve via the first poll on a history load. Conversations whose immediate-persist snapshot left attachments at `status: 'pending'` would yank the panel open every revisit. Adds `mountedDuringStreamRef` to the hook (mirroring ToolArtifactCard) so the flag fires only when the hook itself was mounted during an active turn — preserving the pre-PR contract that the panel only auto-opens for results the user is actively waiting on, never for history. * 🐛 fix: don't downgrade preview to failed when only the SSE emit throws Codex P2 finding on PR #12957: the original chain placed `.catch` after `.then(onResolved)`, so a throw inside `onResolved` (transport-side errors — SSE write race after stream close, an emitter listener throwing) would propagate into the finalize catch and persist `status: 'failed'` / `previewError: 'unexpected'`. That surfaced "preview unavailable" in the UI for a perfectly valid file, and degraded next-turn LLM context to reflect a non-existent failure. Wraps `onResolved` in its own try/catch so emit errors are logged but do not affect the file's persisted status. Extraction success and emit success are now independent: if extraction succeeds and `finalizePreview` writes the terminal status, the polling layer / next page load surfaces the resolved preview even if this turn's SSE emit didn't land. * 🛡️ fix: run boot-time orphan sweep under system tenant context Codex P2 finding on PR #12957: `File` is tenant-isolated, so under `TENANT_ISOLATION_STRICT=true` the boot-time `sweepOrphanedPreviews` threw `[TenantIsolation] Query attempted without tenant context in strict mode` and the recovery path silently failed every restart. Stale `status: 'pending'` records would be stuck until a user happened to poll the preview endpoint and trigger the lazy sweep — which only covers the file the user is currently looking at, not the bulk candidate set the boot sweep is designed to recover. Wraps the sweep in `runAsSystem(...)` in both boot paths (`api/server/index.js` and `api/server/experimental.js`) and pins the contract with regression tests in `file.spec.ts` — one test asserts the bare call throws under strict mode, the other asserts the `runAsSystem`-wrapped call succeeds. * 🧹 chore: trim verbose comments from previous commit * 🧹 chore: address review findings (dead branch, lazy-sweep cutoff, stale JSDoc) - finalizePreview: drop unreachable !isOfficeBucket branch (caller already gates on hasOfficeHtmlPath, so this path is always office) - preview endpoint: drop lazy-sweep cutoff from 5min to 2min — anything past the 60s render ceiling is definitively orphaned, and per-request sweep can be tighter than the per-instance boot sweep - strip stale `isSubmitting` references from JSDoc in 3 spots (the client-side gate was removed in `9a65840`) Skipped: function-length (#3) and client-side polling cap (#4) — refactors without correctness/perf wins; remaining NITs. * 🧹 fix: trim 1 query off pending polls + clear stale lifecycle on cross-shape updates - Preview endpoint: reuse fileAccess middleware's record for the lifecycle check; only re-fetch with text on the terminal ready response. Cuts the typical poll lifecycle from 2(N+1) to N+1 queries, since the vast majority of polls hit while pending and don't need text at all. - processCodeOutput non-office branch: explicitly null out status, previewError, previewRevision (codex P2). Without this, an update at the same (filename, conversationId) where the prior emit was an office file leaves stale lifecycle fields and the client renders the wrong state for the now non-office artifact. - Tests: rewire preview.spec mocks for the new shape, add boundary test pinning the 2min cutoff, add regression test for the cross-shape update. * 🐛 fix: keep polling on transient errors but cap permanently-broken endpoint Codex P2: the previous `data?.status === 'pending' ? 2500 : false` gate killed polling on the first transient error. With `retry: false`, a 500 left `data` undefined, the callback returned false, and the chip was stuck "Preparing preview…" forever — exactly the bug the polling layer was supposed to recover from. Inverts the gate: stop on terminal success (`ready`/`failed`) or after 5 consecutive errors. Transient errors keep retrying; a permanently broken endpoint caps at ~12.5s instead of polling forever. Predicate extracted as `previewRefetchInterval` for direct unit testing without fighting React Query's timer machinery. * ✨ feat: render pending-preview files in their own row Pending deferred-preview chips now bucket into a separate row above the resolved attachments — reads as "this is still happening" rather than mixing with completed downloads. Once status flips to ready, the chip re-buckets into panelArtifacts; failed re-buckets into the file row alongside other downloads. * 🎨 fix: render pending-preview chips in the panel-artifact row, not the file row Previous bucketing put pending chips in the file row (since `artifactTypeForAttachment` returns null for empty-text records). The pending placeholder is a future panel artifact — sharing the row keeps the chip in place when it resolves instead of jumping rows. Plain files still get their own row. * 🐛 fix: phase-1 SSE replay must not regress a resolved attachment Codex P1: useEventHandlers.finalHandler iterates responseMessage.attachments at stream end and dispatches each through the attachment handler. Those records are the immediate-persist snapshot (status:pending, text:null) — if a deferred update has already moved the same file_id to ready/failed, the existing merge let the pending fields win and downgraded the resolved record. Result: chip flickers back to pending and polling restarts until the lazy sweep corrects. Pin the terminal lifecycle fields (status, text, textFormat, previewError) when existing is ready/failed and incoming is pending. Other field updates still go through. * 🐛 fix: track preview-poll error cap outside React Query state Codex P2: the previous cap relied on `query.state.fetchFailureCount`, but React Query v4's reducer resets that to 0 on every fetch dispatch (the `'fetch'` action). With `retry: false`, each failed poll left count at 1 and the next dispatch reset it back to 0, so the `>= 5` branch never fired and a permanently-broken endpoint polled forever. Track consecutive errors in a module-level Map keyed by file_id, incremented in a thin `fetchFilePreview` wrapper around the data service call. The Map is cleared on success and on cap-stop, so memory is bounded by in-flight pending file_ids per session.	2026-05-06 03:04:19 -04:00
Danny Avila	f20419d0b7	📄 feat: Rich File Artifact Previews for DOCX, CSV, XLSX, PPTX (#12934 ) * 📄 feat: Rich File Artifact Previews for DOCX, CSV, XLSX, PPTX Render office files emitted by tools as interactive previews in the artifact panel instead of raw extracted text. The backend produces a sanitized HTML document via mammoth (DOCX), SheetJS (CSV/XLSX/XLS/ODS), or yauzl-based slide extraction (PPTX) and ships it through the existing SSE attachment payload; the client routes it through the Sandpack `static` template's `index.html` slot — no new browser deps, no client-side blob fetch, no React renderer components. * 🔐 fix: Restrict data: URLs to <img> in office HTML sanitizer Codex review on #12934 caught that `data:` lived in the global `allowedSchemes`, which meant a smuggled `<a href="data:text/html, <script>...</script>">` would survive sanitization. The Sandpack iframe sandbox does not gate `target="_blank"` navigations, so a click would open attacker-controlled HTML in a new tab. Scope `data:` to `<img src>` only via `allowedSchemesByTag` (mammoth inlines DOCX images as base64 `data:image/...` URIs — that path still works). Add a regression suite (`sanitizeOfficeHtml security`) with 8 cases covering: <script> stripping, event-handler removal, javascript:/data: rejection on anchors, data:image preservation in <img>, http/https/mailto allowance, target=_blank rel=noopener enforcement, and <iframe> stripping. * 🔧 fix: Route extensionless office files by MIME alone Codex review on #12934 caught that the office-render gate in `extractCodeArtifactText` only fired when the extension was in `OFFICE_HTML_EXTENSIONS` or the category was `document`/`pptx`. A tool emitting `data` with `text/csv` (no extension) classifies as `utf8-text`, so the gate was skipped and raw CSV text shipped to the client — but the client routes by MIME to the SPREADSHEET bucket expecting a full HTML document, so the panel rendered broken text. Extract a shared `officeHtmlBucket(name, mime)` predicate from `html.ts` (returns the bucket name or null). Both `bufferToOfficeHtml` (the dispatcher) and the upstream gate in `extract.ts` now go through this single source of truth, so they can never drift apart again. The predicate already mirrors the dispatcher's extension/MIME logic (extension wins; MIME is the fallback for extensionless inputs). Adds: - 14 cases for the new `officeHtmlBucket` predicate covering the positive paths (each bucket via extension OR MIME) and the negative paths (txt, py, json, jpg, pdf, zip, odt, plain noext). - A direct regression test in `extract.spec.ts` for the Codex catch: `data` with `text/csv` + utf8-text category routes through the office HTML producer. - Parameterized cases for extensionless DOCX/XLSX/XLS/ODS/PPTX files identified by MIME alone. * 🛡️ fix: Enforce extension-wins precedence in officeHtmlBucket Codex review on #12934 caught that the predicate's if-chain interleaved extension and MIME checks for each bucket — e.g. CSV's branch was `ext === 'csv' \|\| CSV_MIME_PATTERN.test(mimeType)`. A `deck.pptx` shipped with `text/csv` (sandboxed tools sometimes ship generic MIMEs) matched the CSV branch BEFORE the PPTX extension branch was reached, so a binary PPTX would have been handed to `csvToHtml` to parse as text — yielding garbage or a parse exception. Restructure to a strict two-pass dispatch: an exhaustive extension table first (one lookup, all known extensions), then MIME-only fallback for extensionless / unknown-ext inputs. The doc comment's "extension wins" claim is now actually enforced by the implementation. Add 7 regression cases covering the conflicting-MIME footgun for each bucket: deck.pptx + text/csv → pptx; workbook.xlsx + text/csv → spreadsheet; legacy.xls + pptx-MIME → spreadsheet; report.docx + text/csv → docx; data.csv + docx-MIME → csv; etc. * 🛡️ fix: Reject zip-bomb office files before in-process parsing (SEC) Addresses pre-existing availability vulnerability validated by SEC review (Codex finding 275344c5...) and made worse by this PR's HTML rendering path. A sub-1MiB compressed XLSX/DOCX/PPTX (highly compressed run-of-zeros) inflates to 200+ MiB of XML when handed to mammoth/xlsx — blocking the Node event loop for 10+ seconds and spiking RSS to ~1 GiB. The existing 8s `withTimeout` wrapper uses `Promise.race`, which can only return early; it cannot interrupt synchronous parser CPU/RAM consumption. PoC ran an authenticated execute_code call to OOM the API process. Add `assertSafeZipSize(buffer)` — a yauzl-based pre-flight that streams every entry with mid-inflate byte counting and bails on either a per-entry or total decompressed-size cap. Mid-inflate counting cannot be bypassed by falsifying the central directory's `uncompressedSize` field (the technique the PoC used). Defaults: 25 MiB per entry, 100 MiB total — generous headroom for legitimate image-heavy office files, well below the attack profile. Hook the check into every path that hands a buffer to mammoth/xlsx /yauzl: - New HTML producers (`wordDocToHtml`, `excelSheetToHtml`, `pptxToSlideListHtml`) — added by this PR - Legacy RAG text extractors (`wordDocToText`, `excelSheetToText` in `crud.ts`) — pre-existing path, also vulnerable Errors propagate as a tag-distinct `ZipBombError` so callers can distinguish a refused bomb from generic parse failures. The outer `extractCodeArtifactText` swallows the error and returns null, falling back to the regular download UI. `.xls` (BIFF/CFB binary, not ZIP) is detected by magic bytes and skipped — yauzl would reject it as malformed anyway. Adds 15 tests: - `zipSafety.spec.ts` (9): benign passes, per-entry cap, total cap, ZipBombError type-tagging, malformed-zip distinction, directory- entry handling, named-error surfacing, and the SEC-PoC pattern (sub-1 MiB compressed → 50 MiB inflated rejected on default caps). - `html.spec.ts` zip-bomb suite (5): each producer rejects a bomb; dispatcher propagates correctly; legitimate fixtures still render. - `extract.spec.ts` (1): outer extractor swallows ZipBombError and returns null so the download UI fallback fires. * 🧹 fix: Normalize MIME parameters; add legacy CSV MIME variant Two related Codex catches on PR #12934 — both about MIME-routing inconsistencies between backend and client that would cause extensionless CSV files to render as broken (raw text under an HTML slot) or skip the artifact panel entirely. P2 — backend MIME normalization: `officeHtmlBucket` matched MIME strings exactly, so a real-world `text/csv; charset=utf-8` Content-Type slipped through and the backend returned raw CSV text. The client's `baseMime` helper strips parameters before its own MIME lookup, so it routed the same file to the SPREADSHEET bucket expecting an HTML body that never arrived. Mirror the client's normalization on the backend (strip everything from `;` onward, lowercase) before bucket matching. P3 — client legacy CSV MIME: Backend's `CSV_MIME_PATTERN` accepts three variants (`text/csv`, `application/csv`, `text/comma-separated-values`); the client's `MIME_TO_TOOL_ARTIFACT_TYPE` only had the first two. An extensionless file with `text/comma-separated-values` would have backend HTML produced but the client would skip the artifact panel entirely. Add the missing variant. Tests: - 9 new parameterized-MIME cases on backend covering charset/ boundary/case variants for every bucket. - 1 new client routing case for `text/comma-separated-values`. * 🩹 fix: Try office HTML before short-circuiting on category=other Codex review on #12934 caught that the early `category === 'other'` return short-circuited before `hasOfficeHtmlPath` was checked. The classifier returns 'other' for inputs the new dispatcher can still route — extensionless `application/csv` (CSV MIMEs aren't in the classifier's text-MIME set and don't start with `text/`), and extensionless office MIMEs with parameters like `application/vnd... spreadsheetml.sheet; charset=binary` (the classifier's `isDocumentMime` exact-matches these MIMEs without parameter normalization). Both would route correctly through `officeHtmlBucket` but never reached it. Move the office-HTML attempt above the 'other' early return, and drop the `\|\| category === 'document' \|\| category === 'pptx'` shortcut now that `hasOfficeHtmlPath` covers the same surface (with parameter normalization) and a wider one. ODT still routes through `extractDocument` unchanged — `hasOfficeHtmlPath` returns false for it and the `category === 'document'` branch below handles it. Adds 3 regression tests: - extensionless `application/csv` + category='other' → office HTML - extensionless parameterized office MIME + category='other' → office HTML - defense check: actual binary 'other' (image/jpeg) still returns null without invoking the office producer * 🛡️ fix: Office types are HTML-or-null (no text fallback → XSS) Codex P1 review on #12934 caught that when `renderOfficeHtml` failed (timeout, malformed file, zip-bomb rejection) for an office type, the extractor fell through to `extractDocument` and returned plain text. The client routes by extension/MIME to the office preview buckets and feeds `attachment.text` straight into the Sandpack iframe's `index.html`. A spreadsheet cell or document body containing the literal string `<script>alert(1)</script>` would have been injected as executable markup — direct XSS. The contract for office types is now HTML-or-null with no text fallback. Failed render returns null, the client's empty-text gate keeps the artifact off the panel, and the file falls back to the regular download UI (matching what PPTX already did). PDF and ODT still go through `extractDocument` because the client routes them to PLAIN_TEXT (which the markdown viewer escapes) or no artifact at all, so plain text is safe there. Test reshuffle: - `document` describe block now uses ODT/PDF for the legacy parseDocument-path tests (DOCX/XLSX/XLS/ODS bypass that path). - New "does NOT call parseDocument for office HTML types" test locks in the SEC contract for all four office HTML buckets. - "falls back to ..." tests rewritten as "returns null when ..." with explicit `parseDocumentCalls.length === 0` assertions to prove no text leaks back to the client. - New XSS regression test for the XLSX failure path. - Mock parseDocument failure-name match relaxed to `includes()` so ODT-named tests can use the same trigger. * 🧽 chore: Address follow-up review findings on PR #12934 Wraps up the 10-finding follow-up review. Two MAJOR + four MINOR + two NIT addressed; one NIT skipped after verifying it was a misread of the package.json structure. MAJOR - #1: Rewrite `renderOfficeHtml` JSDoc to document the HTML-or-null contract explicitly. The pre-fix doc described a text-fallback path that was the original XSS vector (commit b06f08a). A future maintainer trusting the stale doc could reintroduce the fallback. - #2: Replace byte-truncation of office HTML with a small "preview too large" banner document. Cutting at a UTF-8 boundary lands mid-tag (`<table><tr><td>con\n…[truncated]`) and ships malformed markup to the iframe — unpredictable rendering, occasional broken layouts on DOCX with embedded images / wide spreadsheets. MINOR - #4: Wrap `readSlidesFromZip`'s `zipfile.close()` in try/catch so a close-time exception (mid-flight stream) doesn't replace the original error. Mirrors the defensive pattern in zipSafety.ts. - #5: Refactor PPTX extraction to use `yauzl.fromBuffer` directly, eliminating the temp-file write/unlink the safety pre-flight already proved unnecessary. Removes 4 unused imports (os, path, fs/promises, randomUUID). - #6: Extract `isPreviewOnlyArtifact(type)` to `client/src/utils/ artifacts.ts` so the membership check is unit-testable without mounting the full Artifacts component (Recoil + Sandpack + media query). 15 new test cases covering positive types, negative types, null/undefined, and unknown strings. NIT - #3: Remove dead `stripColorStyles` / `COLOR_PROPERTY_PATTERN` — unused (sanitizer's `allowedStyles` config handles color implicitly). - #7: Remove dead `!_lc_csv_label` worksheet property write. - #9: Remove no-op `exclusiveFilter: () => false` sanitize-html config. - #10: Type-narrow `PREVIEW_ONLY_ARTIFACT_TYPES` to `ReadonlySet<ToolArtifactType>` so the membership table is compile-time checked against the enum. SKIPPED - #8: Reviewer flagged `sanitize-html` as duplicated in devDeps and dependencies. The package has no `dependencies` section — only `devDependencies` and `peerDependencies`. Existing convention (mammoth, xlsx, yauzl, pdfjs-dist) is to appear in BOTH. Removing the devDep entry would break local test runs. Tests: packages/api 4406/4406, client artifacts 128/128. * 🪞 chore: Fix isPreviewOnlyArtifact test description parameter order Follow-up review nit on PR #12934. Jest's `it.each` substitutes `%s` positionally, and the table rows were `[type, expected]` while the description template read `'returns %s for type %s'` — outputting "returns application/vnd.librechat.docx-preview for type true" instead of the intended "type ... returns true". Reorder the template to match the column order. Test runner output now reads naturally: "type application/vnd.librechat.docx-preview returns true". Pure cosmetic — runtime behavior unchanged. * ✨ feat: Improve DOCX rendering and surface filename in panel header Two UX improvements based on hands-on use of the office preview pipeline. DOCX rendering — mammoth strips the navy banners, cell shading, and column layouts that direct-formatted docs apply (python-docx-style output is a common case). The flat `<p><strong>X</strong></p>` and bare `<table><tr><td>` it emits looks washed out next to the source. Three targeted compensations: - Style map promotes `Title`, `Subtitle`, `Heading 1` thru `Heading 6`, and `Quote` paragraphs to their semantic HTML equivalents (mammoth's default only handles Heading 1-6, missing Title/Subtitle/Quote). - Extra CSS scoped to `.lc-docx` gives the first table row sticky- looking header styling regardless of `<thead>` (mammoth never emits `<thead>`), adds zebra striping, and treats the python-docx `<p><strong>X</strong></p>` section-heading idiom as a pseudo-h2 with a thin accent left border so document structure survives the round trip. Headings get a left accent or underline so they read as headings instead of just bold paragraphs. - Sanitizer's `allowedAttributes` opens `class` on the heading and block tags the styleMap and CSS heuristics rely on. `<script>`, event handlers, javascript: URLs, etc. are still stripped — the existing security regression suite catches any drift. Panel header — `Artifacts.tsx` showed a generic "Preview" pill for preview-only artifacts. Single-tab Radio is a no-op; surfacing the document filename there gives the user something useful in the chrome without taking real estate. `displayFilename` handles the sandbox dotfile suffix the upload pipeline applies. Tests: html.spec.ts +1 (new CSS-emission lock), 71/71. Backend files suite 428/428. Client 308/308. * ✨ feat: High-fidelity DOCX preview via docx-preview in iframe Switch the default DOCX render path from server-side mammoth → flat HTML to client-side `docx-preview` loaded inside the Sandpack iframe. Mammoth becomes the fallback for files above the cap. Why --- The Sandpack iframe is a real browser DOM. Server-side rendering ceiling for DOCX→HTML is well below the source's visual fidelity — mammoth strips cell shading, run colors, banners, and column layouts because Word's layout model doesn't fit HTML's flow model. Pushing the render into the iframe lifts that ceiling without paying the server-side cost of jsdom or LibreOffice. What ---- - New `wordDocToHtmlViaCdn(buffer)` builds a self-contained HTML doc that embeds the binary as base64 and lets `docx-preview@0.3.7` render it on load. CSS preserves dark/light mode handoff via `prefers-color-scheme`. Bootstrap script falls back to a "preview unavailable, please download" message if the CDN is unreachable or the parse throws. - `docx-preview` and its `jszip` peer dep are pinned to specific versions on jsdelivr with SRI sha384 integrity hashes and `crossorigin="anonymous"`. Refresh: re-fetch the file, run `openssl dgst -sha384 -binary FILE \| openssl base64 -A`. - CSP locked down on the iframe: `default-src 'none'`, scripts only from jsdelivr (no eval), `connect-src 'none'` so a parser bug in docx-preview can't be turned into exfiltration of the embedded document, `base-uri 'none'`, `form-action 'none'`. Defense in depth on top of the Sandpack cross-origin sandbox. - `wordDocToHtml` dispatches by size: ≤ 350 KB binary → CDN path (high fidelity), larger → mammoth fallback (preserves the size cap on `attachment.text`). 350 KB chosen so worst-case base64-inflated output (~478 KB) plus wrapper overhead (~5 KB) fits under MAX_TEXT_CACHE_BYTES (512 KB) with 40 KB headroom. - Internal renderers exported as `_internal` for tests. Public API unchanged — callers still go through `wordDocToHtml`. PPTX intentionally NOT switched ------------------------------- Surveyed the available client-side PPTX libraries: - `pptx-preview@1.0.7` ships an ESM-only main entry plus a 1.36 MB UMD that references `require("stream"/"events"/"buffer"/"util")` — bundled for Node, not browser-clean. Could work but the runtime references to undefined Node globals are a fragility risk worth more validation than this PR can absorb. - `pptxjs` is jQuery-era, requires four separate UMD scripts in a specific order, less actively maintained. - The honest answer for PPTX is the LibreOffice sidecar (DOCX/XLSX/ PPTX → PDF → PDF.js), which is the architecture every major product (Google Drive, Claude.ai, ChatGPT) effectively uses and the only path to ~5/5 fidelity for arbitrary user decks. PPTX stays on the existing slide-list extraction for now. Open a follow-up issue for the LibreOffice/Gotenberg sidecar. Tests ----- - 6 new in CDN-rendered describe block: wrapper structure, base64 round-trip, SRI integrity + crossorigin, CSP locks (connect-src/eval/base-uri/form-action), fallback message wiring, size-threshold lock. - Adjusted 2 existing tests that asserted on mammoth-path artifacts (literal document text in `<article class="lc-docx">`) — those assertions move to the mammoth-fallback test that calls `_internal.wordDocToHtmlViaMammoth` directly. Dispatcher tests now assert CDN-path signatures instead. packages/api files: 434/434 ✅, full unit suite 4473/4473 ✅. * 🧷 fix: Address Codex P1 (MIME aliases) + P2 (CDN dependency) Two follow-up review findings on PR #12934, both real. P1 — Spreadsheet MIME aliases on client ---------------------------------------- Backend's `officeHtmlBucket` uses the broad `excelMimeTypes` regex from `librechat-data-provider` (covers `application/x-ms-excel`, `application/x-msexcel`, `application/msexcel`, `application/x-excel`, `application/x-dos_ms_excel`, `application/xls`, `application/x-xls`, plus the canonical sheet MIMEs). The client's exact-match `MIME_TO_TOOL_ARTIFACT_TYPE` only had three of those, so an extensionless XLS upload with a legacy MIME would have backend HTML produced but the client would fail to route the artifact at all — preview chip never registers. Fix: import the same regex on the client and add it as a fallback in `detectArtifactTypeFromFile` after the exact-match map miss. Stays in lock-step with the backend automatically. 7 new test cases — one per legacy alias. P2 — Hard CDN dependency on jsdelivr ------------------------------------- Air-gapped / corporate-filtered networks where jsdelivr is unreachable would see DOCX previews permanently degrade to "Preview unavailable" because the iframe could never load the renderer scripts. Mammoth was sitting right there on the server but the dispatcher always preferred the CDN path for files under 350 KB. Fix: `OFFICE_PREVIEW_DISABLE_CDN` env var. When truthy (`1`, `true`, `yes`, case-insensitive, whitespace-trimmed), `wordDocToHtml` short-circuits to the mammoth path regardless of file size. Operators on filtered networks set the env var; default behavior is unchanged. Read at function-call time (not module load) so jest can flip it in `beforeEach` without `jest.resetModules()`. The cost is one property access per render. 12 new test cases: env-unset uses CDN (default), all five truthy forms force mammoth, five non-truthy forms (`false`/`0`/`no`/empty/ arbitrary string) leave CDN active. Tests ----- packages/api/src/files: 446/446 ✅ (was 434, +12 from env-var matrix). client artifact suites: 235/235 ✅ (was 228, +7 from MIME aliases). * ✨ feat: High-fidelity PPTX preview via pptx-preview in iframe Mirrors the DOCX CDN architecture for PPTX: small files (≤350 KB binary) embed as base64 and render via `pptx-preview` loaded from jsdelivr inside the Sandpack iframe. Larger files and air-gapped deployments fall back to the existing slide-list extraction. Why --- PPTX is the format where the gap between LibreChat's preview and Claude.ai-style previews was most visible (slide-list of bullet points vs. rendered slide layouts). LibreOffice → PDF → PDF.js is still the eventual gold-standard answer for PPTX fidelity, but client-side rendering inside the Sandpack iframe gets us a meaningful intermediate step (~1.5/5 → ~3.5/5) without a sidecar. What ---- - `pptx-preview@1.0.7` (ISC license, ~1.36 MB UMD bundle that includes its echarts/lodash/uuid/jszip/tslib deps inline). Pinned to a specific version on jsdelivr with SHA-384 SRI and `crossorigin="anonymous"`. - `buildPptxCdnDocument` mirrors the DOCX wrapper: same CSP locks (`default-src 'none'`, `connect-src 'none'`, no eval, no base/form tampering), same `id="lc-doc-data"` base64 slot, same fallback message wiring (`typeof pptxPreview === 'undefined'` → "Preview unavailable"). - New public `pptxToHtml(buffer)` dispatcher; `bufferToOfficeHtml` switches its `'pptx'` case to call it. `pptxToSlideListHtml` stays exported as the slide-list-only path (still hit by tests directly and by the dispatcher fallback). - `OFFICE_PREVIEW_DISABLE_CDN=true` env-var hatch applies to PPTX too — air-gapped operators get the slide-list path. Same env-var read at call time, same matrix of truthy values (`1` / `true` / `yes` / case-insensitive / whitespace-trimmed). - `_internal` re-exports moved to after the PPTX section since the PPTX internals live further down in the file. Adds `pptxToHtmlViaCdn`, `MAX_PPTX_CDN_BINARY_BYTES`, `PPTX_PREVIEW_CDN`. Honest caveats -------------- - The 1.36 MB UMD bundle has `require("stream"/"events"/"buffer"/ "util")` references in its outer wrapper. Those are bundled-dep artifacts (likely from `tslib` / Node-shim transforms) and don't appear to execute on the browser code paths, but I haven't done manual e2e on a wide range of decks. If a class of files turns up that breaks rendering, the iframe-side fallback message catches it and operators have `OFFICE_PREVIEW_DISABLE_CDN=true` as the bail. - First-render CDN fetch is ~1.36 MB (browser-cached after). - PPTX with embedded media easily exceeds the 350 KB binary cap; those files take the slide-list path. Lifting the cap is a follow-up (tied to the broader self-hosting work). Tests ----- 11 new in two new describe blocks: - `pptxToHtml dispatcher`: routing predicate (small → CDN, env-set → slide-list). - `CDN-rendered path`: base64 round-trip, SRI integrity + crossorigin, CSP locks (connect/eval/base/form), fallback message, size-threshold lock at 350 KB. - `OFFICE_PREVIEW_DISABLE_CDN escape hatch`: env-var matrix for truthy values. packages/api/src/files: 457/457 ✅ (was 446, +11). * 🪟 fix: DOCX preview fills the artifact panel width docx-preview defaults to rendering at the document's native page width (8.5in for letter, 21cm for A4). In a wide artifact panel that left whitespace on either side; in a narrow one it forced horizontal scroll. Two changes: - Pass `ignoreWidth: true` to `docx.renderAsync` so the library skips the document's pageSize width and uses its container's width. - Defensive CSS overrides on `.docx-wrapper` and `.docx-wrapper > section.docx` in case a future library version regresses on the option, plus `padding: 0` on the wrapper to drop the page-edge whitespace docx-preview otherwise reserves. `renderHeaders`/`renderFooters`/etc. stay enabled — those still appear in the rendered output, just inside a container that fills the panel instead of a fixed-width "page." Tests unchanged (100/100); manual e2e ahead of merge. * 🩹 fix: PPTX black screen — allow blob: workers + harden bootstrap Manual e2e of the PPTX CDN renderer surfaced a black screen with "Could not establish connection. Receiving end does not exist." unhandled-rejection — characteristic of a Web Worker that couldn't start. Root cause: pptx-preview's bundled echarts dep spins up Web Workers via blob: URLs for chart rendering. Our CSP had `default-src 'none'` and no `worker-src`, so workers fell back to default → blocked. The async failure deep inside echarts didn't surface through the outer `previewer.preview()` promise, so my bootstrap's `.catch` never fired, the loading state was removed, and the iframe sat with the body background showing through (dark navy in dark mode = "black screen"). Three changes: - Add `worker-src blob:` to the PPTX CSP. Allows blob:-only worker creation without permitting arbitrary worker URLs. - Bootstrap: window-level `unhandledrejection` and `error` listeners so rejections from inside bundled-dep async pipelines surface as the user-facing "Preview unavailable" fallback instead of going silent. - Bootstrap: 8-second timeout that checks `container.children.length` — if the renderer hasn't appended anything visible by then, assume silent failure and show the fallback. Also wipe `container.innerHTML` when showing the fallback so a partial render doesn't compete with the message. DOCX wrapper unchanged: docx-preview doesn't use workers, so the worker-src directive doesn't apply, and the existing fallback path already covers its failure modes. Tests ----- - Existing PPTX CSP test now also asserts `worker-src blob:` is present. - Existing fallback-message test extended to cover the new unhandledrejection/error/timeout listeners. packages/api/src/files: 467/467 ✅. * 🔒 fix: gate office HTML routing on backend trust flag (textFormat) Codex P1 review on PR #12934: routing .docx/.csv/.xlsx/.xls/.ods/.pptx into the office preview buckets assumed `attachment.text` was already sanitized full-document HTML, but that guarantee only existed for the new code-output extractor path. Existing stored attachments and other non-code paths can still carry plain extracted text — `useArtifactProps` would then inject that as `index.html` inside the Sandpack iframe. Adds a `textFormat: 'html' \| 'text' \| null` trust flag persisted on the file record by the code-output extractor, surfaced over the SSE attachment payload and the TFile API type. The client's routing in `detectArtifactTypeFromFile` requires `textFormat === 'html'` before landing on an office HTML bucket; everything else (legacy attachments, RAG-extracted plain text from `parseDocument`, explicitly-marked 'text' entries) falls back to the PLAIN_TEXT bucket where the markdown viewer escapes content rather than executing it. Tests: new `getExtractedTextFormat` helper has 14 cases covering all office paths, legacy XLS MIME aliases, parseDocument fallthroughs, and null-input. Client `artifacts.test.ts` adds three security-gate tests proving downgrade behavior for missing/null/'text' textFormat, plus a `fileToArtifact` test that legacy office attachments without the flag end up in PLAIN_TEXT with their content escaped. * 🌐 fix: air-gapped DOCX preview — embed mammoth fallback in CDN doc Codex P2 review on PR #12934: the CDN-rendered DOCX path always pulled docx-preview + jszip from cdn.jsdelivr.net. Air-gapped or corporate- filtered networks where jsdelivr is blocked would degrade to a static "Preview unavailable" message even though the server already had a local mammoth renderer that could produce readable output. Now the dispatcher renders mammoth first and embeds the sanitized output inside the CDN document as a hidden `#lc-fallback` block. The iframe's existing `typeof docx === 'undefined'` check (which fires when the CDN scripts can't load) un-hides the fallback so the user sees a real preview. CDN-success path is unchanged: high-fidelity docx-preview output owns the viewport, mammoth fallback stays hidden. Two new safeguards in the dispatcher: - Size budget: if base64(binary) + mammoth body + wrapper > 512 KB (the `attachment.text` cache cap), drop to mammoth-only so a giant document still renders. The `OFFICE_HTML_OUTPUT_CAP` constant mirrors `MAX_TEXT_CACHE_BYTES` from extract.ts (separate constant to avoid a circular import; pinned by a unit test). - `lc-render` is hidden when fallback shows so the empty padded slot doesn't sit above the mammoth content. Tests: existing CDN-path tests updated for the new `wordDocToHtmlViaCdn(buffer, mammothBody)` signature; new test for the embedded fallback structure (`#lc-fallback`, mammoth body content, "High-fidelity renderer unavailable" notice, render-slot hide); new constant pin and per-fixture cap-respect assertion. * 🧪 feat: LibreOffice → PDF preview path (POC, opt-in via env) Per the plan-mode discussion: prove out a LibreOffice subprocess pipeline as an alternative to the docx-preview / pptx-preview CDN renderers. LibreOffice handles every office format Microsoft and LibreOffice itself can open (DOCX, PPTX, XLSX, ODT, ODP, ODS, RTF, many more), produces a PDF, and the host browser's built-in PDF viewer renders it inside the Sandpack iframe via a `data:` URI. No client-side JS dependency, no CDN dependency, true high fidelity for any feature LibreOffice supports. Off by default. Operators opt in by setting both: - `OFFICE_PREVIEW_LIBREOFFICE=true` - LibreOffice (`soffice` or `libreoffice`) on the server's `$PATH` When either is missing, the dispatcher falls through to the existing CDN/mammoth/slide-list pipeline so a misconfiguration doesn't break previews. Hardening (`packages/api/src/files/documents/libreoffice.ts`): - Fresh subprocess per call with isolated temp dir, stripped env (PATH/HOME/TMPDIR only), and `-env:UserInstallation` so concurrent conversions can't collide on shared `~/.config/libreoffice` locks - 30-second wall-time cap; SIGKILL on timeout - 50 MB PDF output cap to bound disk pressure - 512 KB output cap on the wrapped HTML so the SSE/cache contract stays intact (base64 inflates ~33%, effective PDF cap ~380 KB) - Macros disabled by default flags (`--norestore --invisible --nodefault --nofirststartwizard --nolockcheck`) - Tag-distinct `LibreOfficeUnavailableError` / `LibreOfficeConversionError` so callers can swallow appropriately Iframe wrapper (`buildPdfEmbedDocument`): - Native browser PDF viewer via `<iframe src="data:application/pdf; base64,...">` — works in Chrome, Edge, Safari, Firefox - CSP locks the iframe to `default-src 'none'; frame-src data:; connect-src 'none'; script-src 'unsafe-inline'` — no outbound network, no eval, no external scripts - `#view=FitH` for first-paint sizing - 4-second heuristic timer that swaps to a "Preview unavailable" fallback when the browser's PDF viewer is disabled (kiosk mode, Brave Shields, etc.) Wired into `wordDocToHtml` and `pptxToHtml` as the first branch — returns null when disabled / unavailable / oversized so the existing pipeline takes over. XLSX intentionally NOT routed through this path: SheetJS's HTML output is already excellent for spreadsheets (sortable, sticky headers) and PDF rendering of sheets is awkward. Tests (`libreoffice.spec.ts`, 30 cases — 25 always run, 5 conditional on the binary): env-gating parser semantics matching `OFFICE_PREVIEW_DISABLE_CDN`, fallthrough contract (never throws, returns null on any failure), CSP lock-down, fallback structure, binary probe caching + missing-binary path, error tagging, and integration tests that engage when `soffice`/`libreoffice` is on PATH (DOCX→PDF, PPTX→PDF, output-cap fallthrough). Integration tests skip cleanly on bare CI. * 🩹 fix: CI — preserve legacy download path for empty-text office attachments Two regressions surfaced after the textFormat security gate landed. 1. Client (`LogContent.test.tsx` "falls back to the legacy download branch for an office file with no extracted text"): When the security gate downgraded an office type without `textFormat: 'html'` to PLAIN_TEXT, the lenient empty-text gate on PLAIN_TEXT then accepted a missing `text` field and rendered a half-empty panel card. The historical contract is "office type + no text → legacy download UI"; the downgrade should only fire when there's actual plain text that needs safe-escaping. Fix in `detectArtifactTypeFromFile`: short-circuit to null when the office type lands in the security-gate branch with no text. The PLAIN_TEXT downgrade still fires for legacy attachments that DO carry plain text. 2. API (`process.spec.js` + `process-traversal.spec.js`): the `@librechat/api` mocks didn't expose `getExtractedTextFormat`, so `processCodeOutput` called `undefined(...)` → TypeError → tests got undefined results. Added the helper to both mocks with a faithful default (returns 'text' for non-null extractor output, null otherwise). Tests: new regression in `artifacts.test.ts` pinning the empty-text + no-textFormat → null contract for all four office types (.docx/.csv/.xlsx/.pptx), so a future refactor can't silently re-introduce the half-empty card. * 🩹 fix: PPTX slides scale to fit panel width (no horizontal scroll) Manual e2e on PR #12934: pptx-preview rendered slides at their native init dimensions (960×540 default). The artifact panel is much narrower than that, so the iframe got a horizontal scrollbar and only a corner of each slide showed at any time — the user had to drag-scroll across each slide to read it. Fix: keep pptx-preview's init at 960×540 so its internal layout math stays correct, then post-process each rendered slide: - Cache the slide's native width/height on its dataset BEFORE applying any transform (so subsequent re-fits don't measure the already-transformed box). - Wrap the slide in `.lc-slide-wrap` with explicit width/height set inline to the scaled dimensions; the wrap shrinks the layout space the slide occupies. - Apply `transform: scale(panel_width / 960)` to the slide itself with `transform-origin: top left` so the rendered output shrinks from the top-left corner into the wrap. - Cap the scale at 1.0 so small slides don't upscale and get blurry. Streaming + resize: - `MutationObserver` watches the container for slide insertions so streaming renders get scaled on arrival rather than waiting for the entire `previewer.preview` promise to settle. - `ResizeObserver` re-fits all wrapped slides when the iframe resizes (panel drag, window resize). Tests: new "bootstrap wraps + scales each slide" lock in the wrap class, scale computation, observer setup, and native-size caching so a future refactor can't silently re-introduce the overflow. * 🩹 fix: PPTX wrap+scale runs after preview, not during streaming Manual e2e on PR #12934: regenerated PPTX showed "Preview unavailable" in the iframe. Root cause: the MutationObserver I added in the previous commit fired during pptx-preview's render and moved slides out from under the library's references. pptx-preview's async pipeline raised an unhandled rejection, the iframe's window-level listener caught it, and the fallback message replaced the partial render. Fix: drop the MutationObserver. Apply the wrap+scale ONCE in a `finalize` step that runs: - On `previewer.preview().then` (the happy path) - On the 8-second timeout safety net IF the container has children (silent-failure path — pptx-preview emitted slides but never resolved its outer promise) To prevent the user from seeing an unscaled flash while pptx-preview renders into the 960px-wide canvas, the container is set to `visibility: hidden` at init and only revealed inside `finalize` after wrap+scale completes. Resize handling stays via `ResizeObserver` on `document.body`, installed AFTER the wrap pass so it doesn't fire during the wrap itself. Tests: regression assertion now also locks in: - `container.style.visibility = 'hidden' / 'visible'` (the flash- prevention contract) - Absence of MutationObserver (the bug we just removed — must NOT creep back in via a future "let's scale during streaming" idea) * 🩹 fix: PPTX slides fill panel width (drop upscale cap, per-slide scale) Manual e2e on PR #12934: slides rendered correctly but didn't fill the artifact panel — whitespace on either side. Two issues: 1. The scale was capped at `Math.min(1, available / SLIDE_W)`. On panels wider than 960px, the cap clamped the scale to 1.0 and slides rendered at native size with whitespace on the sides instead of stretching. 2. The scale was computed against the constant `SLIDE_W = 960`, but pptx-preview can emit slides whose `offsetWidth` differs from the init param if the source PPTX has a non-16:9 layout. Per-slide division of `available / nativeW` handles that case. Fix: replace `computeScale()` with two helpers — `availableWidth()` returns the panel content-box width and `scaleFor(nativeW)` returns the per-slide scale. No upscale cap. The slide content is rendered by pptx-preview against its 960×540 canvas using vector text / canvas — scaling up to e.g. 1500px doesn't visibly degrade quality. Tests: regression now also asserts: - `availableWidth()` and `scaleFor()` exist by name - The exact scale formula `availableWidth() / (nativeW \|\| SLIDE_W)` - Negative assertion that `Math.min(1, ...)` is NOT present, so a future "let's add an upscale cap" rewrite can't silently re-introduce the whitespace. * 🩹 fix: PPTX preview fills panel height (no white gap below slides) Manual e2e on PR #12934: PPTX preview filled the panel width but left empty space below the last slide. DOCX didn't have this issue because its content (mammoth-rendered HTML) flows naturally and either fits exactly or overflows; PPTX slides are fixed-aspect 16:9 and don't grow with the panel. Two changes: 1. Body fills the iframe viewport — `html, body { min-height: 100vh }` plus `body { display: flex; flex-direction: column }` and `#lc-render { flex: 1 0 auto }`. The dark theme bg now fills the iframe even when total slide content is shorter than the panel, so a single-slide deck never reveals a "white below" gap. 2. Per-slide scale honors viewport height — `scaleFor(nativeW, nativeH)` now returns `min(width-fit, height-fit)` (largest factor that fits without overflowing either dimension). On a tall artifact panel with a short deck, slides grow up to the full panel height instead of staying at the width-bound size. Existing height-fit was always considered correct conceptually but the previous implementation only used width-fit, leaving half the viewport unused per slide. Tests: regression now also asserts `availableHeight()`, the `Math.min(sw, sh)` formula, and `min-height: 100vh` are in the bootstrap. Negative assertion for the old `Math.min(1, ...)` upscale cap remains. * 🩹 fix: revert body flex on PPTX bootstrap (caused black-screen render) Manual e2e regression on PR #12934: the previous commit added `body { display: flex; flex-direction: column }` plus `#lc-render { flex: 1 0 auto }` to fill the panel height. Side effect: pptx-preview's internal layout assumes block flow on its ancestor elements; making body a flex container caused slides to render as solid-black rectangles (sized correctly, but with no visible content inside). Fix: keep just `html, body { min-height: 100vh }` for the bg-fill effect — that alone gives empty space below short decks the dark theme bg without changing flow. Drop the body-flex and the `#lc-render { flex: 1 0 auto }` directives. The height-aware `scaleFor(nativeW, nativeH)` from the same commit stays — it doesn't interact with pptx-preview's layout, just chooses a per-slide scale. Each slide still grows to fit the viewport contain-style. Negative-assertion added to the regression test: `body { display: flex }` must NOT appear in the bootstrap, so a future "let's flex the body to make height work" rewrite can't silently re-introduce this. (Note: the user also flagged DOCX theming as faint body text; I'm leaving that for now per their note that it may be pre-existing. Not addressed in this commit.) * 🩹 fix: revert PPTX height-fill changes; lock DOCX CDN to light scheme Two fixes for separate manual e2e regressions on PR #12934. 1. PPTX black screen (single slide rendering as solid black). The previous fix removed `body { display: flex }` thinking that was the sole cause, but the regression persisted. Bisecting against the last known-good commit (`4e2d538b0`, width-fit only), the actual culprit is the COMBINATION of: - `min-height: 100vh` on html/body - `availableHeight()` reading viewport-derived dimensions - `Math.min(sw, sh)` height-aware scale pptx-preview's CSS injection step interacts unpredictably with these. Reverting to width-only `scaleFor(nativeW)` and dropping the viewport min-height restores reliable rendering. Vertical empty space below short decks now shows the body's bg color (`var(--bg)`) which still matches the panel theme — that's an acceptable trade-off vs. the black-screen regression. Negative assertions added: `Math.min(sw, sh)`, `availableHeight`, `min-height: 100vh`, `body { display: flex }` must NOT appear in the bootstrap. So a future "let's fill height" rewrite has to demonstrate it doesn't break pptx-preview before it can land. 2. DOCX body text rendering as faint / translucent grey. docx-preview emits page-style rendering with white pages and the docs native text colors. The CDN doc declared `color-scheme: light dark`, so on OS dark mode the iframes inheritable `--fg` resolved to `#e5e7eb` (light grey). docx-preview body text (no explicit color in the source DOCX) inherited that light-grey on the white page bg → barely-visible "translucent" rendering. Fix: declare `color-scheme: light` only in `buildDocxCdnDocument`, drop the dark-mode `@media` override. docx-preview is a light-mode- only renderer; matching that produces correct contrast regardless of OS theme. The mammoth-only `wrapAsDocument` path is unaffected — it owns its own bg + text colors and continues to respect the users OS scheme. New regression test pins the lock: CDN doc must contain `color-scheme: light`, must NOT contain `color-scheme: light dark`, must NOT contain `prefers-color-scheme: dark`. * 🩹 fix: relax connect-src to allow sourcemap fetches (silence CSP noise) Manual e2e on PR #12934: every time DevTools is open while viewing a DOCX or PPTX preview, the console fills with CSP violations like: Connecting to 'https://cdn.jsdelivr.net/npm/docx-preview@0.3.7/ dist/docx-preview.min.js.map' violates the following Content Security Policy directive: "connect-src 'none'". The request has been blocked. The actual rendering isn't affected (sourcemap fetches happen AFTER the script has already loaded and executed via `script-src`), but the noise is enough to make people suspect a real problem and distracts from useful console output. Fix: relax `connect-src` from `'none'` to `'self' https://cdn. jsdelivr.net` in both DOCX and PPTX CDN docs. This allows: - Same-origin fetches (sandpack-static-server) — covers any bundler-embedded sourcemaps + same-origin runtime fetches the renderer might make - jsdelivr fetches — covers sourcemaps from the CDN where we loaded the script Exfiltration risk stays minimal: the iframe is cross-origin to LibreChat so an attacker can't read application data anyway, and neither 'self' (sandpack-static-server) nor jsdelivr is a useful target for exfiltrating slide content to a host the attacker controls. Tests updated: assertions for `connect-src 'none'` swapped to `connect-src 'self' https://cdn.jsdelivr.net` for both DOCX + PPTX CDN docs. Added negative assertion for wildcard `` in connect-src so a future "let's allow everything" rewrite can't widen the exfiltration surface. 🩹 fix: surface PPTX/DOCX fallback reason (inline + console) Manual e2e on PR #12934: "Preview unavailable" appears in the iframe with no way to know what actually failed. The reason was tucked into the fallback element's `title` attribute (hover-only tooltip) — easy to miss and impossible to copy/paste. Now surfaces three ways: 1. Visible inline via a `<details>` element with the reason in monospace, folded so the friendly message stays primary but the diagnostic is one click away in the iframe itself. 2. `title` attribute (preserved) for hover tooltip. 3. `console.error('[pptx-preview] fallback fired:', reason)` so DevTools shows it in red — also the only reliable way to see the reason if the iframe is detached / re-mounted. DOCX gets the same console mirror (as `console.warn` since the fallback there is "high-fidelity unavailable, showing simplified preview" — informational, not error). The DOCX fallback already displays the mammoth-rendered content visibly, so no `<details>` needed there. Tests: regression assertions pin the diagnostic surfacing — the `<details>` element, the `title` write, and the `console.error` call must all be present in the bootstrap. * 🩹 fix: PPTX CDN embeds slide-list fallback + detects empty renders Manual e2e + DOM inspection on PR #12934: pptx-preview silently produces empty `.pptx-preview-wrapper` placeholders for pptxgenjs- generated decks. The library parses the file enough to create the 960×540 host element with a black bg, then fails to populate it. The outer Promise resolves "successfully" — no throw, no rejection, the bootstrap thinks rendering succeeded — and the user sees a black rectangle with no content and no fallback message. Fix mirrors the DOCX mammoth-fallback pattern from commit `0c0b0ce88`: 1. Server side: `pptxToHtml` now renders the slide-list body (`<ol class="lc-pptx-list">...`) via the new `renderPptxSlidesBody` helper, then embeds it inside the CDN doc via the new `buildPptxCdnDocument(base64, slideListFallbackBody)` signature. Combined-doc size budget mirrors the DOCX pattern: if the CDN doc would exceed `OFFICE_HTML_OUTPUT_CAP` (512 KB), drop to slide-list only. 2. Iframe bootstrap: new `hasRenderedContent()` check after `wrapSlides()` walks each `.lc-slide-wrap` looking for actual child content inside pptx-preview's emitted slide nodes. If every wrap is empty, fires `showFallback('renderer-produced-empty- wrappers ...')` which reveals the embedded slide-list view instead of the previous static "Preview unavailable" message. 3. CSS: slide-list rules extracted to `PPTX_SLIDE_LIST_CSS` constant so they can be inlined into both the standalone slide- list document AND the CDN doc's `<style>` block (CSP `style-src` is `'unsafe-inline'` only — no external sheets). `renderPptxSlidesHtml` now delegates to `renderPptxSlidesBody` wrapped in `wrapAsDocument` — single source of truth for the slide markup. Tests (506 passing, +1 vs before): existing `pptxToHtmlViaCdn` call sites updated for the new fallback-body argument; new regression test pins `hasRenderedContent`, the `renderer-produced-empty-wrappers` reason string, the embedded fallback structure, and the inlined slide-list CSS. * fix: Detect Empty PPTX Preview Slides * 🩹 fix: LibreOffice PDF embed uses blob: URL (Chrome blocks data: PDFs) Manual e2e on PR #12934: enabling `OFFICE_PREVIEW_LIBREOFFICE=true` on a host with `soffice` installed surfaced "This page has been blocked by Chrome" inside the PDF preview iframe. Root cause: Chrome blocks `data:application/pdf;base64,...` navigations inside sandboxed iframes (anti-phishing measure since Chrome 76, see crbug.com/863001). The Sandpack iframe IS sandboxed (its `sandbox="..."` attribute lacks `allow-top-navigation` for data: URLs specifically), so when our inner `<iframe src="data: application/pdf;...">` tries to navigate, Chrome's interstitial fires and renders the "blocked" message. Fix: switch from `data:` URL to `blob:` URL. The bootstrap now: 1. Reads the base64 payload from a `<script type="application/ octet-stream;base64">` data block (same pattern as the DOCX and PPTX wrappers). 2. Decodes via `atob` + `Uint8Array.from`. 3. Creates a `Blob` with `type: 'application/pdf'`. 4. `URL.createObjectURL(blob)` produces a same-origin blob: URL. 5. Sets `pdfFrame.src = url + '#view=FitH'` — Chrome treats blob: URLs as legitimate navigation and serves the built-in PDF viewer. CSP updated: `frame-src blob:` (was `frame-src data:`). `data:` is now explicitly NOT allowed in `frame-src` since Chrome would block it anyway in our context — keeping it would be misleading documentation. Bonus: failure paths now log to `console.error` with a `[libreoffice-pdf]` prefix so DevTools surfaces blob-creation failures and PDF-viewer load timeouts in red. Tests updated: - "emits a complete sandboxed HTML document" now asserts the data-block + blob URL construction (not the old data: URL). - New CSP test "allows blob: in frame-src (NOT data:)" with both positive and negative assertions to lock in the change. - Integration test for `tryLibreOfficePreview` updated to look for the data block + `URL.createObjectURL` instead of the data: URL. - Large-payload test now verifies the data block round-trip rather than data: URL escaping (base64 alphabet has no characters that break out of `<script>` anyway). * 🩹 fix: LibreOffice PDF embed renders via pdf.js (Chrome blocks blob: PDFs too) Manual e2e on PR #12934 round 2: switching from `data:` to `blob:` URLs (commit `d90f26c11`) didn't fix the "This page has been blocked by Chrome" interstitial. Chrome blocks BOTH data: AND blob: PDF navigations inside sandboxed iframes — the built-in PDF viewer requires a top-level browsing context. The Sandpack host iframe is sandboxed, so neither approach works. Fix: switch from native browser PDF viewer to pdf.js (Mozilla's pdfjs-dist) loaded from CDN. pdf.js renders to `<canvas>` which works in any context — no plugin, no privileged viewer, no top-level requirement. ~1 MB CDN load is acceptable for a path that's already opt-in via `OFFICE_PREVIEW_LIBREOFFICE=true`. Implementation: - Pin pdf.js v3.11.174 (single-file UMD; v4+ uses ES modules which complicate the load + SRI flow) - Worker URL pointed at the same jsdelivr origin; CSP `worker-src https://cdn.jsdelivr.net blob:` allows it - DPR-aware canvas rendering: scale based on `panelWidth / page.viewport.width * devicePixelRatio` so retina displays get crisp pixels - Sequential page rendering (Promise chain) so a many-slide PDF doesn't spawn N parallel render jobs - 15 s timeout safety net (was 4 s for the native viewer; pdf.js with DPR=2 on a many-page PDF can take longer) CSP changes: - Added `script-src https://cdn.jsdelivr.net 'unsafe-inline'` (was inline-only) - Added `worker-src https://cdn.jsdelivr.net blob:` - Removed `frame-src` entirely (no nested iframes) - Removed `object-src` (no `<object>`/`<embed>` either) Same diagnostic surfacing as the other CDN paths: failure reasons shown via `<details>` disclosure inline + `console.error` to DevTools. Tests updated: PDF.js script presence, GlobalWorkerOptions setup, canvas render path, all the new failure detection paths. Negative assertions for both `data:application/pdf` and `blob:...application /pdf` so a future "let's just try the native viewer again" rewrite can't silently re-introduce the Chrome block. SRI hashes intentionally omitted (unlike docx-preview / pptx- preview) — operator opted in by setting the env flag and trusts the LibreOffice render pipeline. Worth adding once the path is proven in production. * 🧹 cleanup: trim unused _internal exports + stale JSDoc references After the LibreOffice + pdf.js path proved out, swept the office HTML modules for dead code and stale documentation. Unused `_internal` exports removed (`html.ts`): - `renderMammothBody` — only called within the file (by `wordDocToHtmlViaMammoth` and `wordDocToHtml`), never imported by tests. - `DOCX_PREVIEW_CDN` — internal config constant, never referenced. - `PPTX_PREVIEW_CDN` — same, never referenced. The remaining `_internal` surface (`wordDocToHtmlViaCdn`, `wordDocToHtmlViaMammoth`, `pptxToHtmlViaCdn`, `MAX_DOCX_CDN_BINARY_BYTES`, `MAX_PPTX_CDN_BINARY_BYTES`, `OFFICE_HTML_OUTPUT_CAP`) is all actively used by the spec file. Stale JSDoc fixed (`libreoffice.ts`): Module-level header still claimed we "embed the PDF as a base64 data:application/pdf URI" and "rely on the host browser's built-in PDF viewer". Both untrue after the pdf.js switch in commit `b2cc81ad8`. Updated to: - Describe the actual pipeline: PPTX → soffice → PDF → pdf.js → canvas - Document the dead-end iterations (data: blocked, blob: also blocked, pdf.js works) so future readers don't re-discover the same Chrome PDF-viewer-in-sandboxed-iframe limitation - Drop "(POC)" tag — the path is production-quality, just opt-in - Adjust disk footprint estimate (250-350 MB with `--no-install-recommends` is more accurate than the 500 MB original) No production code changes; tests still 505 passing. * ✨ feat: per-format LibreOffice opt-in (env value accepts format list) Manual e2e on PR #12934: enabling `OFFICE_PREVIEW_LIBREOFFICE=true` forces both DOCX and PPTX through the LibreOffice path. DOCX renders ~instantly via docx-preview and rarely needs the LibreOffice treatment; paying the ~2-3 s cold-start there hurts UX without adding much. Solution: extend the env var to accept three forms: - Truthy (`true`/`1`/`yes`): all formats — backwards compatible with the previous behavior - Falsy (`false`/`0`/`no`/empty/unset): no formats — default - Comma-separated list (`pptx`, `pptx,docx`): just those formats Practical guidance documented in the module header: most operators will set `OFFICE_PREVIEW_LIBREOFFICE=pptx` — pptx-preview chokes on pptxgenjs decks and the slide-list fallback loses formatting, so LibreOffice is the only path that produces a faithful PPTX preview. DOCX is well-served by docx-preview's existing CDN renderer. API: - New `isLibreOfficeEnabledFor(format)` is the per-format gate, used by `tryLibreOfficePreview` to short-circuit before doing work. - Existing `isLibreOfficeEnabled()` retained for "any format enabled" diagnostic checks (returns true if at least one format is opted in). - Internal `parseLibreOfficeEnablement` returns `'all' \| Set \| null` — keeps the gate future-proof: adding a new format to the LibreOffice route doesnt require operators to re-enumerate their env value. Edge cases handled: - Whitespace-tolerant: ` pptx , docx ` works - Case-insensitive on both env value AND format name - Empty list entries dropped: `pptx, ,docx` enables pptx + docx - Empty string treated as unset (not as a valid empty list) Tests: 21 new cases pinning the parse semantics + per-format gate (`pptx` env vs `docx` lookup → false, etc.). Existing `isLibreOfficeEnabled` tests retained but renamed to clarify the "any format" semantic. Total file tests: 526 passing (+21 vs before). * 🔒 fix: officeHtmlBucket only does MIME fallback when extension is empty Codex P2 review on PR #12934: the server's `officeHtmlBucket` falls back to MIME whenever the extension isn't an OFFICE extension. The client's `detectArtifactTypeFromFile` is stricter — it routes by extension first for ANY known extension (`.txt` → PLAIN_TEXT, `.md` → MARKDOWN, `.py` → CODE, etc.), only falling back to MIME when the extension is unknown. Mismatch case: `notes.txt` shipped with `Content-Type: application/ vnd.openxmlformats-officedocument.wordprocessingml.document`. Server runs `officeHtmlBucket` → extension `.txt` not office → MIME fallback → 'docx' → produces full HTML, sets `textFormat: 'html'`. Client routes by extension to PLAIN_TEXT (extension wins), markdown viewer escapes the HTML, user sees raw `<html>...` markup instead of the rendered preview. Fix: server only falls back to MIME when extension is genuinely empty (extensionless filename). Symmetric with the client's "extension wins for any known extension" semantic — neither will mis-route. Trade-off: a true DOCX renamed to `myfile.bin` with the canonical DOCX MIME no longer routes through office HTML on the server. The client would have routed to the office bucket via MIME, then the security gate (`textFormat !== 'html'`) would have downgraded to PLAIN_TEXT anyway. So the user-visible outcome is the same (raw bytes via PLAIN_TEXT) — the new behavior just avoids producing HTML that the client would never use. Long-term fix: share the extension routing table in data-provider so both server and client query the same source of truth. Out of scope for this PR. Tests: new 8-case `it.each` block in `officeHtmlBucket predicate` locks in the contract — `.txt`/`.md`/`.json`/`.py`/`.html`/`.css` + office MIME → null, and `.bin`/`.dat` + office MIME → null too. Existing extension-wins tests still pass unchanged. Total file tests: 534 (+8 vs before).	2026-05-05 12:06:10 +09:00
Danny Avila	1b79e0b785	🧬 chore: Align LibreChat With Agents LangChain Upgrade (#12922 ) * 🔧 chore: Update dependencies in package-lock.json and package.json - Bump version of @librechat/agents to 3.1.75-dev.0 in multiple package.json files. - Upgrade various AWS SDK and Smithy dependencies to their latest versions in package-lock.json for improved stability and performance. * 🔧 chore: Update AWS SDK and Smithy dependencies in package-lock.json - Bump version of @aws-sdk/client-bedrock-runtime to 3.1041.0 and update related dependencies for improved performance and stability. - Upgrade various AWS SDK and Smithy packages to their latest versions, ensuring compatibility and enhanced functionality. * chore: Align LibreChat with agents LangChain upgrade - Route LangChain imports through @librechat/agents facade exports - Update @librechat/agents to 3.1.75-dev.1 and remove direct LangChain deps - Normalize nullable agent model params and API key override typing - Update Google thinking config typing for newer LangChain packages - Refresh targeted audit-related dependency overrides * chore: Add Jest types for API specs * test: Fix LangChain upgrade CI specs * test: Exercise agents env facade * fix: Clean up TS preview diagnostics * fix: Address Codex review feedback	2026-05-03 12:46:01 -04:00
Danny Avila	f3e1201ae7	📌 fix: Stabilize Agent Prompt Cache Prefix (#12907 ) * fix: stabilize agent prompt cache prefix * chore: refresh agents sdk lockfile integrity * test: format agent memory assertion * test: type agent context fixtures * fix: preserve MCP instruction precedence * fix: reuse resolved conversation anchor * fix: keep resumable startup immediate	2026-05-02 09:55:31 +09:00
Danny Avila	74307e6dcc	💭 feat: Require Explicit Auto-agent Enablement for Memories (#12886 )	2026-05-01 23:56:08 +09:00
Danny Avila	89bf2ab7b4	💎 fix: Stop Double-Counting Cache Tokens for Gemini/OpenAI in Usage Spend (#12868 ) * 💎 fix: Stop Double-Counting Cache Tokens for Gemini/OpenAI in Usage Spend (#12855) Different providers report `usage_metadata.input_tokens` with different semantics: - Anthropic / Bedrock: `input_tokens` EXCLUDES cache; cache reads/writes arrive separately and must be added to get the total prompt size. - Gemini / OpenAI: `input_tokens` ALREADY INCLUDES cached tokens (Google's `promptTokenCount`, OpenAI's `prompt_tokens`). Their `input_token_details.cache_` are subsets of `input_tokens`. `recordCollectedUsage` treated both schemes as additive, so for cache-hit requests on Gemini/OpenAI it added cache tokens on top of an `input_tokens` value that already contained them — overcharging users by the cache_hit_rate (e.g., ~67% cache hit ≈ 1.67x overcharge). This matches the issue reporter's GCP billing comparison. Adds a small `splitUsage` helper that classifies the provider by model name and computes `inputOnly` (the non-cached portion) plus the all-inclusive `totalInput` for both the spend math and the returned `input_tokens` summary. The helper defaults to additive semantics (the historical behavior) so unknown providers are unaffected. Updates existing OpenAI-shaped tests that previously asserted the buggy additive math, and adds Gemini regression tests using the exact numbers from the issue report (input=11125, cache_read=7441 → input=3684). Anthropic / Bedrock paths remain bit-identical to before. 🔧 refactor: Classify Cache-Token Semantics by Provider, Not Model Name Follows up the previous commit. Replaces a model-name regex (`gemini\|gpt\|o[1-9]\|chatgpt`) with an explicit `Providers` enum lookup keyed off the `usage.provider` field — `UsageMetadata.provider` already exists in `IJobStore.ts` but was never being populated. - `callbacks.js#ModelEndHandler` now attaches `usage.provider` from `agentContext.provider` alongside `usage.model`. - `usage.ts` uses a `SUBSET_PROVIDERS` set (`openAI`, `azureOpenAI`, `google`, `vertexai`, `xai`, `deepseek`, `openrouter`, `moonshot`) backed by the canonical `Providers` enum from `librechat-data-provider`. - `xai`, `deepseek`, `openrouter`, `moonshot` extend `ChatOpenAI` so they inherit subset semantics (verified in node_modules). - Defaults to additive when `usage.provider` is missing, so the title flow (which doesn't propagate provider) and any pre-this-PR usage entries keep their existing behavior. Tests: switch fixtures from model-name signaling to explicit `provider` field, plus a Vertex AI case and a "missing provider" fallback case.	2026-04-29 08:36:00 +09:00
Danny Avila	46a86d849f	🛂 fix: Skip Inherited / Mark Skill Files Read-Only in Code-Env Pipeline (#12866 ) * 🛂 fix: Skip Re-Download of Inherited Code-Env Files (No More 403 Storms) When a bash/code-interpreter call lists or operates on inputs the user already owns (skill files primed via primeInvokedSkills, files inherited from a prior session), codeapi echoes those files back in the tool result with `inherited: true`. We were treating every entry as a generated artifact and calling processCodeOutput on each, which: 1. Hit `/api/files/code/download/<session_id>/<file_id>` with the user's session key. Skill files are uploaded under the skill's entity_id, so every download 403'd — producing dozens of "Unauthorized download" log lines per turn. 2. Surfaced those inputs as ghost file chips in the UI even though they were never generated by the run. 3. Wasted a download round-trip even when no auth boundary was crossed — the file is already persisted at its origin. Fix: skip files where `file.inherited === true` in all three artifact-files loops (`tools.js`, `createToolEndCallback`, and `createResponsesToolEndCallback`). Skill files remain available to subsequent calls via primeInvokedSkills / session inheritance — we just don't redundantly re-download them. Pairs with codeapi-side change that adds the `inherited` flag. * 🔒 feat: Mark Skill Files as `read_only` During Code-Env Priming Pairs with codeapi `read_only` upload flag (ClickHouse/ai#1345). When LibreChat primes a skill into the code-env, every file in the batch (SKILL.md plus all bundled scripts/schemas/docs) is now uploaded with `read_only: true`. Codeapi seals these inputs at the filesystem layer (chmod 444) and the walker echoes the original refs as `inherited: true` regardless of whether sandboxed code modified the bytes on disk. Without this, the previous PR's `inherited` skip handled only the unchanged case. A modified skill file (pip writing pyc near a .py, a script accidentally truncating LICENSE.txt, etc.) still flowed through the modified-input branch on codeapi, got a fresh user-owned file_id, uploaded as a "generated" artifact, and surfaced in the UI as a chip the user couldn't actually authorize a download for. Changes: - `api/server/services/Files/Code/crud.js`: `batchUploadCodeEnvFiles({ ..., read_only })` forwards the flag as a multipart form field. Default `false` preserves existing behavior for user-attached files and prior-session inheritance. - `packages/api/src/agents/skillFiles.ts`: type signature gains `read_only?: boolean`; `primeSkillFiles` passes `true`. - `packages/api/src/agents/skillFiles.spec.ts`: assert the upload call carries `read_only: true`. The flag is intentionally not skill-specific. Any future infrastructure-input flow (system fixtures, cached datasets, etc.) can opt in the same way.	2026-04-29 08:26:25 +09:00
Danny Avila	24e29aa8cb	🌱 fix: Inject Code-Tool Files Into Graph Sessions on First Call (+ read_file Sandbox Fallback) (#12831 ) * 🌱 fix: Seed Code Tool Files Into Graph Sessions on First Call Files attached to an agent's `tool_resources.execute_code` (user uploads or generated artifacts from a prior turn) were silently dropped on the first `execute_code` invocation of a turn. The agents-side `ToolNode` populates `_injected_files` only when its `sessions` map already has an `EXECUTE_CODE` entry — but that entry is only written by a previous successful execution, so call #1 had nothing to inject. CodeExecutor then fell back to a `/files/{session_id}` fetch, but `session_id` was also empty on call #1, leaving the sandbox without the primed files. Mirror the existing skill-priming pattern (`primeInvokedSkills` → `initialSessions`) for code-resource files: eagerly call `primeFiles` before `createRun` and merge the result into `initialSessions` via a new `seedCodeFilesIntoSessions` helper. Skill files and code-resource files now share the same `EXECUTE_CODE` entry; the prior representative `session_id` is preserved on merge. * 🔬 chore: Add Diagnostic Logging for Code-Files Seeding Temporary debug logs to diagnose why first-call file injection is not firing in real agent runs. Logs `wantsCodeExec`, available tool-resource keys, primed file count, and the seeded EXECUTE_CODE entry. Will revert once the failure mode is identified. * 🪛 refactor: Capture primedCodeFiles per-agent at init, merge across run Replace the client.js eager `primeFiles` call with a per-agent capture at initialization time so every agent in a multi-agent run (primary + handoff + addedConvo) contributes its `tool_resources.execute_code` files to the shared `Graph.sessions` seed. - handleTools.js (eager loadTools): the `execute_code` factory closes over a `primedCodeFiles` slot and surfaces it in the return. - ToolService.js loadToolDefinitionsWrapper (event-driven): captures `files` from the existing `primeCodeFiles` call (was dropping them while only keeping `toolContext`) and surfaces them. - packages/api initialize.ts: the loadTools callback contract now includes `primedCodeFiles`, threaded onto `InitializedAgent`. - client.js: iterate `[primary, ...agentConfigs.values()]` and merge each agent's `primedCodeFiles` into `initialSessions`. Drop the primary-only `primeCodeFiles` call and diagnostic logs from the prior attempt — wrong layer (single-agent), wrong gate (`agent.tools` contained Tool instances after init, so the `.includes("execute_code")` string check always failed). * 🔬 chore: Add per-agent diagnostic logs for code-files seeding Logs `tool_resources` keys + file counts inside loadToolDefinitionsWrapper and per-agent `primedCodeFiles` + final initialSessions inside AgentClient. Will revert once the failure mode is confirmed. * 🔬 chore: Add file-lookup diagnostics inside initializeAgent Logs the inputs and intermediate counts of the conversation-file lookup chain (convo file ids, thread message ids, code-generated and user-code file counts) so we can pinpoint why `tool_resources.execute_code` is arriving empty at `loadToolDefinitionsWrapper` despite the agent having `execute_code` in its tools list. * 🔬 chore: Probe execute_code files without messageId filter Adds a relaxed `getFiles({conversationId, context: execute_code})` probe that runs only when `getCodeGeneratedFiles` returns empty. Lists what's actually in the DB for this conversation so we can confirm whether the file is missing entirely or whether the messageId filter is rejecting it. * 🔬 chore: Fix probe getFiles arg order (sort vs projection) Probe was passing a projection object as the sort arg, which mongoose rejected with `Invalid sort value`. Move it to the third arg (selectFields) so the probe actually runs. * 🪢 fix: Preserve Original messageId on Code-Output File Update Each `processCodeOutput` call was overwriting the persisted file's `messageId` with the current run's id. When a turn re-creates an existing file (filename + conversationId match → `claimCodeFile` returns the existing record, `isUpdate=true`), the file's link to the assistant message that originally produced it gets clobbered. `initializeAgent` later runs `getCodeGeneratedFiles({messageId: $in: <thread>})` to seed `tool_resources.execute_code` from prior-turn artifacts. With a stale `messageId` (e.g. from a failed read attempt that re-shelled the same filename), the file no longer matches the parent-walk thread, so `tool_resources` arrives empty at agent init, the new `primedCodeFiles` channel has nothing to seed, and the LLM can't see its own prior-turn artifacts on the next turn — defeating the just-added Graph-sessions seeding fix. Preserve the existing `claimed.messageId` on update; first-creation behavior is unchanged. The runtime return value still includes the current run's `messageId` (via `Object.assign(file, { messageId })`) so the artifact is correctly attributed to the live tool_call. * 🧹 chore: Remove diagnostic logs from code-files seeding path Drops the temporary debug logs added to trace the empty-tool_resources failure mode. Production code paths (loadToolDefinitionsWrapper, client.js seed loop, initializeAgent file lookup) are left as the permanent shape: capture primedCodeFiles, merge across agents, seed initialSessions before run start. * 🪛 feat: read_file Sandbox Fallback for /mnt/data + Non-Skill Paths When the model called `read_file` with a code-execution path (e.g. `/mnt/data/sentinel.txt`), the handler returned a misleading `Use format: {skillName}/{path}` error. Adds a sandbox-aware fallback: - Short-circuit `/mnt/data/...` (can never be a skill reference) → route to a sandbox `cat` via the new host-provided `readSandboxFile` callback, which POSTs to the codeapi `/exec` endpoint. - Skip the skill resolver entirely when `accessibleSkillIds` is empty — the resolved-output of `resolveAgentScopedSkillIds` already collapses the admin capability + ephemeral badge + persisted `skills_enabled` chain, so an empty value is the authoritative "skills aren't in scope for this agent" signal. - For `{firstSegment}/...` paths, consult the catalog-derived `activeSkillNames` Set (no DB read) to detect non-skill names and fall through to the sandbox before the model has to retry with `bash_tool`. `activeSkillNames` is captured from `injectSkillCatalog`, threaded onto `InitializedAgent`, into `agentToolContexts`, then through `enrichWithSkillConfigurable` into `mergedConfigurable` for the handler. The host implementation of `readSandboxFile` lives in `api/server/services/Files/Code/process.js` and shells `cat <path>` through the seeded sandbox session — `tc.codeSessionContext` (emitted by ToolNode for `read_file` calls in `@librechat/agents` v3.1.72+) provides the `session_id` + `_injected_files` so the read lands in the same sandbox that holds prior-turn artifacts. When the seeded context isn't available (older agents version, no codeapi configured), the handler returns a model-visible error pointing at `bash_tool` instead of silently failing. Tests: 8 new `handleReadFileCall` cases cover the new short-circuits, the skills-not-enabled gate, the activeSkillNames lookup, the sandbox-fallback success path, and the bash_tool retry hint on fallback failure. Existing `read_file` tests now opt into "skills are in scope" via a `skillsInScope()` fixture (production wouldn't reach the skill lookup with empty `accessibleSkillIds`). * 🔧 chore: Update @librechat/agents dependency to version 3.1.72 Bumps the version of the @librechat/agents package across package-lock.json and relevant package.json files to ensure compatibility with the latest features and fixes. * 🪛 refactor: Centralize Tool-Session Seed in buildInitialToolSessions Helper Addresses review feedback on the per-agent merge in client.js: - Run-wide semantics, named explicitly. The merge into a single `Graph.sessions[EXECUTE_CODE]` was a deliberate match to the agents-library design (`Graph.sessions` is shared across every `ToolNode` in the run), but the inline `for (const a of agents)` loop in `AgentClient.chatCompletion` made it look per-agent. Move the logic to a TS helper `buildInitialToolSessions` that documents the run-wide-by-design contract in one place. The CJS controller now contains a single call site, no business logic. - Subagent walk (P2). The previous loop only iterated `[primary, ...agentConfigs.values()]`. Pure subagents are pruned out of `agentConfigs` after init and retained on each parent's `subagentAgentConfigs`, so their primed code files were silently dropped from the seed. The helper now walks recursively, with a visited-Set keyed on object identity that terminates safely on a malformed agent graph (cycle). - `jest.setup.cjs` polyfill for undici `File`. Reviewer hit `ReferenceError: File is not defined` running the targeted spec on WSL — a known Node 18 issue where `globalThis.File` from `node:buffer` isn't auto-exposed. Polyfill it inside a Jest setup file so the suite boots regardless of Node patch version. Helper test coverage (8 new): skill-only / agent-only / both, recursive subagent walk, cycle-safe walk, primary+subagent deduplication, undefined/null entries in the agents iterable, and representative session_id preservation across the merge. 16 tests pass total in `codeFilesSession.spec.ts` (8 prior + 8 new). No behavior change vs. the previous commit for the existing primary+agentConfigs case — subagent inclusion is the only new behavior, and it matches what the existing seeding logic would have done if subagents had been in `agentConfigs`. * 🪛 fix: FIFO Walk Order in buildInitialToolSessions (P3 review) The traversal used `Array.pop()` (LIFO), which visited the LAST top-level agent first. The docstring says "primary first"; the code contradicted it. When no skill seed exists the first-visited agent's first file supplies the representative `session_id` written to `Graph.sessions[EXECUTE_CODE]` — so a LIFO walk silently flipped which agent that came from. `ToolNode` ultimately uses per-file `session_id`s for runtime injection (so behavior was indistinguishable for current callers), but the discrepancy was a footgun for any future consumer that read the representative. Switch to FIFO via `Array.shift()` to match both the docstring and the existing `loadSubagentsFor` walk pattern in `Endpoints/agents/initialize.js`. Add a regression test that asserts the primary's `session_id` is the representative (and that all three agents' files still contribute, with per-file `session_id`s preserved). * 🔬 test: Lock In Code-Files Bug Fixes Per Comprehensive Review Addresses MAJOR + MINOR + NIT findings from the multi-pass review: Finding #4 (MINOR) — empty relativePath misses sandbox fallback. A model calling `read_file("output/")` where "output" isn't a skill name dead-ended with `Missing file path after skill name` instead of being routed to the sandbox like every other malformed-path branch. Add the same `codeEnvAvailable → handleSandboxFileFallback` pattern, plus two regression tests. Finding #7 (NIT) — duplicate `skillsInScope()` helper. Hoist the identical helper out of two nested describe blocks to module scope. Single source of truth. Finding #1 (MAJOR) — `persistedMessageId` had zero test coverage. The fix preserves a file's original `messageId` on update so `getCodeGeneratedFiles` can still match it on subsequent turns. A regression in the `isUpdate ? (claimed.messageId ?? messageId) : messageId` ternary would silently re-introduce the original cross-turn priming bug. Five new tests cover: - UPDATE preserves `claimed.messageId` in the persisted record - UPDATE falls back to current run id when `claimed.messageId` is absent (legacy records predating the field) - CREATE uses current run id (no claimed record exists) - The runtime return value uses the LIVE id (artifact attribution) even when the persisted record kept the original - The image branch follows the same contract (would silently regress if the ternary diverged across the two file-build branches) The tests use a `snapshotCreateFileArgs()` helper because `processCodeOutput` mutates the file object after `createFile` returns (`Object.assign(file, { messageId, toolCallId })`) and a naive `createFile.mock.calls[0][0]` would reflect the post-mutation state instead of what was actually persisted. Finding #2 (MAJOR) — `readSandboxFile` had no direct tests. The model-controlled `file_path` flows through a POSIX single-quote escape into a shell `cat` command, making this a security boundary. A quoting regression would let a malicious filename break out of the quoted argument and inject arbitrary shell. 20 new tests across: - Shell quoting (7): plain filenames, embedded `'`, `$()`, backticks, newlines, shell metachars, multiple consecutive single-quotes - Payload shape (6): /exec URL, bash language, conditional session_id / files inclusion, dedicated keepAlive:false agents - Response handling (6): `{content}` on success, null on missing base URL or absent stdout, throws on stderr-only, partial-success returns stdout, transport errors are logged then rethrown - Timeout (1): matches processCodeOutput's 15s SLA Audited findings #5 (acknowledged tech debt — readSandboxFile in JS workspace), #6 (pre-existing positional-args debt on enrichWithSkillConfigurable), and #8 (cosmetic JSDoc style) — no action taken per the reviewer's own assessment. Audited finding #3 (walk order vs docstring) — already addressed in commit `007f32341` which converted to FIFO via `queue.shift()` plus a regression test. The audit was performed against an earlier PR head. Tests: 152 packages/api + 195 api JS = 347 pass. Typecheck clean. * 🪛 fix: Pure-Subagent codeEnv + Primed-Skill Routing + ToolService Early Returns Three findings from the second-pass review: P2 — Pure subagents missed `codeEnvAvailable` (initialize.js). The pure-subagent init path didn't forward the endpoint-level `codeEnvAvailable` flag to `initializeAgent`, unlike the primary, handoff, and addedConvo paths. A code-enabled subagent loaded only through `subagentAgentConfigs` initialized with `codeEnvAvailable: false`, so even though the recursive seed walk found its primed code files, the subagent's own `bash_tool` / `read_file` sandbox fallback were silently gated off. Forward the flag and add `codeEnvAvailable: config.codeEnvAvailable` to the `agentToolContexts.set` for symmetry with the other paths. P2 — Primed skills outside the catalog cap were misrouted to sandbox (handlers.ts). Manual ($-popover) and always-apply primes are intentionally resolved off the wider `accessibleSkillIds` ACL set BEFORE catalog injection — see `resolveManualSkills` for why a skill outside the `SKILL_CATALOG_LIMIT` cap can still be authorized for direct manual invocation. The `activeSkillNames` shortcut ran before reading `skillPrimedIdsByName`, so a primed skill not in the catalog would fall through to the sandbox instead of resolving via the pinned `_id`. Read the primed map first and bypass the shortcut for primed names. New regression test asserts a primed-but-not- cataloged skill resolves through the existing skill path with `getSkillByName` invoked and `readSandboxFile` NOT called. P3 — `loadAgentTools` early returns dropped `primedCodeFiles` (ToolService.js). The non-`definitionsOnly` path captures the field correctly, but two early-return branches (no-action-tools fast path, no-action-sets fast path) omitted it. Any traditional `loadAgentTools(..., definitionsOnly: false)` caller using execute_code without action tools would have its first-call session seed silently empty. Add `primedCodeFiles` to both early returns for consistency with the final return shape. Tests: 153 packages/api + 195 api JS = 348 pass. * 🧹 chore: Document jest.mock arrow-indirection pattern in process.spec.js Per the second-pass review's Finding #2 (NIT, "would help future readers"): the mock setup mixes direct `jest.fn()` references with arrow-function indirection (`(...args) => mockX(...args)`). The indirection isn't stylistic — it's required because `jest.mock(...)` is hoisted above the outer `const` declarations at parse time, so a direct reference would capture `undefined`. Inline comment explains the pattern so the next reader doesn't have to reverse-engineer it or accidentally "simplify" the mocks and break per-test `mockReturnValueOnce` / `mockImplementationOnce` overrides. * 🪛 fix: Five Issues from Pass-N + Codex Review (incl. 404 root cause) Five real bugs surfaced by another review pass + Codex PR comments + the codeapi-side logs we collected during manual testing: 1) `processCodeOutput` 404 root cause (`callbacks.js`). The codeapi worker emits TWO distinct `session_id`s on a tool result: - `artifact.session_id` is the EXEC session — the sandbox VM that ran the bash command. Files don't live there; it's torn down post-execution. - `file.session_id` is the STORAGE session — the file-server bucket prefix where artifacts actually live. `callbacks.js` was passing the EXEC id to `processCodeOutput`, which builds `/download/{session_id}/{id}` and 404s because the file-server doesn't know about that path. This explains every "Error downloading/processing code environment file" we saw during testing. Use `file.session_id ?? output.artifact.session_id` (per-file id with artifact-level fallback for older worker payloads). 2) `primeFiles` reupload pushed STALE sandbox ids (`process.js`). When `getSessionInfo` returns null (file expired/missing in sandbox), `reuploadFile` re-uploads via `handleFileUpload`, gets a NEW `fileIdentifier`, and persists it on the DB record. But `pushFile` was a closure capturing the OLD `(session_id, id)` parsed at the top of the loop, so the in-memory `files[]` array (now consumed by `buildInitialToolSessions` to seed `Graph.sessions`) silently referenced a sandbox object that no longer existed. The first tool call would 404 trying to mount it; only the next turn's metadata re-read would correct course. Parameterize `pushFile` with optional `(session_id, id)` overrides; in `reuploadFile` parse the new identifier and pass through. 2 regression tests. 3) Codex P2 — Cap sandbox fallback output before line-numbering (`handlers.ts`). The new `handleSandboxFileFallback` returned `addLineNumbers(result.content)` without a size guard, so reading a multi-MB `/mnt/data/` artifact materialized the file twice in memory (raw + line-numbered) before downstream truncation. Match the existing skill-file path's `MAX_READABLE_BYTES` (256KB): truncate raw first, then number, surface the truncation to the model so it can use `bash_tool` (`head` / `tail`) for the rest. 2 tests (oversized truncates with hint, in-cap doesn't). 4) Codex P2 — Dedupe seeded code files by `(session_id, id)` (`codeFilesSession.ts`).* Multiple agents in a run commonly carry the same primed execute-code resources (shared conversation files); without dedupe, `_injected_files` grows proportionally to agent count and bloats every `/exec` POST. Use a `(session_id, id)` identity key so first-seen wins (preserves source ordering); name alone isn't sufficient because two distinct primed uploads can share a filename across different sessions. 4 tests covering dedup across iterations, against pre-existing entries, name-collision distinct-session preservation, and the multi-agent realistic case in `buildInitialToolSessions`. 5) Pass-N P2 — Polyfill `globalThis.File` in api Jest setup (`api/test/jestSetup.js`). `packages/api/jest.setup.cjs` had the polyfill; the legacy api workspace's Jest config has its own `setupFiles` that didn't, so on Node 18 / WSL the api focused tests still failed at import time with `ReferenceError: File is not defined` from undici. Mirror the polyfill. Tests: 159 packages/api + 206 api JS = 365 pass. Typecheck clean. * 🔧 chore: Update @librechat/agents dependency to version 3.1.73 Bumps the version of the @librechat/agents package across package-lock.json and relevant package.json files to ensure compatibility with the latest features and fixes.	2026-04-27 08:56:39 +09:00
Danny Avila	596f806f60	🛡️ fix: Strict Opt-In Skills Activation per Agent (#12823 ) * 🛡️ fix: Strict opt-in skills activation per agent Skills were activating on every agent run that had the capability + RBAC enabled, regardless of whether the user (ephemeral) or author (persisted) had opted in. `scopeSkillIds(undefined)` fell through to "full accessible catalog" whenever `agent.skills` was unset, which is the default state for any agent created before skills existed and for every ephemeral agent. Activation now requires an explicit signal: - Ephemeral agent → per-conversation skills badge toggle. - Persisted agent → new `skills_enabled` master switch on the agent doc, surfaced as a toggle in the Agent Builder skills section. Enabled + empty/undefined allowlist = full accessible catalog; enabled + non-empty allowlist = narrow to those ids; disabled (or undefined) = no skills available, even if an allowlist is set. Centralised the predicate in `resolveAgentScopedSkillIds` so the primary-agent path, handoff/discovery, the subagent loop, and both OpenAI controllers all share one source of truth. Frontend `$` popover scope mirrors the same logic so the UI never offers skills the backend would refuse to activate. * test: mock resolveAgentScopedSkillIds in agent controller specs * refactor: address review findings on skills opt-in PR - AgentConfig: associate skills label with toggle via htmlFor for click/keyboard affordance; simplify Switch handler to Boolean(value). - skills: mark scopeSkillIds as @internal so runtime callers continue to route through resolveAgentScopedSkillIds and inherit the activation predicate (ephemeral toggle, persisted skills_enabled). * fix(agents): include skills_enabled in agent list projection Without this field, agents loaded via the list endpoint hydrate into the client agentsMap with skills_enabled === undefined, causing the `$` skill popover to hide every skill on a fresh page load even when the agent was saved with skills_enabled: true. * fix(skills): fail closed for persisted agents during agentsMap hydration Returning undefined while the agents map loads let the popover render the full catalog for a persisted agent before we could read its skills_enabled flag, so the user could pick a skill the backend would then refuse for the turn. Match the strict opt-in contract by returning [] until the map is authoritative. * refactor(skills): extract skillsHintKey for readability Replaces the nested ternary in the skills section JSX with a pre-computed constant so the activation -> hint key mapping reads top-down. * refactor(skills): unflatten skillsHintKey to remove nested ternary	2026-04-25 04:02:01 -04:00
Danny Avila	d83cb84f59	🪆 feat: Subagent configuration in Agent Builder (#12725 ) * 🪆 feat: Subagents configuration (isolated-context child agents) Surfaces the new @librechat/agents `SubagentConfig` primitive in the Agent Builder. Subagents let a supervisor delegate a focused subtask to a child graph running in an isolated context window: verbose tool output stays in the child, only a filtered summary returns to the parent. Data model: new `subagents: { enabled, allowSelf, agent_ids }` on Agent, wired through the Zod, Mongoose, and form schemas plus a new `AgentCapabilities.subagents` capability (enabled by default). Backend: `initialize.js` loads explicit subagent configs alongside handoff agents, and drops subagent-only references from the parallel/handoff maps so they don't leak into the supervisor's graph. `run.ts` emits `SubagentConfig[]` on the primary `AgentInputs` — a self-spawn entry when `allowSelf` is enabled plus one entry per configured agent. UI: an "Advanced" panel section with an enable toggle, a self-spawn toggle, and an agent picker (capped at 10). Enabling without adding agents still yields self-spawn; disabling self-spawn with no agents shows a warning. A capability flag gates the whole section. * 🪆 feat: Stream subagent progress to UI (dialog + inline ticker) Pairs with the @librechat/agents SDK change that forwards child-graph events through the parent's handler registry (danny-avila/agents#107): - Self-spawn and explicit subagents can now use event-driven tools, because child `ON_TOOL_EXECUTE` dispatches reach our ToolService via the parent's registered handler. - The same forwarding path wraps the child's run_step / run_step_delta / run_step_completed / message_delta / reasoning_delta dispatches in a new `ON_SUBAGENT_UPDATE` envelope, with start/stop/error bookends. Backend: `callbacks.js` registers an `ON_SUBAGENT_UPDATE` handler that forwards each envelope straight to the SSE stream. Frontend: - `useStepHandler` consumes `ON_SUBAGENT_UPDATE` events and merges them into a per-tool_call Recoil atom (`subagentProgressByToolCallId`). First-seen `subagentRunId` claims the most-recent unclaimed `subagent` tool call in the active response message — a temporal mapping, no SDK wire-format change needed to correlate child runs with parent tool calls. - New `SubagentCall` part component replaces the default `ToolCall` rendering when `toolCall.name === Constants.SUBAGENT`: compact status ticker showing the 3 most recent update labels, clickable to open a dialog with the full activity log + final markdown-rendered result. - Adds `Constants.SUBAGENT`, `StepEvents.ON_SUBAGENT_UPDATE`, and `SubagentUpdateEvent` type in data-provider. Tests: - `packages/api npx jest run-summarization` — 23 pass - `api npx jest initialize` — 16 pass - `npm run build` — clean Dependency note: bumps `@librechat/agents` to `^3.1.67-dev.1` — requires the SDK PR (danny-avila/agents#107) to be merged to dev and published before this PR merges. `ON_SUBAGENT_UPDATE` is absent from dev.0, so the handler registration would be a no-op with the older SDK but would not crash. * 🪆 fix: address Codex review and review audit on subagents Stacks on top of the SDK change in danny-avila/agents#107 (bumped to `^3.1.67-dev.2`). - P1 (`initialize.js`): subagent-only agents were being deleted from both `agentConfigs` AND `agentToolContexts`. The tool-execute handler resolves execution context (agent, tool_resources, skill ACLs) from `agentToolContexts`, so explicit subagents would run without their configured resources and skip action tools. Now only `agentConfigs` is pruned — tool context stays intact. - P2 (`AgentSubagents.tsx`): toggling subagents off set the form field to `undefined`; `removeNullishValues` stripped it from the PATCH, leaving the server copy enabled. Now it persists an explicit `{ enabled: false, ... }` so the update actually clears state. - Finding 1 (MAJOR) — `agent_ids` Zod schema gains `.max()` via a new `MAX_SUBAGENTS` export from `data-provider` (shared with the UI cap). Crafted payloads can't trigger hundreds of `processAgent` calls. - Finding 2 (MAJOR) — `subagentProgressByToolCallId` atomFamily atoms are now tracked in a ref and reset from `clearStepMaps` via a `useRecoilCallback({ reset })`. No monotonic growth across a session. - Finding 3 (MAJOR) — early-arriving `ON_SUBAGENT_UPDATE` events whose parent `tool_call_id` is not yet mapped are now buffered in `pendingSubagentBuffer` (keyed by `subagentRunId`) and replayed in arrival order once correlation completes. Mirrors the existing `pendingDeltaBuffer` pattern. - Finding 4 (MAJOR) — switched to deterministic correlation via the new `parentToolCallId` that SDK `3.1.67-dev.2` threads through from `ToolRunnableConfig.toolCall.id`. Temporal fallback now iterates oldest-unclaimed-first (forward), matching tool-call creation order, so concurrent spawns map correctly. - Finding 6 (MINOR) — `agent_ids` are deduped on the backend via `new Set(...)` before the load loop. Duplicates no longer produce duplicate `SubagentConfig` entries visible to the LLM. - Finding 7 (MINOR) — events array inside each Recoil atom is capped at 200 entries. Long-running subagents no longer replay O(n) spreads on every update; the dialog log still shows the cap window. - Finding 8 (MINOR) — documented: subagents are loaded only for the primary agent this release (handoff children get self-spawn but not explicit sub-subagents). In-code comment added so the next maintainer doesn't wonder. - Finding 9 (NIT) — removed `{!isSubmitting && null}` dead code and the misleading announce-polite comment in `SubagentCall`. - New `validation.spec.ts` — 9 tests covering the cap on `agent_ids.length` at the subagent schema, agent-create, and agent-update layers. - `run-summarization` — 23 pass, `initialize` — 16 pass, total backend package: 103 pass across touched areas. Findings 5 (component tests) and 10 (micro-allocation) are tracked but deferred; the former needs a Recoil-RenderHook harness that isn't in this PR's scope, and the latter has negligible impact (one `Array.from` per subagent run). * 🧪 test: integration coverage for subagent correlation + backend loading Addresses the follow-up audit on #12725 with real-code tests (no mock handlers, only the existing setMessages/getMessages spies and the standard mongodb-memory-server harness). Six new tests under a dedicated `describe('subagent loading')`: - loads a configured subagent, populates `subagentAgentConfigs`, keeps it out of `agentConfigs` - P1 regression guard: drives the real `toolExecuteOptions.loadTools` closure with the subagent id and asserts `loadToolsForExecution` is called with `agent: <subagent>`, `tool_resources`, `actionsEnabled`. If anyone deletes `agentToolContexts` again, this fails. - dedup: three copies of the same id load the agent once - overlap: agent referenced both as handoff target and subagent stays in `agentConfigs` - capability gate: admin disabling `subagents` suppresses loading even when the agent has a config - per-agent disable: `subagents.enabled: false` skips loading entirely Five new tests under `describe('on_subagent_update event')` using a real `RecoilRoot` and a companion `useRecoilCallback` reader so writes from the hook are observable: - deterministic correlation via `parentToolCallId` (happy path with SDK dev.2+) - fallback: oldest-unclaimed tool call wins for concurrent spawns without `parentToolCallId` - early-arrival buffer: updates with no mapping get buffered and replayed once the tool call appears - event cap: 205 updates collapse to 200 retained, oldest dropped - `clearStepMaps` resets tracked atoms back to their null default - F2 — added explicit `// TODO` marker for handoff-subagent-loading extension (matches the comment that referenced it). - F3 — dropped the unnecessary `MAX_SUBAGENTS as MAX_SUBAGENTS_CAP` alias; just import the constant directly. - Bumped `@librechat/agents` to `^3.1.67-dev.3` to pick up the SDK's paired test additions. - `api/server/services/Endpoints/agents/initialize.spec.js` — 22 pass (6 new + 16 existing) - `packages/api/src/agents/validation.spec.ts` + `run-summarization.test.ts` — 103 pass - `client/src/hooks/SSE/__tests__/useStepHandler.spec.ts` — 48 pass (5 new + 43 existing) * 🪆 fix: strip parent run summary + discovered tools from subagent inputs Codex P1 on #12725: `buildSubagentConfigs` reused the shared `buildAgentInput` factory for each explicit child, and that factory always stamps the parent run's `initialSummary` (cross-run conversation summary) and `discoveredTools` (tool names the parent's LLM searched earlier) onto every `AgentInputs` it returns. When subagents were enabled on a conversation that had already been summarized, every child inherited that summary — silently defeating the isolated-context contract and burning extra tokens on unrelated prior chat. Fix in `run.ts`: after `buildAgentInput(child)`, explicitly clear `childInputs.initialSummary` and `childInputs.discoveredTools` before attaching to the `SubagentConfig`. The parent keeps both — that's how the supervisor receives cross-turn context — but the child starts fresh. Paired with danny-avila/agents#107 (bumped to `^3.1.67-dev.4`), which adds the equivalent strip inside `buildChildInputs` to cover the self-spawn path where the SDK clones parent `_sourceInputs` directly and LibreChat never sees the intermediate shape. Belt and suspenders. Regression test (new): - `does NOT leak the parent run initialSummary into an explicit child (Codex P1 regression)` — sets `initialSummary` on the run, enables subagents with an explicit child, asserts the parent still has the summary but `childConfig.agentInputs.initialSummary` is `undefined`. Same for `discoveredTools`. 24 pass. * 🪆 fix: capability gate applies to handoff agents + parallel subagent test ### Codex P2 — handoff agents kept `subagents` after capability disabled The endpoint-level `AgentCapabilities.subagents` gate only cleared `subagents` on `primaryConfig`. Handoff agents loaded into `agentConfigs` retained their persisted `subagents.enabled: true`, and because `run.ts` calls `buildSubagentConfigs` for every agent input, self-spawn would still fire on a handoff target even when the admin had disabled the capability globally. Fix in `initialize.js`: after the subagent loading block, when the capability is off, iterate `agentConfigs.values()` and clear `subagents` + `subagentAgentConfigs` on every loaded config. Regression test: `clears subagents on handoff agents too when capability is disabled (Codex P2 regression)` — seeds a handoff target with its own `subagents.enabled: true`, disables the capability at the endpoint, asserts both primary AND handoff have `subagents` undefined in the client args. 23 init tests pass. ### Parallel subagent correlation — user-requested verification Added `keeps parallel subagent streams independent when events interleave` to `useStepHandler.spec.ts`. Two `subagent` tool calls seeded side by side, 6 interleaved `ON_SUBAGENT_UPDATE` envelopes dispatched (a-start, b-start, a-step, b-step, a-stop, b-step), each carrying its own `parentToolCallId`. Asserts each `tool_call_id`'s Recoil bucket accumulates only its own run's events, statuses reflect each run independently (`call_a` → stop, `call_b` → run_step), no cross-contamination. 49 step-handler tests pass. * 🪆 fix: SubagentCall detects cancelled / errored states (Codex P2) Codex P2 on #12725: the old `running` check only consulted `initialProgress` and the subagent's phase. A user stop, dropped stream, or backend crash before a terminal `stop`/`error` envelope arrived would leave the ticker permanently stuck on "working…". Other Call components (ToolCall.tsx) already model this via `!isSubmitting && !finished` → cancelled. Mirror that pattern. Re-introduce `isSubmitting` on `SubagentCallProps` (the prop was dropped earlier as 'unused' — that was a bug) and resolve status as a tri-state: - `finished` — initialProgress >= 1, or subagent `stop`/`error` - `cancelled` — `!isSubmitting && !finished` - `running` — neither New locale keys `com_ui_subagent_cancelled` + `com_ui_subagent_errored` swap in the right header text per state. Tests: new `SubagentCall.test.tsx` covers all four states with a real `RecoilRoot` and a `useRecoilCallback` seeder — no mocked store — 5/5 pass. Includes an explicit P2 regression test that simulates the `isSubmitting=false, progress.status='run_step', initialProgress<1` scenario and asserts the cancelled label renders. 🪆 feat: semantic ticker + aggregated content-part dialog for subagents Two rounds of feedback on #12725: ### Ticker — user-readable lines, not raw event names The old ticker showed \`on_run_step\`, \`on_message_delta\`, etc. — not meaningful to users. Replaced with \`buildSubagentTickerLines\`, a pure helper that walks the \`SubagentUpdateEvent\` stream and emits: - message/reasoning deltas → a single live "Writing: <last 60 chars>" (or "Reasoning: …") line that updates in place as chunks arrive - run_step with tool_calls → "Using calculator(expression=4258)" for a single call, "Using tool: a, b" for parallel (args dropped when multiple so the line stays short) - run_step_completed → "calculator → 4258 = 2436" (output truncated to 48 chars; falls back to "Tool X complete" when output is empty) - error → "Error: <message>" - start / stop / run_step_delta → suppressed (too granular / lifecycle-only) Args and output pass through \`summarizeArgs\` / \`summarizeOutput\` which flatten JSON to \`key=value\` pairs and head-truncate long strings so a 200-line tool output never bloats the ticker. ### Dialog — aggregated content parts via leaf renderers \`aggregateSubagentContent\` folds the raw event stream into \`TMessageContentParts[]\` — text/reasoning delta streaks collapse into single \`TEXT\` / \`THINK\` parts, tool calls become \`TOOL_CALL\` parts, and \`run_step\` boundaries correctly break text runs around tool calls. The dialog iterates those parts through a \`SubagentDialogPart\` renderer that delegates to the existing \`Text\`, \`Reasoning\`, and \`ToolCall\` leaf components — the same sub-components \`<Part />\` uses — wrapped in a minimal \`MessageContext\` so reasoning expand state and cursor animation work. Leaf components are used directly rather than importing \`<Part />\` itself to avoid a module cycle (Part → Parts/index → SubagentCall → Part) and to sidestep a hypothetical nested-subagent rendering. ### Tests - \`subagentContent.test.ts\` — 19 pure-function tests covering the aggregator (text concat, reasoning concat, tool call lifecycle, interleaving, phase suppression, late-arriving completions) and the ticker builder (live preview truncation, args/output snippets, parallel-call handling, output truncation, i18n formatter override). - \`SubagentCall.test.tsx\` — 9 component tests: 5 status-resolution (existing) + 2 ticker (semantic text, delta collapse) + 2 dialog (aggregated parts routed to leaf renderers, raw-output fallback). ### Locale keys New: \`com_ui_subagent_ticker_writing\`, \`…_reasoning\`, \`…_error\`, \`…_using\`, \`…_using_with_args\`, \`…_tool_complete\`, \`…_tool_output\`. Preserves i18n at the display layer while the helper stays pure. * chore: drop unused com_ui_subagent_activity_log locale key The dialog no longer renders an "Activity log" section — the new content-parts renderer replaced it. Also tweaks the dialog description copy to match. * 🪆 fix: subagent dialog order, persistence, auto-scroll, width Follow-up pass addressing the four issues observed in real runs against a live subagent-using parent. ### Aggregator ordering (reasoning appearing after text it preceded) Reproducible pattern: LLM emits reasoning → text → tool call in that order, but the dialog rendered text BEFORE reasoning in the content array. Root cause: `aggregateSubagentContent` maintained `currentText` and `currentThink` buffers in parallel and only flushed them at a `run_step` boundary in a fixed (text, think) order, losing the actual arrival order. Fix: when a text chunk arrives, close any open think buffer first (pushes it into the content array right then); symmetric for think → text. Two new regression tests cover the exact reasoning → text → tool_call sequence from the screenshot and the repeated reasoning ↔ text flow across a turn. ### Content persists after completion (markdown not rendering when done) `clearStepMaps` was calling `resetSubagentAtoms()` at stream end, which wiped every `subagentProgressByToolCallId` entry. Once reset, `contentParts.length === 0` and the dialog fell back to rendering the raw `output` string with plain text — hence the literal `##`/`*` in the completed-state screenshot. Stopped resetting; the atoms are bounded per-call (200-event cap) and per-conversation (one per subagent spawn) so growth matches the rest of the conversation state. `resetSubagentAtoms` is kept for a future conversation-switch caller. Also: routed the raw-`output` fallback (older subagent runs recorded before the event forwarder existed) through the same `SubagentDialogPart` → `Text` leaf that content parts use, so its markdown renders the same way. ### Auto-scroll to bottom while running Added a `scrollRef` on the dialog body and a `useEffect` that pins `scrollTop = scrollHeight` while the dialog is open AND the subagent is running. Triggers on `contentParts.length` (new tool calls / part boundaries) and `events.length` (intra-part deltas) so the cursor tracks text streaming. Disabled post-completion so re-opening a finished run doesn't yank to the bottom. ### Wider dialog Went from `max-w-2xl` (42rem / 672px — too cramped on maximized laptop windows) to `w-[min(95vw,64rem)] max-w-[min(95vw,64rem)]`. Narrow on phones, scales up to 64rem on desktop, always leaves a bit of margin from the viewport edge. Bumped `max-h-[65vh]` on the scroll area to give the extra width room to breathe vertically too. ### Tests - `subagentContent.test.ts` — 21 pass (2 new ordering regressions). - `useStepHandler.spec.ts` — 49 pass (1 updated to assert atoms are preserved* on clearStepMaps). - `SubagentCall.test.tsx` — 9 pass (unchanged; aggregator-level tests cover the ordering). * 🪆 feat: persist subagent_content via SDK createContentAggregator Per-request map of createContentAggregator instances keyed by the parent's tool_call_id. ON_SUBAGENT_UPDATE handler feeds each event into the matching aggregator (phase → GraphEvent mapping); AgentClient harvests contentParts onto the subagent tool_call at message save so the child's reasoning / tool calls / final text survive a page refresh. Reusing the SDK's battle-tested aggregator instead of a bespoke one keeps the persisted shape identical to the parent graph's output and drops ~100 lines of custom aggregation code. * 🪆 fix: incremental subagent aggregation + dialog render parity Disappearing tool_calls: the Recoil atom trimmed events to a 200-long rolling window, so verbose subagents could shed the `run_step` that originally created a tool_call part — rebuilding content from the trimmed window then produced only the surviving text/reasoning. Fix: fold each envelope into `contentParts` incrementally in the atom as it arrives (new `foldSubagentEvent` + cursor state). Event trim window now affects only the ticker, never the dialog. Render parity: dialog now applies `groupSequentialToolCalls` and renders single parts through `Container` + grouped batches through `ToolCallGroup` — same spacing and "Used N tools" collapsing the main message view uses. Width: `min(96vw, 80rem)` — wider on big screens, still responsive. Labels: "Subagent: X" is jargon. Named subagents render as `Running "{name}" agent` / `Ran "{name}" agent` (past tense on completion); self-spawns use `Running subtask` / `Ran subtask` since `Running "self" agent` reads badly. * 🪆 polish: subagent dialog parity + agent avatar in header Labels: drop "subtask" framing. Self-spawn shows `Running agent` / `Ran agent` (past tense on completion); named subagents stay `Running "X" agent` / `Ran "X" agent`. Dialog render parity: stop wrapping every part in `Container`. TEXT keeps its `Container` (gap-3 + `mt-5` sibling margin), THINK and TOOL_CALL render bare so their own wrappers set the full-column width the regular message view gives them — matches the main `<Part>` dispatch. Outer scroll region now uses `px-4 py-3` padding and a `max-w-full flex-grow flex-col gap-0` inner wrapper, mirroring the `MessageParts` container the main conversation uses. Avatar: header icon now renders the subagent's configured avatar via `MessageIcon` when `useAgentsMapContext()` has the child agent, falling back to the `Users` SVG (which keeps its running-state pulse). Same icon-left-of-label pattern the tool UI uses. * 🪆 polish: subagent group label, ticker throttle + tail-ellipsis, scroll button Grouped label: ToolCallGroup now detects all-subagent batches and labels them "Running N agents" / "Ran N agents" instead of "Used N tools". Mixed batches keep the existing label. The tool-name summary is suppressed for all-subagent groups (every entry dedupes to "subagent", which adds nothing). Ticker width + tail-ellipsis: raise the preview cap to 300 chars so wide containers aren't half-empty, and flip the ticker `<li>` to `dir="rtl"` so `text-overflow: ellipsis` clips the oldest characters (visually the left edge) — the newest tokens stay pinned to the right regardless of container width. Bidi lays out the Latin text LTR internally, the rtl only affects which side gets the ellipsis. Throttle: `useThrottledValue` hook (trailing-edge, 1.2s) smooths the live `Writing: …` preview so tokens no longer strobe past the eye faster than they can be read. Ref-based internals (not `useState`) avoid infinite-update loops when the upstream value is a new-reference each render; `NEGATIVE_INFINITY` sentinel ensures the very first value passes through synchronously so tests and first paint aren't delayed. Scroll-to-bottom: dialog tracks `isAtBottom` with a 120px threshold; auto-scroll only engages when the user is already following along, and a persistent jump-to-latest button appears whenever they scroll up — no more fighting the auto-scroll to read back. * 🪆 polish: snappier ticker, prefix-safe labels, agents icon, readable lines Ticker lines are now incrementally aggregated in the atom — same pattern as contentParts. The raw-events rolling window is gone; event volume no longer caps what the ticker can display. Verbose subagents that used to drop early tool_call lines out of the window now keep the full 3-line history (using_tool, tool_complete, writing). Discriminated-union ticker lines split a constant prefix (e.g. "Writing:") from a tail-truncatable body. The prefix lives in a `shrink-0` span so it never gets clipped when the body overflows; the body uses `dir="rtl"` only on itself — scoped so non-streaming lines (e.g. "Waiting for first update…") can't get their trailing ellipsis flipped by bidi. Content-aware throttle: 800ms interval (down from 1200ms), skipped entirely while the live buffer is below 120 chars. Early tokens now appear immediately — no more "Reasoning: I" sitting blank for a full second before the next heartbeat. Once the preview is long enough to fill the container, throttling kicks in at the tighter interval. Header label is now a constant verb + optional muted sub-label. Base reads "Running agent" / "Ran agent" / "Cancelled agent" / "Agent errored" for every subagent; named subagents get the configured agent name rendered to the right in secondary text (self-spawns and unresolved names omit it — "Running self agent" is nonsense). ToolCallGroup now detects `allSubagents` and swaps `StackedToolIcons` for a single `Users` glyph — otherwise the group header shows a wrench ("tool") icon next to "Ran 5 agents", which reads wrong. * 🪆 feat: delimiter-aware tool labels in ticker + full-width tool lines New shared `parseToolName` helper in `client/src/utils/toolLabels.ts` — single source of truth for splitting `<tool>_mcp_<server>` ids and mapping native tool names (web_search, execute_code, …) to their friendly translation keys. `ToolCallGroup` drops its inline copy and pulls from this helper. Ticker tool lines now use the shared parser + a new `ToolIdentifier` sub-renderer so the live log reads like the main tool UI: - MCP tool → `<server> · <code-badge:tool>` (e.g. "github · `search_code`") - Native → friendly name from `TOOL_FRIENDLY_NAME_KEYS` - Unknown → bare `<code>` badge of the raw id The `using_tool` / `tool_complete` rows now render with a `flex w-full items-baseline gap-1 overflow-hidden` layout matching the writing/reasoning rows — they take the full container width instead of collapsing to content size. Output snippets on `tool_complete` get the same tail-side `dir="rtl"` ellipsis so the newest characters stay flush-right when the container is narrow. Dropped the now-unused template i18n keys (`com_ui_subagent_ticker_using_with_args`, `com_ui_subagent_ticker_tool_complete`, `com_ui_subagent_ticker_tool_output`) in favor of tokens the JSX composes structurally. Only English is touched per the project rule; other locales follow externally. * 🪆 fix: dialog scroll button + auto-scroll during streaming deltas Two race/trigger bugs in the dialog's scroll behavior: Button never showed: `addEventListener('scroll', …)` in a `useEffect` ran before Radix's portal had actually committed the scroll container, so `scrollRef.current` was still null — the listener never attached, `isAtBottom` stayed stuck at its initial `true`, and the jump-to-latest button was never rendered. Swap to React's `onScroll` prop on the element itself so the handler wires up as part of DOM commit, not a post-commit effect. Auto-scroll stalled during text streaming: the pin-to-bottom effect only re-fired on `contentParts.length` changes. Message/reasoning deltas extend the last TEXT/THINK part's `.text` without changing the array length — so the view would drift up as tokens piled in and never catch back up. Replace the length-dep effect with a `ResizeObserver` on the inner content div; every height change (new part or in-place growth) triggers a scroll-pin when the user is still at the bottom. * 🪆 fix: drop leading ellipsis from ticker body truncatePreview was prepending ... to the tail when the buffer exceeded 300 chars. The component's CSS already produces a left-side ellipsis for overflow via dir=rtl + text-overflow: ellipsis — stacking a data-level ellipsis on top renders a stray dot character right after the Writing: / Reasoning: label (Writing: .Sure!), which looks like a typo to the reader. Data now returns just the last 300 chars when truncating; CSS handles the visual cue whenever the body actually overflows its container. * 🪆 fix: Codex review — subagent isolation + concurrent-safe throttle Three findings from the @codex review pass, all valid: P1 — buildAgentInput leaks parent discovered-tool state into subagent children. `buildAgentInput` mutates `agent.toolRegistry` (`overrideDeferLoadingForDiscoveredTools` flips `defer_loading:true→false` on tools the parent previously searched for) and appends those tools' definitions to the returned `toolDefinitions` before the function returns. `buildSubagentConfigs` was clearing the reported `initialSummary` / `discoveredTools` fields on the returned AgentInputs, but that happened post-return — the registry writes and extra tool definitions persisted on the child, silently defeating context isolation and inflating the child's prompt. Fix: `buildAgentInput` now takes an `isSubagent` flag that gates the registry-mutation block and omits `initialSummary` / `discoveredTools` at the source. `buildSubagentConfigs` passes `{ isSubagent: true }` for every explicit child; no post-hoc cleanup needed. P2 — ToolCallGroup labels a finished subagent group as still running when the child returned no output. `getToolMeta` computed `hasOutput` as `!!tc.output`, which is `false` for a completed subagent that returned empty text (the UI already has an "empty result" fallback for that case). `allCompleted` would stay `false` and the group header stuck on "Running N agents" forever. Fix: treat `tc.progress === 1` as completion too — progress is the authoritative lifecycle signal, output is just content. P2 — useThrottledValue schedules `setTimeout` during render. Discarded renders under Strict Mode / Concurrent rendering would leave orphan timers firing against stale trees. Fix: move `setTimeout` into a `useEffect` keyed on `[value, intervalMs, enabled]`. Render-time still mutates refs (idempotent), but timer scheduling lives post-commit. Cleanup on unmount and on passthrough transitions is preserved. * 🪆 fix: Codex P2 — wipe subagent atoms on conversation switch `clearStepMaps()` intentionally doesn't reset `subagentProgressByToolCallId` so a user can reopen a completed subagent's dialog mid-conversation, but `resetSubagentAtoms` was defined and never exposed / called — so each completed run's aggregated `contentParts` + `tickerState` stayed resident in the `atomFamily` for the whole app session. Unbounded growth across multi-conversation sessions. Expose `resetSubagentAtoms` from `useStepHandler` and fire it from `useEventHandlers` whenever the URL's `conversationId` changes. That's the correct cleanup boundary: historical subagent dialogs rehydrate from persisted `subagent_content` on each `tool_call` at message-save time, so wiping live atoms on switch doesn't lose any viewable history — it just releases per-tool-call state that the old conversation's components no longer subscribe to. * 🪆 fix: Codex round 3 — subagent registry isolation + post-run label Two more valid findings. P1 — parent-order registry mutation leaks into subagent inputs. `overrideDeferLoadingForDiscoveredTools` mutates `agent.toolRegistry` in place (the Map and the LCTool objects inside it). When an agent appears both as a handoff target (normal graph node) AND an explicit subagent child, a subagent build that ran before the parent's build captures a reference to the same registry — the parent's later mutation leaks through to the child. Fix: for subagent children (`isSubagent`), clone the `toolRegistry` Map and shallow-clone each LCTool inside before returning the inputs. `defer_loading` flips on parent-graph registry mutations can't propagate across the clone boundary. `toolDefinitions` also gets a shallow-copy pass so the same isolation holds for definitions the child carries directly. P2 — "Running N agents" label stuck after cancel/error. ToolCallGroup's all-subagent label was gated only on `allCompleted`, which requires every child to have `hasOutput \|\| progress === 1`. A subagent that gets cancelled (stream ends, no `stop` phase, no output) never satisfies that — so even after `isSubmitting` flips false, the header stays on "Running N agents" while each individual card correctly shows "Cancelled agent". Fix: derive a `subagentsDone` flag as `allCompleted \|\| !isSubmitting` and gate the past-tense label on that. Matches the tri-state each SubagentCall card already resolves (finished / cancelled / running). * 🪆 fix: Codex P2 — ACL-check subagents.agent_ids on create/update Codex flagged that `subagents.agent_ids` was accepted as arbitrary strings on the create/update routes while `edges` got a `validateEdgeAgentAccess` pass — so users could save subagent references to agents they can't VIEW. At runtime `initializeClient`'s `processAgent` ACL gate silently drops those, so the persisted configuration and the actual behavior diverged in a way that is difficult to diagnose. Refactor: extract the id-set → unauthorized-ids check into a shared `collectUnauthorizedAgentIds`, wrap it with a dedicated `validateSubagentAccess`, and plumb the same 403-on-failure response the edge path already returns. Applied on both POST /agents and PATCH /agents/:id. * 🪆 fix: Codex round 5 — ACL-disable escape hatch + ticker order Two valid findings. P1 — can't disable subagents after losing access to a child. The subagent ACL check ran on every create/update that echoed back the `agent_ids` list, even when the user was explicitly disabling the feature. The UI keeps the list intact when toggling `enabled: false`, so a user who subsequently lost VIEW on any child would be locked in a 403 loop — every edit (including the one that turns subagents off) bounces. Fix: gate the ACL check on `subagents.enabled !== false` at both the POST /agents and PATCH /agents/:id handlers. Empty list stays a no-op. Disabling the feature is always permitted. P2 — ticker fold merges out-of-order previews across delta switches. `foldSubagentEventIntoTicker` carried `textLineIdx` / `thinkLineIdx` across a reasoning → text → reasoning transition, so the second reasoning chunk appended to the original reasoning line instead of starting a new chronological one. Fix: close the opposite buffer + cursor when a delta-type switch is detected (same rule the content-parts reducer already applies). Added a regression test. * 🪆 fix: Codex round 6 — preserve mid-stream atoms + honor sequential suppression Two valid findings. P2 — atom reset fires on initial chat URL assignment. `useEventHandlers` initialized `lastConversationIdRef` from the URL's current `paramId`, then reset subagent atoms whenever the ref and `paramId` disagreed. For a brand-new conversation the URL stamp goes from `undefined → "abc123"` while the first response is still streaming, which used to drop subagent ticker/content state mid-run and leave dialogs missing earlier updates. Fix: only reset when both the old and new IDs are non-null and differ — i.e. a user-initiated switch between two established conversations. The initial assignment passes through without clearing. P2 — ON_SUBAGENT_UPDATE bypassed `hide_sequential_outputs`. Every other streaming handler in `callbacks.js` (`ON_RUN_STEP`, `ON_MESSAGE_DELTA`, etc.) gates emission on `checkIfLastAgent` + `metadata?.hide_sequential_outputs`, but the subagent forwarder did an unconditional `emitEvent` — so intermediate agents in a sequential chain were leaking their children's activity to the client even when the chain was configured to suppress intermediates. Fix: accept `metadata` and apply the same `isLastAgent \|\| !hide_sequential_outputs` gate. Aggregation still runs regardless of visibility (persistence + dialog depend on it); only the SSE forward is suppressed. * 🪆 fix: Codex P2 — gate subagent ACL check on endpoint capability `validateSubagentAccess` ran on every create/update where `subagents.enabled !== false`, regardless of the endpoint-level `subagents` capability. When the capability is off at the appConfig level, `initializeClient` already strips the `subagents` block at runtime — so persisted `agent_ids` are inert — but the validation could still 403 on a legacy record whose referenced child is no longer viewable, blocking unrelated edits. Fix: add `isSubagentsCapabilityEnabled(req)` that reads the agents endpoint's capabilities from `req.config` and gate both the create and update ACL checks on it. Capability-off environments can update agents with stale `subagents` data freely; capability-on keeps the full ACL protection. * 🪆 fix: Codex P2 — reset subagent atoms on id→null navigation too Previous guard (both-established) skipped the reset whenever `paramId` became null/undefined, so navigating from an existing chat to a "new chat" route left stale subagent progress resident in the `atomFamily` until the user picked a specific different chat. Swap the both-established check for a one-time flag: skip only the very first `undefined → id` transition (the brand-new-chat URL stamp that happens mid-stream), then reset on any subsequent change — id→id, id→null, null→id-after-reset. If the user started on an established chat the flag is true at mount, so the guard is a no-op and every navigation resets normally. * 🪆 fix: Codex round 9 — subagent persistence gate + handoff children Two valid findings. P1 — hide_sequential_outputs also gates persistence. The previous fix gated the SSE forward on `isLastAgent \|\| !hide_sequential_outputs` but still ran the per-tool-call `createContentAggregator` aggregation unconditionally. `finalizeSubagentContent` would then attach the hidden intermediate agent's child reasoning / tool output to the saved message, so a page refresh could reveal activity that was intentionally suppressed live. Move the visibility gate to the top of the handler — hidden agents now skip both aggregation and emission, so "hide_sequential_outputs" is a consistent "don't record" rule for subagent traces. P2 — handoff agents' explicit subagents were silently dropped. `initializeClient` only resolved `subagentAgentConfigs` for the primary config, so an agent used via handoff that had its own `subagents.agent_ids` saved in the builder would get self-spawn only; every explicit child was quietly ignored, creating a saved-config / runtime mismatch the user couldn't diagnose. Extract the resolution into a shared `loadSubagentsFor(config)` helper and invoke it for the primary and every handoff agent in `agentConfigs`. The `edgeAgentIds` precomputation stays outside the helper (it's loop-invariant). Capability-off shortcuts return empty early so the existing strip-on-capability-off path still holds. * 🪆 fix: Codex P2 — recursive subagent build for multi-level delegation Previously only the outer `agents[]` loop attached `subagentConfigs` to its inputs, so a child used as a subagent (invoked via the `subagent` tool) lost every explicit spawn target of its own. A user-valid configuration like A → B → C would only run the top layer; B could never actually delegate to C from inside A's run. Recursively build `subagentConfigs` for each child inside `buildSubagentConfigs`, passing the child's freshly-constructed `childInputs` down so its own `subagents.enabled` children get resolved too. Added cycle protection via an `ancestors` Set — a configuration like A → B → A is safely cut off at the second encounter of A rather than recursing forever (the existing `child.id === agent.id` guard already prevents the direct self-loop). * 🪆 fix: Codex P2 — reset subagent atoms on useEventHandlers unmount The effect that resets subagent atoms only fired on `paramId` change, so unmounting the chat container (route change away from /c) never flushed the atoms. `knownSubagentAtomKeys` lives in a ref inside `useStepHandler` — once the hook unmounts the ref is gone, so a subsequent remount can't clean atoms it never registered. Added a second `useEffect` that only runs cleanup on unmount (empty deps aside from the stable `resetSubagentAtoms` callback). Keeps `atomFamily` bounded across full route teardowns too. * 🪆 fix: Codex round 13 — cyclic subagent guard + prefer persisted Two valid findings. P1 — cyclic subagent ref reloads the primary. A configuration like `A ↔ B` (B lists A as its own subagent) would send `loadSubagentsFor` down a path that couldn't find A in `agentConfigs` (the primary isn't stored there), so it called `processAgent(A)` a second time. That inserts a fresh config for the primary id, which downstream duplicates in `[primary, ...agentConfigs.values()]` and can replace the primary's tool context with the reloaded copy. Fix: short-circuit when a subagent ref points back at `primaryConfig.id` — reuse the already-loaded primary config. Primary is always an edge id so no pruning bookkeeping needed. P2 — live atom preferred over canonical persisted trace. The dialog picked `progress.contentParts` ahead of `persistedContent`, but the Recoil bucket is best-effort — after a disconnect/reconnect it can be stale or partial. The server's `subagent_content` on the `tool_call` is the canonical record refreshed on sync. Preferring live could hide completed tool/reasoning history that was actually persisted. Fix: flip the preference order. Persisted wins when it's non-empty; live covers the mid-stream window (before the parent message saves, persisted is empty) and the older-runs fallback. Updated the test that enforced the old order to lock the new semantics in (separate mid-stream live-fallback assertion kept). * 🪆 fix: Codex P2 — subagent atom reset rule simplified to 'leaving established id' The `hasEstablishedConversationRef` + check for initial undefined→id covered the first navigation but missed the equivalent mid-stream URL stamp when a user goes from an existing chat to a new chat and sends a message there (`id → null → newId`). The null → newId transition was still hitting the reset branch and wiping the in-flight subagent ticker/content for that first turn. Simpler rule: only reset when the PREVIOUS paramId is an established id. Every transition AWAY from an established chat clears (id→id2, id→null, id→undefined); every transition FROM null/undefined passes through (initial mount, new-chat URL stamp mid-stream). Drop the `hasEstablishedConversationRef` machinery in favor of that single condition. * 🪆 fix: Codex P2 — match runtime's strict subagent enable check in ACL Runtime (`initializeClient` + `run.ts`) treats `subagents?.enabled` as a truthy predicate — `undefined`, `null`, missing, and `false` all short-circuit. The ACL gate was using `!== false` which accepted `undefined` / missing as "enabled" and could 403 a payload whose subagent tool would be inert at runtime. Swap both create and update to `enabled === true`. Only a strictly- enabled payload triggers the ACL check; the disable path (`false`) still passes through so a user who lost VIEW on a child can still save the disable edit. * 🪆 fix: Codex P2 — reject missing subagent references with 400 `validateSubagentAccess` collapsed through `collectUnauthorizedAgentIds`, which returns an empty list for ids with no DB record — so typos and references to deleted agents passed validation silently, and `initializeClient` later dropped them at runtime. Saved config would then list spawn targets that the backend never honored, a hard-to- diagnose drift. Refactor the helper into `classifyAgentReferences(ids, …)` which returns `{ missing, unauthorized }` separately. `validateEdgeAgentAccess` keeps its old semantics (missing is intentional — a self-referential `from` names the agent being created). `validateSubagentReferences` surfaces both buckets so the create/update handlers can 400 on missing and 403 on unauthorized with distinct error messages and `agent_ids` lists. * 🪆 polish: tighten subagent dialog grid gap to gap-2 OGDialogContent's grid default is `gap-4`, which renders the title, description, and scroll area as three visually separated panels. Drop to `gap-2` so they read as one block. * 🪆 polish: swap Subagents above Handoffs in Advanced panel Subagents is the more common knob users reach for, so show it first. Handoffs keep the same Controller wiring, just move below.	2026-04-25 04:02:01 -04:00
Danny Avila	35bf04b26c	🧰 refactor: Unify code-execution tools (#12767 ) * 🛠️ feat: Add registerCodeExecutionTools helper Idempotently registers `bash_tool` + `read_file` in the run's tool registry and tool-definition list via a registry `.has()` dedupe. Sets up the single code-execution tool path shared by: - `initializeAgent` (when an agent has `execute_code` in its tools and the capability is enabled for the run) - `injectSkillCatalog` (when skills are active; unconditional read_file, bash_tool follows `codeEnvAvailable`) Both callers reach the helper in the same initialization sequence, so the second call becomes a no-op and exactly one copy of each tool reaches the LLM — no more double registration for agents that combine `execute_code` capability with active skills. Unit-tested on a fresh run, idempotence (second call, overlap with prior tooldefs, partial overlap), and the no-registry variant. * 🔀 refactor: Route injectSkillCatalog bash_tool + read_file through registerCodeExecutionTools The `skill` tool is still registered inline (it's skill-path-specific), but `bash_tool` + `read_file` now flow through the shared idempotent helper so a prior registration from the execute_code path doesn't produce a duplicate copy later in the same run. Behavior preserved: - `read_file` always registers when any active skill is in scope — manually-primed `disable-model-invocation: true` skills still need it to load `references/` from storage. - `bash_tool` follows `codeEnvAvailable` exactly as before. Adds a test pinning the cross-call dedupe: when `injectSkillCatalog` runs AFTER `registerCodeExecutionTools` has already seeded the registry + tool definitions with bash_tool/read_file, the resulting `toolDefinitions` still contains exactly one copy of each. 🪄 feat: Expand `execute_code` tool name into bash_tool + read_file at initialize-time When an agent's `tools` include `execute_code` and the `execute_code` capability is enabled for the run, `initializeAgent` now registers `bash_tool` + `read_file` via `registerCodeExecutionTools` before `injectSkillCatalog`. The legacy `execute_code` tool definition is no longer handed to the LLM — `execute_code` remains on the agent document as a capability-trigger marker, but the runtime expands it into the skill-flavored tool pair. Call ordering matters: the `execute_code` registration runs BEFORE `injectSkillCatalog`, so the skill path's own `registerCodeExecutionTools` call inside `injectSkillCatalog` becomes a no-op via the registry's `.has()` check. Exactly one copy of each tool reaches the LLM whether the agent has: - only `execute_code` (legacy path) - only skills - both No data migration needed — `agent.tools: ['execute_code']` stays in the DB unchanged; the expansion is a runtime operation. Three tests cover the matrix: execute_code + capability on → bash_tool + read_file registered; execute_code + capability off → neither registered; no execute_code + capability on → neither registered. * 🗑️ refactor: Drop CodeExecutionToolDefinition from the builtin registry Removes the legacy `execute_code` entry from `agentToolDefinitions` and the corresponding import. With the initialize-time expansion in place, nothing consults `getToolDefinition('execute_code')` for a tool schema any more — the capability gate still filters on the string `execute_code`, but the actual tool definitions the LLM sees come from `registerCodeExecutionTools` (i.e. `bash_tool` + `read_file`). `loadToolDefinitions` in `packages/api/src/tools/definitions.ts` silently drops `execute_code` when it no longer resolves in the registry — that's the expected path and is now covered by an updated test. No caller of `getToolDefinition('execute_code')` expects a non-undefined result after this change. * 🔌 refactor: Read CODE_API_KEY from env for primeCodeFiles + PTC Finishes the Phase 4 server-env-keyed rollout on the two remaining `loadAuthValues({ authFields: [EnvVar.CODE_API_KEY] })` sites in `ToolService.js`: - `primeCodeFiles` (user-attached file priming on execute_code agents) - Programmatic Tool Calling (`createProgrammaticToolCallingTool`) Both now read `process.env[EnvVar.CODE_API_KEY]` directly, matching `bash_tool`'s pattern. The per-user plugin-auth path is no longer consulted for code-env credentials anywhere in the hot path — the agents library owns the actual tool-call execution and also reads the env var internally. Priming still fires for existing user-file workflows so the legacy `toolContextMap[execute_code]` hint ("files available at /mnt/data/...") stays in the prompt; only the key lookup changed. * 🔧 fix: Type the pre-seeded dedupe-test tools as LCTool CI TypeScript type checks caught `{ parameters: {} }` in the new cross-call dedupe test: `LCTool.parameters` is a `JsonSchemaType`, not `{}`. Use `{ type: 'object', properties: {} }` and type the local registry Map through the parameter-derived shape so the pre-seeded values match what `toolRegistry.set` expects. * 🛡️ fix: Run execute_code expansion before GOOGLE_TOOL_CONFLICT gate Codex review caught a latent regression: the original Phase 8 placement ran `registerCodeExecutionTools` after `hasAgentTools` was computed, so an execute-code-only agent on Google/Vertex with provider-specific `options.tools` populated would no longer trip `GOOGLE_TOOL_CONFLICT` — the legacy `CodeExecutionToolDefinition` used to populate `toolDefinitions` before the guard, but after dropping it from the registry, `toolDefinitions` stayed empty until my expansion ran downstream of the guard. Mixed provider + agent tools would silently flow through to the LLM. Fix moves the `execute_code` expansion to BEFORE `hasAgentTools` computation. `bash_tool` + `read_file` now contribute to the check the same way the legacy `execute_code` def did. Covered by a new test that pins the Google+execute_code+provider-tools scenario — the `rejects.toThrow(/google_tool_conflict/)` path would have silently passed on the prior placement. * 🔗 fix: Thread codeEnvAvailable through handoff sub-agents Round-2 codex review caught the other half of the execute_code expansion gap: `discoverConnectedAgents` omitted `codeEnvAvailable` from its forwarded `initializeAgent` params, so handoff sub-agents with `agent.tools: ['execute_code']` lost the `bash_tool` + `read_file` registration (pre-Phase 8 the legacy `CodeExecutionToolDefinition` would have landed in their `toolDefinitions` via the registry). - Add `codeEnvAvailable?` to `DiscoverConnectedAgentsParams` and forward it verbatim on every sub-agent `initializeAgent` call. - Update the three JS call sites that construct the primary's `codeEnvAvailable` (`services/Endpoints/agents/initialize.js`, `controllers/agents/openai.js`, `controllers/agents/responses.js`) to pass the same flag into `discoverConnectedAgents` — one authoritative source per request. - Two regression tests in `discovery.spec.ts` pin the true/false passthrough so a future refactor that drops the param-forwarding surfaces immediately. Left intentionally unchanged: `packages/api/src/agents/openai/service.ts` (public API helper with no in-repo caller). External consumers of `createAgentChatCompletion` who want code execution should pass a `codeEnvAvailable`-aware `initializeAgent` via `deps` — documenting the full public-API surface is out of scope for this Phase 8 PR. * 🔗 fix: Thread codeEnvAvailable through addedConvo + memory-agent paths Round-3 codex review caught the last two production `initializeAgent` callers missing the Phase-8 capability flag: - `api/server/services/Endpoints/agents/addedConvo.js` (multi-convo parallel agent execution). Added `codeEnvAvailable` to `processAddedConvo`'s destructured params and forwarded it into the per-added-agent `initializeAgent` call. Caller in `api/server/services/Endpoints/agents/initialize.js` passes the same `codeEnvAvailable` it computed for the primary. - `api/server/controllers/agents/client.js` (`useMemory` — memory extraction agent). Computes its own `codeEnvAvailable` from `appConfig?.endpoints?.[EModelEndpoint.agents]?.capabilities` and forwards into `initializeAgent`. Memory agents rarely list `execute_code`, but if one does, pre-Phase 8 they got the legacy `execute_code` tool registered unconditionally — the passthrough restores parity. With this, every production caller of `initializeAgent` explicitly resolves the capability: main chat flow (primary + handoff), OpenAI chat completions (primary + handoff), Responses API (primary + handoff), added convo parallel agents, and memory agents. The one remaining caller, `packages/api/src/agents/openai/service.ts::createAgentChatCompletion`, is a public API helper with no in-repo consumer (external callers must pass a capability-aware `initializeAgent` via `deps`). * 🪤 fix: Remove duplicate appConfig declaration causing TDZ ReferenceError The Responses API controller had TWO `const appConfig = req.config;` bindings inside `createResponse`: one at the top of the function (added by the Phase 4 `bash_tool` decouple) and one inside the try block (added by the polish PR #12760). Because `const` is block-scoped with a temporal dead zone, the inner redeclaration put `appConfig` in TDZ for the entire try block, so any earlier reference inside the try — notably `appConfig?.endpoints?.[EModelEndpoint.agents]?.allowedProviders` at line 348 — threw `ReferenceError: Cannot access 'appConfig' before initialization`. The error was silently swallowed by the outer try/catch, leaving `recordCollectedUsage` unreached and the six `responses.unit.spec.js` token-usage tests failing. Removing the inner redeclaration fixes the six failing tests (verified: 11/11 pass locally post-fix, 0 regressions elsewhere). The outer function-scoped binding already provides `appConfig` to every downstream reference. * 🔗 fix: Thread codeEnvAvailable through the OpenAI chat-completion public API Round-4 codex review (legitimate on the type-safety angle, even though the runtime concern was already covered): the `createAgentChatCompletion` helper defines its own narrower `InitializeAgentParams` interface locally, and the type was missing `codeEnvAvailable`. External consumers who supply a capability-aware `deps.initializeAgent` couldn't route `codeEnvAvailable` through without a type-cast workaround. - Widen the local `InitializeAgentParams` interface to include `codeEnvAvailable?: boolean` (matches the real `packages/api/src/agents/initialize.ts` type). - Derive `codeEnvAvailable` inside `createAgentChatCompletion` from `deps.appConfig?.endpoints?.agents?.capabilities` (the same source the in-repo controllers use) and forward to `deps.initializeAgent`. Uses a string literal `'execute_code'` lookup so this file stays free of a `librechat-data-provider` import — keeping the dependency surface of the public helper minimal. With this, external consumers of `createAgentChatCompletion` who pass `appConfig` with the agents capabilities get `bash_tool` + `read_file` registration automatically; consumers who don't pass `appConfig` retain the existing "explicit opt-in" semantics (the flag stays `undefined`, expansion is skipped). * 🧹 chore: Review-driven polish — observability log, JSDoc DRY, test gaps, no-op allocation Addresses the comprehensive review of PR #12767: - Finding #1 (MINOR, observability): `initializeAgent` now emits a debug log when an agent lists `execute_code` in its tools but the runtime gate is off (`params.codeEnvAvailable` !== true). The event-driven `loadToolDefinitionsWrapper` path doesn't log capability-disabled warnings, so without this the tool silently vanishes from the LLM's definitions with zero trace. Operators debugging "why isn't code interpreter working?" now get a signal at the initialize layer. - Finding #5 (NIT, allocation): `registerCodeExecutionTools` now returns the input `toolDefinitions` array by reference on the no-op path (both tools already registered by a prior caller in the same run) instead of allocating a fresh spread array every time. The common dual-call scenario — `initializeAgent` then `injectSkillCatalog` — saves one O(n) copy per request. - Finding #4 (NIT, DRY): Collapsed the duplicated 6-line JSDoc comment in `openai.js`, `responses.js`, and `addedConvo.js` into either a one-line `@see DiscoverConnectedAgentsParams.codeEnvAvailable` pointer (the two JS call sites) or a compact 3-line block referring back to the canonical source (addedConvo's @param). - Finding #2 (MINOR, test gap): Added `api/server/services/Endpoints/agents/addedConvo.spec.js` with three cases covering `codeEnvAvailable=true`, `codeEnvAvailable=false`, and omitted (undefined) passthrough. A future refactor that drops the param from destructuring now surfaces here instead of silently regressing multi-convo parallel agents with `execute_code`. - Finding #3 (MINOR, test gap): Added `api/server/controllers/agents/__tests__/client.memory.spec.js` pinning the capability-flag derivation that `AgentClient::useMemory` uses — six cases covering present/absent/null/undefined config shapes plus an enum-literal pin (`'execute_code'` / `'agents'`). Catches enum renames or config-path shifts that would otherwise silently strip `bash_tool` + `read_file` from memory agents. Finding #7 (jest.mock scoping, confidence 40) left as-is: the reviewer's own risk assessment noted `buildToolSet` doesn't touch the mocked exports, and restructuring a file-level `jest.mock` to `jest.doMock` + dynamic `import()` introduces more complexity than the speculative risk justifies. The existing mock is scoped to the test file and contains the same stubs the adjacent `skills.test.ts` already uses. Finding #6 (PR description commit count) addressed out-of-band via PR description update. All existing tests pass, typecheck clean, lint clean across touched files. New tests: 9 cases across 2 new spec files. * 🧽 refactor: Replace hardcoded 'execute_code' string with AgentCapabilities enum in service.ts Follow-up review (conf 55) caught that `openai/service.ts`'s Phase 8 `codeEnvAvailable` derivation used the literal `'execute_code'` while every in-repo controller uses `AgentCapabilities.execute_code` from `librechat-data-provider`. The file deliberately uses local type interfaces to keep the public API helper's type surface small, but that pattern was never a ban on single-value imports from the data provider — `packages/api` already depends on it. Importing the enum value means a future rename of `AgentCapabilities.execute_code` propagates to this file automatically, matching the in-repo controllers' behavior. Other follow-up findings left as-is per the reviewer's own verdict: - #2 (memory spec mirrors the production expression rather than calling `AgentClient::useMemory` directly): reviewer flagged as "not blocking" / "design-philosophy observation." The test file's JSDoc already explicitly documents the tradeoff and pins the enum literals to catch the most likely drift vector. Standing up `AgentClient` + all its mocks for a one-line regression guard is disproportionate. - #3 (`addedConvo.spec.js` mock signature vs. underlying `loadAddedAgent` arity): reviewer's own confidence 25 noted the mock matches the wrapper's actual call pattern in the production file. Not a real gap. - #4 was self-retracted as a false alarm. * 🗑️ refactor: Fully deprecate CODE_API_KEY — remove all LibreChat-side references The code-execution sandbox no longer authenticates via a per-run `CODE_API_KEY` (frontend or backend). Auth moved server-side into the agents library / sandbox service, so LibreChat drops every reference: Backend plumbing: - `api/server/services/Files/Code/crud.js`: `getCodeOutputDownloadStream`, `uploadCodeEnvFile`, `batchUploadCodeEnvFiles` no longer accept `apiKey` or send the `X-API-Key` header. - `api/server/services/Files/Code/process.js`: `processCodeOutput`, `getSessionInfo`, `primeFiles` drop the `apiKey` param throughout. - `api/server/services/ToolService.js`: stop reading `process.env[EnvVar.CODE_API_KEY]` for `primeCodeFiles` and PTC; the agents library handles auth internally. Remove the now-dead `loadAuthValues` + `EnvVar` imports. Drop the misleading "LIBRECHAT_CODE_API_KEY" hint from the bash_tool error log. - `api/server/services/Files/process.js`: remove the `loadAuthValues` call around `uploadCodeEnvFile`. - `api/server/routes/files/files.js`: code-env file download no longer fetches a per-user key. - `api/server/controllers/tools.js`: `execute_code` is no longer a tool that needs verifyToolAuth with `[EnvVar.CODE_API_KEY]` — the endpoint always reports system-authenticated so the client skips the key-entry dialog. `processCodeOutput` called without `apiKey`. - `api/server/controllers/agents/callbacks.js`: `processCodeOutput` invoked without the loadAuthValues round trip, for both LegacyHandler and Responses-API handlers. - `api/app/clients/tools/util/handleTools.js`: `createCodeExecutionTool` called with just `user_id` + files. packages/api: - `packages/api/src/agents/skillFiles.ts`: `PrimeSkillFilesParams`, `PrimeInvokedSkillsDeps`, `primeSkillFiles`, `primeInvokedSkills` all drop the `apiKey` param; the gate is purely `codeEnvAvailable`. - `packages/api/src/agents/handlers.ts`: `handleSkillToolCall` drops the `process.env[EnvVar.CODE_API_KEY]` read; skill-file priming is now gated solely on `codeEnvAvailable`. `ToolExecuteOptions` signatures drop apiKey from `batchUploadCodeEnvFiles` and `getSessionInfo`. - `packages/api/src/agents/skillConfigurable.ts`: JSDoc no longer references the env var. - `packages/api/src/tools/classification.ts`: PTC creation no longer gated on `loadAuthValues`; `buildToolClassification` drops the `loadAuthValues` dep entirely (no LibreChat-side callers need it for this path anymore). - `packages/api/src/tools/definitions.ts`: `LoadToolDefinitionsDeps` drops the `loadAuthValues` field. Frontend: - Delete `client/src/hooks/Plugins/useAuthCodeTool.ts`, `useCodeApiKeyForm.ts`, and `client/src/components/SidePanel/Agents/Code/ApiKeyDialog.tsx` — the install/revoke dialogs for CODE_API_KEY are fully dead. - `BadgeRowContext.tsx`: drop `codeApiKeyForm` from the context type and provider. `codeInterpreter` toggle treated as always authenticated (sandbox auth is server-side). - `ToolsDropdown.tsx`, `ToolDialogs.tsx`, `CodeInterpreter.tsx`, `RunCode.tsx`, `SidePanel/Agents/Code/Action.tsx` +`Form.tsx`: all API-key dialog trigger refs, "Configure code interpreter" gear buttons, and auth-verification plumbing removed. The "Code Interpreter" toggle is now a plain `AgentCapabilities.execute_code` checkbox — no key-entry gate. - `client/src/locales/en/translation.json`: drop the three `com_ui_librechat_code_api` keys and `com_ui_add_code_interpreter_api_key`. Other locales are externally automated per CLAUDE.md. Config:* - `.env.example`: remove the `# LIBRECHAT_CODE_API_KEY=your-key` section and its header. Tests: - `crud.spec.js`: assertions flipped to pin "no X-API-Key header" and "no apiKey param". - `skillFiles.spec.ts`: removed env-var save/restore; tests now pin that the batch-upload path is gated solely on `codeEnvAvailable` and that no apiKey is threaded through. - `handlers.spec.ts`: same — just the `codeEnvAvailable` gate pins remain. - `classification.spec.ts`: remove the two tests that asserted `loadAuthValues` was (not) called for PTC. - `definitions.spec.ts`: drop every `loadAuthValues: mockLoadAuthValues` entry from the deps shape. - `process.spec.js`: strip the mock of `EnvVar.CODE_API_KEY`. Comment hygiene: - `tools.ts`, `initialize.ts`, `registry/definitions.ts`: shortened stale comment references to "legacy `execute_code` tool" without naming the retired env var. Tests verified: 678 packages/api tests pass, 836 backend api tests pass. Typecheck clean, lint clean. Only remaining CODE_API_KEY mentions in the code are two regression-guard assertions: - `crud.spec.js`: pins "no X-API-Key header" stays absent. - `skillConfigurable.spec.ts`: pins `configurable` never grows a `codeApiKey` field. * 🧹 chore: Remove the last two CODE_API_KEY name mentions in LibreChat Follow-up to the prior full deprecation commit: two tests still named the retired identifier in their regression-guard assertions. - `packages/api/src/agents/skillConfigurable.spec.ts`: drop the "does not inject a codeApiKey key" test. The `codeApiKey` field is gone from the production configurable shape, so an absence-assertion naming it re-introduces the retired identifier in code. - `api/server/services/Files/Code/crud.spec.js`: rename the "without an X-API-Key header" case back to "should request stream response from the correct URL" and drop the `expect(headers).not.toHaveProperty('X-API-Key')` assertion. The surrounding request-shape checks (URL, timeout, responseType) still pin the behavior; the explicit header-absence line was named-after the deprecated contract. Result: `grep -rn "CODE_API_KEY\\|codeApiKey\\|LIBRECHAT_CODE_API_KEY"` against the LibreChat source tree returns zero hits. The only remaining `X-API-Key` strings in this repo are on unrelated OpenAPI Action + MCP server auth configurations, where the string is user-facing config, not a LibreChat-owned identifier. Tests: 677 packages/api pass (2 pre-existing summarization e2e failures unrelated); 126 api-workspace controller/service tests pass. Typecheck and lint clean. * 🎯 fix: Narrow codeEnvAvailable to per-agent (admin cap AND agent.tools) Before this commit, `codeEnvAvailable` was computed in the three JS controllers as the admin-level capability flag only (`enabledCapabilities.has(AgentCapabilities.execute_code)`) and passed through `initializeAgent` → `injectSkillCatalog` / `primeInvokedSkills` / `enrichWithSkillConfigurable` unchanged. A skills-only agent whose `tools` array didn't include `execute_code` still got `bash_tool` registered (via `injectSkillCatalog`) and skill files re-primed to the sandbox on every turn — wrong, because the agent never opted in to code execution. Fix: `initializeAgent` now computes the per-agent effective value once as `params.codeEnvAvailable === true && agent.tools.includes(Tools.execute_code)`, reuses the same boolean for: 1. The `execute_code` → `bash_tool + read_file` expansion gate (previously already consulted `agent.tools`; now shares the single `effectiveCodeEnvAvailable` binding). 2. The `injectSkillCatalog` call (previously got the raw admin flag). 3. The returned `InitializedAgent.codeEnvAvailable` field (new, typed as required boolean). Controllers (initialize.js, openai.js, responses.js): store `primaryConfig.codeEnvAvailable` in `agentToolContexts.set(primaryId, ...)`, capture `config.codeEnvAvailable` in every handoff `onAgentInitialized` callback, and read it from the per-agent ctx inside the `toolExecuteOptions.loadTools` runtime closure. The hoisted `const codeEnvAvailable = enabledCapabilities.has(...)` locals in the two OpenAI-compat controllers are gone — they were shadowing the narrowed per-agent value. primeInvokedSkills: `handlePrimeInvokedSkills` in `services/Endpoints/agents/initialize.js` now uses `primaryConfig.codeEnvAvailable` (per-agent, narrowed) instead of the raw admin flag. A skills-only primary agent won't re-prime historical skill files to the sandbox even when the admin enabled the capability globally. Efficiency: one extra `&&` in `initializeAgent`. No runtime hot-path cost — the `includes()` scan on `agent.tools` was already happening for the `execute_code` expansion gate; it's now just bound to a local. Tool execution closures read `ctx.codeEnvAvailable === true` (property access + strict equality, O(1)). Ephemeral-agent note: per-agent narrowing is authoritative for both persisted and ephemeral flows. The ephemeral toggle (`ephemeralAgent.execute_code`) is reconciled into `agent.tools` upstream in `packages/api/src/agents/added.ts`, so `agent.tools.includes('execute_code')` is the single source of truth by the time `initializeAgent` runs. Tests: two new regression tests pin the narrowing contract: - `initialize.test.ts` — four-quadrant matrix on `InitializedAgent.codeEnvAvailable` (cap on × agent asks, cap on × doesn't ask, cap off × asks, neither). Catches future refactors that drop either half of the AND. - `skills.test.ts` — `injectSkillCatalog` with `codeEnvAvailable: false` against an active skill catalog must NOT register `bash_tool` even though it still registers `read_file` + `skill`. This is the state a skills-only agent gets post-narrowing. All 191 affected packages/api tests pass + 836 backend api tests pass. Typecheck clean, lint clean. * 🧽 refactor: Comprehensive-review polish — hoist tool defs, pin verifyToolAuth contract, doc appConfig Addresses the comprehensive review of Phase 8. Findings mapped: #1 (MINOR): `verifyToolAuth` unconditional auth for execute_code - Added doc comment explicitly stating the deployment contract (admin capability → reachable sandbox; no per-check health probe to keep UI-gate queries O(1)). - New `api/server/controllers/__tests__/tools.verifyToolAuth.spec.js` with 4 regression tests pinning the contract: 1. `authenticated: true` + `SYSTEM_DEFINED` for execute_code. 2. 404 for unknown tool IDs. 3. `loadAuthValues` is never consulted (catches a future revert that would resurface the per-user key-entry dialog). 4. Response `message` is never `USER_PROVIDED`. #2 (MINOR): `openai/service.ts` undocumented `appConfig` dependency - Expanded the `ChatCompletionDependencies.appConfig` JSDoc to spell out that omitting it silently disables code execution for agents with `execute_code` in their tools. External consumers of `createAgentChatCompletion` now have the contract documented at the type boundary. #5 (NIT): `registerCodeExecutionTools` re-allocates tool defs - Hoisted `READ_FILE_DEF` and `BASH_TOOL_DEF` to module-level `Object.freeze`d constants. The shapes derive entirely from static `@librechat/agents` exports, so a single frozen object per tool is safe to share across every agent init. Eliminates the ~4-property allocations on every call (including the common second-call no-op path). #6 (NIT): Verbose history-priming comment in initialize.js - Trimmed the 16-line `handlePrimeInvokedSkills` block to a 5-line summary with `@see InitializedAgent.codeEnvAvailable` pointer. The canonical narrowing explanation lives on the type; the controller comment is just the ACL-vs-capability rationale. Skipped: - #3 (memory spec tests a mirror function): reviewer self-dismissed as a design tradeoff; the enum-literal pin already catches the highest-risk drift vector. - #4 (cross-repo contract for `createCodeExecutionTool`): user will explicitly install the latest `@librechat/agents` dev version once the companion PR publishes, so the version pin will be authoritative. - #7 (migration/deprecation note for self-hosters): out of scope per user direction — release notes handle this. Tests verified: 679 packages/api + 840 backend api tests pass. Typecheck + lint clean. * 🔧 chore: Update @librechat/agents version to 3.1.68-dev.1 across package-lock and package.json files This commit updates the version of the `@librechat/agents` package from `3.1.68-dev.0` to `3.1.68-dev.1` in the `package-lock.json` and relevant `package.json` files. This change ensures consistency across the project and incorporates any updates or fixes from the new version.	2026-04-25 04:02:01 -04:00
Danny Avila	ac913aa886	🔐 chore: Skills Permissions Housekeeping, Reachable Admin Dialog + Defaults Tests (#12766 ) * 🔐 chore: Skills permissions housekeeping — reachable admin dialog + defaults tests Phase 9 housekeeping pass. Skills was already gated on `PermissionTypes.SKILLS` (seeded from `interface.skills`) and `AgentCapabilities.skills` everywhere it matters, but two smaller parity gaps with Prompts/Memory/MCP remained: - The skills admin settings dialog had no UI entry point. The only mount was inside an unused `FilterSkills` component, so admins had no way to reach the role-permissions editor for skills. Mounted it in `SkillsAccordion` gated on `SystemRoles.ADMIN`, matching the `PromptsAccordion` pattern. - No regression lock on skill permission defaults. `roles.spec.ts` asserted structural completeness but not the specific shape — a future refactor could silently flip ADMIN's `USE/CREATE/SHARE/SHARE_PUBLIC` to false or drop SKILLS from USER defaults without failing. Added explicit Skills assertions for both roles. - No lock on `AgentCapabilities.skills` being in `defaultAgentCapabilities`. Added an assertion in `endpoints.spec.ts`. * 🩹 fix: Remove duplicate `const appConfig` in Responses createResponse The Skills polish commit (#12760) added `const appConfig = req.config;` at line 381 inside the try block of `createResponse`, without noticing that the earlier drive-by fix (`2463b6acd`) already declared it at the function top (line 283). The second `const` creates a new block-scoped binding inside the try, so earlier references within the same block (e.g. line 348's `appConfig?.endpoints?.[EModelEndpoint.agents]?.allowedProviders`) now hit the TDZ instead of the outer binding and throw `ReferenceError: Cannot access 'appConfig' before initialization` — which the outer try/catch then swallows into a generic 500. This surfaced as all six token-usage tests in `api/server/controllers/agents/__tests__/responses.unit.spec.js` failing with `mockRecordCollectedUsage` never being called (because the throw skips past the `recordCollectedUsage(...)` call). Dropping the inner re-declaration restores the full control flow. All 11 tests in the file pass again. * 🧹 refactor: Address review nits on Phase 9 housekeeping - Move the `defaultAgentCapabilities` regression test out of the `createEndpointsConfigService` describe block and into its own top-level describe. It tests a module constant and has no relationship to the service factory; nesting was misleading and made it easier to accidentally drop if the service tests are later restructured. - Re-order local imports in `SkillsAccordion.tsx` longest-to-shortest per AGENTS.md convention (`SkillsSidePanel` 48 chars before `useAuthContext` 41 chars).	2026-04-25 04:02:01 -04:00
Danny Avila	91cd3f7b7c	🧽 refactor: Skills polish: precedence-aware body validation, controller drop logs, SkillPills rename (#12760 ) Post-merge sanity-review cleanup on top of #12746: - `createSkill` / `updateSkill` now parse SKILL.md body's always-apply status once and reuse it for both validation and derivation (was parsing the same YAML block twice per call). - Body-inline `always-apply:` validation becomes precedence-aware: a caller sending an explicit top-level `alwaysApply` or a structured `frontmatter['always-apply']` no longer gets rejected for a typo in the body — the body value is never consulted at derivation time when a higher-precedence source wins. New tests cover the three relevant interactions (explicit+body-typo, frontmatter+body-typo, body-only typo still rejects). - OpenAI and Responses controllers now emit a `logger.warn` when `injectSkillPrimes` drops always-apply primes to stay under `MAX_PRIMED_SKILLS_PER_TURN`. `injectSkillPrimes` already logs internally; the controller-level warn adds endpoint context so operators can identify which path hit the cap at a glance. Mirrors AgentClient's existing log. - Rename `ManualSkillPills` → `SkillPills` (component + type + file + test + all JSDoc references). The component handles both manual and always-apply pills now; the original name was carried over from the manual-only Phase 3 and misleads new readers. - Drive-by fix: declare `appConfig = req.config` at the top of `createResponse` in `responses.js` — it was used unqualified on lines 381/396, which silently evaluated to `undefined` (via optional chaining) and disabled the skills-capability check on the Responses endpoint. Pre-existing, surfaced by lint on the touched file.	2026-04-25 04:02:01 -04:00
Danny Avila	7581540ab6	🔌 refactor: Decouple bash_tool from Per-User CODE_API_KEY (#12712 ) * 🔌 refactor: Decouple bash_tool from Per-User CODE_API_KEY Phase 4 of Agent Skills umbrella (#12625): gate bash_tool and skill file priming on the `execute_code` capability only. Thread a boolean `codeEnvAvailable` through `enrichWithSkillConfigurable` and `primeInvokedSkills` in place of the old per-user `codeApiKey` + `loadAuthValues` plumbing. The sandbox API key is the LibreChat- hosted service key — system-level, not a user secret — so the per-user lookup was legacy; when needed, it's read directly from `process.env[EnvVar.CODE_API_KEY]` inside the capability gate. `handleSkillToolCall` and `primeInvokedSkills` gate sandbox uploads on `codeEnvAvailable` first, preventing skill-file uploads to the sandbox when an agent has `execute_code` disabled even if the env var happens to be set. The agents library resolves the env key itself for `bash_tool`, so `ToolService.js` drops the `loadAuthValues` lookup and the "Code execution is not available" placeholder tool in favor of a plain `createBashExecutionTool({})` with a loud error log if the env var is missing. Also fixes a pre-existing `appConfig`-undefined lint error in `responses.js`/`createResponse` that surfaced when this file was touched (declares `const appConfig = req.config` at function top, matching the existing pattern in other controllers). Preserves the `skillPrimedIdsByName` threading added by Phase 3/5/6 and all Phase 3/5/6 call-site signatures. Adds `skillConfigurable.spec.ts` (5 cases pinning the new surface) and `skillFiles.spec.ts` (4-way matrix of capability × env key for `primeInvokedSkills`). * 🧪 refactor: Address Codex Review Feedback Resolves findings from the second codex review on #12712: - MAJOR: `handlers.spec.ts` now covers the `codeEnvAvailable` gate in `handleSkillToolCall` across three cases (gate off, gate on + env set, gate on + env unset). The gate is the critical regression prevention — a future edit that drops it would silently re-enable sandbox uploads for agents with `execute_code` disabled. - MINOR: Hoist `codeEnvAvailable` and `skillPrimedIdsByName` out of `loadTools` closures in `openai.js` and `responses.js`. Both values are fixed once `initializeAgent` resolves, so recomputing them on every tool execution was wasted work. `responses.js` shares a single pair between its streaming and non-streaming branches. - MINOR: `skillFiles.spec.ts` now has a test that exercises the full upload path end-to-end with real file records, asserting `batchUploadCodeEnvFiles` is called with the env-sourced apiKey and the correct file set (including the synthetic `SKILL.md`). - NIT: Finish the `appConfig` extraction in `responses.js/createResponse` — replaces the remaining `req.config` references with `appConfig` for consistency with the pattern in other controllers. No behavioral changes beyond what was already in place; this is coverage and readability polish. * 🧷 test: Tighten Spec Hygiene Per Codex Nit Feedback Round-3 codex review flagged two NITs on the test code added in the previous commit: - Replace `_id: 'skill-id' as unknown as never` in the new `makeSkillHandlerWithFiles` helper with a real `Types.ObjectId`, matching the pattern used by the primed-skill tests further up in the same file (and by `skillFiles.spec.ts`). The `never` cast hides the fact that `_id` really is a string / ObjectId at runtime. - Replace the ad-hoc `{ on, pipe, read }` stub with a real `Readable.from(Buffer.from(''))` in the upload-path test. The stub worked only because `batchUploadCodeEnvFiles` is mocked and never iterates the stream; `Readable.from` satisfies the same contract and is robust to any future partial-real replacement of the upload function. Pure test-hygiene improvements; no runtime code touched. * 🧹 chore: Remove Duplicate appConfig Declaration After Rebase The upstream ``2463b6acd`` fix declared `const appConfig = req.config` inside the try block (line 381) to patch the same `no-undef` error I fixed in this PR at the top of `createResponse` (line 283). The rebase kept both declarations side-by-side. Drops the inner one — the outer binding covers every downstream reference already.	2026-04-25 04:02:01 -04:00

1 2 3 4 5

249 commits