Commit graph

129 commits

Author SHA1 Message Date
Danny Avila
dc42748813
🧷 fix: Bind Agent File Context to Current Turn (#13506)
* fix: Bind agent file context to current turn

* fix: Avoid duplicating agent file context

* fix: Export agent file context prepender

* test: Use exported file context prepender

* fix: Keep file context transient for memory and counts
2026-06-04 09:03:43 -04:00
Danny Avila
baa23a8e24
🗂️ feat: Add Private Chat Projects (#13467)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
GitNexus Index / index (push) Waiting to run
GitNexus Index / post-index (push) Blocked by required conditions
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions
Sync Helm Chart Tags / Ignore non-main push (push) Waiting to run
Sync Helm Chart Tags / Sync chart tags (push) Waiting to run
* feat: Add private chat projects

* fix: Format project files

* fix: Address project review findings

* fix: Resolve project review follow-ups

* fix: Handle project stats and cache edge cases

* style: align projects UI with sidebar patterns

* fix: resolve projects UI lint issues

* style: Align project menus and composer

* fix: Avoid project placeholder shadowing

* fix: Handle project search and stale ids

* fix: Polish project sidebar behavior

* fix: Preserve new chat stream after creation

* fix: Stabilize project sidebar sections

* fix: Smooth project sidebar organization

* fix: stabilize project chat entry

* fix: keep project workspace outside chat context

* fix: show default model on project workspace

* fix: fallback project workspace model label

* fix: preserve project scope during draft hydration

* fix: include route project in new chat submission

* fix: persist project id in agent chat saves

* fix: refine project sidebar and creation UX

* fix: export chat project method types

* fix: polish project landing context

* fix: refine project navigation affordances

* feat: rework projects UX — coexisting sidebar sections + URL-driven scope

Sidebar
- Replace the chronological/by-project mode toggle with coexisting
  Projects + Chats sections (both always visible)
- Remove ProjectConversations (927 lines), the org-mode Header, and types
- Add ProjectsSection: collapsible project rows that unfurl chats inline
  (full-size rows), with per-project new chat and an open/rename/delete menu
- Lift the marketplace/favorites shortcuts above the Projects section

Chat scope
- Derive a new chat's project strictly from the URL ?projectId, so the
  global New Chat no longer stays stuck in a project after a project chat

Surfaces
- Chat landing: subtle, clickable project chip instead of the floating badge
- Project workspace: modest header, composer-style entry, chats list
- All-projects grid: Claude-style cards with pluralized chat counts

* chore: prune unused i18n keys; fix project chat-count pluralization

* fix: project new-chat keeps model spec; sidebar header + row polish

- newConversation: ignore a chatProjectId-only template when deciding to
  apply the default model spec, so starting a chat in a project no longer
  strips the conversation `spec`
- useSelectMention: the Model Selector and @ command now retain the active
  project across endpoint/spec/preset switches; other new-chat paths still
  clear it
- Chats header now matches the Projects header (inline chevron + a new-chat
  icon button) and starts a non-project chat
- Project rows: use the new-chat icon for the per-project add button, render
  at text-sm to match the chat list, and align the row actions + hover color
  with conversation rows

* fix: read project scope from router params; align sidebar header icons

- useSelectMention now reads the active project from React Router's search
  params instead of window.location, which can drift out of sync because
  new-chat params are written to the URL via raw history.pushState; the
  Model Selector and @ command now reliably keep the project on switch
- Move the Chats section header out of the virtualized list so it renders
  in the same context as the Projects header and isn't shifted by the
  list scrollbar
- Inset header action icons (pr-2) so Projects/Chats header icons line up
  with the project-row and conversation-row trailing actions
- Extract getRouteChatProjectId into utils for the submit path

* fix: preserve chatProjectId through the new-chat template reduction

The param-endpoint guard in newConversation reduced a new chat's template to
{ endpoint } only, dropping the chatProjectId injected by the Model Selector /
@ switch — so switching models cleared the project scope. Keep chatProjectId
in the reduced template.

* style: align chat-history panel top padding; improve projects page contrast

- Add pt-2 to the chat-history panel so its top spacing matches the other
  side panels (agent builder, skills, files, etc.)
- Projects grid + workspace now use the darkest surface for the page
  (surface-primary) with cards, inputs, and the composer one step lighter
  (surface-secondary) and tertiary on hover, so cards read as elevated
  rather than darker than the background

* feat: interactive project landing chip + gallery icon for all-projects

- All-projects sidebar button uses the gallery-vertical-end icon
- The project landing chip is now interactive: click it to switch projects
  via a searchable combobox (ControlCombobox), or the trailing × to drop the
  project scope. Both update the draft conversation and the ?projectId search
  param in place, so the typed message and selected model are preserved

* test: fix Conversations unit test for refactored sidebar; add projects e2e

- Update Conversations.test.tsx mocks for the inline Chats header
  (useNewConvo, useQueryClient, conversation atom, NewChatIcon, TooltipAnchor),
  drop the removed chatsHeaderControls prop, and remove the mock for the
  deleted ../Header module — fixes the failing frontend Jest job
- Add e2e/specs/mock/projects.spec.ts covering project creation, the
  project-scoped new-chat landing + interactive chip (switch/remove), and
  listing projects on /projects
- Give the landing chip combobox a stable selectId for reliable targeting

* fix: refresh project stats after project-chat activity; stabilize e2e

- useEventHandlers: when a project chat is created/updated, invalidate the
  live [projects] query (gated on chatProjectId) instead of the now-unused
  projectConversations key, so the sidebar + all-projects stats refresh
  after a streamed reply (addresses a Codex finding)
- projects e2e: assert the reliable project-landing behavior (chip, scoped
  composer, accepted send) rather than the /c/:id transition, which the
  mock LLM harness doesn't complete

* test: verify a project chat saves and is filed under its project (e2e)

- Switch to a mock endpoint before sending so the message streams without a
  real API key (the default model failed with "No key found", so no chat was
  saved and the page never left /c/new); this also asserts the project chip
  survives the model switch
- Restore the reply + /c/:id transition assertions and add a check that the
  chat is listed under the expanded project in the sidebar
- Add data-testid="project-chats-<id>" to the inline project chat list

* fix: address Codex review findings (project scope edge cases)

- useSelectMention: fall back to the conversation's chatProjectId when the
  URL has no projectId, so switching model/spec inside an existing project
  chat (/c/:id) keeps the project assignment
- Conversations: include chatProjectId in the MemoizedConvo comparator so a
  sidebar row's project menu doesn't stay stale after a reassignment
- useDeleteProjectMutation: clear the active conversation's chatProjectId
  when its project is deleted (mirrors the assignment mutation); drop the
  now-dead projectConversations invalidation
- useQueryParams: carry the project into the new conversation when applying
  URL settings, so /c/new?projectId=...&<settings> stays scoped

* fix: project stats pagination + archived-chat edge cases (data-schemas)

- listChatProjects: include the null lastConversationAt bucket in the desc
  cursor so empty projects paginate (a $lt:<date> predicate excluded nulls,
  hiding chat-less projects from "Load more")
- saveConvo: recompute project stats instead of the incremental fast path
  when the saved conversation is itself archived/temporary/expired, so a
  project's lastConversationAt/Id no longer points at a hidden chat

* test: cover chat-less project pagination across the dated→null boundary

* fix: validate project ownership in bulkSaveConvos

Bulk paths (import/duplicate/fork) persisted whatever chatProjectId the
payload carried; an id that does not belong to the user created an orphan
assignment hidden from both the project and the unassigned sidebar. Validate
ownership like saveConvo and strip un-owned project ids before persisting,
refreshing stats only for owned projects.

* fix(projects): preserve chatProjectId on continuation, basename-safe delete redirect, project-detail invalidation

* fix(projects): navigate project workspace chats via useNavigateToConvo to avoid stale conversation state

* fix(projects): include projectConversations cache when resolving deleted chat's project for detail invalidation

* fix(projects): refresh both projects when a save or bulk write moves a chat between them

* style(projects): use Folders icon for the sidebar Projects header

* fix(projects): require id on ProjectUser so ProjectRequest extends Express Request cleanly

* style(projects): taller project chip with hover-revealed remove button, upward combobox; sort en translations

* style(projects): show endpoint/agent icon for project workspace chat rows
2026-06-03 15:29:18 -04:00
Danny Avila
2ef7bdfbc2
feat: Immediate Conversation Title Generation (#13395)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
GitNexus Index / index (push) Waiting to run
GitNexus Index / post-index (push) Blocked by required conditions
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions
Sync Helm Chart Tags / Ignore non-main push (push) Waiting to run
Sync Helm Chart Tags / Sync chart tags (push) Waiting to run
*  feat: Immediate Conversation Title Generation

Generate conversation titles as soon as the request is made (in parallel
with the response, from the user's first message) as the new default,
fixing the #13318 race where a transient /gen_title 404 left new chats
stuck on "New Chat".

- Add per-endpoint `titleTiming` ('immediate' | 'final') to baseEndpointSchema;
  `endpoints.all` acts as the global default, unset = immediate. Resolve via
  a new `resolveTitleTiming` helper (`all` takes precedence).
- Fire title generation in parallel with `sendMessage`; `titleConvo` waits
  (bounded, abortable) for the agent run and titles from the user input only.
  Persist after the conversation row exists; defer `disposeClient` until the
  title settles.
- Expose `titleGenerationTiming` via startup config; `useTitleGeneration`
  fetches eagerly in immediate mode with a bounded 404 retry and never treats
  a transient 404 as final. Skip title queueing for temporary conversations.
- Supersedes #13329 while incorporating its bounded 404-retry.

* 🩹 fix: Address Copilot review findings on title timing

- Guard against an undefined conversationId in addTitle (skip + warn) so the
  gen_title cache key can't collide as `userId-undefined` and saveConvo is
  never called without a conversationId.
- Gate the title `useQueries` on `enabled` so no /gen_title request fires while
  unauthenticated (e.g. after logout) even if the module queue holds IDs.
- Drop the stale `conversationId` param from the titleConvo JSDoc.
- Add a regression test for the undefined-conversationId guard.

* 🧵 fix: Harden immediate-title edge cases from codex review

- Cancel in-flight immediate title generation when the request aborts: thread
  job.abortController.signal through addTitle so pressing Stop on a new chat
  neither consumes the title model nor surfaces a title for a cancelled turn.
- Preserve a locally-applied title when the final SSE event's conversation
  carries no title yet (built before the title was saved), so long immediate-mode
  responses no longer revert the chat to "New Chat" until reload.
- Guarantee one full post-completion gen_title fetch cycle before giving up, so a
  `final`-mode title (generated only after the stream ends) is still fetched under
  a global `immediate` default instead of being stranded.
- Add regression tests for the abort propagation and the undefined-conversationId guard.

* 🔁 fix: Correct title abort, post-completion refetch, and replacement ordering

Follow-up to codex review of the immediate-title fixes:

- Use a dedicated title AbortController instead of `job.abortController`. The
  latter is also aborted by `completeJob` on *successful* completion, which
  cancelled any title slower than a short response. The title is now cancelled
  only on a real user Stop or when the stream is replaced; a completed-then-
  aborted title is discarded (no save, cache cleared) rather than persisted.
- Reset (not remove) the post-completion title query: `resetQueries` refetches
  the mounted observer with a fresh retry budget, whereas `removeQueries` left it
  stuck in its error state, so the promised post-completion cycle never ran.
- Run the job-replacement check before resolving `convoReady`, and on a replaced
  stream cancel/discard the stale title so a discarded prompt can't persist a title.

* 🧷 fix: Tighten title abort ordering and endpoint-level timing resolution

Follow-up to codex review:

- Abort the title controller before resolving `convoReady` on a stopped turn, so
  the title task can't resume and persist before the later abort.
- Cancel the title and unblock its waits on ANY send failure (not just user
  aborts): a preflight/quota failure before the run exists otherwise hangs
  `_waitForRun`, deferring client disposal until the 45s title timeout.
- Resolve `titleTiming` for custom endpoints via `getCustomEndpointConfig`
  (their config lives under `endpoints.custom[]`, not `endpoints[endpoint]`).
- Derive the startup `titleGenerationTiming` via `resolveTitleTiming` for the
  agents endpoint so an endpoint-level `final` (without `endpoints.all`) is honored
  client-side instead of defaulting to immediate and burning eager gen_title polls.

* 🪢 fix: Per-agent title timing and safer abort/replacement handling

Follow-up to codex review:

- Resolve `titleTiming` from the agent's actual endpoint after initialization, so a
  per-endpoint `final` override on a custom/provider endpoint backing an (ephemeral)
  agent is honored instead of always using the `agents` endpoint's value.
- Don't preserve a locally-fetched title on a stopped (unfinished) turn: the server
  cancels and discards that title, so keeping it client-side would diverge from
  server state and leave the stopped chat titled until reload.
- On abort/replacement, only delete the cached title if it still holds THIS task's
  value — a replacement stream shares the `userId-conversationId` key and may have
  already cached its own valid title that must not be removed.

* 🪞 fix: Mirror AgentClient title-config resolution for titleTiming

Per maintainer guidance, keep titleTiming resolution identical to how
`AgentClient#titleConvo` already resolves the endpoint config — `endpoints.all`
is the intended global override and the agent's actual provider endpoint is used:

- Resolve via `endpoints.all ?? endpoints[endpoint] ?? getProviderConfig(endpoint)
  .customEndpointConfig` (was using `getCustomEndpointConfig` directly). Going
  through `getProviderConfig` picks up its case-insensitive fallback for normalized
  provider names (e.g. `openrouter` → `OpenRouter`), so a custom endpoint's
  `titleTiming` is honored like its other title settings.
- Add `titleTiming` to the Azure endpoint schema `.pick()` so
  `endpoints.azureOpenAI.titleTiming` is no longer silently stripped by Zod.

Note: per-endpoint title settings being skipped when `endpoints.all` is present is
the existing, intended global-override behavior — not changed here.

* 🧪 test: Cover useTitleGeneration effect logic (integration)

Adds a deterministic white-box integration test that drives the real hook's
React effects with a controllable react-query surface, locking down the
stateful decisions that previously had no coverage:

- immediate mode fetches a queued conversation while its stream is still active
- final mode gates until the stream completes, then becomes eligible
- success applies the fetched title to the conversation caches
- a 404 while active defers (removeQueries) instead of giving up
- a 404 after completion forces a fresh fetch via resetQueries (post-completion remount)

* feat: Stream immediate title events

* style: Format title SSE handler

* test: Preserve data-provider exports in OAuth mock

* test: Isolate OAuth route API mock

* test: Keep OAuth callback factory capture

* fix: Replay streamed title events on resume

* fix: Honor agents title timing precedence

* style: Format title timing fixes
2026-06-02 16:40:57 -04:00
Danny Avila
6d9c01927d
🧠 refactor: Replay DeepSeek reasoning_content via OpenRouter (#13368)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
GitNexus Index / index (push) Waiting to run
GitNexus Index / post-index (push) Blocked by required conditions
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions
* 🧠 fix: Replay DeepSeek `reasoning_content` via OpenRouter

DeepSeek's thinking-mode API rejects multi-turn tool-calling requests
unless `reasoning_content` from each tool-bearing assistant message is
replayed verbatim, returning HTTP 400 "The `reasoning_content` in the
thinking mode must be passed back to the API." The agents SDK already
handles this for direct `Providers.DEEPSEEK`, but DeepSeek models routed
via OpenRouter use `Providers.OPENROUTER` — `formatAgentMessages` skipped
the reasoning-preservation branch, and `ChatOpenRouter` left
`includeReasoningContent` unset, so the field silently dropped on every
subsequent turn.

Add `isDeepSeekReasoningProvider(provider, model)` and use it in two
places: (1) `getOpenAILLMConfig` flips `includeReasoningContent: true`
when OpenRouter is dispatching a `deepseek/*` model so the LangChain
client emits the field on assistant turns that have non-empty
`additional_kwargs.reasoning_content`, and (2) `AgentClient` spoofs the
provider hint to `Providers.DEEPSEEK` when calling
`formatAgentMessages`, triggering the SDK's existing
`preserveReasoningContent` path that re-attaches the field to
reconstructed tool-bearing AIMessages. The downstream
`_convertMessagesToOpenAIParams` is already gated on non-empty
`reasoning_content`, so the flag is a no-op outside thinking mode.

Resolves #13366.

* fix: Harden DeepSeek detection against OpenRouter routing edges

Address three Codex review findings on #13368:

1. Strip OpenRouter's `~` latest-routing prefix before applying the
   DeepSeek model regex. `~deepseek-chat` and `~deepseek/r1` were
   previously left unmatched because the regex's start/`/` boundary
   only saw the `~`. Mirror the SDK's `normalizeOpenRouterModel()`
   here and in `getOpenAILLMConfig`.

2. Add a custom-endpoint fallback: when the model id carries the
   unambiguous `deepseek/...` OpenRouter namespace, accept it
   regardless of the resolved provider. Covers the case where a user
   configures OpenRouter under a non-standard endpoint name and
   `initializeAgent` normalizes the unknown provider to `openai`,
   stranding the spoof. Bare `deepseek-*` ids still require an
   explicit DeepSeek/OpenRouter provider so unrelated endpoints
   labelling a model `deepseek-r1` don't trigger.

3. Inspect every agent in `this.agentConfigs` when deciding whether
   to spoof the format provider. Multi-agent handoff runs feed all
   agents' messages through one `formatAgentMessages` call, so a
   DeepSeek handoff under a non-DeepSeek primary previously lost its
   persisted reasoning_content too.

Also addresses Copilot's review note: only pass the options object
to `formatAgentMessages` when the DeepSeek spoof is actually needed,
preserving the pre-fix behavior for everyone else.

* fix: Extend DeepSeek reasoning_content fix to OpenAI-compat agent paths

Address two more Codex P2 findings on #13368:

1. `getOpenAILLMConfig` no longer gates `includeReasoningContent` on
   `useOpenRouter`. Any DeepSeek-style model id (with `~` latest-routing
   prefix stripped) is sufficient. This re-aligns the LLM gate with
   `AgentClient`'s formatter spoof, which already treats a `deepseek/*`
   id as authoritative — so a custom-named OpenRouter endpoint or a
   DeepSeek-compatible proxy gets the field both attached to history AND
   serialized to the wire. Direct `ChatDeepSeek` ignores the flag (its
   own conversion path hardcodes `includeReasoningContent: true`), so
   this is a harmless no-op there.

2. Thread the same `Providers.DEEPSEEK` formatter hint through
   `api/server/controllers/agents/openai.js` and `responses.js` (the
   OpenAI-/Responses-compatible serving paths). Without it those paths
   restored `additional_kwargs.reasoning_content` only in `AgentClient`
   while the LLM config flipped `includeReasoningContent` on for them
   too — so DeepSeek tool turns served from those endpoints would still
   ship requests with the flag set but no field present, hitting the
   same second-turn 400. The `needsDeepSeekFormatHint` helper in
   `openai.js` mirrors `AgentClient`'s per-agent check.

* fix: Tighten DeepSeek detection and cover handoff sub-agents

Address four more Codex P2 findings on #13368:

- Tighten the DeepSeek model regex to `^deepseek(?:[-/]|$)/i` (anchored
  to start). Rejects cloned/distilled slugs like
  `mistral/deepseek-distilled-foo` and `community/deepseek-r1` that
  previously matched via the `(?:^|/)` alternation, which could attach
  the DeepSeek-only `reasoning_content` field on proxies that don't
  accept it.

- Anchoring also collapses the namespace-only fallback into the same
  pattern, so bare `deepseek-chat` / `deepseek-reasoner` on a
  custom OpenAI-compatible DeepSeek proxy are now recognized — fixing
  the asymmetry where `getOpenAILLMConfig` would flip
  `includeReasoningContent` for those bare ids but `AgentClient`
  wouldn't pass the formatter hint.

- Extend `needsDeepSeekFormatHint` in `openai.js` (and the inline
  check in `responses.js`) to walk `handoffAgentConfigs` too. In
  multi-agent runs where the primary isn't DeepSeek but a connected
  handoff agent is, the SDK's `formatAgentMessages` previously dropped
  the handoff's persisted reasoning_content before the next tool turn,
  preserving the 400 the PR was meant to prevent.

- Mirror the regex change in `getOpenAILLMConfig`.

Out of scope: the OpenAI-compatible serving paths still don't
preserve incoming `reasoning_content`/`reasoning` fields in
`convertMessages`, nor does the Responses API persist reasoning in
`saveResponseOutput`. Those are deeper persistence/conversion fixes
worth a separate PR.

* test: Allow includeReasoningContent for Azure-serverless DeepSeek

CI surfaced a backward-compat expectation that snapshotted the
pre-fix behavior. Azure-serverless DeepSeek deployments (e.g.
`DeepSeek-R1`) forward to the same DeepSeek thinking-mode tool-call
contract, so the LLM gate now correctly flips
`includeReasoningContent: true` for them too. The downstream
gate on a non-empty `additional_kwargs.reasoning_content` keeps
this a no-op outside thinking mode.

* chore: Trim noisy comments

Per CLAUDE.md ("self-documenting code; no inline comments narrating
what code does"), strip the multi-paragraph rationale that crept into
the DeepSeek reasoning_content fix. The commit history and PR
description carry the why; the code says the what.

Keeps one single-line JSDoc on `isDeepSeekReasoningProvider` (linking
to the DeepSeek docs) and a `(#13366)` tag on each opt-in site so
future readers can find the context.

* revert: Drop non-functional DeepSeek hint from OpenAI-compat serving paths

Codex's later review passes correctly flagged that threading the
DeepSeek formatter hint through openai.js (`/v1/chat/completions`) and
responses.js (`/v1/responses`) doesn't actually fix the second-turn
400 in those paths. Empirical check against the real SDK confirmed the
gap is deeper and pre-existing:

  formatAgentMessages(payload, ..., { provider: DEEPSEEK })

where payload is the `convertMessages`/`convertInputToMessages` output
shape (string content + TOP-LEVEL `tool_calls`) produces NO tool-bearing
AIMessage at all — `formatAssistantMessage` only reconstructs tool calls
from `tool_call`-typed *content parts*, never a top-level `tool_calls`
field. So those serving paths don't reconstruct tool-call history (let
alone reasoning) regardless of the hint. The Responses persistence layer
likewise stores only output text, not tool calls or reasoning.

Making those paths work requires reworking the wire->internal message
conversion (and Responses persistence) to emit content-part arrays — a
broad, pre-existing concern beyond this issue and risky to land here.
Rather than ship a hint that looks like a fix but is inert, revert the
serving-path changes and scope this PR to the validated AgentClient
chat path (the actual surface in #13366).

Reverts the openai.js/responses.js threading and their spec mocks to
main. Keeps the AgentClient fix, `isDeepSeekReasoningProvider`, the
`getOpenAILLMConfig` flag, and the type.
2026-05-28 22:10:49 -07:00
Danny Avila
68eac104ad
🗂️ fix: Scope Handoff Agent Context Docs (#13167)
* fix: Scope agent context docs to handoff agents

* fix: Deduplicate scoped request context

* refactor: Extract agent attachment helpers
2026-05-18 15:36:22 -04:00
Danny Avila
d90567204e
🛟 fix: persist Vertex Gemini 3 thoughtSignatures across DB round-trips (#13026)
When a tool round-trip is interrupted between the tool result and the
model's text reply (user aborted, network drop, pod restart, ...) and
LibreChat persists the partial assistant message, the next conversation
turn reconstructs an `AIMessage` from `formatAgentMessages` that has
`tool_calls` populated but no `additional_kwargs.signatures`. Vertex
Gemini 3 rejects the resumed request with 400 because the most recent
historical functionCall has no `thought_signature`.

## Storage shape

Capture as `Record<tool_call_id, signature>` rather than a flat array.
This addresses the codex P1 review:

  > When an assistant turn contains multiple sequential tool-call batches,
  > this restoration path writes all persisted thoughtSignatures onto only
  > the last tool-bearing AIMessage. Vertex/Gemini validates signatures
  > for each step in the current tool-calling turn, so earlier
  > functionCall steps reconstructed without their signature can still
  > fail with 400.

A single agent run can fire multiple `chat_model_end` events when the
loop cycles the LLM with intervening tool results — each cycle owns a
distinct `tool_call_id`. Per-id storage maps each signature back onto
the right reconstructed `AIMessage`, not just the last one.

## Mapping

`additional_kwargs.signatures` is a flat array indexed by *response part*
(text + functionCall interleaved). `tool_calls` is just the function
calls in their original order. Non-empty signatures correspond 1:1 with
tool_calls in order — see `partsToSignatures` in
`@langchain/google-common`. Single-pass walk maps `signatures[i]` (when
non-empty) onto the i-th `tool_call.id`.

## Pipeline

| Stage | File | Change |
|---|---|---|
| Capture | callbacks.js | `ModelEndHandler` accepts `Record<string,string>` map; walks signatures + tool_calls in tandem to record per-id. Gated on the map being provided — non-Vertex flows are no-op (and also no-op even when provided, since they don't emit signatures). |
| Plumbing | initialize.js | Allocate `collectedThoughtSignatures = {}`, share with handler + client. Always allocated; the JSDoc explicitly documents that it stays empty for non-Vertex providers. |
| Surface | client.js | `sendCompletion` returns `metadata.thoughtSignatures` when the map has entries; falls through unchanged when empty. |
| Persist | (existing BaseClient.handleRespCompletion) | Writes `metadata` from `sendCompletion` onto `responseMessage.metadata`. Mongoose `Mixed` — no migration. |
| Restore | formatMessages.js | Track every tool-bearing AIMessage produced from a TMessage. For each, build a position-aligned `additional_kwargs.signatures` array (empty placeholders for tool_calls without a stored sig). Agents' `fixThoughtSignatures` dispatches non-empty entries to functionCall parts in order. |

## Live verification

- **Single-step:** real Vertex `gemini-3.1-flash-lite-preview` resume-after-tool case. With fix  / without  400.
- **Multi-step (codex case):** real two-step agent loop (list /tmp → echo done). Each step's signature attaches to its own reconstructed AIMessage. With fix  / without  400.
- **Cross-provider:** Anthropic Claude haiku-4.5 + OpenAI gpt-5-mini accept the persisted/restored shape unchanged.

## Tests

`modelEndHandler.spec.js` (new) — 6 tests:
- maps non-empty signatures onto tool_call_ids in order
- accumulates per-id across multiple `model_end` events (multi-step)
- no-op when `collectedThoughtSignatures` is null
- no-op when `signatures` field missing (non-Vertex)
- no-op when `tool_calls` missing
- preserves existing `collectedUsage` array contract

`formatAgentMessages.spec.js` — 6 new tests:
- restores onto the AIMessage that owns the tool_call
- per-step attachment for multi-step turns (codex review case)
- preserves tool_call ordering when signatures are partial
- no-op when metadata.thoughtSignatures absent
- no-op when assistant has no tool_calls
- no-op when stored ids don't match any current tool_call

37 passing across 3 suites; 15 existing formatAgentMessages tests unchanged.

## Compatibility

- Backward-compatible — restore gated on `metadata.thoughtSignatures` being a populated object; capture gated on the map being provided.
- No schema migration — uses `Message.metadata: Mixed` already in place.
- Cross-provider safe — non-Vertex providers tolerate the field (verified live against Anthropic + OpenAI converters).
- Pairs with [agents#159](https://github.com/danny-avila/agents/pull/159) for full coverage on histories that mix plain-text and toolcall AIMessages.
2026-05-08 18:51:34 -04:00
Danny Avila
93c4ef4ba8
🧱 refactor: typed CodeEnvRef + kind discriminator + principal-aware sandbox cache (#12960)
* 🧱 refactor: typed CodeEnvRef + kind discriminator + tenant-aware sandbox cache

Final cutover for the LibreChat ↔ codeapi sandbox file identity. Replaces
the magic string `${session_id}/${file_id}?entity_id=...` with a typed,
discriminated `CodeEnvRef`. Pre-release lockstep deploy with codeapi
#1455 and agents #148; no legacy aliases retained.

## Final shape

```ts
type CodeEnvRef =
  | { kind: 'skill'; id: string; storage_session_id: string; file_id: string; version: number }
  | { kind: 'agent'; id: string; storage_session_id: string; file_id: string }
  | { kind: 'user';  id: string; storage_session_id: string; file_id: string };
```

`kind` drives codeapi's sessionKey: `<tenant>:<kind>:<id>[✌️<version>]`
for shared kinds, `<tenant>:user:<userId>` for user-private (auth context
provides `userId`). `version` is statically required for `kind: 'skill'`
and forbidden otherwise via discriminated union — constraint holds at
compile time on every consumer, not just codeapi's runtime validator.

`id` is sessionKey-meaningful for `'skill'` / `'agent'`; informational
only for `'user'` (codeapi resolves user identity from auth context).

## What changed

- `packages/data-provider/src/codeEnvRef.ts` — discriminated union +
  `CODE_ENV_KINDS` const-tuple keeps the runtime list and TS union
  locked together.
- Schemas: `metadata.codeEnvRef` and `SkillFile.codeEnvRef` enums
  tightened to `['skill', 'agent', 'user']`.
- `primeSkillFiles` writes `kind: 'skill'`, `id: skill._id`,
  `version: skill.version`. Cache-hit path reads `codeEnvRef`
  directly. Bumping `skill.version` on edit naturally invalidates
  the prior cache entry under the new sessionKey.
- `processCodeOutput` writes `kind: 'user'`, `id: req.user.id`. Output
  bucket is always user-scoped, regardless of which skill the
  execution invoked. New regression test pins the asymmetry.
- `primeFiles` reupload preserves `kind`/`id`/`version?` from the
  existing ref so a skill-cache-miss reupload doesn't silently demote
  to user bucket.
- `crud.js` upload functions (`uploadCodeEnvFile` /
  `batchUploadCodeEnvFiles`) thread `kind`/`id`/`version?` to the
  multipart form (codeapi #1455 option α). Without these on the wire,
  codeapi falls back to user bucketing and skill-cache invalidation
  never fires. Client-side validation mirrors codeapi's validator.
- `Files/process.js` — chat attachments use `kind: 'user'`; agent
  setup files use `kind: 'agent'`.
- Drops `entity_id` everywhere (struct, schema sub-docs, write paths,
  upload form fields). Drops `'system'` from the kind enum (no emitter
  ever existed).

## Test plan

- [x] `cd packages/data-provider && npx jest src/codeEnvRef.spec` — 4 / 4
- [x] `cd packages/data-schemas && npx jest` — 1447 / 1447
- [x] `cd packages/api && npx jest src/agents` — 81 / 81 in skillFiles +
  handlers + resources
- [x] `cd api && npx jest server/services/Files server/controllers/agents` —
  436 / 436
- [x] `cd api && npx jest server/services/Files/Code` — 98 / 98 (incl.
  new "outputs are user-scoped regardless of which skill the execution
  invoked" regression and "reupload forwards kind/id/version from
  existing ref")
- [x] `npx tsc --noEmit -p packages/data-{provider,schemas}/tsconfig.json
  && npx tsc --noEmit -p packages/api/tsconfig.json` — clean (only
  pre-existing unrelated dev errors in storage/balance, untouched here)

## Deploy notes

- **24h cache-miss burst** on first deploy. Inputs (skill caches re-prime
  under new sessionKey shape) and outputs (any pre-Phase C skill-output
  cached files become unreadable). Bounded by codeapi's 24h TTL.
- **Lockstep with codeapi #1455 and agents #148.** Either repo can land
  first since no aliases to drain, but the three deploys must overlap
  within the same maintenance window.
- **`@librechat/agents` bump to `3.1.79-dev.0`** required after agents
  #148 lands and is published.

## What this enables

Auth bridge work (JWT-based tenant/user identity between LC and codeapi)
— codeapi now derives sessionKey purely from `req.codeApiAuthContext.{
tenantId, userId}`, so the next chapter is replacing the header-asserted
user identity with a verified-claim path.

* 🩹 fix: persist execute_code uploads under codeEnvRef metadata key

Codex review P1 (chatgpt-codex-connector). `Files/process.js` was
storing the upload result under `metadata.fileIdentifier` even though:
- `uploadCodeEnvFile` now returns `{ storage_session_id, file_id }`,
  not the legacy magic string.
- The post-cutover schema (`File.metadata.codeEnvRef`) only declares
  `codeEnvRef` — mongoose strict mode silently strips unknown keys.
- All readers (`primeFiles`, `getCodeFilesByIds`,
  `categorizeFileForToolResources`, controller filtering) check
  `metadata.codeEnvRef`.

Net effect of the bug: chat-attached and agent-setup execute_code files
would lose their sandbox reference on save, and primeFiles would skip
them on subsequent code-execution turns — the file blob would still be
available locally but never re-mounted in the sandbox.

Fix: construct the full `CodeEnvRef` (`{ kind, id, storage_session_id,
file_id }`) at the write site and persist under `metadata.codeEnvRef`.
`BaseClient`'s "is this a code-env file" presence check accepts the new
shape alongside the legacy `fileIdentifier` for back-compat with any
pre-cutover records still in the database. Mirrors the same change in
`processAttachments.spec.ts` (which re-implements the BaseClient logic
for testability).

New regression tests in `process.spec.js` cover three cases:
- chat attachments (`messageAttachment=true`) → `kind: 'user'`
- agent setup (`messageAttachment=false`) → `kind: 'agent'`
- legacy `fileIdentifier` key is NOT persisted (would be schema-stripped)

* 🩹 fix: read storage_session_id on primed file refs (Codex P1)

Codex review (chatgpt-codex-connector). After Phase B's per-file
`session_id` → `storage_session_id` rename, `primeFiles` emits the
new field — but `seedCodeFilesIntoSessions` was still reading
`files[0].session_id` for the representative session and `f.session_id`
for the dedupe key. In runs with only primed attachments (no skill
seed), `representativeSessionId` was `undefined`, the function
returned the unchanged map, and `seedCodeFilesIntoSessions` silently
dropped the entire batch. The first `execute_code` call then started
without `_injected_files` and the agent couldn't see prior-turn
artifacts.

Fix:
- `codeFilesSession.ts`: read `f.storage_session_id` for both the
  dedupe key and the representative session id. JSDoc updated to
  match the new field name.
- `callbacks.js`: the two output-file persistence paths read
  `file.session_id` to pass to `processCodeOutput` — switch to
  `file.storage_session_id`. The original comment explicitly says
  this should be the STORAGE session, which is exactly the field
  Phase B renamed.
- `codeFilesSession.spec.ts`: fixture builder uses `storage_session_id`
  and `kind: 'user'` to match the post-cutover `CodeEnvFile` shape.

Lockstep coordination: this matches the post-bump shape of
`@librechat/agents` 3.1.79+. CI tsc errors against the currently-pinned
3.1.78 are expected and resolve when the dep bumps in this PR before
merge.

* 📦 chore: Bump `@librechat/agents` to version 3.1.80-dev.0 in package-lock and package.json files

* 🪪 fix: thread kind/id/version through codeapi /download URLs (Phase C α)

Symmetric fix for the upload-side wire change in 537725a. Codeapi's
`sessionAuth` middleware now requires `kind`/`id`/`version?` on every
download/freshness URL — without them it 400s with "kind must be one
of: skill, agent, user" before serving the file.

Three sites construct codeapi-side URLs that go through `sessionAuth`:

- `processCodeOutput` (`Files/Code/process.js`): `/download/<sess>/<id>`
  for freshly-generated sandbox outputs. Always `kind: 'user'` +
  `id: req.user.id` — code-output files are always user-private,
  regardless of which skill the run invoked.
- `getSessionInfo` (`Files/Code/process.js`): `/sessions/<sess>/objects/<id>`
  for the 23h freshness check. Pulls kind/id/version straight off the
  `codeEnvRef` already in scope — skill files stay skill-bucketed,
  user files stay user-bucketed.
- `/code/download/:session_id/:fileId` LC route (`routes/files/files.js`):
  proxies to codeapi for manual downloads. Code-output files only on
  this route, so `kind: 'user'` + `id: req.user.id`.

The `getCodeOutputDownloadStream` helper in `crud.js` now takes an
`identity` param, validated by a `buildCodeEnvDownloadQuery` helper
that mirrors `appendCodeEnvFileIdentity`'s shape rules: kind required
from the closed `{skill, agent, user}` set, version required for
'skill' and forbidden otherwise. Bad callers fail fast on the client
instead of round-tripping a 400.

Also cleans up two log-noise sources reported alongside the 400:

- `logAxiosError` in `packages/api/src/utils/axios.ts` was dumping
  `error.response.data` raw. With `responseType: 'arraybuffer'` that's
  a `Buffer` (~4 chars per byte after JSON-serialization); with
  `responseType: 'stream'` it's a `Readable` whose internal state
  serializes the entire ring buffer + socket. New `renderResponseData`
  decodes small buffers as UTF-8 (truncated past 2KB) and stubs streams
  as `'[stream]'`. Diagnostics stay useful, log lines stop being
  megabytes.
- `/code/download` route's catch was bare `logger.error('...', error)`,
  bypassing the redactor. Switched to `logAxiosError` so it benefits
  from the same buffer/stream handling.

Tests updated to match the new contract:
- crud.spec: `getCodeOutputDownloadStream` fixtures pass `userIdentity`;
  new cases cover skill identity (with version), bad kind rejection,
  skill-without-version rejection.
- process.spec: `getSessionInfo` test passes a full `codeEnvRef` object.

* ♻️ refactor: extract codeEnv identity helpers into packages/api

Per the project convention that new backend code lives in TypeScript
under `packages/api`, moves `appendCodeEnvFileIdentity` and
`buildCodeEnvDownloadQuery` from `api/server/services/Files/Code/crud.js`
into a new `packages/api/src/files/code/identity.ts` module.

Both helpers are pure validators that mirror codeapi's
`parseUploadSessionKeyInput` server-side rules (closed kind set,
`version` required for `'skill'` and forbidden otherwise) — they
deserve TS support and a dedicated spec rather than living as
JSDoc-typed helpers in the legacy `/api` workspace. The new module:

- Exports a `CodeEnvIdentity` interface using the
  `librechat-data-provider` `CodeEnvKind` discriminated union.
- Adds 13 unit tests in `identity.spec.ts` covering the validation
  matrix (skill+version, agent, user, and every rejection path) plus
  URL encoding for the download query.
- Re-exported from `packages/api/src/files/code/index.ts` alongside
  `classify`, `extract`, and `form`.

Consumer updates:
- `api/server/services/Files/Code/crud.js`: drops the local helpers
  and imports them from `@librechat/api`. Net -64 lines.
- `api/server/services/Files/Code/process.js`: same.
- Test mocks for `@librechat/api` in three spec files now stub the
  helpers' validation behavior locally rather than pulling them
  through `requireActual` (which would drag in provider-config
  init-time side effects). The package's `exports` field only
  surfaces the root barrel, so leaf imports aren't reachable from
  legacy `/api` test setup.

No runtime behavior change. Identity validation rules and emitted
form/query shapes are byte-for-byte identical pre/post.

* 🪪 fix: emit resource_id alongside id on _injected_files (skill 403 fix)

Companion to codeapi #1455 fix and agents 3.1.80-dev.1 — the wire
shape for shared-kind files now requires `resource_id` distinct from
the storage `id`. Without this LC change, codeapi's sessionKey
re-derivation on every shared-kind /exec rejects with 403
session_key_mismatch:

    cached:  legacy:skill:69dcf561...✌️59  (signed at upload, skill _id)
    derived: legacy:skill:ysPwEURuPk-...✌️59  (storage nanoid)

Emit sites updated:

- `primeInvokedSkills` cache-hit path: `resource_id: ref.id` (the
  persisted skill `_id` from `codeEnvRef.id`); `id: ref.file_id`
  unchanged (storage uuid).
- `primeInvokedSkills` fresh-upload path: `resource_id: skill._id.toString()`
  on every primed file (the `allPrimedFiles` builder type now carries
  the field).
- `processCodeOutput`'s `pushFile` (Code/process.js): `resource_id: ref.id`
  — for `kind: 'user'` this is informational (codeapi derives
  sessionKey from auth context) but emitted for shape uniformity
  with shared kinds.

Bumps `@librechat/agents` to `^3.1.80-dev.1` (the version that
ships the matching `CodeEnvFile.resource_id` field).

## Test plan

- [x] `cd packages/api && npx jest src/agents` — 67 / 67 pass
  (skillFiles fixtures updated to assert `resource_id` on the
  emitted CodeSessionContext.files).
- [x] `cd api && npx jest server/services/Files server/controllers/agents` —
  445 / 445 pass (process.spec fixtures updated for the reupload
  + cache-hit emission).
- [x] `npx tsc --noEmit -p packages/api/tsconfig.json` — clean.

* fix(skill-tool-call): carry resource_id through primeSkillFiles → artifact

Codeapi was 400ing every /exec following a `handle_skill` tool call
with `resource_id is invalid` (`type: 'undefined'`). Both code paths
in `primeSkillFiles` (cache-hit + fresh-upload) returned files
without `resource_id`/`kind`/`version`, and the artifact in
`handlers.ts` forwarded the stripped shape into
`tc.codeSessionContext.files` → `_injected_files`.

`primeInvokedSkills` (the NL-detected loader) had already been fixed
end-to-end; this commit aligns the tool-invoked path with the same
contract: `resource_id` = `skill._id.toString()`, `kind: 'skill'`,
`version` = the skill's monotonic counter.

Tests added to `skillFiles.spec.ts` lock the contract on
`primeSkillFiles` directly so future refactors can't silently drop
the resource identity again.

* fix(handlers.spec): align session_id → storage_session_id rename + kind discriminator

Pre-existing TS errors against the post-rename `CodeEnvFile` shape:
the test file still used `session_id` on per-file objects (renamed to
`storage_session_id` in agents Phase B/C) and was missing the `kind`
discriminator the discriminated union requires. Both inputs and the
matching `expect.toEqual(...)` mirrors updated together so the
runtime equality check still holds.

Lines 723-732 stay as-is — they sit behind `as unknown as
ToolCallRequest` and TS already skipped them.

* chore: fix `@librechat/agents`, correct version to 3.1.80-dev.0 in package.json files

* chore: bump `@librechat/agents` to version 3.1.80-dev.1 in package.json and package-lock.json

* chore: bump `@librechat/agents` to version 3.1.80-dev.2

* feat(observability): trace file priming chain from primeCodeFiles to _injected_files

Diagnosing the user-upload "files=[] on first /exec" bug requires
seeing where in the LC chain a file ref disappears. Prior to this
patch the chain (primeCodeFiles → primedCodeFiles → initialSessions
→ CodeSessionContext → _injected_files) was opaque end-to-end:
  - primeCodeFiles silently dropped files without `metadata.codeEnvRef`
  - reuploadFile catches all errors and continues with no signal
  - the handlers.ts handoff to codeapi never logged what it was sending

After this patch, a single grep on `[primeCodeFiles]` plus
`[code-env:inject]` shows the full per-file path:

  [primeCodeFiles] in: file_ids=N resourceFiles=M
  [primeCodeFiles] file=<id> path=skip reason=no-codeenvref filename=...
  [primeCodeFiles] file=<id> path=cache-hit-by-session storage_session_id=...
  [primeCodeFiles] file=<id> path=reupload reason=no-uploadtime ...
  [primeCodeFiles] file=<id> path=reupload reason=stale ...
  [primeCodeFiles] file=<id> path=reupload-success oldSession=... newSession=... newFileId=...
  [primeCodeFiles] file=<id> path=reupload-failed session=...
  [primeCodeFiles] file=<id> path=fresh-active storage_session_id=...
  [primeCodeFiles] out: returned=N skippedNoRef=M reuploadFailures=K

  [code-env:inject] tool=<name> files=N missingResourceId=K     (debug)
  [code-env:inject] M/N files missing resource_id ...           (warn)
  [code-env:inject] tool=<name> _injected_files=0 ...           (warn)

The boundary log warns when LC sends zero injected files on a
code-execution tool call — that's the user's actual symptom showing
up at the LC side instead of having to correlate against codeapi's
`Request received { files: [] }`.

Tag chosen as `[code-env:inject]` rather than `[handoff:exec]` to
avoid collision with the app-level "handoff" semantic (subagent
handoff workflow).

Structural cleanup in primeFiles: replaced the `if (ref) { ... }`
nesting with an early `if (!ref) continue` so the per-path
instrumentation hooks land at top-level scope instead of indented
inside a conditional. Behavior unchanged; pushFile / reuploadFile
identical.

Spec fixtures (handlers.spec.ts, codeFilesSession.spec.ts) updated
to include `resource_id` on `CodeEnvFile` literals — required by
the post-3.1.80-dev.2 type now installed.

## Test plan

- [x] `cd packages/api && npx jest src/agents/handlers.spec.ts src/agents/codeFilesSession.spec.ts src/agents/skillFiles.spec.ts` — 69/69 pass
- [x] `cd api && npx jest server/services/Files/Code/process.spec.js` — 84/84 pass
- [x] `npx tsc --noEmit -p packages/api` — clean
- [x] `npx eslint` on all four touched files — clean

* chore: add CONSOLE_JSON_STRING_LENGTH to .env.example for JSON log string length configuration

* fix(files): align codeapi upload filename with LC's sanitized DB filename

User-attached files for code execution were uploading to codeapi
under `file.originalname` (raw upload filename, may contain spaces /
special chars) while LC's DB record stored the sanitized form
(`sanitizeFilename(file.originalname)`, underscores). Codeapi
preserves whatever filename the upload sent, so the sandbox saw
`/mnt/data/<originalname>` while LC's `primeFiles` toolContext text
+ `_injected_files.name` referenced `file.filename` (sanitized).

Visible failure: agent gets system prompt saying

    /mnt/data/librechat_code_api_-_active_customer_-_2025-11-05.xlsx

…tries that path, hits `FileNotFoundError`, then notices the
sandbox's actual `Available files` line says

    /mnt/data/librechat code api - active customer - 2025-11-05.xlsx

…retries with spaces, succeeds. Wastes a tool call per upload and
leaks raw filenames into model context.

Fix: sanitize once and use the sanitized form in both the codeapi
upload AND the LC DB record. Sandbox path = LC toolContext text =
in-memory ref name. No drift.

Reupload path (`Code/process.js` line 867 `filename: file.filename`)
already uses the sanitized DB name, so it stays consistent with the
fresh-upload path after this change.

## Test plan

- [x] `cd api && npx jest server/services/Files/process` — 32/32 pass
- [x] `npx eslint` on the touched file — clean

* chore: bump `@librechat/agents` to version 3.1.80-dev.3 in package.json and package-lock.json
2026-05-08 12:29:43 -04:00
Danny Avila
1b79e0b785
🧬 chore: Align LibreChat With Agents LangChain Upgrade (#12922)
* 🔧 chore: Update dependencies in package-lock.json and package.json

- Bump version of @librechat/agents to 3.1.75-dev.0 in multiple package.json files.
- Upgrade various AWS SDK and Smithy dependencies to their latest versions in package-lock.json for improved stability and performance.

* 🔧 chore: Update AWS SDK and Smithy dependencies in package-lock.json

- Bump version of @aws-sdk/client-bedrock-runtime to 3.1041.0 and update related dependencies for improved performance and stability.
- Upgrade various AWS SDK and Smithy packages to their latest versions, ensuring compatibility and enhanced functionality.

* chore: Align LibreChat with agents LangChain upgrade

- Route LangChain imports through @librechat/agents facade exports
- Update @librechat/agents to 3.1.75-dev.1 and remove direct LangChain deps
- Normalize nullable agent model params and API key override typing
- Update Google thinking config typing for newer LangChain packages
- Refresh targeted audit-related dependency overrides

* chore: Add Jest types for API specs

* test: Fix LangChain upgrade CI specs

* test: Exercise agents env facade

* fix: Clean up TS preview diagnostics

* fix: Address Codex review feedback
2026-05-03 12:46:01 -04:00
Danny Avila
f3e1201ae7
📌 fix: Stabilize Agent Prompt Cache Prefix (#12907)
* fix: stabilize agent prompt cache prefix

* chore: refresh agents sdk lockfile integrity

* test: format agent memory assertion

* test: type agent context fixtures

* fix: preserve MCP instruction precedence

* fix: reuse resolved conversation anchor

* fix: keep resumable startup immediate
2026-05-02 09:55:31 +09:00
Danny Avila
74307e6dcc
💭 feat: Require Explicit Auto-agent Enablement for Memories (#12886) 2026-05-01 23:56:08 +09:00
Danny Avila
24e29aa8cb
🌱 fix: Inject Code-Tool Files Into Graph Sessions on First Call (+ read_file Sandbox Fallback) (#12831)
* 🌱 fix: Seed Code Tool Files Into Graph Sessions on First Call

Files attached to an agent's `tool_resources.execute_code` (user uploads
or generated artifacts from a prior turn) were silently dropped on the
first `execute_code` invocation of a turn. The agents-side `ToolNode`
populates `_injected_files` only when its `sessions` map already has an
`EXECUTE_CODE` entry — but that entry is only written by a previous
successful execution, so call #1 had nothing to inject. CodeExecutor
then fell back to a `/files/{session_id}` fetch, but `session_id` was
also empty on call #1, leaving the sandbox without the primed files.

Mirror the existing skill-priming pattern (`primeInvokedSkills` →
`initialSessions`) for code-resource files: eagerly call `primeFiles`
before `createRun` and merge the result into `initialSessions` via a
new `seedCodeFilesIntoSessions` helper. Skill files and code-resource
files now share the same `EXECUTE_CODE` entry; the prior representative
`session_id` is preserved on merge.

* 🔬 chore: Add Diagnostic Logging for Code-Files Seeding

Temporary debug logs to diagnose why first-call file injection is not
firing in real agent runs. Logs `wantsCodeExec`, available tool-resource
keys, primed file count, and the seeded EXECUTE_CODE entry. Will revert
once the failure mode is identified.

* 🪛 refactor: Capture primedCodeFiles per-agent at init, merge across run

Replace the client.js eager `primeFiles` call with a per-agent capture at
initialization time so every agent in a multi-agent run (primary +
handoff + addedConvo) contributes its `tool_resources.execute_code`
files to the shared `Graph.sessions` seed.

- handleTools.js (eager loadTools): the `execute_code` factory closes
  over a `primedCodeFiles` slot and surfaces it in the return.
- ToolService.js loadToolDefinitionsWrapper (event-driven): captures
  `files` from the existing `primeCodeFiles` call (was dropping them
  while only keeping `toolContext`) and surfaces them.
- packages/api initialize.ts: the loadTools callback contract now
  includes `primedCodeFiles`, threaded onto `InitializedAgent`.
- client.js: iterate `[primary, ...agentConfigs.values()]` and merge
  each agent's `primedCodeFiles` into `initialSessions`. Drop the
  primary-only `primeCodeFiles` call and diagnostic logs from the prior
  attempt — wrong layer (single-agent), wrong gate (`agent.tools`
  contained Tool instances after init, so the `.includes("execute_code")`
  string check always failed).

* 🔬 chore: Add per-agent diagnostic logs for code-files seeding

Logs `tool_resources` keys + file counts inside loadToolDefinitionsWrapper
and per-agent `primedCodeFiles` + final initialSessions inside
AgentClient. Will revert once the failure mode is confirmed.

* 🔬 chore: Add file-lookup diagnostics inside initializeAgent

Logs the inputs and intermediate counts of the conversation-file lookup
chain (convo file ids, thread message ids, code-generated and
user-code file counts) so we can pinpoint why `tool_resources.execute_code`
is arriving empty at `loadToolDefinitionsWrapper` despite the agent
having `execute_code` in its tools list.

* 🔬 chore: Probe execute_code files without messageId filter

Adds a relaxed `getFiles({conversationId, context: execute_code})` probe
that runs only when `getCodeGeneratedFiles` returns empty. Lists what's
actually in the DB for this conversation so we can confirm whether the
file is missing entirely or whether the messageId filter is rejecting it.

* 🔬 chore: Fix probe getFiles arg order (sort vs projection)

Probe was passing a projection object as the sort arg, which mongoose
rejected with `Invalid sort value`. Move it to the third arg
(selectFields) so the probe actually runs.

* 🪢 fix: Preserve Original messageId on Code-Output File Update

Each `processCodeOutput` call was overwriting the persisted file's
`messageId` with the *current* run's id. When a turn re-creates an
existing file (filename + conversationId match → `claimCodeFile`
returns the existing record, `isUpdate=true`), the file's link to
the assistant message that originally produced it gets clobbered.

`initializeAgent` later runs `getCodeGeneratedFiles({messageId: $in: <thread>})`
to seed `tool_resources.execute_code` from prior-turn artifacts. With a
stale `messageId` (e.g. from a failed read attempt that re-shelled the
same filename), the file no longer matches the parent-walk thread, so
`tool_resources` arrives empty at agent init, the new
`primedCodeFiles` channel has nothing to seed, and the LLM can't see
its own prior-turn artifacts on the next turn — defeating the
just-added Graph-sessions seeding fix.

Preserve the existing `claimed.messageId` on update; first-creation
behavior is unchanged. The runtime return value still includes the
current run's `messageId` (via `Object.assign(file, { messageId })`)
so the artifact is correctly attributed to the live tool_call.

* 🧹 chore: Remove diagnostic logs from code-files seeding path

Drops the temporary debug logs added to trace the empty-tool_resources
failure mode. Production code paths (loadToolDefinitionsWrapper,
client.js seed loop, initializeAgent file lookup) are left as the
permanent shape: capture primedCodeFiles, merge across agents, seed
initialSessions before run start.

* 🪛 feat: read_file Sandbox Fallback for /mnt/data + Non-Skill Paths

When the model called `read_file` with a code-execution path (e.g.
`/mnt/data/sentinel.txt`), the handler returned a misleading
`Use format: {skillName}/{path}` error. Adds a sandbox-aware fallback:

- Short-circuit `/mnt/data/...` (can never be a skill reference) →
  route to a sandbox `cat` via the new host-provided `readSandboxFile`
  callback, which POSTs to the codeapi `/exec` endpoint.
- Skip the skill resolver entirely when `accessibleSkillIds` is empty
  — the resolved-output of `resolveAgentScopedSkillIds` already
  collapses the admin capability + ephemeral badge + persisted
  `skills_enabled` chain, so an empty value is the authoritative
  "skills aren't in scope for this agent" signal.
- For `{firstSegment}/...` paths, consult the catalog-derived
  `activeSkillNames` Set (no DB read) to detect non-skill names and
  fall through to the sandbox before the model has to retry with
  `bash_tool`.

`activeSkillNames` is captured from `injectSkillCatalog`, threaded onto
`InitializedAgent`, into `agentToolContexts`, then through
`enrichWithSkillConfigurable` into `mergedConfigurable` for the handler.

The host implementation of `readSandboxFile` lives in
`api/server/services/Files/Code/process.js` and shells `cat <path>`
through the seeded sandbox session — `tc.codeSessionContext`
(emitted by ToolNode for `read_file` calls in `@librechat/agents`
v3.1.72+) provides the `session_id` + `_injected_files` so the read
lands in the same sandbox that holds prior-turn artifacts. When the
seeded context isn't available (older agents version, no codeapi
configured), the handler returns a model-visible error pointing at
`bash_tool` instead of silently failing.

Tests: 8 new `handleReadFileCall` cases cover the new short-circuits,
the skills-not-enabled gate, the activeSkillNames lookup, the
sandbox-fallback success path, and the bash_tool retry hint on
fallback failure. Existing `read_file` tests now opt into "skills are
in scope" via a `skillsInScope()` fixture (production wouldn't reach
the skill lookup with empty `accessibleSkillIds`).

* 🔧 chore: Update @librechat/agents dependency to version 3.1.72

Bumps the version of the @librechat/agents package across package-lock.json and relevant package.json files to ensure compatibility with the latest features and fixes.

* 🪛 refactor: Centralize Tool-Session Seed in buildInitialToolSessions Helper

Addresses review feedback on the per-agent merge in client.js:

- **Run-wide semantics, named explicitly.** The merge into a single
  `Graph.sessions[EXECUTE_CODE]` was a deliberate match to the
  agents-library design (`Graph.sessions` is shared across every
  `ToolNode` in the run), but the inline `for (const a of agents)`
  loop in `AgentClient.chatCompletion` made it look per-agent. Move
  the logic to a TS helper `buildInitialToolSessions` that documents
  the run-wide-by-design contract in one place. The CJS controller
  now contains a single call site, no business logic.

- **Subagent walk (P2).** The previous loop only iterated
  `[primary, ...agentConfigs.values()]`. Pure subagents are pruned
  out of `agentConfigs` after init and retained on each parent's
  `subagentAgentConfigs`, so their primed code files were silently
  dropped from the seed. The helper now walks recursively, with a
  visited-Set keyed on object identity that terminates safely on a
  malformed agent graph (cycle).

- **`jest.setup.cjs` polyfill for undici `File`.** Reviewer hit
  `ReferenceError: File is not defined` running the targeted spec on
  WSL — a known Node 18 issue where `globalThis.File` from
  `node:buffer` isn't auto-exposed. Polyfill it inside a Jest setup
  file so the suite boots regardless of Node patch version.

Helper test coverage (8 new): skill-only / agent-only / both,
recursive subagent walk, cycle-safe walk, primary+subagent
deduplication, undefined/null entries in the agents iterable, and
representative session_id preservation across the merge.

16 tests pass total in `codeFilesSession.spec.ts` (8 prior + 8 new).
No behavior change vs. the previous commit for the existing
primary+agentConfigs case — subagent inclusion is the only new
behavior, and it matches what the existing seeding logic would have
done if subagents had been in `agentConfigs`.

* 🪛 fix: FIFO Walk Order in buildInitialToolSessions (P3 review)

The traversal used `Array.pop()` (LIFO), which visited the LAST
top-level agent first. The docstring says "primary first"; the code
contradicted it. When no skill seed exists the first-visited agent's
first file supplies the representative `session_id` written to
`Graph.sessions[EXECUTE_CODE]` — so a LIFO walk silently flipped which
agent that came from. `ToolNode` ultimately uses per-file `session_id`s
for runtime injection (so behavior was indistinguishable for current
callers), but the discrepancy was a footgun for any future consumer
that read the representative.

Switch to FIFO via `Array.shift()` to match both the docstring and the
existing `loadSubagentsFor` walk pattern in
`Endpoints/agents/initialize.js`. Add a regression test that asserts
the primary's `session_id` is the representative (and that all three
agents' files still contribute, with per-file `session_id`s preserved).

* 🔬 test: Lock In Code-Files Bug Fixes Per Comprehensive Review

Addresses MAJOR + MINOR + NIT findings from the multi-pass review:

**Finding #4 (MINOR) — empty relativePath misses sandbox fallback.**
A model calling `read_file("output/")` where "output" isn't a skill
name dead-ended with `Missing file path after skill name` instead of
being routed to the sandbox like every other malformed-path branch.
Add the same `codeEnvAvailable → handleSandboxFileFallback` pattern,
plus two regression tests.

**Finding #7 (NIT) — duplicate `skillsInScope()` helper.**
Hoist the identical helper out of two nested describe blocks to
module scope. Single source of truth.

**Finding #1 (MAJOR) — `persistedMessageId` had zero test coverage.**
The fix preserves a file's original `messageId` on update so
`getCodeGeneratedFiles` can still match it on subsequent turns. A
regression in the `isUpdate ? (claimed.messageId ?? messageId) : messageId`
ternary would silently re-introduce the original cross-turn priming
bug. Five new tests cover:
- UPDATE preserves `claimed.messageId` in the persisted record
- UPDATE falls back to current run id when `claimed.messageId` is
  absent (legacy records predating the field)
- CREATE uses current run id (no claimed record exists)
- The runtime return value uses the LIVE id (artifact attribution)
  even when the persisted record kept the original
- The image branch follows the same contract (would silently regress
  if the ternary diverged across the two file-build branches)

The tests use a `snapshotCreateFileArgs()` helper because
`processCodeOutput` mutates the file object after `createFile`
returns (`Object.assign(file, { messageId, toolCallId })`) and a
naive `createFile.mock.calls[0][0]` would reflect the post-mutation
state instead of what was actually persisted.

**Finding #2 (MAJOR) — `readSandboxFile` had no direct tests.**
The model-controlled `file_path` flows through a POSIX single-quote
escape into a shell `cat` command, making this a security boundary.
A quoting regression would let a malicious filename break out of the
quoted argument and inject arbitrary shell. 20 new tests across:
- Shell quoting (7): plain filenames, embedded `'`, `$()`, backticks,
  newlines, shell metachars, multiple consecutive single-quotes
- Payload shape (6): /exec URL, bash language, conditional
  session_id / files inclusion, dedicated keepAlive:false agents
- Response handling (6): `{content}` on success, null on missing
  base URL or absent stdout, throws on stderr-only, partial-success
  returns stdout, transport errors are logged then rethrown
- Timeout (1): matches processCodeOutput's 15s SLA

Audited findings #5 (acknowledged tech debt — readSandboxFile in JS
workspace), #6 (pre-existing positional-args debt on
enrichWithSkillConfigurable), and #8 (cosmetic JSDoc style) — no
action taken per the reviewer's own assessment.

Audited finding #3 (walk order vs docstring) — already addressed in
commit 007f32341 which converted to FIFO via `queue.shift()` plus a
regression test. The audit was performed against an earlier PR head.

Tests: 152 packages/api + 195 api JS = 347 pass. Typecheck clean.

* 🪛 fix: Pure-Subagent codeEnv + Primed-Skill Routing + ToolService Early Returns

Three findings from the second-pass review:

**P2 — Pure subagents missed `codeEnvAvailable`** (initialize.js).
The pure-subagent init path didn't forward the endpoint-level
`codeEnvAvailable` flag to `initializeAgent`, unlike the primary,
handoff, and addedConvo paths. A code-enabled subagent loaded only
through `subagentAgentConfigs` initialized with
`codeEnvAvailable: false`, so even though the recursive seed walk
found its primed code files, the subagent's own `bash_tool` /
`read_file` sandbox fallback were silently gated off. Forward the
flag and add `codeEnvAvailable: config.codeEnvAvailable` to the
`agentToolContexts.set` for symmetry with the other paths.

**P2 — Primed skills outside the catalog cap were misrouted to
sandbox** (handlers.ts). Manual ($-popover) and always-apply primes
are intentionally resolved off the wider `accessibleSkillIds` ACL
set BEFORE catalog injection — see `resolveManualSkills` for why a
skill outside the `SKILL_CATALOG_LIMIT` cap can still be authorized
for direct manual invocation. The `activeSkillNames` shortcut ran
before reading `skillPrimedIdsByName`, so a primed skill not in the
catalog would fall through to the sandbox instead of resolving via
the pinned `_id`. Read the primed map first and bypass the shortcut
for primed names. New regression test asserts a primed-but-not-
cataloged skill resolves through the existing skill path with
`getSkillByName` invoked and `readSandboxFile` NOT called.

**P3 — `loadAgentTools` early returns dropped `primedCodeFiles`**
(ToolService.js). The non-`definitionsOnly` path captures the field
correctly, but two early-return branches (no-action-tools fast path,
no-action-sets fast path) omitted it. Any traditional
`loadAgentTools(..., definitionsOnly: false)` caller using
execute_code without action tools would have its first-call session
seed silently empty. Add `primedCodeFiles` to both early returns
for consistency with the final return shape.

Tests: 153 packages/api + 195 api JS = 348 pass.

* 🧹 chore: Document jest.mock arrow-indirection pattern in process.spec.js

Per the second-pass review's Finding #2 (NIT, "would help future
readers"): the mock setup mixes direct `jest.fn()` references with
arrow-function indirection (`(...args) => mockX(...args)`). The
indirection isn't stylistic — it's required because `jest.mock(...)`
is hoisted above the outer `const` declarations at parse time, so a
direct reference would capture `undefined`. Inline comment explains
the pattern so the next reader doesn't have to reverse-engineer it
or accidentally "simplify" the mocks and break per-test
`mockReturnValueOnce` / `mockImplementationOnce` overrides.

* 🪛 fix: Five Issues from Pass-N + Codex Review (incl. 404 root cause)

Five real bugs surfaced by another review pass + Codex PR comments
+ the codeapi-side logs we collected during manual testing:

**1) `processCodeOutput` 404 root cause (`callbacks.js`).**
The codeapi worker emits TWO distinct `session_id`s on a tool result:
  - `artifact.session_id` is the EXEC session — the sandbox VM that
    ran the bash command. Files don't live there; it's torn down
    post-execution.
  - `file.session_id` is the STORAGE session — the file-server
    bucket prefix where artifacts actually live.
`callbacks.js` was passing the EXEC id to `processCodeOutput`, which
builds `/download/{session_id}/{id}` and 404s because the file-server
doesn't know about that path. This explains every "Error
downloading/processing code environment file" we saw during testing.
Use `file.session_id ?? output.artifact.session_id` (per-file id with
artifact-level fallback for older worker payloads).

**2) `primeFiles` reupload pushed STALE sandbox ids (`process.js`).**
When `getSessionInfo` returns null (file expired/missing in sandbox),
`reuploadFile` re-uploads via `handleFileUpload`, gets a NEW
`fileIdentifier`, and persists it on the DB record. But `pushFile`
was a closure capturing the OLD `(session_id, id)` parsed at the top
of the loop, so the in-memory `files[]` array (now consumed by
`buildInitialToolSessions` to seed `Graph.sessions`) silently
referenced a sandbox object that no longer existed. The first tool
call would 404 trying to mount it; only the next turn's metadata
re-read would correct course. Parameterize `pushFile` with optional
`(session_id, id)` overrides; in `reuploadFile` parse the new
identifier and pass through. 2 regression tests.

**3) Codex P2 — Cap sandbox fallback output before line-numbering
(`handlers.ts`).** The new `handleSandboxFileFallback` returned
`addLineNumbers(result.content)` without a size guard, so reading a
multi-MB `/mnt/data/*` artifact materialized the file twice in
memory (raw + line-numbered) before downstream truncation. Match the
existing skill-file path's `MAX_READABLE_BYTES` (256KB): truncate
raw first, then number, surface the truncation to the model so it
can use `bash_tool` (`head` / `tail`) for the rest. 2 tests
(oversized truncates with hint, in-cap doesn't).

**4) Codex P2 — Dedupe seeded code files by `(session_id, id)`
(`codeFilesSession.ts`).** Multiple agents in a run commonly carry
the same primed execute-code resources (shared conversation files);
without dedupe, `_injected_files` grows proportionally to agent
count and bloats every `/exec` POST. Use a `(session_id, id)`
identity key so first-seen wins (preserves source ordering); name
alone isn't sufficient because two distinct primed uploads can
share a filename across different sessions. 4 tests covering dedup
across iterations, against pre-existing entries, name-collision
distinct-session preservation, and the multi-agent realistic case
in `buildInitialToolSessions`.

**5) Pass-N P2 — Polyfill `globalThis.File` in api Jest setup
(`api/test/jestSetup.js`).** `packages/api/jest.setup.cjs` had the
polyfill; the legacy api workspace's Jest config has its own
`setupFiles` that didn't, so on Node 18 / WSL the api focused tests
still failed at import time with `ReferenceError: File is not
defined` from undici. Mirror the polyfill.

Tests: 159 packages/api + 206 api JS = 365 pass. Typecheck clean.

* 🔧 chore: Update @librechat/agents dependency to version 3.1.73

Bumps the version of the @librechat/agents package across package-lock.json and relevant package.json files to ensure compatibility with the latest features and fixes.
2026-04-27 08:56:39 +09:00
Danny Avila
d83cb84f59 🪆 feat: Subagent configuration in Agent Builder (#12725)
* 🪆 feat: Subagents configuration (isolated-context child agents)

Surfaces the new @librechat/agents `SubagentConfig` primitive in the Agent
Builder. Subagents let a supervisor delegate a focused subtask to a child
graph running in an isolated context window: verbose tool output stays in
the child, only a filtered summary returns to the parent.

Data model: new `subagents: { enabled, allowSelf, agent_ids }` on Agent,
wired through the Zod, Mongoose, and form schemas plus a new
`AgentCapabilities.subagents` capability (enabled by default).

Backend: `initialize.js` loads explicit subagent configs alongside handoff
agents, and drops subagent-only references from the parallel/handoff maps
so they don't leak into the supervisor's graph. `run.ts` emits
`SubagentConfig[]` on the primary `AgentInputs` — a self-spawn entry when
`allowSelf` is enabled plus one entry per configured agent.

UI: an "Advanced" panel section with an enable toggle, a self-spawn
toggle, and an agent picker (capped at 10). Enabling without adding
agents still yields self-spawn; disabling self-spawn with no agents shows
a warning. A capability flag gates the whole section.

* 🪆 feat: Stream subagent progress to UI (dialog + inline ticker)

Pairs with the @librechat/agents SDK change that forwards child-graph
events through the parent's handler registry (danny-avila/agents#107):

- Self-spawn and explicit subagents can now use event-driven tools,
  because child `ON_TOOL_EXECUTE` dispatches reach our ToolService via
  the parent's registered handler.
- The same forwarding path wraps the child's run_step / run_step_delta
  / run_step_completed / message_delta / reasoning_delta dispatches in
  a new `ON_SUBAGENT_UPDATE` envelope, with start/stop/error bookends.

Backend: `callbacks.js` registers an `ON_SUBAGENT_UPDATE` handler that
forwards each envelope straight to the SSE stream.

Frontend:
- `useStepHandler` consumes `ON_SUBAGENT_UPDATE` events and merges them
  into a per-tool_call Recoil atom (`subagentProgressByToolCallId`).
  First-seen `subagentRunId` claims the most-recent unclaimed `subagent`
  tool call in the active response message — a temporal mapping, no SDK
  wire-format change needed to correlate child runs with parent tool
  calls.
- New `SubagentCall` part component replaces the default `ToolCall`
  rendering when `toolCall.name === Constants.SUBAGENT`: compact status
  ticker showing the 3 most recent update labels, clickable to open a
  dialog with the full activity log + final markdown-rendered result.
- Adds `Constants.SUBAGENT`, `StepEvents.ON_SUBAGENT_UPDATE`, and
  `SubagentUpdateEvent` type in data-provider.

Tests:
- `packages/api npx jest run-summarization` — 23 pass
- `api npx jest initialize` — 16 pass
- `npm run build` — clean

Dependency note: bumps `@librechat/agents` to `^3.1.67-dev.1` — requires
the SDK PR (danny-avila/agents#107) to be merged to dev and published
before this PR merges. `ON_SUBAGENT_UPDATE` is absent from dev.0, so the
handler registration would be a no-op with the older SDK but would not
crash.

* 🪆 fix: address Codex review and review audit on subagents

Stacks on top of the SDK change in danny-avila/agents#107 (bumped to
`^3.1.67-dev.2`).

- **P1 (`initialize.js`)**: subagent-only agents were being deleted from
  both `agentConfigs` AND `agentToolContexts`. The tool-execute handler
  resolves execution context (agent, tool_resources, skill ACLs) from
  `agentToolContexts`, so explicit subagents would run without their
  configured resources and skip action tools. Now only `agentConfigs`
  is pruned — tool context stays intact.
- **P2 (`AgentSubagents.tsx`)**: toggling subagents off set the form
  field to `undefined`; `removeNullishValues` stripped it from the
  PATCH, leaving the server copy enabled. Now it persists an explicit
  `{ enabled: false, ... }` so the update actually clears state.

- **Finding 1 (MAJOR)** — `agent_ids` Zod schema gains `.max()` via a
  new `MAX_SUBAGENTS` export from `data-provider` (shared with the UI
  cap). Crafted payloads can't trigger hundreds of `processAgent`
  calls.
- **Finding 2 (MAJOR)** — `subagentProgressByToolCallId` atomFamily
  atoms are now tracked in a ref and reset from `clearStepMaps` via a
  `useRecoilCallback({ reset })`. No monotonic growth across a session.
- **Finding 3 (MAJOR)** — early-arriving `ON_SUBAGENT_UPDATE` events
  whose parent `tool_call_id` is not yet mapped are now buffered in
  `pendingSubagentBuffer` (keyed by `subagentRunId`) and replayed in
  arrival order once correlation completes. Mirrors the existing
  `pendingDeltaBuffer` pattern.
- **Finding 4 (MAJOR)** — switched to deterministic correlation via
  the new `parentToolCallId` that SDK `3.1.67-dev.2` threads through
  from `ToolRunnableConfig.toolCall.id`. Temporal fallback now iterates
  oldest-unclaimed-first (forward), matching tool-call creation order,
  so concurrent spawns map correctly.
- **Finding 6 (MINOR)** — `agent_ids` are deduped on the backend via
  `new Set(...)` before the load loop. Duplicates no longer produce
  duplicate `SubagentConfig` entries visible to the LLM.
- **Finding 7 (MINOR)** — events array inside each Recoil atom is
  capped at 200 entries. Long-running subagents no longer replay O(n)
  spreads on every update; the dialog log still shows the cap window.
- **Finding 8 (MINOR)** — documented: subagents are loaded only for
  the primary agent this release (handoff children get self-spawn but
  not explicit sub-subagents). In-code comment added so the next
  maintainer doesn't wonder.
- **Finding 9 (NIT)** — removed `{!isSubmitting && null}` dead code
  and the misleading announce-polite comment in `SubagentCall`.

- New `validation.spec.ts` — 9 tests covering the cap on
  `agent_ids.length` at the subagent schema, agent-create, and
  agent-update layers.
- `run-summarization` — 23 pass, `initialize` — 16 pass, total backend
  package: 103 pass across touched areas.

Findings 5 (component tests) and 10 (micro-allocation) are tracked
but deferred; the former needs a Recoil-RenderHook harness that isn't
in this PR's scope, and the latter has negligible impact (one `Array.from`
per subagent run).

* 🧪 test: integration coverage for subagent correlation + backend loading

Addresses the follow-up audit on #12725 with real-code tests (no mock
handlers, only the existing setMessages/getMessages spies and the
standard mongodb-memory-server harness).

Six new tests under a dedicated `describe('subagent loading')`:
- loads a configured subagent, populates `subagentAgentConfigs`, keeps
  it out of `agentConfigs`
- **P1 regression guard**: drives the real `toolExecuteOptions.loadTools`
  closure with the subagent id and asserts `loadToolsForExecution` is
  called with `agent: <subagent>`, `tool_resources`, `actionsEnabled`.
  If anyone deletes `agentToolContexts` again, this fails.
- dedup: three copies of the same id load the agent once
- overlap: agent referenced both as handoff target and subagent stays in
  `agentConfigs`
- capability gate: admin disabling `subagents` suppresses loading even
  when the agent has a config
- per-agent disable: `subagents.enabled: false` skips loading entirely

Five new tests under `describe('on_subagent_update event')` using a
real `RecoilRoot` and a companion `useRecoilCallback` reader so writes
from the hook are observable:
- deterministic correlation via `parentToolCallId` (happy path with
  SDK dev.2+)
- fallback: oldest-unclaimed tool call wins for concurrent spawns
  without `parentToolCallId`
- early-arrival buffer: updates with no mapping get buffered and
  replayed once the tool call appears
- event cap: 205 updates collapse to 200 retained, oldest dropped
- `clearStepMaps` resets tracked atoms back to their null default

- F2 — added explicit `// TODO` marker for handoff-subagent-loading
  extension (matches the comment that referenced it).
- F3 — dropped the unnecessary `MAX_SUBAGENTS as MAX_SUBAGENTS_CAP`
  alias; just import the constant directly.
- Bumped `@librechat/agents` to `^3.1.67-dev.3` to pick up the SDK's
  paired test additions.

- `api/server/services/Endpoints/agents/initialize.spec.js` — 22 pass
  (6 new + 16 existing)
- `packages/api/src/agents/validation.spec.ts` +
  `run-summarization.test.ts` — 103 pass
- `client/src/hooks/SSE/__tests__/useStepHandler.spec.ts` — 48 pass
  (5 new + 43 existing)

* 🪆 fix: strip parent run summary + discovered tools from subagent inputs

Codex P1 on #12725: `buildSubagentConfigs` reused the shared
`buildAgentInput` factory for each explicit child, and that factory
always stamps the parent run's `initialSummary` (cross-run conversation
summary) and `discoveredTools` (tool names the parent's LLM searched
earlier) onto every `AgentInputs` it returns. When subagents were
enabled on a conversation that had already been summarized, every
child inherited that summary — silently defeating the isolated-context
contract and burning extra tokens on unrelated prior chat.

Fix in `run.ts`: after `buildAgentInput(child)`, explicitly clear
`childInputs.initialSummary` and `childInputs.discoveredTools` before
attaching to the `SubagentConfig`. The parent keeps both — that's how
the supervisor receives cross-turn context — but the child starts
fresh.

Paired with danny-avila/agents#107 (bumped to `^3.1.67-dev.4`), which
adds the equivalent strip inside `buildChildInputs` to cover the
self-spawn path where the SDK clones parent `_sourceInputs` directly
and LibreChat never sees the intermediate shape. Belt and suspenders.

Regression test (new):
- `does NOT leak the parent run initialSummary into an explicit child
  (Codex P1 regression)` — sets `initialSummary` on the run, enables
  subagents with an explicit child, asserts the parent still has the
  summary but `childConfig.agentInputs.initialSummary` is `undefined`.
  Same for `discoveredTools`. 24 pass.

* 🪆 fix: capability gate applies to handoff agents + parallel subagent test

### Codex P2 — handoff agents kept `subagents` after capability disabled
The endpoint-level `AgentCapabilities.subagents` gate only cleared
`subagents` on `primaryConfig`. Handoff agents loaded into
`agentConfigs` retained their persisted `subagents.enabled: true`,
and because `run.ts` calls `buildSubagentConfigs` for every agent
input, self-spawn would still fire on a handoff target even when the
admin had disabled the capability globally.

Fix in `initialize.js`: after the subagent loading block, when the
capability is off, iterate `agentConfigs.values()` and clear
`subagents` + `subagentAgentConfigs` on every loaded config.

Regression test: `clears subagents on handoff agents too when
capability is disabled (Codex P2 regression)` — seeds a handoff target
with its own `subagents.enabled: true`, disables the capability at
the endpoint, asserts both primary AND handoff have `subagents`
undefined in the client args. 23 init tests pass.

### Parallel subagent correlation — user-requested verification
Added `keeps parallel subagent streams independent when events
interleave` to `useStepHandler.spec.ts`. Two `subagent` tool calls
seeded side by side, 6 interleaved `ON_SUBAGENT_UPDATE` envelopes
dispatched (a-start, b-start, a-step, b-step, a-stop, b-step), each
carrying its own `parentToolCallId`. Asserts each `tool_call_id`'s
Recoil bucket accumulates only its own run's events, statuses reflect
each run independently (`call_a` → stop, `call_b` → run_step), no
cross-contamination. 49 step-handler tests pass.

* 🪆 fix: SubagentCall detects cancelled / errored states (Codex P2)

Codex P2 on #12725: the old `running` check only consulted
`initialProgress` and the subagent's phase. A user stop, dropped
stream, or backend crash before a terminal `stop`/`error` envelope
arrived would leave the ticker permanently stuck on "working…". Other
*Call components (ToolCall.tsx) already model this via
`!isSubmitting && !finished` → cancelled.

Mirror that pattern. Re-introduce `isSubmitting` on `SubagentCallProps`
(the prop was dropped earlier as 'unused' — that was a bug) and resolve
status as a tri-state:

- `finished`  — initialProgress >= 1, or subagent `stop`/`error`
- `cancelled` — `!isSubmitting && !finished`
- `running`   — neither

New locale keys `com_ui_subagent_cancelled` + `com_ui_subagent_errored`
swap in the right header text per state.

Tests: new `SubagentCall.test.tsx` covers all four states with a real
`RecoilRoot` and a `useRecoilCallback` seeder — no mocked store — 5/5
pass. Includes an explicit P2 regression test that simulates the
`isSubmitting=false, progress.status='run_step', initialProgress<1`
scenario and asserts the cancelled label renders.

* 🪆 feat: semantic ticker + aggregated content-part dialog for subagents

Two rounds of feedback on #12725:

### Ticker — user-readable lines, not raw event names

The old ticker showed \`on_run_step\`, \`on_message_delta\`, etc. — not
meaningful to users. Replaced with \`buildSubagentTickerLines\`, a pure
helper that walks the \`SubagentUpdateEvent\` stream and emits:

- message/reasoning deltas → a single live "Writing: <last 60 chars>"
  (or "Reasoning: …") line that updates in place as chunks arrive
- run_step with tool_calls → "Using calculator(expression=42*58)" for
  a single call, "Using tool: a, b" for parallel (args dropped when
  multiple so the line stays short)
- run_step_completed → "calculator → 42*58 = 2436" (output truncated
  to 48 chars; falls back to "Tool X complete" when output is empty)
- error → "Error: <message>"
- start / stop / run_step_delta → suppressed (too granular / lifecycle-only)

Args and output pass through \`summarizeArgs\` / \`summarizeOutput\`
which flatten JSON to \`key=value\` pairs and head-truncate long
strings so a 200-line tool output never bloats the ticker.

### Dialog — aggregated content parts via leaf renderers

\`aggregateSubagentContent\` folds the raw event stream into
\`TMessageContentParts[]\` — text/reasoning delta streaks collapse into
single \`TEXT\` / \`THINK\` parts, tool calls become \`TOOL_CALL\` parts,
and \`run_step\` boundaries correctly break text runs around tool
calls. The dialog iterates those parts through a \`SubagentDialogPart\`
renderer that delegates to the existing \`Text\`, \`Reasoning\`, and
\`ToolCall\` leaf components — the same sub-components \`<Part />\` uses
— wrapped in a minimal \`MessageContext\` so reasoning expand state and
cursor animation work.

Leaf components are used directly rather than importing \`<Part />\`
itself to avoid a module cycle (Part → Parts/index → SubagentCall →
Part) and to sidestep a hypothetical nested-subagent rendering.

### Tests

- \`subagentContent.test.ts\` — 19 pure-function tests covering the
  aggregator (text concat, reasoning concat, tool call lifecycle,
  interleaving, phase suppression, late-arriving completions) and the
  ticker builder (live preview truncation, args/output snippets,
  parallel-call handling, output truncation, i18n formatter override).
- \`SubagentCall.test.tsx\` — 9 component tests: 5 status-resolution
  (existing) + 2 ticker (semantic text, delta collapse) + 2 dialog
  (aggregated parts routed to leaf renderers, raw-output fallback).

### Locale keys

New: \`com_ui_subagent_ticker_writing\`, \`…_reasoning\`, \`…_error\`,
\`…_using\`, \`…_using_with_args\`, \`…_tool_complete\`,
\`…_tool_output\`. Preserves i18n at the display layer while the
helper stays pure.

* chore: drop unused com_ui_subagent_activity_log locale key

The dialog no longer renders an "Activity log" section — the new
content-parts renderer replaced it. Also tweaks the dialog description
copy to match.

* 🪆 fix: subagent dialog order, persistence, auto-scroll, width

Follow-up pass addressing the four issues observed in real runs
against a live subagent-using parent.

### Aggregator ordering (reasoning appearing after text it preceded)

Reproducible pattern: LLM emits reasoning → text → tool call in that
order, but the dialog rendered text BEFORE reasoning in the content
array. Root cause: `aggregateSubagentContent` maintained `currentText`
and `currentThink` buffers in parallel and only flushed them at a
`run_step` boundary in a fixed (text, think) order, losing the actual
arrival order.

Fix: when a text chunk arrives, close any open think buffer first
(pushes it into the content array right then); symmetric for think →
text. Two new regression tests cover the exact reasoning → text →
tool_call sequence from the screenshot and the repeated
reasoning ↔ text flow across a turn.

### Content persists after completion (markdown not rendering when done)

`clearStepMaps` was calling `resetSubagentAtoms()` at stream end,
which wiped every `subagentProgressByToolCallId` entry. Once reset,
`contentParts.length === 0` and the dialog fell back to rendering the
raw `output` string with plain text — hence the literal `##`/`**` in
the completed-state screenshot. Stopped resetting; the atoms are
bounded per-call (200-event cap) and per-conversation (one per
subagent spawn) so growth matches the rest of the conversation state.
`resetSubagentAtoms` is kept for a future conversation-switch caller.

Also: routed the raw-`output` fallback (older subagent runs recorded
before the event forwarder existed) through the same
`SubagentDialogPart` → `Text` leaf that content parts use, so its
markdown renders the same way.

### Auto-scroll to bottom while running

Added a `scrollRef` on the dialog body and a `useEffect` that pins
`scrollTop = scrollHeight` while the dialog is open AND the subagent
is running. Triggers on `contentParts.length` (new tool calls / part
boundaries) and `events.length` (intra-part deltas) so the cursor
tracks text streaming. Disabled post-completion so re-opening a
finished run doesn't yank to the bottom.

### Wider dialog

Went from `max-w-2xl` (42rem / 672px — too cramped on maximized
laptop windows) to `w-[min(95vw,64rem)] max-w-[min(95vw,64rem)]`.
Narrow on phones, scales up to 64rem on desktop, always leaves a bit
of margin from the viewport edge. Bumped `max-h-[65vh]` on the scroll
area to give the extra width room to breathe vertically too.

### Tests

- `subagentContent.test.ts` — 21 pass (2 new ordering regressions).
- `useStepHandler.spec.ts` — 49 pass (1 updated to assert atoms are
  *preserved* on clearStepMaps).
- `SubagentCall.test.tsx` — 9 pass (unchanged; aggregator-level tests
  cover the ordering).

* 🪆 feat: persist subagent_content via SDK createContentAggregator

Per-request map of createContentAggregator instances keyed by the
parent's tool_call_id. ON_SUBAGENT_UPDATE handler feeds each event
into the matching aggregator (phase → GraphEvent mapping); AgentClient
harvests contentParts onto the subagent tool_call at message save so
the child's reasoning / tool calls / final text survive a page refresh.

Reusing the SDK's battle-tested aggregator instead of a bespoke one
keeps the persisted shape identical to the parent graph's output and
drops ~100 lines of custom aggregation code.

* 🪆 fix: incremental subagent aggregation + dialog render parity

**Disappearing tool_calls**: the Recoil atom trimmed events to a 200-long
rolling window, so verbose subagents could shed the `run_step` that
originally created a tool_call part — rebuilding content from the trimmed
window then produced only the surviving text/reasoning. Fix: fold each
envelope into `contentParts` incrementally in the atom as it arrives
(new `foldSubagentEvent` + cursor state). Event trim window now affects
only the ticker, never the dialog.

**Render parity**: dialog now applies `groupSequentialToolCalls` and
renders single parts through `Container` + grouped batches through
`ToolCallGroup` — same spacing and "Used N tools" collapsing the main
message view uses.

**Width**: `min(96vw, 80rem)` — wider on big screens, still responsive.

**Labels**: "Subagent: X" is jargon. Named subagents render as
`Running "{name}" agent` / `Ran "{name}" agent` (past tense on
completion); self-spawns use `Running subtask` / `Ran subtask` since
`Running "self" agent` reads badly.

* 🪆 polish: subagent dialog parity + agent avatar in header

**Labels**: drop "subtask" framing. Self-spawn shows `Running agent` /
`Ran agent` (past tense on completion); named subagents stay
`Running "X" agent` / `Ran "X" agent`.

**Dialog render parity**: stop wrapping every part in `Container`.
TEXT keeps its `Container` (gap-3 + `mt-5` sibling margin), THINK and
TOOL_CALL render bare so their own wrappers set the full-column width
the regular message view gives them — matches the main `<Part>` dispatch.
Outer scroll region now uses `px-4 py-3` padding and a
`max-w-full flex-grow flex-col gap-0` inner wrapper, mirroring the
`MessageParts` container the main conversation uses.

**Avatar**: header icon now renders the subagent's configured avatar
via `MessageIcon` when `useAgentsMapContext()` has the child agent,
falling back to the `Users` SVG (which keeps its running-state pulse).
Same icon-left-of-label pattern the tool UI uses.

* 🪆 polish: subagent group label, ticker throttle + tail-ellipsis, scroll button

**Grouped label**: ToolCallGroup now detects all-subagent batches and
labels them "Running N agents" / "Ran N agents" instead of "Used N
tools". Mixed batches keep the existing label. The tool-name summary
is suppressed for all-subagent groups (every entry dedupes to
"subagent", which adds nothing).

**Ticker width + tail-ellipsis**: raise the preview cap to 300 chars so
wide containers aren't half-empty, and flip the ticker `<li>` to
`dir="rtl"` so `text-overflow: ellipsis` clips the *oldest* characters
(visually the left edge) — the newest tokens stay pinned to the right
regardless of container width. Bidi lays out the Latin text LTR
internally, the rtl only affects which side gets the ellipsis.

**Throttle**: `useThrottledValue` hook (trailing-edge, 1.2s) smooths
the live `Writing: …` preview so tokens no longer strobe past the
eye faster than they can be read. Ref-based internals (not `useState`)
avoid infinite-update loops when the upstream value is a new-reference
each render; `NEGATIVE_INFINITY` sentinel ensures the very first value
passes through synchronously so tests and first paint aren't delayed.

**Scroll-to-bottom**: dialog tracks `isAtBottom` with a 120px
threshold; auto-scroll only engages when the user is already following
along, and a persistent jump-to-latest button appears whenever they
scroll up — no more fighting the auto-scroll to read back.

* 🪆 polish: snappier ticker, prefix-safe labels, agents icon, readable lines

**Ticker lines are now incrementally aggregated in the atom** — same
pattern as contentParts. The raw-events rolling window is gone; event
volume no longer caps what the ticker can display. Verbose subagents
that used to drop early tool_call lines out of the window now keep the
full 3-line history (using_tool, tool_complete, writing).

**Discriminated-union ticker lines** split a constant prefix (e.g.
"Writing:") from a tail-truncatable body. The prefix lives in a
`shrink-0` span so it never gets clipped when the body overflows; the
body uses `dir="rtl"` only on itself — scoped so non-streaming lines
(e.g. "Waiting for first update…") can't get their trailing ellipsis
flipped by bidi.

**Content-aware throttle**: 800ms interval (down from 1200ms), skipped
entirely while the live buffer is below 120 chars. Early tokens now
appear immediately — no more "Reasoning: I" sitting blank for a full
second before the next heartbeat. Once the preview is long enough to
fill the container, throttling kicks in at the tighter interval.

**Header label** is now a constant verb + optional muted sub-label.
Base reads "Running agent" / "Ran agent" / "Cancelled agent" / "Agent
errored" for every subagent; named subagents get the configured agent
name rendered to the right in secondary text (self-spawns and
unresolved names omit it — "Running self agent" is nonsense).

**ToolCallGroup** now detects `allSubagents` and swaps `StackedToolIcons`
for a single `Users` glyph — otherwise the group header shows a wrench
("tool") icon next to "Ran 5 agents", which reads wrong.

* 🪆 feat: delimiter-aware tool labels in ticker + full-width tool lines

New shared `parseToolName` helper in `client/src/utils/toolLabels.ts`
— single source of truth for splitting `<tool>_mcp_<server>` ids and
mapping native tool names (web_search, execute_code, …) to their
friendly translation keys. `ToolCallGroup` drops its inline copy and
pulls from this helper.

Ticker tool lines now use the shared parser + a new `ToolIdentifier`
sub-renderer so the live log reads like the main tool UI:

  - MCP tool  → `<server> · <code-badge:tool>` (e.g. "github · `search_code`")
  - Native   → friendly name from `TOOL_FRIENDLY_NAME_KEYS`
  - Unknown  → bare `<code>` badge of the raw id

The `using_tool` / `tool_complete` rows now render with a
`flex w-full items-baseline gap-1 overflow-hidden` layout matching
the writing/reasoning rows — they take the full container width
instead of collapsing to content size. Output snippets on
`tool_complete` get the same tail-side `dir="rtl"` ellipsis so the
newest characters stay flush-right when the container is narrow.

Dropped the now-unused template i18n keys
(`com_ui_subagent_ticker_using_with_args`,
`com_ui_subagent_ticker_tool_complete`,
`com_ui_subagent_ticker_tool_output`) in favor of tokens the JSX
composes structurally. Only English is touched per the project rule;
other locales follow externally.

* 🪆 fix: dialog scroll button + auto-scroll during streaming deltas

Two race/trigger bugs in the dialog's scroll behavior:

**Button never showed**: `addEventListener('scroll', …)` in a
`useEffect` ran before Radix's portal had actually committed the
scroll container, so `scrollRef.current` was still null — the listener
never attached, `isAtBottom` stayed stuck at its initial `true`, and
the jump-to-latest button was never rendered. Swap to React's
`onScroll` prop on the element itself so the handler wires up as part
of DOM commit, not a post-commit effect.

**Auto-scroll stalled during text streaming**: the pin-to-bottom
effect only re-fired on `contentParts.length` changes. Message/reasoning
deltas extend the last TEXT/THINK part's `.text` without changing the
array length — so the view would drift up as tokens piled in and
never catch back up. Replace the length-dep effect with a
`ResizeObserver` on the inner content div; every height change (new
part or in-place growth) triggers a scroll-pin when the user is still
at the bottom.

* 🪆 fix: drop leading ellipsis from ticker body

truncatePreview was prepending ... to the tail when the buffer
exceeded 300 chars. The component's CSS already produces a left-side
ellipsis for overflow via dir=rtl + text-overflow: ellipsis — stacking
a data-level ellipsis on top renders a stray dot character right after
the Writing: / Reasoning: label (Writing: .Sure!), which looks like a
typo to the reader.

Data now returns just the last 300 chars when truncating; CSS handles
the visual cue whenever the body actually overflows its container.

* 🪆 fix: Codex review — subagent isolation + concurrent-safe throttle

Three findings from the @codex review pass, all valid:

**P1 — buildAgentInput leaks parent discovered-tool state into subagent
children.** `buildAgentInput` mutates `agent.toolRegistry`
(`overrideDeferLoadingForDiscoveredTools` flips `defer_loading:true→false`
on tools the parent previously searched for) and appends those tools'
definitions to the returned `toolDefinitions` before the function
returns. `buildSubagentConfigs` was clearing the reported
`initialSummary` / `discoveredTools` fields on the returned
AgentInputs, but that happened post-return — the registry writes and
extra tool definitions persisted on the child, silently defeating
context isolation and inflating the child's prompt.

Fix: `buildAgentInput` now takes an `isSubagent` flag that gates the
registry-mutation block and omits `initialSummary` /
`discoveredTools` at the source. `buildSubagentConfigs` passes
`{ isSubagent: true }` for every explicit child; no post-hoc cleanup
needed.

**P2 — ToolCallGroup labels a finished subagent group as still
running when the child returned no output.** `getToolMeta` computed
`hasOutput` as `!!tc.output`, which is `false` for a completed
subagent that returned empty text (the UI already has an "empty
result" fallback for that case). `allCompleted` would stay `false`
and the group header stuck on "Running N agents" forever.

Fix: treat `tc.progress === 1` as completion too — progress is the
authoritative lifecycle signal, output is just content.

**P2 — useThrottledValue schedules `setTimeout` during render.**
Discarded renders under Strict Mode / Concurrent rendering would
leave orphan timers firing against stale trees.

Fix: move `setTimeout` into a `useEffect` keyed on `[value, intervalMs,
enabled]`. Render-time still mutates refs (idempotent), but timer
scheduling lives post-commit. Cleanup on unmount and on passthrough
transitions is preserved.

* 🪆 fix: Codex P2 — wipe subagent atoms on conversation switch

`clearStepMaps()` intentionally doesn't reset `subagentProgressByToolCallId`
so a user can reopen a completed subagent's dialog mid-conversation,
but `resetSubagentAtoms` was defined and never exposed / called —
so each completed run's aggregated `contentParts` + `tickerState`
stayed resident in the `atomFamily` for the whole app session.
Unbounded growth across multi-conversation sessions.

Expose `resetSubagentAtoms` from `useStepHandler` and fire it from
`useEventHandlers` whenever the URL's `conversationId` changes.
That's the correct cleanup boundary: historical subagent dialogs
rehydrate from persisted `subagent_content` on each `tool_call` at
message-save time, so wiping live atoms on switch doesn't lose any
viewable history — it just releases per-tool-call state that the
old conversation's components no longer subscribe to.

* 🪆 fix: Codex round 3 — subagent registry isolation + post-run label

Two more valid findings.

**P1 — parent-order registry mutation leaks into subagent inputs.**
`overrideDeferLoadingForDiscoveredTools` mutates `agent.toolRegistry`
in place (the Map *and* the LCTool objects inside it). When an agent
appears both as a handoff target (normal graph node) AND an explicit
subagent child, a subagent build that ran before the parent's build
captures a reference to the same registry — the parent's later mutation
leaks through to the child.

Fix: for subagent children (`isSubagent`), clone the `toolRegistry`
Map and shallow-clone each LCTool inside before returning the inputs.
`defer_loading` flips on parent-graph registry mutations can't
propagate across the clone boundary. `toolDefinitions` also gets a
shallow-copy pass so the same isolation holds for definitions the
child carries directly.

**P2 — "Running N agents" label stuck after cancel/error.**
ToolCallGroup's all-subagent label was gated only on `allCompleted`,
which requires every child to have `hasOutput || progress === 1`.
A subagent that gets cancelled (stream ends, no `stop` phase, no
output) never satisfies that — so even after `isSubmitting` flips
false, the header stays on "Running N agents" while each individual
card correctly shows "Cancelled agent".

Fix: derive a `subagentsDone` flag as `allCompleted || !isSubmitting`
and gate the past-tense label on that. Matches the tri-state each
SubagentCall card already resolves (finished / cancelled / running).

* 🪆 fix: Codex P2 — ACL-check subagents.agent_ids on create/update

Codex flagged that `subagents.agent_ids` was accepted as arbitrary
strings on the create/update routes while `edges` got a
`validateEdgeAgentAccess` pass — so users could save subagent
references to agents they can't VIEW. At runtime `initializeClient`'s
`processAgent` ACL gate silently drops those, so the persisted
configuration and the actual behavior diverged in a way that is
difficult to diagnose.

Refactor: extract the id-set → unauthorized-ids check into a shared
`collectUnauthorizedAgentIds`, wrap it with a dedicated
`validateSubagentAccess`, and plumb the same 403-on-failure response
the edge path already returns. Applied on both POST /agents and
PATCH /agents/:id.

* 🪆 fix: Codex round 5 — ACL-disable escape hatch + ticker order

Two valid findings.

**P1 — can't disable subagents after losing access to a child.**
The subagent ACL check ran on every create/update that echoed back
the `agent_ids` list, even when the user was explicitly disabling
the feature. The UI keeps the list intact when toggling `enabled:
false`, so a user who subsequently lost VIEW on any child would be
locked in a 403 loop — every edit (including the one that turns
subagents off) bounces.

Fix: gate the ACL check on `subagents.enabled !== false` at both
the POST /agents and PATCH /agents/:id handlers. Empty list stays
a no-op. Disabling the feature is always permitted.

**P2 — ticker fold merges out-of-order previews across delta
switches.** `foldSubagentEventIntoTicker` carried `textLineIdx` /
`thinkLineIdx` across a reasoning → text → reasoning transition,
so the second reasoning chunk appended to the original reasoning
line instead of starting a new chronological one.

Fix: close the opposite buffer + cursor when a delta-type switch
is detected (same rule the content-parts reducer already applies).
Added a regression test.

* 🪆 fix: Codex round 6 — preserve mid-stream atoms + honor sequential suppression

Two valid findings.

**P2 — atom reset fires on initial chat URL assignment.**
`useEventHandlers` initialized `lastConversationIdRef` from the URL's
current `paramId`, then reset subagent atoms whenever the ref and
`paramId` disagreed. For a brand-new conversation the URL stamp goes
from `undefined → "abc123"` while the first response is still
streaming, which used to drop subagent ticker/content state mid-run
and leave dialogs missing earlier updates.

Fix: only reset when *both* the old and new IDs are non-null and
differ — i.e. a user-initiated switch between two established
conversations. The initial assignment passes through without
clearing.

**P2 — ON_SUBAGENT_UPDATE bypassed `hide_sequential_outputs`.**
Every other streaming handler in `callbacks.js`
(`ON_RUN_STEP`, `ON_MESSAGE_DELTA`, etc.) gates emission on
`checkIfLastAgent` + `metadata?.hide_sequential_outputs`, but the
subagent forwarder did an unconditional `emitEvent` — so intermediate
agents in a sequential chain were leaking their children's activity
to the client even when the chain was configured to suppress
intermediates.

Fix: accept `metadata` and apply the same `isLastAgent ||
!hide_sequential_outputs` gate. Aggregation still runs regardless of
visibility (persistence + dialog depend on it); only the SSE forward
is suppressed.

* 🪆 fix: Codex P2 — gate subagent ACL check on endpoint capability

`validateSubagentAccess` ran on every create/update where
`subagents.enabled !== false`, regardless of the endpoint-level
`subagents` capability. When the capability is off at the appConfig
level, `initializeClient` already strips the `subagents` block at
runtime — so persisted `agent_ids` are inert — but the validation
could still 403 on a legacy record whose referenced child is no
longer viewable, blocking unrelated edits.

Fix: add `isSubagentsCapabilityEnabled(req)` that reads the agents
endpoint's capabilities from `req.config` and gate both the create
and update ACL checks on it. Capability-off environments can update
agents with stale `subagents` data freely; capability-on keeps the
full ACL protection.

* 🪆 fix: Codex P2 — reset subagent atoms on id→null navigation too

Previous guard (both-established) skipped the reset whenever
`paramId` became null/undefined, so navigating from an existing
chat to a "new chat" route left stale subagent progress resident
in the `atomFamily` until the user picked a specific different
chat.

Swap the both-established check for a one-time flag: skip only the
very first `undefined → id` transition (the brand-new-chat URL
stamp that happens mid-stream), then reset on any subsequent
change — id→id, id→null, null→id-after-reset. If the user started
on an established chat the flag is true at mount, so the guard is
a no-op and every navigation resets normally.

* 🪆 fix: Codex round 9 — subagent persistence gate + handoff children

Two valid findings.

**P1 — hide_sequential_outputs also gates persistence.**
The previous fix gated the SSE forward on
`isLastAgent || !hide_sequential_outputs` but still ran the
per-tool-call `createContentAggregator` aggregation unconditionally.
`finalizeSubagentContent` would then attach the hidden intermediate
agent's child reasoning / tool output to the saved message, so a
page refresh could reveal activity that was intentionally suppressed
live. Move the visibility gate to the top of the handler — hidden
agents now skip both aggregation and emission, so
"hide_sequential_outputs" is a consistent "don't record" rule for
subagent traces.

**P2 — handoff agents' explicit subagents were silently dropped.**
`initializeClient` only resolved `subagentAgentConfigs` for the
primary config, so an agent used via handoff that had its own
`subagents.agent_ids` saved in the builder would get self-spawn
only; every explicit child was quietly ignored, creating a
saved-config / runtime mismatch the user couldn't diagnose.

Extract the resolution into a shared `loadSubagentsFor(config)`
helper and invoke it for the primary and every handoff agent in
`agentConfigs`. The `edgeAgentIds` precomputation stays outside
the helper (it's loop-invariant). Capability-off shortcuts return
empty early so the existing strip-on-capability-off path still
holds.

* 🪆 fix: Codex P2 — recursive subagent build for multi-level delegation

Previously only the outer `agents[]` loop attached `subagentConfigs`
to its inputs, so a child used as a subagent (invoked via the
`subagent` tool) lost every explicit spawn target of its own. A
user-valid configuration like A → B → C would only run the top
layer; B could never actually delegate to C from inside A's run.

Recursively build `subagentConfigs` for each child inside
`buildSubagentConfigs`, passing the child's freshly-constructed
`childInputs` down so its own `subagents.enabled` children get
resolved too. Added cycle protection via an `ancestors` Set — a
configuration like A → B → A is safely cut off at the second
encounter of A rather than recursing forever (the existing
`child.id === agent.id` guard already prevents the direct self-loop).

* 🪆 fix: Codex P2 — reset subagent atoms on useEventHandlers unmount

The effect that resets subagent atoms only fired on `paramId` change,
so unmounting the chat container (route change away from /c) never
flushed the atoms. `knownSubagentAtomKeys` lives in a ref inside
`useStepHandler` — once the hook unmounts the ref is gone, so a
subsequent remount can't clean atoms it never registered.

Added a second `useEffect` that only runs cleanup on unmount (empty
deps aside from the stable `resetSubagentAtoms` callback). Keeps
`atomFamily` bounded across full route teardowns too.

* 🪆 fix: Codex round 13 — cyclic subagent guard + prefer persisted

Two valid findings.

**P1 — cyclic subagent ref reloads the primary.** A configuration
like `A ↔ B` (B lists A as its own subagent) would send
`loadSubagentsFor` down a path that couldn't find A in
`agentConfigs` (the primary isn't stored there), so it called
`processAgent(A)` a second time. That inserts a fresh config for
the primary id, which downstream duplicates in
`[primary, ...agentConfigs.values()]` and can replace the primary's
tool context with the reloaded copy.

Fix: short-circuit when a subagent ref points back at
`primaryConfig.id` — reuse the already-loaded primary config.
Primary is always an edge id so no pruning bookkeeping needed.

**P2 — live atom preferred over canonical persisted trace.** The
dialog picked `progress.contentParts` ahead of
`persistedContent`, but the Recoil bucket is best-effort — after
a disconnect/reconnect it can be stale or partial. The server's
`subagent_content` on the `tool_call` is the canonical record
refreshed on sync. Preferring live could hide completed
tool/reasoning history that was actually persisted.

Fix: flip the preference order. Persisted wins when it's
non-empty; live covers the mid-stream window (before the parent
message saves, persisted is empty) and the older-runs fallback.
Updated the test that enforced the old order to lock the new
semantics in (separate mid-stream live-fallback assertion kept).

* 🪆 fix: Codex P2 — subagent atom reset rule simplified to 'leaving established id'

The `hasEstablishedConversationRef` + check for initial undefined→id
covered the first navigation but missed the equivalent mid-stream
URL stamp when a user goes from an existing chat to a new chat and
sends a message there (`id → null → newId`). The null → newId
transition was still hitting the reset branch and wiping the
in-flight subagent ticker/content for that first turn.

Simpler rule: only reset when the PREVIOUS paramId is an established
id. Every transition AWAY from an established chat clears (id→id2,
id→null, id→undefined); every transition FROM null/undefined passes
through (initial mount, new-chat URL stamp mid-stream). Drop the
`hasEstablishedConversationRef` machinery in favor of that single
condition.

* 🪆 fix: Codex P2 — match runtime's strict subagent enable check in ACL

Runtime (`initializeClient` + `run.ts`) treats `subagents?.enabled`
as a truthy predicate — `undefined`, `null`, missing, and `false`
all short-circuit. The ACL gate was using `!== false` which
accepted `undefined` / missing as "enabled" and could 403 a payload
whose subagent tool would be inert at runtime.

Swap both create and update to `enabled === true`. Only a strictly-
enabled payload triggers the ACL check; the disable path (`false`)
still passes through so a user who lost VIEW on a child can still
save the disable edit.

* 🪆 fix: Codex P2 — reject missing subagent references with 400

`validateSubagentAccess` collapsed through `collectUnauthorizedAgentIds`,
which returns an empty list for ids with no DB record — so typos and
references to deleted agents passed validation silently, and
`initializeClient` later dropped them at runtime. Saved config would
then list spawn targets that the backend never honored, a hard-to-
diagnose drift.

Refactor the helper into `classifyAgentReferences(ids, …)` which
returns `{ missing, unauthorized }` separately. `validateEdgeAgentAccess`
keeps its old semantics (missing is intentional — a self-referential
`from` names the agent being created). `validateSubagentReferences`
surfaces both buckets so the create/update handlers can 400 on
missing and 403 on unauthorized with distinct error messages and
`agent_ids` lists.

* 🪆 polish: tighten subagent dialog grid gap to gap-2

OGDialogContent's grid default is `gap-4`, which renders the title,
description, and scroll area as three visually separated panels.
Drop to `gap-2` so they read as one block.

* 🪆 polish: swap Subagents above Handoffs in Advanced panel

Subagents is the more common knob users reach for, so show it first.
Handoffs keep the same Controller wiring, just move below.
2026-04-25 04:02:01 -04:00
Danny Avila
35bf04b26c 🧰 refactor: Unify code-execution tools (#12767)
* 🛠️ feat: Add registerCodeExecutionTools helper

Idempotently registers `bash_tool` + `read_file` in the run's tool
registry and tool-definition list via a registry `.has()` dedupe. Sets
up the single code-execution tool path shared by:

- `initializeAgent` (when an agent has `execute_code` in its tools and
  the capability is enabled for the run)
- `injectSkillCatalog` (when skills are active; unconditional read_file,
  bash_tool follows `codeEnvAvailable`)

Both callers reach the helper in the same initialization sequence, so
the second call becomes a no-op and exactly one copy of each tool
reaches the LLM — no more double registration for agents that combine
`execute_code` capability with active skills.

Unit-tested on a fresh run, idempotence (second call, overlap with
prior tooldefs, partial overlap), and the no-registry variant.

* 🔀 refactor: Route injectSkillCatalog bash_tool + read_file through registerCodeExecutionTools

The `skill` tool is still registered inline (it's skill-path-specific),
but `bash_tool` + `read_file` now flow through the shared idempotent
helper so a prior registration from the execute_code path doesn't
produce a duplicate copy later in the same run. Behavior preserved:

- `read_file` always registers when any active skill is in scope —
  manually-primed `disable-model-invocation: true` skills still need it
  to load `references/*` from storage.
- `bash_tool` follows `codeEnvAvailable` exactly as before.

Adds a test pinning the cross-call dedupe: when `injectSkillCatalog`
runs AFTER `registerCodeExecutionTools` has already seeded the registry
+ tool definitions with bash_tool/read_file, the resulting
`toolDefinitions` still contains exactly one copy of each.

* 🪄 feat: Expand `execute_code` tool name into bash_tool + read_file at initialize-time

When an agent's `tools` include `execute_code` and the `execute_code`
capability is enabled for the run, `initializeAgent` now registers
`bash_tool` + `read_file` via `registerCodeExecutionTools` before
`injectSkillCatalog`. The legacy `execute_code` tool definition is no
longer handed to the LLM — `execute_code` remains on the agent
document as a capability-trigger marker, but the runtime expands it
into the skill-flavored tool pair.

Call ordering matters: the `execute_code` registration runs BEFORE
`injectSkillCatalog`, so the skill path's own `registerCodeExecutionTools`
call inside `injectSkillCatalog` becomes a no-op via the registry's
`.has()` check. Exactly one copy of each tool reaches the LLM whether
the agent has:

- only `execute_code` (legacy path)
- only skills
- both

No data migration needed — `agent.tools: ['execute_code']` stays in
the DB unchanged; the expansion is a runtime operation.

Three tests cover the matrix: execute_code + capability on →
bash_tool + read_file registered; execute_code + capability off →
neither registered; no execute_code + capability on → neither
registered.

* 🗑️ refactor: Drop CodeExecutionToolDefinition from the builtin registry

Removes the legacy `execute_code` entry from `agentToolDefinitions` and
the corresponding import. With the initialize-time expansion in place,
nothing consults `getToolDefinition('execute_code')` for a tool schema
any more — the capability gate still filters on the string
`execute_code`, but the actual tool definitions the LLM sees come from
`registerCodeExecutionTools` (i.e. `bash_tool` + `read_file`).

`loadToolDefinitions` in `packages/api/src/tools/definitions.ts`
silently drops `execute_code` when it no longer resolves in the
registry — that's the expected path and is now covered by an updated
test. No caller of `getToolDefinition('execute_code')` expects a
non-undefined result after this change.

* 🔌 refactor: Read CODE_API_KEY from env for primeCodeFiles + PTC

Finishes the Phase 4 server-env-keyed rollout on the two remaining
`loadAuthValues({ authFields: [EnvVar.CODE_API_KEY] })` sites in
`ToolService.js`:

- `primeCodeFiles` (user-attached file priming on execute_code agents)
- Programmatic Tool Calling (`createProgrammaticToolCallingTool`)

Both now read `process.env[EnvVar.CODE_API_KEY]` directly, matching
`bash_tool`'s pattern. The per-user plugin-auth path is no longer
consulted for code-env credentials anywhere in the hot path — the
agents library owns the actual tool-call execution and also reads the
env var internally.

Priming still fires for existing user-file workflows so the legacy
`toolContextMap[execute_code]` hint ("files available at /mnt/data/...")
stays in the prompt; only the key lookup changed.

* 🔧 fix: Type the pre-seeded dedupe-test tools as LCTool

CI TypeScript type checks caught `{ parameters: {} }` in the new
cross-call dedupe test: `LCTool.parameters` is a `JsonSchemaType`,
not `{}`. Use `{ type: 'object', properties: {} }` and type the
local registry Map through the parameter-derived shape so the
pre-seeded values match what `toolRegistry.set` expects.

* 🛡️ fix: Run execute_code expansion before GOOGLE_TOOL_CONFLICT gate

Codex review caught a latent regression: the original Phase 8 placement
ran `registerCodeExecutionTools` after `hasAgentTools` was computed,
so an execute-code-only agent on Google/Vertex with provider-specific
`options.tools` populated would no longer trip `GOOGLE_TOOL_CONFLICT`
— the legacy `CodeExecutionToolDefinition` used to populate
`toolDefinitions` before the guard, but after dropping it from the
registry, `toolDefinitions` stayed empty until my expansion ran
downstream of the guard. Mixed provider + agent tools would silently
flow through to the LLM.

Fix moves the `execute_code` expansion to BEFORE `hasAgentTools`
computation. `bash_tool` + `read_file` now contribute to the check
the same way the legacy `execute_code` def did. Covered by a new
test that pins the Google+execute_code+provider-tools scenario —
the `rejects.toThrow(/google_tool_conflict/)` path would have
silently passed on the prior placement.

* 🔗 fix: Thread codeEnvAvailable through handoff sub-agents

Round-2 codex review caught the other half of the execute_code
expansion gap: `discoverConnectedAgents` omitted `codeEnvAvailable`
from its forwarded `initializeAgent` params, so handoff sub-agents
with `agent.tools: ['execute_code']` lost the `bash_tool` + `read_file`
registration (pre-Phase 8 the legacy `CodeExecutionToolDefinition`
would have landed in their `toolDefinitions` via the registry).

- Add `codeEnvAvailable?` to `DiscoverConnectedAgentsParams` and
  forward it verbatim on every sub-agent `initializeAgent` call.
- Update the three JS call sites that construct the primary's
  `codeEnvAvailable` (`services/Endpoints/agents/initialize.js`,
  `controllers/agents/openai.js`, `controllers/agents/responses.js`)
  to pass the same flag into `discoverConnectedAgents` — one
  authoritative source per request.
- Two regression tests in `discovery.spec.ts` pin the true/false
  passthrough so a future refactor that drops the param-forwarding
  surfaces immediately.

Left intentionally unchanged: `packages/api/src/agents/openai/service.ts`
(public API helper with no in-repo caller). External consumers of
`createAgentChatCompletion` who want code execution should pass a
`codeEnvAvailable`-aware `initializeAgent` via `deps` — documenting
the full public-API surface is out of scope for this Phase 8 PR.

* 🔗 fix: Thread codeEnvAvailable through addedConvo + memory-agent paths

Round-3 codex review caught the last two production `initializeAgent`
callers missing the Phase-8 capability flag:

- `api/server/services/Endpoints/agents/addedConvo.js` (multi-convo
  parallel agent execution). Added `codeEnvAvailable` to
  `processAddedConvo`'s destructured params and forwarded it into
  the per-added-agent `initializeAgent` call. Caller in
  `api/server/services/Endpoints/agents/initialize.js` passes the
  same `codeEnvAvailable` it computed for the primary.
- `api/server/controllers/agents/client.js` (`useMemory` — memory
  extraction agent). Computes its own `codeEnvAvailable` from
  `appConfig?.endpoints?.[EModelEndpoint.agents]?.capabilities` and
  forwards into `initializeAgent`. Memory agents rarely list
  `execute_code`, but if one does, pre-Phase 8 they got the legacy
  `execute_code` tool registered unconditionally — the passthrough
  restores parity.

With this, every production caller of `initializeAgent` explicitly
resolves the capability: main chat flow (primary + handoff), OpenAI
chat completions (primary + handoff), Responses API (primary + handoff),
added convo parallel agents, and memory agents. The one remaining
caller, `packages/api/src/agents/openai/service.ts::createAgentChatCompletion`,
is a public API helper with no in-repo consumer (external callers
must pass a capability-aware `initializeAgent` via `deps`).

* 🪤 fix: Remove duplicate appConfig declaration causing TDZ ReferenceError

The Responses API controller had TWO `const appConfig = req.config;`
bindings inside `createResponse`: one at the top of the function
(added by the Phase 4 `bash_tool` decouple) and one inside the try
block (added by the polish PR #12760). Because `const` is block-scoped
with a temporal dead zone, the inner redeclaration put `appConfig` in
TDZ for the entire try block, so any earlier reference inside the
try — notably `appConfig?.endpoints?.[EModelEndpoint.agents]?.allowedProviders`
at line 348 — threw `ReferenceError: Cannot access 'appConfig'
before initialization`. The error was silently swallowed by the
outer try/catch, leaving `recordCollectedUsage` unreached and the
six `responses.unit.spec.js` token-usage tests failing.

Removing the inner redeclaration fixes the six failing tests
(verified: 11/11 pass locally post-fix, 0 regressions elsewhere).
The outer function-scoped binding already provides `appConfig` to
every downstream reference.

* 🔗 fix: Thread codeEnvAvailable through the OpenAI chat-completion public API

Round-4 codex review (legitimate on the type-safety angle, even though the
runtime concern was already covered): the `createAgentChatCompletion`
helper defines its own narrower `InitializeAgentParams` interface locally,
and the type was missing `codeEnvAvailable`. External consumers who
supply a capability-aware `deps.initializeAgent` couldn't route
`codeEnvAvailable` through without a type-cast workaround.

- Widen the local `InitializeAgentParams` interface to include
  `codeEnvAvailable?: boolean` (matches the real
  `packages/api/src/agents/initialize.ts` type).
- Derive `codeEnvAvailable` inside `createAgentChatCompletion` from
  `deps.appConfig?.endpoints?.agents?.capabilities` (the same source
  the in-repo controllers use) and forward to `deps.initializeAgent`.
  Uses a string literal `'execute_code'` lookup so this file stays free
  of a `librechat-data-provider` import — keeping the dependency surface
  of the public helper minimal.

With this, external consumers of `createAgentChatCompletion` who pass
`appConfig` with the agents capabilities get `bash_tool` + `read_file`
registration automatically; consumers who don't pass `appConfig` retain
the existing "explicit opt-in" semantics (the flag stays `undefined`,
expansion is skipped).

* 🧹 chore: Review-driven polish — observability log, JSDoc DRY, test gaps, no-op allocation

Addresses the comprehensive review of PR #12767:

- **Finding #1** (MINOR, observability): `initializeAgent` now emits a
  debug log when an agent lists `execute_code` in its tools but the
  runtime gate is off (`params.codeEnvAvailable` !== true). The
  event-driven `loadToolDefinitionsWrapper` path doesn't log
  capability-disabled warnings, so without this the tool silently
  vanishes from the LLM's definitions with zero trace. Operators
  debugging "why isn't code interpreter working?" now get a signal at
  the initialize layer.

- **Finding #5** (NIT, allocation): `registerCodeExecutionTools` now
  returns the input `toolDefinitions` array by reference on the no-op
  path (both tools already registered by a prior caller in the same
  run) instead of allocating a fresh spread array every time. The
  common dual-call scenario — `initializeAgent` then
  `injectSkillCatalog` — saves one O(n) copy per request.

- **Finding #4** (NIT, DRY): Collapsed the duplicated 6-line JSDoc
  comment in `openai.js`, `responses.js`, and `addedConvo.js` into
  either a one-line `@see DiscoverConnectedAgentsParams.codeEnvAvailable`
  pointer (the two JS call sites) or a compact 3-line block referring
  back to the canonical source (addedConvo's @param).

- **Finding #2** (MINOR, test gap): Added
  `api/server/services/Endpoints/agents/addedConvo.spec.js` with three
  cases covering `codeEnvAvailable=true`, `codeEnvAvailable=false`,
  and omitted (undefined) passthrough. A future refactor that drops
  the param from destructuring now surfaces here instead of silently
  regressing multi-convo parallel agents with `execute_code`.

- **Finding #3** (MINOR, test gap): Added
  `api/server/controllers/agents/__tests__/client.memory.spec.js`
  pinning the capability-flag derivation that `AgentClient::useMemory`
  uses — six cases covering present/absent/null/undefined config shapes
  plus an enum-literal pin (`'execute_code'` / `'agents'`). Catches
  enum renames or config-path shifts that would otherwise silently
  strip `bash_tool` + `read_file` from memory agents.

Finding #7 (jest.mock scoping, confidence 40) left as-is: the
reviewer's own risk assessment noted `buildToolSet` doesn't touch
the mocked exports, and restructuring a file-level `jest.mock` to
`jest.doMock` + dynamic `import()` introduces more complexity than
the speculative risk justifies. The existing mock is scoped to the
test file and contains the same stubs the adjacent
`skills.test.ts` already uses.

Finding #6 (PR description commit count) addressed out-of-band via
PR description update.

All existing tests pass, typecheck clean, lint clean across touched
files. New tests: 9 cases across 2 new spec files.

* 🧽 refactor: Replace hardcoded 'execute_code' string with AgentCapabilities enum in service.ts

Follow-up review (conf 55) caught that `openai/service.ts`'s Phase 8
`codeEnvAvailable` derivation used the literal `'execute_code'` while
every in-repo controller uses `AgentCapabilities.execute_code` from
`librechat-data-provider`. The file deliberately uses local type
interfaces to keep the public API helper's type surface small, but
that pattern was never a ban on single-value imports from the data
provider — `packages/api` already depends on it. Importing the enum
value means a future rename of `AgentCapabilities.execute_code`
propagates to this file automatically, matching the in-repo
controllers' behavior.

Other follow-up findings left as-is per the reviewer's own verdict:

- #2 (memory spec mirrors the production expression rather than
  calling `AgentClient::useMemory` directly): reviewer flagged as
  "not blocking" / "design-philosophy observation." The test file's
  JSDoc already explicitly documents the tradeoff and pins the enum
  literals to catch the most likely drift vector. Standing up
  `AgentClient` + all its mocks for a one-line regression guard is
  disproportionate.
- #3 (`addedConvo.spec.js` mock signature vs. underlying
  `loadAddedAgent` arity): reviewer's own confidence 25 noted the
  mock matches the wrapper's actual call pattern in the production
  file. Not a real gap.
- #4 was self-retracted as a false alarm.

* 🗑️ refactor: Fully deprecate CODE_API_KEY — remove all LibreChat-side references

The code-execution sandbox no longer authenticates via a per-run
`CODE_API_KEY` (frontend or backend). Auth moved server-side into the
agents library / sandbox service, so LibreChat drops every reference:

**Backend plumbing:**
- `api/server/services/Files/Code/crud.js`: `getCodeOutputDownloadStream`,
  `uploadCodeEnvFile`, `batchUploadCodeEnvFiles` no longer accept
  `apiKey` or send the `X-API-Key` header.
- `api/server/services/Files/Code/process.js`: `processCodeOutput`,
  `getSessionInfo`, `primeFiles` drop the `apiKey` param throughout.
- `api/server/services/ToolService.js`: stop reading
  `process.env[EnvVar.CODE_API_KEY]` for `primeCodeFiles` and PTC; the
  agents library handles auth internally. Remove the now-dead
  `loadAuthValues` + `EnvVar` imports. Drop the misleading
  "LIBRECHAT_CODE_API_KEY" hint from the bash_tool error log.
- `api/server/services/Files/process.js`: remove the `loadAuthValues`
  call around `uploadCodeEnvFile`.
- `api/server/routes/files/files.js`: code-env file download no longer
  fetches a per-user key.
- `api/server/controllers/tools.js`: `execute_code` is no longer a
  tool that needs verifyToolAuth with `[EnvVar.CODE_API_KEY]` — the
  endpoint always reports system-authenticated so the client skips
  the key-entry dialog. `processCodeOutput` called without `apiKey`.
- `api/server/controllers/agents/callbacks.js`: `processCodeOutput`
  invoked without the loadAuthValues round trip, for both LegacyHandler
  and Responses-API handlers.
- `api/app/clients/tools/util/handleTools.js`: `createCodeExecutionTool`
  called with just `user_id` + files.

**packages/api:**
- `packages/api/src/agents/skillFiles.ts`: `PrimeSkillFilesParams`,
  `PrimeInvokedSkillsDeps`, `primeSkillFiles`, `primeInvokedSkills` all
  drop the `apiKey` param; the gate is purely `codeEnvAvailable`.
- `packages/api/src/agents/handlers.ts`: `handleSkillToolCall` drops
  the `process.env[EnvVar.CODE_API_KEY]` read; skill-file priming is
  now gated solely on `codeEnvAvailable`. `ToolExecuteOptions`
  signatures drop apiKey from `batchUploadCodeEnvFiles` and
  `getSessionInfo`.
- `packages/api/src/agents/skillConfigurable.ts`: JSDoc no longer
  references the env var.
- `packages/api/src/tools/classification.ts`: PTC creation no longer
  gated on `loadAuthValues`; `buildToolClassification` drops the
  `loadAuthValues` dep entirely (no LibreChat-side callers need it for
  this path anymore).
- `packages/api/src/tools/definitions.ts`: `LoadToolDefinitionsDeps`
  drops the `loadAuthValues` field.

**Frontend:**
- Delete `client/src/hooks/Plugins/useAuthCodeTool.ts`,
  `useCodeApiKeyForm.ts`, and
  `client/src/components/SidePanel/Agents/Code/ApiKeyDialog.tsx` —
  the install/revoke dialogs for CODE_API_KEY are fully dead.
- `BadgeRowContext.tsx`: drop `codeApiKeyForm` from the context type and
  provider. `codeInterpreter` toggle treated as always authenticated
  (sandbox auth is server-side).
- `ToolsDropdown.tsx`, `ToolDialogs.tsx`, `CodeInterpreter.tsx`,
  `RunCode.tsx`, `SidePanel/Agents/Code/Action.tsx` +`Form.tsx`: all
  API-key dialog trigger refs, "Configure code interpreter" gear
  buttons, and auth-verification plumbing removed. The
  "Code Interpreter" toggle is now a plain `AgentCapabilities.execute_code`
  checkbox — no key-entry gate.
- `client/src/locales/en/translation.json`: drop the three
  `com_ui_librechat_code_api*` keys and `com_ui_add_code_interpreter_api_key`.
  Other locales are externally automated per CLAUDE.md.

**Config:**
- `.env.example`: remove the `# LIBRECHAT_CODE_API_KEY=your-key` section
  and its header.

**Tests:**
- `crud.spec.js`: assertions flipped to pin "no X-API-Key header" and
  "no apiKey param".
- `skillFiles.spec.ts`: removed env-var save/restore; tests now pin
  that the batch-upload path is gated solely on `codeEnvAvailable` and
  that no apiKey is threaded through.
- `handlers.spec.ts`: same — just the `codeEnvAvailable` gate pins
  remain.
- `classification.spec.ts`: remove the two tests that asserted
  `loadAuthValues` was (not) called for PTC.
- `definitions.spec.ts`: drop every `loadAuthValues: mockLoadAuthValues`
  entry from the deps shape.
- `process.spec.js`: strip the mock of `EnvVar.CODE_API_KEY`.

**Comment hygiene:**
- `tools.ts`, `initialize.ts`, `registry/definitions.ts`: shortened
  stale comment references to "legacy `execute_code` tool" without
  naming the retired env var.

Tests verified: 678 packages/api tests pass, 836 backend api tests
pass. Typecheck clean, lint clean. Only remaining CODE_API_KEY
mentions in the code are two regression-guard assertions:
- `crud.spec.js`: pins "no X-API-Key header" stays absent.
- `skillConfigurable.spec.ts`: pins `configurable` never grows a
  `codeApiKey` field.

* 🧹 chore: Remove the last two CODE_API_KEY name mentions in LibreChat

Follow-up to the prior full deprecation commit: two tests still named
the retired identifier in their regression-guard assertions.

- `packages/api/src/agents/skillConfigurable.spec.ts`: drop the
  "does not inject a codeApiKey key" test. The `codeApiKey` field is
  gone from the production configurable shape, so an absence-assertion
  naming it re-introduces the retired identifier in code.
- `api/server/services/Files/Code/crud.spec.js`: rename the
  "without an X-API-Key header" case back to "should request stream
  response from the correct URL" and drop the
  `expect(headers).not.toHaveProperty('X-API-Key')` assertion. The
  surrounding request-shape checks (URL, timeout, responseType) still
  pin the behavior; the explicit header-absence line was named-after
  the deprecated contract.

Result: `grep -rn "CODE_API_KEY\|codeApiKey\|LIBRECHAT_CODE_API_KEY"`
against the LibreChat source tree returns zero hits. The only
remaining `X-API-Key` strings in this repo are on unrelated OpenAPI
Action + MCP server auth configurations, where the string is
user-facing config, not a LibreChat-owned identifier.

Tests: 677 packages/api pass (2 pre-existing summarization e2e failures
unrelated); 126 api-workspace controller/service tests pass.
Typecheck and lint clean.

* 🎯 fix: Narrow codeEnvAvailable to per-agent (admin cap AND agent.tools)

Before this commit, `codeEnvAvailable` was computed in the three JS
controllers as the admin-level capability flag only
(`enabledCapabilities.has(AgentCapabilities.execute_code)`) and passed
through `initializeAgent` → `injectSkillCatalog` / `primeInvokedSkills` /
`enrichWithSkillConfigurable` unchanged. A skills-only agent whose
`tools` array didn't include `execute_code` still got `bash_tool`
registered (via `injectSkillCatalog`) and skill files re-primed to the
sandbox on every turn — wrong, because the agent never opted in to
code execution.

**Fix:** `initializeAgent` now computes the per-agent effective value
once as `params.codeEnvAvailable === true && agent.tools.includes(Tools.execute_code)`,
reuses the same boolean for:

1. The `execute_code` → `bash_tool + read_file` expansion gate
   (previously already consulted `agent.tools`; now shares the single
   `effectiveCodeEnvAvailable` binding).
2. The `injectSkillCatalog` call (previously got the raw admin flag).
3. The returned `InitializedAgent.codeEnvAvailable` field (new, typed as
   required boolean).

**Controllers (initialize.js, openai.js, responses.js):** store
`primaryConfig.codeEnvAvailable` in `agentToolContexts.set(primaryId, ...)`,
capture `config.codeEnvAvailable` in every handoff `onAgentInitialized`
callback, and read it from the per-agent ctx inside the
`toolExecuteOptions.loadTools` runtime closure. The hoisted
`const codeEnvAvailable = enabledCapabilities.has(...)` locals in the
two OpenAI-compat controllers are gone — they were shadowing the
narrowed per-agent value.

**primeInvokedSkills:** `handlePrimeInvokedSkills` in
`services/Endpoints/agents/initialize.js` now uses
`primaryConfig.codeEnvAvailable` (per-agent, narrowed) instead of the
raw admin flag. A skills-only primary agent won't re-prime historical
skill files to the sandbox even when the admin enabled the capability
globally.

**Efficiency:** one extra `&&` in `initializeAgent`. No runtime hot-path
cost — the `includes()` scan on `agent.tools` was already happening for
the `execute_code` expansion gate; it's now just bound to a local. Tool
execution closures read `ctx.codeEnvAvailable === true` (property
access + strict equality, O(1)).

**Ephemeral-agent note:** per-agent narrowing is authoritative for both
persisted and ephemeral flows. The ephemeral toggle
(`ephemeralAgent.execute_code`) is reconciled into `agent.tools`
upstream in `packages/api/src/agents/added.ts`, so
`agent.tools.includes('execute_code')` is the single source of truth
by the time `initializeAgent` runs.

**Tests:** two new regression tests pin the narrowing contract:

- `initialize.test.ts` — four-quadrant matrix on
  `InitializedAgent.codeEnvAvailable` (cap on × agent asks, cap on ×
  doesn't ask, cap off × asks, neither). Catches future refactors that
  drop either half of the AND.
- `skills.test.ts` — `injectSkillCatalog` with `codeEnvAvailable: false`
  against an active skill catalog must NOT register `bash_tool` even
  though it still registers `read_file` + `skill`. This is the state
  a skills-only agent gets post-narrowing.

All 191 affected packages/api tests pass + 836 backend api tests pass.
Typecheck clean, lint clean.

* 🧽 refactor: Comprehensive-review polish — hoist tool defs, pin verifyToolAuth contract, doc appConfig

Addresses the comprehensive review of Phase 8. Findings mapped:

**#1 (MINOR): `verifyToolAuth` unconditional auth for execute_code**
- Added doc comment explicitly stating the deployment contract
  (admin capability → reachable sandbox; no per-check health probe
  to keep UI-gate queries O(1)).
- New `api/server/controllers/__tests__/tools.verifyToolAuth.spec.js`
  with 4 regression tests pinning the contract:
  1. `authenticated: true` + `SYSTEM_DEFINED` for execute_code.
  2. 404 for unknown tool IDs.
  3. `loadAuthValues` is never consulted (catches a future revert
     that would resurface the per-user key-entry dialog).
  4. Response `message` is never `USER_PROVIDED`.

**#2 (MINOR): `openai/service.ts` undocumented `appConfig` dependency**
- Expanded the `ChatCompletionDependencies.appConfig` JSDoc to spell
  out that omitting it silently disables code execution for agents
  with `execute_code` in their tools. External consumers of
  `createAgentChatCompletion` now have the contract documented at
  the type boundary.

**#5 (NIT): `registerCodeExecutionTools` re-allocates tool defs**
- Hoisted `READ_FILE_DEF` and `BASH_TOOL_DEF` to module-level
  `Object.freeze`d constants. The shapes derive entirely from
  static `@librechat/agents` exports, so a single frozen object per
  tool is safe to share across every agent init. Eliminates the
  ~4-property allocations on every call (including the common
  second-call no-op path).

**#6 (NIT): Verbose history-priming comment in initialize.js**
- Trimmed the 16-line `handlePrimeInvokedSkills` block to a 5-line
  summary with `@see InitializedAgent.codeEnvAvailable` pointer.
  The canonical narrowing explanation lives on the type; the
  controller comment is just the ACL-vs-capability rationale.

**Skipped:**

- #3 (memory spec tests a mirror function): reviewer self-dismissed
  as a design tradeoff; the enum-literal pin already catches the
  highest-risk drift vector.
- #4 (cross-repo contract for `createCodeExecutionTool`): user will
  explicitly install the latest `@librechat/agents` dev version
  once the companion PR publishes, so the version pin will be
  authoritative.
- #7 (migration/deprecation note for self-hosters): out of scope
  per user direction — release notes handle this.

Tests verified: 679 packages/api + 840 backend api tests pass.
Typecheck + lint clean.

* 🔧 chore: Update @librechat/agents version to 3.1.68-dev.1 across package-lock and package.json files

This commit updates the version of the `@librechat/agents` package from `3.1.68-dev.0` to `3.1.68-dev.1` in the `package-lock.json` and relevant `package.json` files. This change ensures consistency across the project and incorporates any updates or fixes from the new version.
2026-04-25 04:02:01 -04:00
Danny Avila
91cd3f7b7c 🧽 refactor: Skills polish: precedence-aware body validation, controller drop logs, SkillPills rename (#12760)
Post-merge sanity-review cleanup on top of #12746:

- `createSkill` / `updateSkill` now parse SKILL.md body's always-apply
  status once and reuse it for both validation and derivation (was
  parsing the same YAML block twice per call).
- Body-inline `always-apply:` validation becomes precedence-aware: a
  caller sending an explicit top-level `alwaysApply` or a structured
  `frontmatter['always-apply']` no longer gets rejected for a typo in
  the body — the body value is never consulted at derivation time when
  a higher-precedence source wins. New tests cover the three relevant
  interactions (explicit+body-typo, frontmatter+body-typo, body-only
  typo still rejects).
- OpenAI and Responses controllers now emit a `logger.warn` when
  `injectSkillPrimes` drops always-apply primes to stay under
  `MAX_PRIMED_SKILLS_PER_TURN`. `injectSkillPrimes` already logs
  internally; the controller-level warn adds endpoint context so
  operators can identify which path hit the cap at a glance. Mirrors
  AgentClient's existing log.
- Rename `ManualSkillPills` → `SkillPills` (component + type + file +
  test + all JSDoc references). The component handles both manual and
  always-apply pills now; the original name was carried over from the
  manual-only Phase 3 and misleads new readers.
- Drive-by fix: declare `appConfig = req.config` at the top of
  `createResponse` in `responses.js` — it was used unqualified on
  lines 381/396, which silently evaluated to `undefined` (via optional
  chaining) and disabled the skills-capability check on the Responses
  endpoint. Pre-existing, surfaced by lint on the touched file.
2026-04-25 04:02:01 -04:00
Danny Avila
dfc3dfa57f 📍 feat: always-apply frontmatter: auto-prime skills every turn (#12746)
* 🔁 refactor: Rebase always-apply work onto merged structured-frontmatter columns

Phase 6 (disable-model-invocation / user-invocable / allowed-tools)
landed first on feat/agent-skills. Reconcile this branch with the new
mainline:

- Thread alwaysApplySkillPrimes through unionPrimeAllowedTools alongside
  manualSkillPrimes, applying the combined MAX_PRIMED_SKILLS_PER_TURN
  ceiling before loading tools.
- Add `_id` to ResolvedAlwaysApplySkill to match Phase 6's
  ResolvedManualSkill shape (read_file name-collision protection).
- Register 'always-apply' in ALLOWED_FRONTMATTER_KEYS / FRONTMATTER_KIND
  so Phase 6's validator recognizes it.
- Drop frontmatter from the listSkillsByAccess projection; the backfill
  helper remains as defensive code but its read path is no longer
  exercised on summary rows (no legacy rows exist — the branch never
  shipped), saving ~200KB per page.
- Retire the corresponding "backfills legacy on summaries" test.
- Plumb listAlwaysApplySkills through the JS controllers + endpoint
  initializer so the always-apply resolver sees a real DB method.

* 🧹 fix: Dedupe manual/always-apply overlap, share YAML util, tidy comments

Addresses review findings:

- Cross-list dedup: when a user $-invokes a skill that is also marked
  always-apply, the always-apply copy is now dropped so the same
  SKILL.md body never primes twice in one turn. Manual wins (explicit
  intent, closer to the user message). Dedup runs in both
  initializeAgent (so persisted user-bubble pills stay in sync) and
  injectSkillPrimes (defense-in-depth at splice time). New test cases
  cover single-overlap, partial-overlap, and dedup-before-cap.
- DRY: extract stripYamlTrailingComment to
  packages/data-schemas/src/utils/yaml.ts; packages/api/src/skills/import.ts
  now imports the shared helper. Also drop the redundant inner
  stripYamlTrailingComment call inside parseBooleanScalar — the call
  site already strips.
- Mark injectManualSkillPrimes as @deprecated in favor of
  injectSkillPrimes (kept for external consumers of @librechat/api).
- Document SKILL_TRIGGER_MODEL as forward-looking plumbing for the
  model-invoked path rather than leaving it as a bare unused export.
- Replace the stale "frontmatter is included" comment on
  listSkillsByAccess with an accurate explanation of why it was
  intentionally excluded.

* 🔒 fix: Include always-apply primes in skillPrimedIdsByName + clear alwaysApply on body opt-out

Two bugs flagged by Codex review:

P1 (read_file): `manualSkillPrimedIdsByName` only carried manual-invocation
primes, so an always-apply skill with `disable-model-invocation: true`
was blocked from reading its own bundled files, and same-name collisions
could resolve to a different doc than the one whose body got primed.
- Rename `buildManualSkillPrimedIdsByName` → `buildSkillPrimedIdsByName`
  (accepts both manual + always-apply prime arrays).
- Rename the configurable field `manualSkillPrimedIdsByName` →
  `skillPrimedIdsByName` throughout the plumbing (skillConfigurable.ts,
  handlers.ts, CJS callers, tests).
- Overlap resolution: manual wins on the rare edge case where the same
  name appears in both arrays (upstream dedup should prevent this, but
  defensive merging treats manual as authoritative).
- New tests: (1) gate-relaxation fires for always-apply primes, (2) `_id`
  pinning works for always-apply same-name collisions.

P2 (updateSkill): when a body update had no `always-apply:` key,
`extractAlwaysApplyFromBody` returned `absent` and the column was left
untouched. A skill that was once `alwaysApply: true` would keep
auto-priming even after its SKILL.md no longer declared the flag.
- Treat `absent` as a positive "not always-apply" declaration when the
  body is explicitly submitted; flip the column to `false`.
- Explicit top-level `alwaysApply` still wins (three-source precedence
  unchanged).
- New tests: body removes key → false, body has no frontmatter at all →
  false, explicit + body-without-key → explicit wins.

* 🧵 refactor: Collapse duplicate prime types + tighten parse + test hygiene

Sanity-check review follow-ups:

- Collapse `ResolvedManualSkill` / `ResolvedAlwaysApplySkill` into a
  single `ResolvedSkillPrime` canonical interface with two backward-
  compatible type aliases. Both resolvers feed the same pipeline stages
  (injectSkillPrimes, unionPrimeAllowedTools, buildSkillPrimedIdsByName);
  the per-source distinction lives on `additional_kwargs.trigger`, not
  on the resolver output.
- Move the `always-apply` branch in `parseFrontmatter` to operate on the
  raw post-colon text. The outer `unquoteYaml` was fine today because
  it's idempotent on non-quoted strings, but running it twice (once per
  line, once after stripping the inline comment) would be fragile if the
  unquoter ever grows richer YAML-escape handling.
- Add the missing `alwaysApplyDedupedFromManual: 0` field to the
  `injectSkillPrimes` mocks in `openai.spec.js` and `responses.unit.spec.js`
  so they match the full `InjectSkillPrimesResult` contract.
- Insert the blank line between the `unionPrimeAllowedTools` and
  `resolveAlwaysApplySkills` describe blocks.

* 🔧 fix(tsc): Cast mock.calls via `unknown` for strict tuple destructure

`getSkillByName.mock.calls[0]` is typed as `[]` by jest's generic default;
a direct cast to `[string, ..., ...]` fails TS2352 under `--noEmit` even
though the runtime shape matches. Go through `as unknown as [...]` like
the earlier test in the same file so CI's type-check step stays green.

* 🪢 fix: Propagate skillPrimedIdsByName into handoff agent tool context

Handoff agents go through the same `initializeAgent` flow as the primary
(with `listAlwaysApplySkills` now plumbed), so they resolve their own
`manualSkillPrimes` and `alwaysApplySkillPrimes` — but the
`agentToolContexts.set(...)` for handoff agents didn't carry
`skillPrimedIdsByName` into the per-agent context.

That meant `handleReadFileCall` fell back to the full ACL set + a
`prefer*` flag for handoff agents: same-name collisions could resolve to
a different doc than the one whose body got primed, and a
`disable-model-invocation: true` skill primed via manual `$` or
always-apply inside the handoff flow would be blocked from reading its
own bundled files.

Build the map via `buildSkillPrimedIdsByName(config.manualSkillPrimes,
config.alwaysApplySkillPrimes)` for every handoff tool context so
`read_file` behaves identically across primary and handoff agents.
2026-04-25 04:02:00 -04:00
Danny Avila
539c4c7e4d 🎬 feat: Prime Manually-Invoked Skills via $ Popover (#12709)
* 🎬 feat: Prime Manually-Invoked Skills via $ Popover

Lands the backend for manual skill invocation, making the $ popover
deterministically prime SKILL.md before the LLM turn instead of leaving
the model to discover the skill via the catalog.

Flow: popover drains pendingManualSkillsByConvoId on submit, attaches
names to the ask payload, controllers forward to initializeAgent, and
initialize resolves each name to its body (ACL + active-state filtered,
reusing the same rules as catalog injection). AgentClient splices the
primes as meta HumanMessages before the user's current message.

- Extract primeManualSkill / resolveManualSkills in packages/api/src/agents/skills.ts
  and reuse primeManualSkill inside handleSkillToolCall for a single shape source.
- Thread manualSkills + getSkillByName through InitializeAgentParams / DbMethods
  and all three initializeAgent call sites (initialize.js, responses.js, openai.js).
- Splice HumanMessage primes in client.js chatCompletion after formatAgentMessages,
  shifting indexTokenCountMap so hydrate still fills fresh positions correctly.
- Carry isMeta / source / skillName in additional_kwargs for downstream filtering.

* 🛡️ fix: Scope manual skill primes to single-agent + cap resolver input

Two follow-ups to the Phase 3 priming path flagged in Codex review.

Multi-agent runs: skipping the splice when agentConfigs is non-empty.
`initialMessages` is shared across every agent in `createRun`, so splicing
a skill body there would bypass Phase 1's per-agent `scopeSkillIds`
contract — a handoff / added-convo agent with a different skill scope
would see content its configuration excludes. Warn + skip is the minimal
correct behavior; lifting this to per-agent initial state is a follow-up.

Input bounding: `resolveManualSkills` now truncates to `MAX_MANUAL_SKILLS`
(10) after dedup, with a warn listing the dropped tail. Controllers only
validate `Array.isArray(req.body.manualSkills)`, so a crafted payload
could otherwise fan out into an unbounded `Promise.all` of concurrent
`getSkillByName` DB lookups. Cap lives in the resolver so every caller
(including future `always-apply` in Phase 5) inherits it.

* 🧪 refactor: Testable Helpers + Payload Validation for Manual Skill Primes

Follow-ups from the comprehensive review. No behavior change for the
happy path — these are architectural and defensive improvements that
shrink the JS surface in /api, tighten the request-body contract, and
cover the delicate splice logic with proper unit tests.

- Extract `injectManualSkillPrimes` into packages/api/src/agents/skills.ts
  so the message-array splice and `indexTokenCountMap` shift are unit-
  testable in TS. client.js now calls the helper. Tests pin the `>=`
  vs `>` boundary condition — a regression here would silently corrupt
  token accounting for every message after the insertion point.
- Extract `extractManualSkills(body)` and use in all three controllers
  (initialize.js, responses.js, openai.js). Replaces copy-pasted
  `Array.isArray(...) ? ... : undefined` with a helper that also filters
  non-string / empty elements — closes a type-safety gap where a crafted
  payload like `{"manualSkills": [123, {"$gt":""}]}` would otherwise reach
  `getSkillByName` and waste DB round-trips.
- Rename `primeManualSkill` → `buildSkillPrimeMessage`. The helper serves
  three invocation modes (`$` popover, `always-apply`, model-invoked);
  the old name misled readers coming from `handleSkillToolCall`.
- Add `loadable.state === 'hasValue'` guard in `drainPendingManualSkills`
  — defensive, since the atom has a synchronous `[]` default, but the
  previous `.contents` cast would have been unsound under loading/error.
- Document why `resolveManualSkills` honors the active-state filter even
  for explicit `$` selections (Phase 2 popover filter + API-direct
  hardening).
- Remove stray `void Types;` in initialize.test.ts — `Types` is already
  consumed elsewhere in that test.

* 🔖 refactor: Single source for the skill-message source marker

Export `SKILL_MESSAGE_SOURCE = 'skill'` and use it in both construction
paths that stamp skill-primed messages — `buildSkillPrimeMessage` (for
the model-invoked tool path) and `injectManualSkillPrimes` (for the
user-invoked splice path). Downstream filtering and telemetry read this
marker, so the two paths must agree; keeping the literal in one place
removes the risk of them drifting when Phase 5's `always-apply` adds a
third caller.

* ♻️ refactor: Drop Multi-Agent Guard + Review Polish

- Remove the multi-agent skip in `AgentClient.chatCompletion`. Leaking
  primes to handoff / added-convo agents via shared `initialMessages` is
  the agents SDK's concern to scope; this layer should just inject and
  let the graph handle agent-scoped state. The guard was well-intended
  but produced a silent-drop UX where `$skill` in a multi-agent run did
  nothing.
- Bound the `[resolveManualSkills] Truncating ...` warn output to the
  first 5 dropped names plus a count suffix. A malicious payload of
  1000 names was previously spilling all ~990 names into the log line.
- Remove dead `?? []` from the `hasValue`-guarded loadable read in
  `drainPendingManualSkills` — the atom always yields a string[] when
  resolved, so the nullish fallback was unreachable.
- Reorder skills.ts imports to follow the style guide: value imports
  shortest-to-longest (`data-schemas` → `langchain/core/messages` →
  multi-line `@librechat/agents`), type imports longest-to-shortest.

* 🧠 fix: Strip Skill Primes from Memory Window + Unbreak CI Mocks

Two fixes after the last push.

CI unbreak: `responses.unit.spec.js` and `openai.spec.js` mock
`@librechat/api` and the mock didn't expose the new `extractManualSkills`
symbol, so every test in those files crashed before reaching the
`recordCollectedUsage` assertion. Added `extractManualSkills: jest.fn()`
returning `undefined` to both mocks; the controllers now no-op on
manualSkills as the tests expect.

Codex P2: `runMemory` passes `messages` straight through to the memory
processor, so after the splice in `injectManualSkillPrimes`, SKILL.md
bodies ride along as if they were real user chat. That pollutes memory
extraction with synthetic instruction content and crowds out real turns
from the window.

- Export `isSkillPrimeMessage(msg)` from `packages/api/src/agents/skills.ts`
  — a predicate keyed on the shared `SKILL_MESSAGE_SOURCE` marker.
- Filter `chatMessages = messages.filter(m => !isSkillPrimeMessage(m))`
  at the top of `runMemory` before the window-sizing logic. Keeps the
  primes visible to the LLM (they still ride in `initialMessages`) but
  invisible to the memory layer.
- 5 new tests for the predicate covering marker-present, plain messages,
  different source, non-object inputs, and array filter integration.

* 📜 feat: Show Skill-Loaded Cards for Manually-Invoked Skills

The $ popover was priming SKILL.md bodies into the turn but leaving no
visible trace on the assistant response — from the user's view it looked
like the `$name ` cosmetic text did nothing. Now each manually-invoked
skill renders the same "Skill X loaded" tool-call card that model-invoked
skills already produce via PR #12684's SkillCall renderer.

Approach: post-run prepend to `this.contentParts`. The aggregator owns
per-step indices during the run, so pre-seeding collides; waiting until
`await runAgents(...)` returns lets the graph settle before synthetic
parts slot in at the front.

- Export `buildSkillPrimeContentParts(primes, { runId })` from
  `packages/api/src/agents/skills.ts`. Returns completed tool_call parts
  (`progress: 1`, args JSON-encoded with `{skillName}`, output matching
  the model-invoked path's wording) that the existing `SkillCall.tsx`
  renderer draws identically.
- In `AgentClient.chatCompletion`, prepend the built parts to
  `this.contentParts` immediately after `await runAgents`. Persistence
  and the final-event reconcile come for free — `sendCompletion` already
  reads `this.contentParts` verbatim.
- Card ordering: skills appear first in the assistant message, reflecting
  that priming ran before the LLM's turn.

Live-during-streaming cards are a separate follow-up — the graph's
index-based aggregator makes that a bigger lift and this change delivers
the core UX win without fighting the stream ordering.

6 new unit tests covering part shape, args JSON contract, output text,
unique IDs, empty input, and startOffset ID differentiation.

*  feat: Emit Optimistic Skill Cards + Wire Primes in OpenAI/Responses

Two follow-ups from testing.

Optimistic card emit: the main chat path was only showing "Skill X
loaded" cards at final-reconcile time, so the user saw nothing happen
until the stream finished. Now emit synthetic ON_RUN_STEP +
ON_RUN_STEP_COMPLETED events right before `runAgents` starts — same
pattern the MCP OAuth flow uses in `ToolService` — so the cards appear
immediately. The graph's content at index 0 may overwrite them during
streaming, but the post-run `contentParts` prepend (unchanged) restores
them on final reconcile.

OpenAI + Responses parity: both controllers were resolving
`manualSkillPrimes` via `initializeAgent` but never injecting them into
`formattedMessages` before the run. Manual invocation silently did
nothing on `/v1/chat/completions` and the Responses API path. Now both
call `injectManualSkillPrimes` on the formatted messages so the model
sees SKILL.md bodies on every path. LibreChat-style card SSE events
don't apply to these OpenAI-shaped responses, so the live-emit is
chat-path-only.

- Export `buildSkillPrimeStepEvents(primes, { runId })` from
  `packages/api/src/agents/skills.ts`. Uses `Constants.USE_PRELIM_RESPONSE_MESSAGE_ID`
  by default so the frontend maps events to the in-flight preliminary
  response message, matching the OAuth emitter.
- In `AgentClient.chatCompletion`, emit via `sendEvent` (or
  `GenerationJobManager.emitChunk` in resumable mode) after
  `injectManualSkillPrimes` runs, before the LLM turn begins.
- Wire `injectManualSkillPrimes` into `openai.js` + `responses.js` after
  `formatAgentMessages`. Refactored the destructure to `let` on
  `indexTokenCountMap` so the injector's returned map is usable.
- 8 new unit tests covering the step-event builder: pair cardinality,
  default/custom runId, TOOL_CALLS shape + JSON args, progress:1 on
  completion, index ordering, stepId/toolCallId pairing, empty input.

* 🎯 fix: Route Skill Prime Events to the Real Response + Sparse-Array Offset

Two bugs in the optimistic-card emit from the last pass.

1. Wrong runId. The events used `USE_PRELIM_RESPONSE_MESSAGE_ID` (the
   MCP OAuth pattern), but OAuth emits DURING tool loading — before the
   real response messageId exists. By the time skill priming fires, the
   graph is about to emit with `this.responseMessageId`, so the PRELIM
   runId orphaned every card onto the client's placeholder response
   entry in `messageMap`, separate from the one the LLM's events were
   building. Net effect: cards never rendered mid-stream.

   Now passing `this.responseMessageId` — the same ID `createRun`
   receives — so synthetic and real steps land on the same `messageMap`
   entry.

2. Index 0 collision. With the runId fixed, card-at-0 would have hit
   `updateContent`'s type-mismatch guard when the LLM's text delta
   arrived at the same index, suppressing the whole text stream.

   New `SKILL_PRIME_INDEX_OFFSET` = 100 placed on both the live SSE
   emit and the server-side `contentParts` assignment. Sparse array
   during streaming renders as `[llm_text, ..., card]` (skip-holes via
   `Array#filter` / `Array#map`). `filterMalformedContentParts` from
   `sendCompletion` compacts to dense `[text, card]` before persistence,
   so streaming UI and saved message agree on order — no finalize
   reorder jank. Post-run switches from `contentParts.unshift` to
   `contentParts[OFFSET + i] = part` to mirror the live placement.

- Add `startIndex` option to `buildSkillPrimeStepEvents` with
  `SKILL_PRIME_INDEX_OFFSET` default. Export the constant from
  `@librechat/api` so `client.js` can reuse it for the post-run splice.
- Update the existing index-ordering test to the new default and add a
  new test for the explicit `startIndex` override.

* 🎗️ feat: Replace \$skill-name Text with Pills on the User Message

The `$skill-name ` cosmetic text the popover was inserting into the
textarea had two problems: it lingered in the user message forever (the
card is a more meaningful marker), and it implied that free-form text
invocation like \"\$foo help me\" should work — which it doesn't, and
supporting it would mean another parsing layer nobody asked for.

Dropped the textarea insertion. Visual confirmation after submit now
comes from a compact `ManualSkillPills` row on the user bubble that
self-extinguishes once the backend's live skill-card stream
(`buildSkillPrimeStepEvents` from the last commit) populates the sibling
assistant response. Multiple skills render as multiple pills — the atom
was already a string array, so multi-select works for free.

- `SkillsCommand.tsx`: select handler no longer writes to the textarea.
  Still drops the trigger `$` via `removeCharIfLast`, still pushes to
  `pendingManualSkillsByConvoId`, still flips `ephemeralAgent.skills`.
- `families.ts`: new `attachedSkillsByMessageId` atomFamily keyed by
  user messageId. `useChatFunctions.ask` writes the drained skill list
  here on every fresh submit (regenerate/continue/edit still skip).
- `ManualSkillPills.tsx` renders pills conditionally: hidden when the
  message isn't a user message, when no skills are attached, or when
  the sibling assistant response already carries a `skill` tool_call
  content part (the live card took over). Reads messages via React Query
  so we don't re-render on every message-state keystroke.
- `Container.tsx` mounts the pills above the user message text, parallel
  to the existing `Files` slot.
- Updated the SkillsCommand select-flow spec to assert the textarea is
  cleared of `$` instead of populated with `\$name `. 5 new tests for
  `ManualSkillPills` covering empty state, non-user message guard,
  multi-skill rendering, the skill-card hide condition, and the
  text-only-content-doesn't-hide case.

* 🎛️ feat: Manual Skills as Persisted Message Field + Compose-Time Chips

Three problems with the previous pass:
1. Cards rendered BELOW the LLM text on the assistant message (and
   stayed there on reload) because the sparse index-100 offset put them
   after the model's content. Now back to `unshift` — cards at the top,
   same as before the live-emit detour.
2. Pills on the user message disappeared the moment the live card
   arrived, so users barely saw them. The live-emit channel also added
   meaningful complexity and relied on a per-message Recoil atom that
   had no clean cleanup story.
3. No visual cue at all during new-chat compose — the `$name ` text was
   removed, the submitted-message pills weren't there yet, and the
   popover closes after selection. User had no way to see what they'd
   queued up before sending.

New architecture: `manualSkills` is a first-class field on `TMessage`,
persisted by the backend on the user message. `ManualSkillPills` reads
straight from `message.manualSkills` — no atom, no sibling-lookup — so
pills survive reload, show in history, and stay for the lifetime of the
message. Compose-time chips above the textarea read the existing
`pendingManualSkillsByConvoId` atom and let users × skills out before
submitting.

Backend reverts:
- `client.js`: dropped the `ON_RUN_STEP` live-emit loop, restored
  `this.contentParts.unshift(...primeParts)` so cards sit at the top of
  the persisted assistant response.
- `skills.ts`: removed `buildSkillPrimeStepEvents` and
  `SKILL_PRIME_INDEX_OFFSET` (both unused now). `GraphEvents`,
  `StepTypes`, and `Constants` imports went with them. Removed 8 tests.

Field persistence:
- `tMessageSchema` gains `manualSkills: z.array(z.string()).optional()`.
- Mongoose message schema gains `manualSkills: { type: [String] }` with
  matching `IMessage` TS field.
- `BaseClient.js` reads `req.body.manualSkills` on user-message save,
  filters to non-empty strings, pins onto `userMessage` before
  `saveMessageToDatabase`. Mirrors the existing `files` pattern right
  above it. Runtime resolution still reads top-level `req.body.manualSkills`
  — persistence and resolution are separate concerns.

Frontend:
- `useChatFunctions.ask` sets `currentMsg.manualSkills` directly; the
  drained atom value goes onto the message, not a separate atom.
  Removed the `attachSkillsToMessage` Recoil callback.
- `ManualSkillPills`: pure render of `message.manualSkills`. No more
  `useQueryClient`, no sibling scan, no atom read. Loses the
  auto-hide-when-card-arrives behavior — pills stay on the user
  bubble, cards live on the assistant bubble, both are informative.
- Dropped the `attachedSkillsByMessageId` atomFamily and its export.
- New `PendingManualSkillsChips` above the textarea reads the
  compose-time atom and renders chips with × to remove. Mounted in
  `ChatForm` right after `TextareaHeader`. Naturally hides on submit
  when the atom drains.

Tests: updated `ManualSkillPills` suite to the new field-based reads
(5 passing). New `PendingManualSkillsChips` suite covering empty state,
multi-chip render, single × removal, and full-clear (4 passing).
Backend suite trimmed to 89 (was 97) from the step-events test
removal — no regressions on the remaining helpers.

* 🧪 feat: Assistant-Side Skill-Loading Chips + Pill Padding

Two small UX fixes on top of the field-on-message architecture.

Pill padding: bumped the user-side `ManualSkillPills` from `py-0.5` to
`py-1` on each chip and added `py-0.5` to the wrapper so the row
breathes a little without feeling tall.

Mid-stream indicator: new `InvokingSkillsIndicator` mirrors the parent
user message's `manualSkills` onto the assistant bubble as transient
"Running X" chips while the real card is in flight. Renders above
`ContentParts` in `MessageParts`. Hides itself when the assistant's
own `content` grows a `skill` tool_call — the authoritative card from
`buildSkillPrimeContentParts.unshift` is showing, so the placeholder
steps aside. No SSE emit, no aggregator injection, no index
collision with the LLM's streaming content: just a render slot keyed
off the parent's field.

Why not stream the cards live: whichever content index we'd choose
either blocks the LLM's text stream (`updateContent` type-mismatch at
index 0) or lands below the response after sparse compaction (index
100+). Mirroring the parent field sidesteps the aggregator entirely
and gives the user an immediate "skill is loading" signal that
naturally gives way to the real card at finalize.

Covers the gap the user flagged: pills on the user message said "I
asked for these" but nothing on the assistant side said "we're
working on it" until the stream finished. 5 new tests for the
indicator: user-msg guard, missing parent-field guard, multi-chip
render, hides-on-card-landing, orphan-parent guard.

* 🔁 fix: Indicator Visibility + Carry Manual Skills Through Regenerate/Edit

Two bugs.

Indicator never rendered: `InvokingSkillsIndicator` looked up the parent
user message via `queryClient.getQueryData([QueryKeys.messages, convoId])`,
but on a new chat the React Query cache is keyed by `"new"` (the URL
`paramId`) until the server assigns a real conversation ID — while
`message.conversationId` on the assistant message is already the server
ID. Lookup missed, `skills.length === 0`, nothing rendered. Switched
to `useChatContext().getMessages()`, which reads from the same
`paramId` the rest of the UI uses, so new-chat and existing-chat cases
both resolve to the correct message list.

Regenerate / save-and-submit dropped manual skills: the compose-time
`pendingManualSkillsByConvoId` atom is drained on the first submit,
so replaying that turn later found an empty atom and sent `manualSkills: []`.
The pills were still on the user bubble, so from the user's point of
view the model was running primed — but the backend saw nothing and
produced an unprimed response.

- Added `overrideManualSkills?: string[]` to `TOptions`. Callers with a
  reference message pass its persisted `manualSkills`; `useChatFunctions.ask`
  uses the override verbatim when present, otherwise falls back to the
  existing drain-or-empty logic.
- `regenerate` in `useChatFunctions` passes `parentMessage.manualSkills`
  — the user message being regenerated has the field persisted by the
  backend, so the second turn primes the same skills as the first.
- `EditMessage.resubmitMessage` covers both edit branches:
  - User-message save-and-submit: forwards the edited message's own
    `manualSkills` so the new sibling turn primes identically.
  - Assistant-response edit: forwards the parent user message's
    `manualSkills` for the same reason.

Indicator test suite converted from `@tanstack/react-query` harness to
a jest-mocked `useChatContext().getMessages()`. 6 tests (was 5), added
a cache-miss case.

* 🧭 fix: Drive Mid-Stream Skill Chips from Submission Atom, Not Message Lookup

Message-ID-keyed lookups kept racing the stream: the user message flips
from its client-side intermediate UUID to the server-assigned ID mid-run,
conversation IDs flip from the URL `paramId="new"` to the real convo
ID on brand-new chats, and the React Query cache splits briefly between
the two. Previous attempts — direct `queryClient.getQueryData` and then
`useChatContext().getMessages()` — each missed a different window.

`TSubmission.manualSkills` is already populated at `ask()` time and the
submission atom (`store.submissionByIndex(index)`) is the single stable
anchor across the whole lifecycle: set once at submit, lives through
every SSE event, cleared when the stream ends. No ID lookups, no cache
timing.

- `InvokingSkillsIndicator` now reads `submissionByIndex(index)` via
  Recoil. Shows chips when:
    • the message is assistant-side,
    • a submission is in flight with non-empty `manualSkills`,
    • the assistant's `parentMessageId` matches
      `submission.userMessage.messageId` (so chips appear only on the
      bubble for the current turn, never on siblings),
    • the assistant's own content doesn't yet carry a `skill`
      tool_call (real card takes over from the server's post-run
      `contentParts.unshift`).
- Drops the `useChatContext().getMessages()` dependency and the
  `useQueryClient` dependency before that. No more lookups by
  conversationId or messageId.

Test suite now mocks `useChatContext` to supply `index: 0` and seeds
the `submissionByIndex(0)` atom via Recoil initializer. 6 cases cover
user-side, no-submission history, empty `manualSkills`, multi-chip
render, hides-on-card-landing, and wrong-turn guard.

* 🌱 fix: Seed Response manualSkills in createdHandler, Indicator Becomes Pure

The mid-stream indicator kept getting wired off state I don't own: first
`queryClient.getQueryData` (raced the new-chat paramId flip), then
`useChatContext().getMessages()` (same cache, same race), then
`useRecoilValue(submissionByIndex)` (pulled every message into the
submission subscription — re-renders all indicators on any submission
change, exactly the "limit hooks in rendering" concern).

Cleanest path is the one the user pointed at: the submission owns the
data, `useSSE` / `useEventHandlers` owns the save points, so seed the
field ONTO the response message at the save site and let the indicator
be a pure prop-read.

- `createdHandler` now writes `manualSkills` onto the initial response
  from `submission.manualSkills` at the moment the placeholder enters
  the messages array. The field rides through the normal mutation
  pipeline via spreads (`useStepHandler` response creation,
  `updateContent` result returns) — no special handling needed.
- `InvokingSkillsIndicator` drops the Recoil / context / queryClient
  reads. Pure function of `message`: if assistant, has `manualSkills`,
  and `content` hasn't grown a `skill` tool_call yet, render chips.
  Only `useLocalize` left, which was already unavoidable for the i18n
  string.
- Renders decouple: no single state change (`submissionByIndex` flip,
  React Query cache update) forces every indicator in the message list
  to re-render anymore. Only the message whose prop changed re-runs.

Finalize story unchanged: server's `responseMessage` doesn't carry the
frontend-only `manualSkills` field, so `finalHandler`'s replacement
drops it — but by then the real `skill` tool_call is in `content`
and the indicator's content-scan hides itself anyway.

Test suite back to pure prop mocks: 7 cases covering user-guard,
no-seed, multi-chip render, skill-card-hide, non-skill-tool-call-keeps,
text-only-keeps, and missing message.

* 🪞 fix: Render Skill Indicator Inside ContentParts, Adjacent to Parts

The indicator still wasn't showing because even though MessageParts
mounted it as a sibling of ContentParts, ContentParts is a `memo`'d
component that owns the only rendering path that refreshes in lockstep
with content deltas. Mounting above it put the indicator one layer
further out — reachable, but not exercised on the same render cycle
that processes the streaming `message` prop.

Moved the indicator into ContentParts itself, rendered at the top of
both the sequential and parallel branches. Reads the `message` prop
(newly threaded through as an optional prop alongside `content`), so:

- Same render cycle as Parts — updates from the SSE pipeline flow
  through the same pathway.
- Lives outside the `content.map`, so delta-driven content reshuffles
  never wipe it.
- Still a pure prop-read inside the indicator itself (no Recoil,
  queryClient, context hooks). The only dep is `useLocalize`.

Thread:
- `ContentPartsProps` gains `message?: TMessage`.
- `MessageParts` passes `message={message}` through, drops its own
  indicator mount + import.
- `ContentParts` renders `<InvokingSkillsIndicator message={message} />`
  in both the parallel-content and sequential-content branches, right
  under `MemoryArtifacts` and before the empty-cursor / parts map.

Companion data flow (unchanged): `createdHandler` seeds
`initialResponse.manualSkills` from `submission.manualSkills`; the
field rides through `useStepHandler` via spreads; indicator hides on
`skill` tool_call landing in `content`.

* 🔎 refactor: Narrow Skill Components to Scalar skills Prop, Kill Memo Churn

Passing the full `message` object into presentational components busts
`React.memo` shallow comparisons every time the message reference changes
for unrelated reasons. Swap to scalar `skills?: string[]` throughout:

- `InvokingSkillsIndicator`: props-only (`skills?: string[]`); visibility
  logic (user-vs-assistant, skill tool_call arrival) now lives in the
  caller so this stays pure presentational.
- `ManualSkillPills`: props-only (`skills?: string[]`).
- `ContentParts`: takes `manualSkills?: string[]` scalar, computes
  `showInvokingSkills` once per render from `manualSkills` + content scan
  for the `skill` tool_call, then mounts the indicator with `skills=`
  prop in both parallel and sequential branches.
- `MessageParts`: passes `manualSkills={message.manualSkills}` through
  to `ContentParts`.
- `Container`: passes `skills={message.manualSkills}` to `ManualSkillPills`.
- Tests updated to exercise the narrowed prop surface.

* 📜 feat: Mid-Stream Skill Cards via SkillCall, Drop Custom Indicator

Instead of a separate `InvokingSkillsIndicator` chip component, render
pending skill placeholders through the existing `SkillCall` renderer —
same component the backend's finalized prime part uses. The loading
visual (`progress < 1` + empty output → pulsing "Running X") and the
completed visual ("Ran X") now come from one source of truth.

`ContentParts` computes `pendingSkillNames` from `manualSkills` minus
any `skill` tool_call already in `content` (dedupe by `args.skillName`
since the synthetic's id differs from the real one). Those names
render through a separate slot ABOVE the Parts iteration — not
prepended to the content array, which would shift React keys on
every downstream streaming text / tool part and force unmount/remount
mid-stream.

When the real prime `tool_call` lands at finalize (backend unshifts to
content[0..]), `collectExistingSkillNames` picks it up, the pending
set empties, and the real part takes over rendering in the Parts
iteration. Layout is identical either way because primes are always
at the top of content.

- `InvokingSkillsIndicator.tsx` + test deleted (no longer referenced)
- `ContentParts.tsx` renders `<SkillCall .../>` directly for pending
  names, mirrors `Part.tsx`'s usage of the same component
- `createdHandler` doc comment updated to reflect the new flow

* ✂️ fix: Render Interim Skill Cards From manualSkills Only, Leave Content Untouched

Previous revision read `content` to de-dupe pending cards against real
`skill` tool_calls, so any optimistic skill part streamed from the
backend would race our placeholder off the screen mid-turn — exactly
the "getting overridden" symptom.

Now: interim `SkillCall` cards are driven purely by the response
message's `manualSkills` field. `content` is never inspected here,
so no backend delta can pull the cards down. The field is now seeded
directly onto the assistant placeholder in `useChatFunctions` (not
only in `createdHandler`) so the cards appear from the first render,
before the `created` SSE event round-trip.

Lifecycle:
- `useChatFunctions` puts `manualSkills` on the freshly-minted
  `initialResponse` — cards render the instant the placeholder lands.
- `createdHandler` keeps its own re-seed (idempotent; safe) so a
  regenerate / save-and-submit flow that hits that path still works.
- `useStepHandler` spread operations preserve the field through every
  content update.
- `finalHandler` replaces the message with the server-backed
  `responseMessage` (no `manualSkills`) — cards disappear, and the
  real `skill` tool_call part in `content` takes over.

ContentParts changes:
- Drop `collectExistingSkillNames` / `parseJsonField` dedupe path.
- `renderPendingSkills` reads only `manualSkills` + `isCreatedByUser`.
- Simpler control flow — one boolean (`hasPendingSkills`) gates the
  early return, one function renders.

* 🩹 fix: Codex Review Resolutions — Localization, Guards, Tests, Docs

Addresses seven findings from comprehensive code review:

Finding 1 (MAJOR) — Document sticky re-priming as intentional
- `buildSkillPrimeContentParts`: expanded doc comment explaining
  synthetic `skill` tool_calls persist and get re-primed on every
  subsequent turn via `extractInvokedSkillsFromPayload` (shape parity
  with model-invoked skills). This matches the UX: the assistant
  skill card is a visible, persistent signal that the skill is active
  for the conversation. Not a bug — called out explicitly so future
  maintainers don't mistake it for one.

Finding 2 (MAJOR) — Add ContentParts render tests
- New `ContentParts.test.tsx` with 7 cases covering the interim skill
  card logic: assistant-only rendering, user-message suppression,
  undefined-content safety, parallel+sequential branch integration,
  progress<1 (pending) state. Child components mocked so the test
  exercises only the branching and prop wiring ContentParts owns.

Finding 3 (MINOR) — Localize hardcoded aria-labels
- Added `com_ui_skills_manual_invoked` + `com_ui_skills_queued` keys.
- Reused existing `com_ui_remove_skill_var` for the remove-button
  aria-label.
- `PendingManualSkillsChips` and `ManualSkillPills` now call
  `useLocalize()`. Test mocks updated to the label-echo pattern.

Finding 4 (MINOR) — Max-length guard in `extractManualSkills`
- New `MAX_SKILL_NAME_LENGTH = 200` constant and filter. Blocks a
  crafted payload like `{ manualSkills: ['a'.repeat(100000)] }` from
  reaching `getSkillByName` / Mongo's query planner.

Finding 5 (NIT) — `BaseClient.js` comment contradicted itself
- Rewrote to call the filter what it is: defense-in-depth on top of
  Mongoose schema validation, not a redundant second layer.

Finding 6 (NIT) — `ManualSkillPills` now wrapped in `React.memo`
- Consistent with peer components (`PendingManualSkillsChips`,
  `ContentParts`). Rendered inside `Container`, which re-renders on
  every content update, so the memo is a real cycle savings.

Finding 7 (NIT) — Redundant guard in `ContentParts.renderPendingSkills`
- Collapsed the duplicate null-check by computing `pendingSkills` as
  a `useMemo`'d array (`[]` when not applicable), and mapping
  directly. `hasPendingSkills` now derives from the array length —
  one source of truth, no redundant gate inside the render function.

* 🔧 fix: Update ParallelContent to Handle Optional Content Prop

Modified the `ParallelContentRendererProps` to make the `content` prop optional, ensuring safer access within the component. Adjusted the calculation of `lastContentIdx` to handle cases where `content` may be undefined, preventing potential runtime errors. This change enhances the robustness of the component when dealing with varying message structures.

* 🎯 fix: Thread manualSkills Through ContentRender — The Real Renderer

This is why the interim skill cards never appeared across many rounds of
iteration: `ContentRender.tsx` (the memo'd renderer used by most paths,
including the agents endpoint) was calling `ContentParts` without the
`manualSkills` prop. Only `MessageParts.tsx` had it wired up — and
that's not the component that actually renders the assistant response
in production.

Two fixes:
1. Pass `manualSkills={msg.manualSkills}` to the `ContentParts` call.
2. Extend the `areContentRenderPropsEqual` memo comparator to include
   `manualSkills.length`, otherwise a message update that adds the
   field (seeded by `useChatFunctions` on the initialResponse) would
   be bailed out by the memo and never re-render.

Verified the two ContentParts call sites are now consistent; Container
usages for `ManualSkillPills` on the user side were already correct.

* 🧹 polish: Address Audit Follow-Up (F1/F3/F6)

F1 — Clarify sticky re-priming opt-out path.
  The previous comment said "regenerate without the pick" as one
  opt-out, but `useChatFunctions.regenerate` forwards the original
  picks via `overrideManualSkills`, so regeneration alone keeps the
  skill sticky. Updated to: edit the originating message to remove
  the pills and resubmit, or start a new conversation.

F3 — Add DOM-order assertions to the parallel + sequential tests.
  The two "alongside" tests verified both elements existed but
  didn't pin the ordering contract. Both now use
  `compareDocumentPosition` to assert the pending SkillCall
  precedes the real content, matching the backend semantic
  (`contentParts.unshift(...primeParts)` puts primes at the top).

F6 — Fix package import order in PendingManualSkillsChips.
  `recoil` (58 chars) was listed before `lucide-react` (45 chars)
  which violates the "shortest to longest after react" rule in
  AGENTS.md. Swapped order; no behavior change.

F2 / F4 / F5 from the audit were confirmed as non-issues
(React-safe empty map, cosmetic test-mock artifact, accepted
memo tradeoff) and require no change.

*  feat: Dedicated PendingSkillCall + Running→Ran Transition on Real Content

UX polish on the interim skill card now that it's actually rendering:

1. New `PendingSkillCall` component (mirrors `SkillCall` visually but
   drops the expand affordance). `SkillCall`'s underlying `ProgressText`
   always renders a chevron + clickable button when any input is
   present, which on a card with empty output points at nothing —
   misleading cursor:pointer and a no-op toggle. The pending variant
   has only the icon + label, no button wrapper, no chevron.

2. "Running X" → "Ran X" transition when real content lands.
   `ContentParts` computes `hasRealContent` (any non-text part, or a
   text part with non-empty content — placeholder empty-text parts
   don't count) and passes `loaded={hasRealContent}` to
   `PendingSkillCall`. Matches what users see for model-invoked skills
   as they finish priming: pulsing shimmer → static icon.

3. Cleanup:
   - Dropped direct `SkillCall` import from `ContentParts` (replaced
     by `PendingSkillCall`). `SkillCall` is still used by `Part` for
     real `skill` tool_call content parts — no behavior change there.
   - Removed the now-redundant explicit `manualSkills` assignment
     in `createdHandler`. `useChatFunctions` seeds the field on
     `initialResponse` at construction, so the `...submission.initialResponse`
     spread already carries it through — the re-assignment was
     defensive belt-and-suspenders doing the same work twice. Comment
     rewritten to describe the actual lifecycle.

Tests updated to the new component (12/12 pass): two new cases pin
the loaded-state transition (unloaded when content has no real parts,
flips to loaded once a non-empty text part lands).
2026-04-25 04:02:00 -04:00
Danny Avila
64ec5f18b8 ⚙️ feat: Skill runtime integration: catalog, tools, execution, file priming (#12649)
* feat: Skill runtime integration — catalog injection, tool registration, execute handler

Wires the @librechat/agents SkillTool primitive into LibreChat's agent runtime:

**Enums:**
- Add `skills` to AgentCapabilities + defaultAgentCapabilities

**Data layer:**
- Add `getSkillByName(name, accessibleIds)` — compound query that
  combines name lookup + ACL check in one findOne

**Agent initialization (packages/api/src/agents/initialize.ts):**
- Accept `accessibleSkillIds` param and `listSkillsByAccess` db method
- Query accessible skills, format catalog via `formatSkillCatalog()`,
  append to `additional_instructions` (appears in agent system prompt)
- Register `SkillToolDefinition` + `createSkillTool()` when catalog
  is non-empty (tool appears in model's tool list)
- Store `accessibleSkillIds` and `skillCount` on InitializedAgent

**Execute handler (packages/api/src/agents/handlers.ts):**
- Add `getSkillByName` to `ToolExecuteOptions`
- `handleSkillToolCall()` intercepts `Constants.SKILL_TOOL`:
  extracts skillName, loads body from DB with ACL check,
  substitutes $ARGUMENTS, returns ToolExecuteResult with
  injectedMessages (skill body as isMeta user message)

**Caller wiring:**
- initialize.js: query skill IDs via findAccessibleResources,
  pass to initializeAgent + store on agentToolContexts,
  add getSkillByName to toolExecuteOptions,
  pass accessibleSkillIds through loadTools configurable
- openai.js + responses.js: same pattern for their flows

Requires @librechat/agents >= 3.1.65 (PR #91 exports).

* feat: Skills toggle in tools menu + backend capability gating

Frontend:
- Add skills?: boolean to TEphemeralAgent type
- Add LAST_SKILLS_TOGGLE_ to LocalStorageKeys for persistence
- Add skillsEnabled to useAgentCapabilities hook
- Add skills useToolToggle to BadgeRowContext with localStorage init
- New Skills.tsx badge component (Scroll icon, cyan theme,
  permission-gated via PermissionTypes.SKILLS)
- Add skills entry to ToolsDropdown with toggle + pin
- Render Skills badge in BadgeRow ephemeral section

Backend:
- Extract injectSkillCatalog() into packages/api/src/agents/skills.ts
  (reduces initializeAgent module size, reusable helper)
- initializeAgent delegates to helper instead of inline block
- Capability-gate the findAccessibleResources query:
  - Agents endpoint: checks AgentCapabilities.skills in admin config
  - OpenAI/Responses controllers: checks ephemeralAgent.skills toggle
- ACL query runs once per run, result shared across all agents

* refactor: remove createSkillTool() instance from injectSkillCatalog

SkillTool is event-driven only. The tool definition in toolDefinitions
is sufficient for the LLM to see the tool schema. No tool instance is
needed since the host handler intercepts via ON_TOOL_EXECUTE before
tool.invoke() is ever called.

Removes tools from InjectSkillCatalogParams/Result, drops the
createSkillTool import.

* feat: skill file priming, bash tool, and invoked skills state

Multi-file skill support:
- New primeSkillFiles() helper (packages/api/src/agents/skillFiles.ts)
  uploads skill files + SKILL.md body to code execution environment
- handleSkillToolCall primes files on invocation when skill.fileCount > 0,
  returns session info as artifact so ToolNode stores the session
- Skill-primed files available to subsequent bash/code tool calls

Bash tool auto-registration:
- BashExecutionToolDefinition added alongside SkillToolDefinition when
  skills are enabled, giving the model a bash tool for running scripts

Conversation state:
- Add invokedSkillIds field to conversation schema (Mongoose + Zod)
- handleSkillToolCall updates conversation with $addToSet on success
- Enables re-priming skill files on subsequent runs (future)

Dependency wiring:
- Pass listSkillFiles, getStrategyFunctions, uploadCodeEnvFile,
  updateConversation through ToolExecuteOptions
- Pass req and codeApiKey through mergedConfigurable
- All three controller entry points wired (initialize.js, openai.js,
  responses.js)

* fix: load bash_tool instance in loadToolsForExecution, remove file listing

- Add createBashExecutionTool to loadToolsForExecution alongside PTC/ToolSearch
  pattern: loads CODE_API_KEY, creates bash tool instance on demand
- Add BASH_TOOL and SKILL_TOOL to specialToolNames set so they don't go
  through the generic loadTools path (bash is created here, skill is
  intercepted in handler before tool.invoke)
- Remove file name listing from skill content text — it's the skill
  author's responsibility to disclose files in SKILL.md, not the framework

* feat: batch upload for skill files, replace sequential uploads

- Add batchUploadCodeEnvFiles() to crud.js: single POST to /upload/batch
  with all files in one multipart request, returns shared session_id
- Rewrite primeSkillFiles to collect all streams (SKILL.md + bundled files)
  then do one batch upload instead of N sequential uploads
- Replace uploadCodeEnvFile with batchUploadCodeEnvFiles across all callers
  (handlers.ts, initialize.js, openai.js, responses.js)

* refactor: remove invokedSkillIds from conversation schema

Skills aren't re-loaded between runs, so conversation-level state for
invoked skills doesn't help. Skill state will live on messages instead
(like tool_search discoveredTools and summaries), enabling in-place
re-injection on follow-up runs.

Removes invokedSkillIds from: convo Mongoose schema, IConversation
interface, Zod schema, ToolExecuteOptions.updateConversation, and
all three caller wiring points.

* feat: smart skill file re-priming with session freshness checking

Schema:
- Add codeEnvIdentifier field to ISkillFile (type + Mongoose schema)
- Add updateSkillFileCodeEnvIds batch method (uses tenantSafeBulkWrite)
- Export checkIfActive from Code/process.js

Extraction:
- Add extractInvokedSkillsFromHistory() to run.ts — scans message
  history for AIMessage tool_calls where name === 'skill', extracts
  skillName args. Follows same pattern as extractDiscoveredToolsFromHistory.

Smart re-priming in primeSkillFiles:
- Before batch uploading, checks if existing codeEnvIdentifiers are
  still active via getSessionInfo + checkIfActive (23h threshold)
- If session is still active, returns cached references (zero uploads)
- If stale or missing, batch-uploads everything and persists new
  identifiers on SkillFile documents (fire-and-forget)
- Single session check covers all files (batch shares one session_id)

Wiring:
- Pass getSessionInfo, checkIfActive, updateSkillFileCodeEnvIds
  through ToolExecuteOptions and all three controller entry points

* feat: wire skill file re-priming at run start via initialSessions

Flow:
1. initialize.js creates primeInvokedSkills callback with all deps
2. client.js calls it with message history before createRun
3. extractInvokedSkillsFromHistory scans for skill tool calls
4. For each invoked skill with files, primeSkillFiles uploads/checks
5. Returns initialSessions map passed to createRun
6. createRun passes initialSessions to Run.create (via RunConfig)
7. Run constructor seeds Graph.sessions, making skill files available
   to subsequent bash/code tool calls via ToolNode session injection

Requires @librechat/agents with initialSessions on RunConfig (PR #94).

* refactor: use CODE_EXECUTION_TOOLS set for code tool checks

Import CODE_EXECUTION_TOOLS from @librechat/agents and replace inline
constant checks in handlers.ts and callbacks.js. Fixes missing bash
tool coverage in the session context injection (handlers.ts) and code
output processing (callbacks.js).

* refactor: move primeInvokedSkills to packages/api, add skill body re-injection

Moves primeInvokedSkills from an inline closure in initialize.js (with
dynamic requires) to a proper exported function in packages/api
skillFiles.ts with explicit typed dependencies.

Key changes:
- primeInvokedSkills now returns both initialSessions (for file priming)
  AND injectedMessages (skill bodies for context continuity)
- createRun accepts invokedSkillMessages and appends skill bodies to
  systemContent so the model retains skill instructions across runs
- initialize.js calls the packaged function with all deps passed explicitly
- client.js passes both initialSessions and injectedMessages to createRun

* fix: move dynamic requires to top-level module imports

Move primeInvokedSkills, getStrategyFunctions, batchUploadCodeEnvFiles,
getSessionInfo, and checkIfActive from inline requires to top-level
module requires where they belong.

* refactor: skill body reconstruction via formatAgentMessages, not systemContent

Replaces the lazy systemContent approach with proper message-level
reconstruction:

SDK (formatAgentMessages):
- New invokedSkillBodies param (Map<string, string>)
- Reconstructs HumanMessages after skill ToolMessages at the correct
  position in the message sequence, matching where ToolNode originally
  injected them

LibreChat:
- extractInvokedSkillsFromPayload replaces extractInvokedSkillsFromHistory
  (works with raw TPayload before formatAgentMessages, not BaseMessage[])
- primeInvokedSkills now takes payload instead of messages, returns
  skillBodies Map instead of injectedMessages
- client.js calls primeInvokedSkills BEFORE formatAgentMessages, passes
  skillBodies through as the 4th param
- Removed invokedSkillMessages from createRun (no more systemContent hack)
- Single-pass: skill detection happens inside formatAgentMessages' existing
  tool_call processing loop, zero extra message iterations

* refactor: rename skillBodies to skills for consistency with SDK param

* refactor: move auth loading into primeInvokedSkills, pass loadAuthValues as dep

The payload/accessibleSkillIds guard and CODE_API_KEY loading now live
inside primeInvokedSkills (packages/api) rather than in the CJS caller.
initialize.js passes loadAuthValues as a dependency and the callback
is only created when skillsCapabilityEnabled.

* feat: ReadFile tool + conditional bash registration + skill path namespacing

ReadFile tool (read_file):
- General-purpose file reader, event-driven (ON_TOOL_EXECUTE)
- Schema: { file_path: string } — "{skillName}/{path}" convention
- handleReadFileCall: resolves skill name from path, ACL check, reads
  from DB cache or storage, binary detection, size limits (256KB),
  lazy caching (512KB), line numbers in output
- SKILL.md special case: reads skill.body directly
- Dispatched alongside SKILL_TOOL in createToolExecuteHandler
- Added to specialToolNames in ToolService

Conditional tool registration:
- ReadFile + SkillTool: always registered when skills enabled
- BashTool: only registered when codeEnvAvailable === true
- codeEnvAvailable passed through InitializeAgentParams from caller

Skill file path namespacing:
- primeSkillFiles now uploads as "{skillName}/SKILL.md" and
  "{skillName}/{relativePath}" instead of flat names
- Prevents file collisions when multiple skills are invoked

Wiring:
- getSkillFileByPath + updateSkillFileContent passed through
  ToolExecuteOptions in all three callers

* feat: return images/PDFs as artifacts from read_file, tighten caching

Binary artifact support:
- Images (png, jpeg, gif, webp) returned as base64 in artifact.content
  with type: 'image_url', processed by existing callback attachment flow
- PDFs returned as base64 artifact similarly
- Binary size limit: 10MB (MAX_BINARY_BYTES)
- Other binary files still return metadata + bash fallback

Caching:
- Text cached only on first read (file.content == null check)
- Binary flag cached only on first detection (file.isBinary == null)
- Skill files are immutable; no redundant cache writes

Registration:
- ReadFileToolDefinition now includes responseFormat: 'content_and_artifact'

* chore: update @librechat/agents to version 3.1.66-dev.0 and add peer dependencies in package-lock.json and package.json files

* fix: resolve review findings #1,#2,#4,#5,#6,#10,#13

Critical:
- #1: primeInvokedSkills now accumulates files across all skills into
  one session entry instead of overwriting. Parallel processing via
  Promise.allSettled.
- #2: codeEnvAvailable now computed and passed in openai.js and
  responses.js (was missing, bash tool never registered in those flows)

Major:
- #4: relativePath in updateSkillFileCodeEnvIds now strips the
  {skillName}/ prefix to match SkillFile documents. SKILL.md filter
  uses endsWith instead of exact match.
- #5: File priming guarded on apiKey being non-empty (skip when not
  configured instead of failing with auth error)
- #6: Skills processed in parallel via Promise.allSettled instead of
  sequential for-of loop

Minor:
- #10: Use top-level imports in initialize.js instead of inline requires
- #13: Log warning when skill catalog reaches the 100-skill limit

* fix: resolve followup review findings N1,N2,N4

N1 (CRITICAL): Wire skill deps into responses.js non-streaming path.
Was completely missing getSkillByName, file strategy functions, etc.

N2 (MAJOR): Single batch upload for ALL skills' files. Resolves skills
in parallel (Phase 1), then collects all file streams across skills
and does ONE batchUploadCodeEnvFiles call (Phase 2). All files share
one session_id, eliminating cross-session isolation issues.

N4 (MINOR): Move inline require() to top-level in openai.js and
responses.js, consistent with initialize.js.

* fix: add mocks for new file strategy imports in controller tests

* fix: restore session freshness check, parallelize file lookups, add warnings

R1: Re-add session freshness check before batch upload. Checks any
existing codeEnvIdentifier via getSessionInfo + checkIfActive. If the
session is still active (23h window), returns cached file references
with zero re-uploads.

R2: listSkillFiles calls parallelized via Promise.all (were sequential
in the for-of loop).

R3: Log warning when skill record lookup fails during identifier
persistence (was a silent empty-string fallback).

* fix: guard freshness cache on single-session consistency

* fix: multi-session freshness check (code env handles mixed sessions natively)

The code execution environment fetches each file by its own
{session_id, fileId} pair independently — no single-session
requirement. Removed the sessionIds.size === 1 guard.

Now checks ALL distinct sessions for freshness. If every session
is still active (23h window), returns cached references with per-file
session_ids preserved. If any session expired, falls through to
re-upload everything in a single batch.

* perf: parallelize session freshness checks via Promise.all

* fix: add optional chaining for session info retrieval in primeInvokedSkills

Updated the primeInvokedSkills function to use optional chaining for getSessionInfo and checkIfActive methods, ensuring safer access and preventing potential runtime errors when these methods are undefined.

* fix: address review findings #1-#9 + Codex P1/P2 + session probe

Critical:
- #1/Codex P1: Add codeApiKey loading to openai.js and responses.js
  loadTools configurable (was missing, file priming broken in 2/3 paths)
- Codex P1: Fix cached file name prefix in primeSkillFiles cache path
  (was sf.relativePath, now ${skill.name}/${sf.relativePath})

Major:
- Codex P2: Honor ephemeral skills toggle in agents endpoint
  (check ephemeralAgent?.skills !== false alongside admin capability)
- #4: Early size check using file.bytes from DB before streaming
  (prevents full-file buffer for oversized files)

Minor:
- #5: Replace Record<string, any> with Record<string, boolean | string>
- #6: Localize Pin/Unpin aria-labels with com_ui_pin/com_ui_unpin
- #8: Parallelize stream acquisition in primeSkillFiles via
  Promise.allSettled
- #9: Log warning for partial batch upload failures with filenames

Performance:
- Session probe optimization: getSessionInfo now hits per-object
  endpoint (GET /sessions/{sid}/objects/{fid}) instead of listing
  entire session (GET /files/{sid}?detail=summary). O(1) stat vs
  O(N) list + linear scan.

* refactor: extract shared skill wiring helper + add unit tests

DRY (#3):
- New skillDeps.js exports getSkillToolDeps() with all 9 skill-related
  deps (getSkillByName, listSkillFiles, getStrategyFunctions, etc.)
- Replaces 5 identical copy-paste blocks across initialize.js, openai.js,
  responses.js (streaming + non-streaming paths)
- One place to maintain when skill deps change

Tests (#2):
- 8 unit tests for extractInvokedSkillsFromPayload covering:
  string args, object args, missing skill tool_calls, non-assistant
  messages, malformed JSON, empty skillName, empty payload, dedup

* fix: remove @jest/globals import, use global jest env

* fix: resolve round 2 review findings R2-1 through R2-7

R2-1 (toggle semantics): openai.js + responses.js now check admin
  capability (AgentCapabilities.skills) alongside ephemeral toggle.
  Aligns with initialize.js.

R2-2 (swallowed error): primeInvokedSkills now logs
  updateSkillFileCodeEnvIds failures (was .catch(() => {}))

R2-4 (test cast): Record<string, string> → Record<string, unknown>

R2-5 (DRY regression): Extract enrichWithSkillConfigurable() into
  skillDeps.js. Replaces 4 identical loadAuthValues blocks.
  Each loadTools callback is now a one-liner. JSDoc added (R2-6).

R2-7 (sequential streams): primeInvokedSkills now uses
  Promise.allSettled for parallel stream acquisition.

* fix: require explicit skills toggle + treat partial cache as miss

- initialize.js: change ephemeralSkillsToggle !== false to === true
  (unset toggle no longer enables skills)
- primeSkillFiles cache: require ALL files to have codeEnvIdentifier
  before using cache (partial persistence = cache miss = re-upload)
- primeInvokedSkills cache: same check (allFilesWithIds.length must
  equal total file count)

* fix: pass entity_id=skillId on batch upload, eliminates per-user cache thrashing

primeSkillFiles now passes entity_id: skill._id.toString() to
batchUploadCodeEnvFiles. This scopes the code env session to the
skill, not the user. All users sharing a skill share the same
uploaded files — no more cache thrashing from overwriting each
other's codeEnvIdentifier.

The stored codeEnvIdentifier now includes ?entity_id= suffix so
freshness checks pass the entity_id through to the per-object
stat endpoint. Both primeSkillFiles and primeInvokedSkills
store consistent identifier formats.

* fix: pass entity_id on multi-skill batch upload, consistent identifier format

* Revert "fix: pass entity_id on multi-skill batch upload, consistent identifier format"

This reverts commit c85ce2161e.

* refactor: per-skill upload in primeInvokedSkills, eliminate multi-skill batch

Replace the monolithic multi-skill batch upload with per-skill
primeSkillFiles calls. Each skill gets its own session with
entity_id=skillId, ensuring:

- Correct session auth (entity_id matches on freshness checks)
- Per-skill freshness caching (only expired skills re-upload)
- Shared skill sessions work across users (same entity_id=skillId)
- Code env handles mixed session_ids natively

The big batch block (stream collection, single upload, identifier
mapping) is replaced by a simple loop over primeSkillFiles, which
already handles freshness caching, batch upload, and identifier
persistence per-skill.

* fix: resolve review findings #1,#3-5,#7,#9-11

Critical:
- #1: Strip ?entity_id= query string before splitting codeEnvIdentifier
  into session_id/fileId (was corrupting cached file IDs in 4 locations)

Major:
- #4: Parallelize per-skill primeSkillFiles via Promise.allSettled
- #5: Add logger.warn to all empty .catch(() => {}) on cache writes

Minor:
- #7: Add logger.debug to enrichWithSkillConfigurable catch block
- #9: Use error instanceof Error guard in batchUploadCodeEnvFiles
- #10: Move enrichWithSkillConfigurable to TypeScript in packages/api
  (skillConfigurable.ts), skillDeps.js wraps with loadAuthValues dep
- #11: Reduce MAX_BINARY_BYTES from 10MB to 5MB (~11.5MB peak with b64)

* fix: forward entity_id in session probe + always register bash tool

Codex P2 (entity_id in probe): getSessionInfo now preserves and
forwards query params (including entity_id) to the per-object stat
endpoint. Without this, identifiers stored as ...?entity_id=... would
fail auth checks because the entity_id scope was dropped.

Codex P2 (bash tool availability): Remove codeEnvAvailable gate from
injectSkillCatalog. Bash tool definition is now always registered when
skills are enabled. Actual tool instance creation still happens at
execution time in loadToolsForExecution (which loads per-user
credentials). This ensures users with per-user CODE_API_KEY get
bash without requiring a global env var at init time.

Removes codeEnvAvailable from InjectSkillCatalogParams,
InitializeAgentParams, and all three controller entry points.

* fix: add debug logging to primeInvokedSkills catch, rename export alias

* fix: stub bash tool when no key + remove PDF artifact path

Codex P1 (bash tool): When CODE_API_KEY is unavailable, create a stub
tool that returns "Code execution is not available. Use read_file
instead." This prevents "tool not found" errors from the model
repeatedly calling bash_tool in no-code-env deployments while still
registering the definition for per-user credential users.

Codex P2 (PDF artifacts): Remove PDF image_url artifact path. The
host artifact pipeline processes image_url via saveBase64Image which
fails for PDFs. PDFs now fall through to the generic binary handler
("Use bash to process"). TODO comment for future document artifact
support.

Also: isImageOrPdf → isImage in early size checks (PDFs are no
longer treated as artifact candidates).

* fix: remove dead PDF_MIME constant, hoist skillToolDeps, document session_id

- #7: Remove unused PDF_MIME constant (dead code after PDF artifact removal)
- #11: Hoist skillToolDeps to module-level constant (avoid per-call allocation)
- #6: Document that CodeSessionContext.session_id is a representative value;
  ToolNode uses per-file session_id from the files array

* fix: call toolEndCallback for skill/read_file artifacts + clear codeEnvIdentifier on re-upload

Codex P1 (toolEndCallback bypass): skill and read_file handler branches
returned early, bypassing the toolEndCallback that processes artifacts
(image attachments). Now calls toolEndCallback when the result has an
artifact, using the same metadata pattern as the normal tool.invoke path.

Codex P1 (stale identifiers): upsertSkillFile now $unset's
codeEnvIdentifier alongside content and isBinary when a file is
re-uploaded. Prevents the freshness cache from returning references
to old file content after a skill file is replaced.

* fix: add session_id comment at cached path, rename skillResult to handlerResult

* fix: return content_and_artifact from bash stub so result.content is populated

* fix: deterministic skill lookup, dedup warning, and multi-session freshness check

- getSkillByName: add sort({updatedAt:-1}) so name collisions resolve
  deterministically to the most recently updated skill
- injectSkillCatalog: warn when multiple accessible skills share a name
- primeSkillFiles: check ALL distinct sessions for freshness, not just
  the first file's session, preventing stale refs after partial bulkWrite

* refactor: update icon import in Skills component

- Replaced the Scroll icon with ScrollText in the Skills component for improved clarity and consistency in the UI.

* fix: SKILL.md cache parity, gate bash_tool on code env, fix read_file too-large message

- primeSkillFiles: filter SKILL.md from returned files array on fresh
  upload so cached and non-cached paths return identical file sets
  (SKILL.md is still on disk in the session for bash access)
- injectSkillCatalog: only register bash_tool when codeEnvAvailable is
  true; thread the flag from all three CJS callers via execute_code
  capability check
- handleReadFileCall: tell the model to invoke the skill first before
  suggesting /mnt/data paths for oversized files

* fix: use EnvVar constant, deduplicate auth lookup, validate batch upload, stream byte limit

- Replace hardcoded 'LIBRECHAT_CODE_API_KEY' with EnvVar.CODE_API_KEY
  in skillConfigurable.ts and skillFiles.ts
- Resolve code API key once at run start in initialize.js and pass to
  both primeInvokedSkills and enrichWithSkillConfigurable via optional
  preResolvedCodeApiKey param, eliminating redundant loadAuthValues calls
- Add response structure validation in batchUploadCodeEnvFiles before
  accessing session_id/files to surface unexpected responses early
- Add streaming byte counter in handleReadFileCall that aborts and
  destroys the stream when accumulated bytes exceed MAX_BINARY_BYTES,
  preventing full file buffering when DB metadata is inaccurate

* refactor: update icon import in ToolsDropdown component

- Replaced the Scroll icon with ScrollText in the ToolsDropdown component for improved clarity and consistency in the UI.

* fix: partial upload failure detection, EnvVar in initialize.js, declaration ordering

- primeSkillFiles: return null (failure) when batch upload partially
  succeeds — missing bundled files would cause runtime bash/read
  failures with missing paths in code env
- initialize.js: replace hardcoded 'LIBRECHAT_CODE_API_KEY' with
  EnvVar.CODE_API_KEY imported from @librechat/agents
- initialize.js: move enabledCapabilities, accessibleSkillIds, and
  codeApiKey declarations before the toolExecuteOptions closure that
  references them (eliminates reliance on temporal dead zone hoisting)
2026-04-25 04:02:00 -04:00
Danny Avila
8e866e6010
🗺️ fix: Resolve Custom-Endpoint Providers for Summarization (#12739)
* 🔧 fix: resolve custom-endpoint providers for summarization

When `summarization.provider` in `librechat.yaml` is set to a custom-endpoint
name (e.g. `Ollama`), the string was passed verbatim to the agents SDK, which
only knows a fixed set of provider names and threw
`Unsupported LLM provider: Ollama`.

Before shaping the summarization config for the SDK, resolve the provider
through `getProviderConfig`: custom-endpoint labels are remapped to the
underlying SDK provider (e.g. `openAI`) and the endpoint's baseURL/apiKey are
injected into `parameters` so the summarization model reaches the right
backend, even when summarization targets a different custom endpoint than the
main agent.

Unknown names and names that appear with no matching endpoint flow through
unchanged so the SDK can surface a clear error. User-provided credentials and
unresolved env-var references are skipped rather than forwarded, letting the
SDK's self-summarize path reuse the agent's own clientOptions.

Ref: LibreChat Discussion #12614

* address: widen unresolved-env-var guard, fix test naming

- Reject summarization overrides when the extracted baseURL/apiKey still
  contains any `${...}` placeholder, including prefix/suffix patterns like
  `https://${UNSET}.example.com` that `envVarRegex` (exact-match) missed.
- Rename the "case-insensitive" test to reflect that only `Ollama` is
  normalized via `normalizeEndpointName`; add coverage proving other
  custom-endpoint names match case-sensitively.

* address: use req.config in responses.js; forward full endpoint options

- `responses.js` relied on a module-level `appConfig` set via `setAppConfig`,
  which is never called anywhere. Use `req.config` directly so the
  summarization provider resolver actually runs on the responses route.
- Route the custom endpoint config through `getOpenAIConfig` so summarization
  inherits the same `headers`, `defaultQuery`, `addParams`/`dropParams`, and
  `customParams` transforms (Anthropic/Google/etc.) that `initializeCustom`
  applies for the main agent flow. Strip the stale `model`/`modelName`
  defaults so `summarization.model` still wins.

* address: skip overrides when summarization matches agent endpoint

When `summarization.provider` resolves to the same custom endpoint as the
main agent, rely on the SDK's self-summarize path (which reuses
`agentContext.clientOptions` unchanged) rather than injecting overrides.
Otherwise the shallow spread of `clientOverrides.configuration` would
replace the agent's request-resolved state (dynamic headers, proxy/fetch
options) with yaml-only config.

Only applies when summarization targets a *different* endpoint from the
agent; the yaml config is all we have in that case, so overrides still
flow through.

* address: preserve raw provider when overrides cannot be built

When summarization points at a different custom endpoint than the agent
and we can't resolve the endpoint's credentials (user_provided, or a
still-unresolved `${VAR}` after env extraction), remapping to `openAI`
without overrides would silently route summaries to the default OpenAI
client. Preserve the raw provider name so the SDK raises a clear
"Unsupported LLM provider" error (now also logged, via the agents SDK
defense-in-depth fix) instead of sending traffic to the wrong backend.

* address: resolve endpoint headers and forward PROXY to summarization

- Custom-endpoint `headers` now flow through `resolveHeaders` before
  reaching `getOpenAIConfig`, matching the main agent path. This ensures
  templated values like `\${PORTKEY_API_KEY}` or `{{LIBRECHAT_BODY_...}}`
  are substituted for summarization requests instead of being forwarded
  literally.
- `PROXY` env var is now passed into `getOpenAIConfig` so cross-endpoint
  summarization honors outbound proxy dispatchers configured for the rest
  of the deployment.

* address: user summarization parameters win over endpoint defaults

Flip the merge order so `summarization.parameters` from yaml override
`clientOverrides` defaults (which come from `getOpenAIConfig` and always
include `streaming: true` etc.). A user who sets `parameters.streaming:
false` in their config should still see non-streaming summarization for
providers that require it.

* address: review feedback (logging, dead code, DRY, types, deep-merge)

- Log error in the resolveSummarizationProvider catch-all so programming
  bugs in getProviderConfig/getOpenAIConfig/resolveHeaders surface in
  operator logs instead of falling through silently.
- Drop dead `setAppConfig`/`appConfig` infrastructure in responses.js and
  fix adjacent `allowedProviders` reference that also relied on the
  never-initialized module-level appConfig. Uses `req.config` directly.
- Import canonical `normalizeEndpointName` from librechat-data-provider
  instead of duplicating it locally.
- Replace `SummarizationClientOverrides = Record<string, unknown>` with
  an explicit interface covering the known fields.
- Deep-merge `configuration` when user-supplied `summarization.parameters.
  configuration` overlaps the resolved endpoint configuration, so user
  additions (e.g. `defaultQuery`) don't wipe out `baseURL`/`defaultHeaders`.
- Wrap `process.env` mutations in test in `try/finally` so a failed
  assertion doesn't leak env state into subsequent tests.
- Drop `as unknown as AppConfig` in test helper; fixture now matches the
  `AppConfig` shape directly using a `Partial<TEndpoint>` union.
- Trim JSDoc that restated the name it was attached to.

* address: review nits — import order, local test type, conflict test

- Move `import { logger }` up into the package value-imports section so
  it no longer sits between `import type` blocks.
- Replace `as unknown as SummarizationConfig['parameters']` in the
  deep-merge test with a named `TestSummarizationParameters` type and a
  single narrowing cast at the call site, making intent explicit.
- Add a test proving that user-supplied `configuration.baseURL` wins
  over the resolved endpoint baseURL, locking in the deep-merge's
  user-wins-on-conflict semantics that the previous suite only exercised
  additively.
2026-04-20 12:00:46 -04:00
kojinseok-del
edd4c6d60c
💎 fix: Handle usage_metadata in Title Transaction for Gemini Models (#12386)
* fix: handle `usage_metadata` in title transaction for Gemini models

Gemini models return token usage via `usage_metadata` instead of `usage`
or `tokenUsage`. The `collectedUsage` mapping in `titleConvo` only
handled the latter two, causing title generation transactions to be
silently skipped for Gemini. Adds an `else if (item.usage_metadata)`
branch to extract `input_tokens`/`output_tokens`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove trailing whitespace in usage_metadata handler

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 10:09:37 -04:00
Danny Avila
cb41ba14b2
🔁 fix: Pass recursionLimit to OpenAI-Compatible Agents API Endpoint (#12510)
* fix: pass recursionLimit to processStream in OpenAI-compatible agents API

The OpenAI-compatible endpoint never passed recursionLimit to LangGraph's
processStream(), silently capping all API-based agent calls at the default
25 steps. Mirror the 3-step cascade already used by the UI path (client.js):
yaml config default → per-agent DB override → max cap.

* refactor: extract resolveRecursionLimit into shared utility

Extract the 3-step recursion limit cascade into a shared
resolveRecursionLimit() function in @librechat/api. Both openai.js and
client.js now call this single source of truth.

Also fixes falsy-guard edge cases where recursion_limit=0 or
maxRecursionLimit=0 would silently misbehave, by using explicit
typeof + positive checks.

Includes unit tests covering all cascade branches and edge cases.

* refactor: use resolveRecursionLimit in openai.js and client.js

Replace duplicated cascade logic in both controllers with the shared
resolveRecursionLimit() utility from @librechat/api.

In openai.js: hoist agentsEConfig to avoid double property walk,
remove displaced comment, add integration test assertions.

In client.js: remove inline cascade that was overriding config
after initial assignment.

* fix: hoist processStream mock for test accessibility

The processStream mock was created inline inside mockResolvedValue,
making it inaccessible via createRun.mock.results (which returns
the Promise, not the resolved value). Hoist it to a module-level
variable so tests can assert on it directly.

* test: improve test isolation and boundary coverage

Use mockReturnValueOnce instead of mockReturnValue to prevent mock
leaking across test boundaries. Add boundary tests for downward
agent override and exact-match maxRecursionLimit.
2026-04-01 21:13:07 -04:00
Danny Avila
935288f841
🏗️ feat: 3-Tier MCP Server Architecture with Config-Source Lazy Init (#12435)
* feat: add MCPServerSource type, tenantMcpPolicy schema, and source-based dbSourced wiring

- Add `tenantMcpPolicy` to `mcpSettings` in YAML config schema with
  `enabled`, `maxServersPerTenant`, `allowedTransports`, and `allowedDomains`
- Add `MCPServerSource` type ('yaml' | 'config' | 'user') and `source`
  field to `ParsedServerConfig`
- Change `dbSourced` determination from `!!config.dbId` to
  `config.source === 'user'` across MCPManager, ConnectionsRepository,
  UserConnectionManager, and MCPServerInspector
- Set `source: 'user'` on all DB-sourced servers in ServerConfigsDB

* feat: three-layer MCPServersRegistry with config cache and lazy init

- Add `configCacheRepo` as third repository layer between YAML cache and
  DB for admin-defined config-source MCP servers
- Implement `ensureConfigServers()` that identifies config-override servers
  from resolved `getAppConfig()` mcpConfig, lazily inspects them, and
  caches parsed configs with `source: 'config'`
- Add `lazyInitConfigServer()` with timeout, stub-on-failure, and
  concurrent-init deduplication via `pendingConfigInits` map
- Extend `getAllServerConfigs()` with optional `configServers` param for
  three-way merge: YAML → Config → User
- Add `getServerConfig()` lookup through config cache layer
- Add `invalidateConfigCache()` for clearing config-source inspection
  results on admin config mutations
- Tag `source: 'yaml'` on CACHE-stored servers and `source: 'user'` on
  DB-stored servers in `addServer()` and `addServerStub()`

* feat: wire tenant context into MCP controllers, services, and cache invalidation

- Resolve config-source servers via `getAppConfig({ role, tenantId })`
  in `getMCPTools()` and `getMCPServersList()` controllers
- Pass `ensureConfigServers()` results through `getAllServerConfigs()`
  for three-way merge of YAML + Config + User servers
- Add tenant/role context to `getMCPSetupData()` and connection status
  routes via `getTenantId()` from ALS
- Add `clearMcpConfigCache()` to `invalidateConfigCaches()` so admin
  config mutations trigger re-inspection of config-source MCP servers

* feat: enforce tenantMcpPolicy on admin config mcpServers mutations

- Add `validateMcpServerPolicy()` helper that checks mcpServers against
  operator-defined `tenantMcpPolicy` (enabled, maxServersPerTenant,
  allowedTransports, allowedDomains)
- Wire validation into `upsertConfigOverrides` and `patchConfigField`
  handlers — rejects with 403 when policy is violated
- Infer transport type from config shape (command → stdio, url protocol
  → websocket/sse, type field → streamable-http)
- Validate server domains against policy allowlist when configured

* revert: remove tenantMcpPolicy schema and enforcement

The existing admin config CRUD routes already provide the mechanism
for granular MCP server prepopulation (groups, roles, users). The
tenantMcpPolicy gating adds unnecessary complexity that can be
revisited if needed in the future.

- Remove tenantMcpPolicy from mcpSettings Zod schema
- Remove validateMcpServerPolicy helper and TenantMcpPolicy interface
- Remove policy enforcement from upsertConfigOverrides and
  patchConfigField handlers

* test: update test assertions for source field and config-server wiring

- Use objectContaining in MCPServersRegistry reset test to account for
  new source: 'yaml' field on CACHE-stored configs
- Add getTenantId and ensureConfigServers mocks to MCP route tests
- Add getAppConfig mock to route test Config service mock
- Update getMCPSetupData assertion to expect second options argument
- Update getAllServerConfigs assertions for new configServers parameter

* fix: disconnect active connections when config-source servers are evicted

When admin config overrides change and config-source MCP servers are
removed, the invalidation now proactively disconnects active connections
for evicted servers instead of leaving them lingering until timeout.

- Return evicted server names from invalidateConfigCache()
- Disconnect app-level connections for evicted servers in
  clearMcpConfigCache() via MCPManager.appConnections.disconnect()

* fix: address code review findings (CRITICAL, MAJOR, MINOR)

CRITICAL fixes:
- Scope configCacheRepo keys by config content hash to prevent
  cross-tenant cache poisoning when two tenants define the same
  server name with different configurations
- Change dbSourced checks from `source === 'user'` to
  `source !== 'yaml' && source !== 'config'` so undefined source
  (pre-upgrade cached configs) fails closed to restricted mode

MAJOR fixes:
- Derive OAuth servers from already-computed mcpConfig instead of
  calling getOAuthServers() separately — config-source OAuth servers
  are now properly detected
- Add parseInt radix (10) and NaN guard with fallback to 30_000
  for CONFIG_SERVER_INIT_TIMEOUT_MS
- Add CONFIG_CACHE_NAMESPACE to aggregate-key branch in
  ServerConfigsCacheFactory to avoid SCAN-based Redis stalls
- Remove `if (role || tenantId)` guard in getMCPSetupData — config
  servers now always resolve regardless of tenant context

MINOR fixes:
- Extract resolveAllMcpConfigs() helper in mcp controller to
  eliminate 3x copy-pasted config resolution boilerplate
- Distinguish "not initialized" from real errors in
  clearMcpConfigCache — log actual failures instead of swallowing
- Remove narrative inline comments per style guide
- Remove dead try/catch inside Promise.allSettled in
  ensureConfigServers (inner method never throws)
- Memoize YAML server names to avoid repeated cacheConfigsRepo.getAll()
  calls per request

Test updates:
- Add ensureConfigServers mock to registry test fixtures
- Update getMCPSetupData assertions for inline OAuth derivation

* fix: address code review findings (CRITICAL, MAJOR, MINOR)

CRITICAL fixes:
- Break circular dependency: move CONFIG_CACHE_NAMESPACE from
  MCPServersRegistry to ServerConfigsCacheFactory
- Fix dbSourced fail-closed: use source field when present, fall back to
  legacy dbId check when absent (backward-compatible with pre-upgrade
  cached configs that lack source field)

MAJOR fixes:
- Add CONFIG_CACHE_NAMESPACE to aggregate-key set in
  ServerConfigsCacheFactory to avoid SCAN-based Redis stalls
- Add comprehensive test suite (ensureConfigServers.test.ts, 18 tests)
  covering lazy init, stub-on-failure, cross-tenant isolation via config
  hash keys, concurrent deduplication, merge order, and cache invalidation

MINOR fixes:
- Update MCPServerInspector test assertion for dbSourced change

* fix: restore getServerConfig lookup for config-source servers (NEW-1)

Add configNameToKey map that indexes server name → hash-based cache key
for O(1) lookup by name in getServerConfig. This restores the config
cache layer that was dropped when hash-based keys were introduced.

Without this fix, config-source servers appeared in tool listings
(via getAllServerConfigs) but getServerConfig returned undefined,
breaking all connection and tool call paths.

- Populate configNameToKey in ensureSingleConfigServer
- Clear configNameToKey in invalidateConfigCache and reset
- Clear stale read-through cache entries after lazy init
- Remove dead code in invalidateConfigCache (config.title, key parsing)
- Add getServerConfig tests for config-source server lookup

* fix: eliminate configNameToKey race via caller-provided configServers param

Replace the process-global configNameToKey map (last-writer-wins under
concurrent multi-tenant load) with a configServers parameter on
getServerConfig. Callers pass the pre-resolved config servers map
directly — no shared mutable state, no cross-tenant race.

- Add optional configServers param to getServerConfig; when provided,
  returns matching config directly without any global lookup
- Remove configNameToKey map entirely (was the source of the race)
- Extract server names from cache keys via lastIndexOf in
  invalidateConfigCache (safe for names containing colons)
- Use mcpConfig[serverName] directly in getMCPTools instead of a
  redundant getServerConfig call
- Add cross-tenant isolation test for getServerConfig

* fix: populate read-through cache after config server lazy init

After lazyInitConfigServer succeeds, write the parsed config to
readThroughCache keyed by serverName so that getServerConfig calls
from ConnectionsRepository, UserConnectionManager, and
MCPManager.callTool find the config without needing configServers.

Without this, config-source servers appeared in tool listings but
every connection attempt and tool call returned undefined.

* fix: user-scoped getServerConfig fallback to server-only cache key

When getServerConfig is called with a userId (e.g., from callTool or
UserConnectionManager), the cache key is serverName::userId. Config-source
servers are cached under the server-only key (no userId). Add a fallback
so user-scoped lookups find config-source servers in the read-through cache.

* fix: configCacheRepo fallback, isUserSourced DRY, cross-process race

CRITICAL: Add findInConfigCache fallback in getServerConfig so
config-source servers remain reachable after readThroughCache TTL
expires (5s). Without this, every tool call after 5s returned
undefined for config-source servers.

MAJOR: Extract isUserSourced() helper to mcp/utils.ts and replace
all 5 inline dbSourced ternary expressions (MCPManager x2,
ConnectionsRepository, UserConnectionManager, MCPServerInspector).

MAJOR: Fix cross-process Redis race in lazyInitConfigServer — when
configCacheRepo.add throws (key exists from another process), fall
back to reading the existing entry instead of returning undefined.

MINOR: Parallelize invalidateConfigCache awaits with Promise.all.
Remove redundant .catch(() => {}) inside Promise.allSettled.
Tighten dedup test assertion to toBe(1).
Add TTL-expiry tests for getServerConfig (with and without userId).

* feat: thread configServers through getAppToolFunctions and formatInstructionsForContext

Add optional configServers parameter to getAppToolFunctions,
getInstructions, and formatInstructionsForContext so config-source
server tools and instructions are visible to agent initialization
and context injection paths.

Existing callers (boot-time init, tests) pass no argument and
continue to work unchanged. Agent runtime paths can now thread
resolved config servers from request context.

* fix: stale failure stubs retry after 5 min, upsert for cross-process races

- Add CONFIG_STUB_RETRY_MS (5 min) — stale failure stubs are retried
  instead of permanently disabling config-source servers after transient
  errors (DNS outage, cold-start race)
- Extract upsertConfigCache() helper that tries add then falls back to
  update, preventing cross-process Redis races where a second instance's
  successful inspection result was discarded
- Add test for stale-stub retry after CONFIG_STUB_RETRY_MS

* fix: stamp updatedAt on failure stubs, null-guard callTool config, test cleanup

- Add updatedAt: Date.now() to failure stubs in lazyInitConfigServer so
  CONFIG_STUB_RETRY_MS (5 min) window works correctly — without it, stubs
  were always considered stale (updatedAt ?? 0 → epoch → always expired)
- Add null guard for rawConfig in MCPManager.callTool before passing to
  preProcessGraphTokens — prevents unsafe `as` cast on undefined
- Log double-failure in upsertConfigCache instead of silently swallowing
- Replace module-scope Date.now monkey-patch with jest.useFakeTimers /
  jest.setSystemTime / jest.useRealTimers in ensureConfigServers tests

* fix: server-only readThrough fallback only returns truthy values

Prevents a cached undefined from a prior no-userId lookup from
short-circuiting the DB query on a subsequent userId-scoped lookup.

* fix: remove findInConfigCache to eliminate cross-tenant config leakage

The findInConfigCache prefix scan (serverName:*) could return any
tenant's config after readThrough TTL expires, violating tenant
isolation. Config-source servers are now ONLY resolvable through:

1. The configServers param (callers with tenant context from ALS)
2. The readThrough cache (populated by ensureSingleConfigServer,
   5s TTL, repopulated on every HTTP request via resolveAllMcpConfigs)

Connection/tool-call paths without tenant context rely exclusively on
the readThrough cache. If it expires before the next HTTP request
repopulates it, the server is not found — which is correct because
there is no tenant context to determine which config to return.

- Remove findInConfigCache method and its call in getServerConfig
- Update server-only readThrough fallback to only return truthy values
  (prevents cached undefined from short-circuiting user-scoped DB lookup)
- Update tests to document tenant isolation behavior after cache expiry

* style: fix import order per AGENTS.md conventions

Sort package imports shortest-to-longest, local imports longest-to-shortest
across MCPServersRegistry, ConnectionsRepository, MCPManager,
UserConnectionManager, and MCPServerInspector.

* fix: eliminate cross-tenant readThrough contamination and TTL-expiry tool failures

Thread pre-resolved serverConfig from tool creation context into
callTool, removing dependency on the readThrough cache for config-source
servers. This fixes two issues:

- Cross-tenant contamination: the readThrough cache key was unscoped
  (just serverName), so concurrent multi-tenant requests for same-named
  servers would overwrite each other's entries
- TTL expiry: tool calls happening >5s after config resolution would
  fail with "Configuration not found" because the readThrough entry
  had expired

Changes:
- Add optional serverConfig param to MCPManager.callTool — uses
  provided config directly, falling back to getServerConfig lookup
  for YAML/user servers
- Thread serverConfig from createMCPTool through createToolInstance
  closure to callTool
- Remove readThrough write from ensureSingleConfigServer — config-source
  servers are only accessible via configServers param (tenant-scoped)
- Remove server-only readThrough fallback from getServerConfig
- Increase config cache hash from 8 to 16 hex chars (64-bit)
- Add isUserSourced boundary tests for all source/dbId combinations
- Fix double Object.keys call in getMCPTools controller
- Update test assertions for new getServerConfig behavior

* fix: cache base configs for config-server users; narrow upsertConfigCache error handling

- Refactor getAllServerConfigs to separate base config fetch (YAML + DB)
  from config-server layering. Base configs are cached via readThroughCacheAll
  regardless of whether configServers is provided, eliminating uncached
  MongoDB queries per request for config-server users
- Narrow upsertConfigCache catch to duplicate-key errors only;
  infrastructure errors (Redis timeouts, network failures) now propagate
  instead of being silently swallowed, preventing inspection storms
  during outages

* fix: restore correct merge order and document upsert error matching

- Restore YAML → Config → User DB precedence in getAllServerConfigs
  (user DB servers have highest precedence, matching the JSDoc contract)
- Add source comment on upsertConfigCache duplicate-key detection
  linking to the two cache implementations that define the error message

* feat: complete config-source server support across all execution paths

Wire configServers through the entire agent execution pipeline so
config-source MCP servers are fully functional — not just visible in
listings but executable in agent sessions.

- Thread configServers into handleTools.js agent tool pipeline: resolve
  config servers from tenant context before MCP tool iteration, pass to
  getServerConfig, createMCPTools, and createMCPTool
- Thread configServers into agent instructions pipeline:
  applyContextToAgent → getMCPInstructionsForServers →
  formatInstructionsForContext, resolved in client.js before agent
  context application
- Add configServers param to createMCPTool and createMCPTools for
  reconnect path fallback
- Add source field to redactServerSecrets allowlist for client UI
  differentiation of server tiers
- Narrow invalidateConfigCache to only clear readThroughCacheAll (merged
  results), preserving YAML individual-server readThrough entries
- Update context.spec.ts assertions for new configServers parameter

* fix: add missing mocks for config-source server dependencies in client.test.js

Mock getMCPServersRegistry, getAppConfig, and getTenantId that were added
to client.js but not reflected in the test file's jest.mock declarations.

* fix: update formatInstructionsForContext assertions for configServers param

The test assertions expected formatInstructionsForContext to be called with
only the server names array, but it now receives configServers as a second
argument after the config-source server feature wiring.

* fix: move configServers resolution before MCP tool loop to avoid TDZ

configServers was declared with `let` after the first tool loop but
referenced inside it via getServerConfig(), causing a ReferenceError
temporal dead zone. Move declaration and resolution before the loop,
using tools.some(mcpToolPattern) to gate the async resolution.

* fix: address review findings — cache bypass, discoverServerTools gap, DRY

- #2: getAllServerConfigs now always uses getBaseServerConfigs (cached via
  readThroughCacheAll) instead of bypassing it when configServers is present.
  Extracts user-DB entries from cached base by diffing against YAML keys
  to maintain YAML → Config → User DB merge order without extra MongoDB calls.

- #3: Add configServers param to ToolDiscoveryOptions and thread it through
  discoverServerTools → getServerConfig so config-source servers are
  discoverable during OAuth reconnection flows.

- #6: Replace inline import() type annotations in context.ts with proper
  import type { ParsedServerConfig } per AGENTS.md conventions.

- #7: Extract resolveConfigServers(req) helper in MCP.js and use it from
  handleTools.js and client.js, eliminating the duplicated 6-line config
  resolution pattern.

- #10: Restore removed "why" comment explaining getLoaded() vs getAll()
  choice in getMCPSetupData — documents non-obvious correctness constraint.

- #11: Fix incomplete JSDoc param type on resolveAllMcpConfigs.

* fix: consolidate imports, reorder constants, fix YAML-DB merge edge case

- Merge duplicate @librechat/data-schemas requires in MCP.js into one
- Move resolveConfigServers after module-level constants
- Fix getAllServerConfigs edge case where user-DB entry overriding a
  YAML entry with the same name was excluded from userDbConfigs; now
  uses reference equality check to detect DB-overwritten YAML keys

* fix: replace fragile string-match error detection with proper upsert method

Add upsert() to IServerConfigsRepositoryInterface and all implementations
(InMemory, Redis, RedisAggregateKey, DB). This eliminates the brittle
error message string match ('already exists in cache') in upsertConfigCache
that was the only thing preventing cross-process init races from silently
discarding inspection results.

Each implementation handles add-or-update atomically:
- InMemory: direct Map.set()
- Redis: direct cache.set()
- RedisAggregateKey: read-modify-write under write lock
- DB: delegates to update() (DB servers use explicit add() with ACL setup)

* fix: wire configServers through remaining HTTP endpoints

- getMCPServerById: use resolveAllMcpConfigs instead of bare getServerConfig
- reinitialize route: resolve configServers before getServerConfig
- auth-values route: resolve configServers before getServerConfig
- getOAuthHeaders: accept configServers param, thread from callers
- Update mcp.spec.js tests to mock getAllServerConfigs for GET by name

* fix: thread serverConfig through getConnection for config-source servers

Config-source servers exist only in configCacheRepo, not in YAML cache or
DB. When callTool → getConnection → getUserConnection → getServerConfig
runs without configServers, it returns undefined and throws. Fix by
threading the pre-resolved serverConfig (providedConfig) from callTool
through getConnection → getUserConnection → createUserConnectionInternal,
using it as a fallback before the registry lookup.

* fix: thread configServers through reinit, reconnect, and tool definition paths

Wire configServers through every remaining call chain that creates or
reconnects MCP server connections:

- reinitMCPServer: accepts serverConfig and configServers, uses them for
  getServerConfig fallback, getConnection, and discoverServerTools
- reconnectServer: accepts and passes configServers to reinitMCPServer
- createMCPTools/createMCPTool: pass configServers to reconnectServer
- ToolService.loadToolDefinitionsWrapper: resolves configServers from req,
  passes to both reinitMCPServer call sites
- reinitialize route: passes serverConfig and configServers to reinitMCPServer

* fix: address review findings — simplify merge, harden error paths, fix log labels

- Simplify getAllServerConfigs merge: replace fragile reference-equality
  loop with direct spread { ...yamlConfigs, ...configServers, ...base }
- Guard upsertConfigCache in lazyInitConfigServer catch block so cache
  failures don't mask the original inspection error
- Deduplicate getYamlServerNames cold-start with promise dedup pattern
- Remove dead `if (!mcpConfig)` guard in getMCPSetupData
- Fix hardcoded "App server" in ServerConfigsCacheRedisAggregateKey error
  messages — now uses this.namespace for correct Config/App labeling
- Remove misleading OAuth callback comment about readThrough cache
- Move resolveConfigServers after module-level constants in MCP.js

* fix: clear rejected yamlServerNames promise, fix config-source reinspect, fix reset log label

- Clear yamlServerNamesPromise on rejection so transient cache errors
  don't permanently prevent ensureConfigServers from working
- Skip reinspectServer for config-source servers (source: 'config') in
  reinitMCPServer — they lack a CACHE/DB storage location; retry is
  handled by CONFIG_STUB_RETRY_MS in ensureConfigServers
- Use source field instead of dbId for storageLocation derivation
- Fix remaining hardcoded "App" in reset() leaderCheck message

* fix: persist oauthHeaders in flow state for config-source OAuth servers

The OAuth callback route has no JWT auth context and cannot resolve
config-source server configs. Previously, getOAuthHeaders would silently
return {} for config-source servers, dropping custom token exchange headers.

Now oauthHeaders are persisted in MCPOAuthFlowMetadata during flow
initiation (which has auth context), and the callback reads them from
the stored flow state with a fallback to the registry lookup for
YAML/user-DB servers.

* fix: update tests for getMCPSetupData null guard removal and ToolService mock

- MCP.spec.js: update test to expect graceful handling of null mcpConfig
  instead of a throw (getAllServerConfigs always returns an object)
- MCP.js: add defensive || {} for Object.entries(mcpConfig) in case of
  null from test mocks
- ToolService.spec.js: add missing mock for ~/server/services/MCP
  (resolveConfigServers)

* fix: address review findings — DRY, naming, logging, dead code, defensive guards

- #1: Simplify getAllServerConfigs to single getBaseServerConfigs call,
  eliminating redundant double-fetch of cacheConfigsRepo.getAll()
- #2: Add warning log when oauthHeaders absent from OAuth callback flow state
- #3: Extract resolveAllMcpConfigs to MCP.js service layer; controller
  imports shared helper instead of reimplementing
- #4: Rename _serverConfig/_provider to capturedServerConfig/capturedProvider
  in createToolInstance — these are actively used, not unused
- #5: Log rejected results from ensureConfigServers Promise.allSettled
  so cache errors are visible instead of silently dropped
- #6: Remove dead 'MCP config not found' error handlers from routes
- #7: Document circular-dependency reason for dynamic require in clearMcpConfigCache
- #8: Remove logger.error from withTimeout to prevent double-logging timeouts
- #10: Add explicit userId guard in ServerConfigsDB.upsert with clear error message
- #12: Use spread instead of mutation in addServer for immutability consistency
- Add upsert mock to ensureConfigServers.test.ts DB mock
- Update route tests for resolveAllMcpConfigs import change

* fix: restore correct merge priority, use immutable spread, fix test mock

- getAllServerConfigs: { ...configServers, ...base } so userDB wins over
  configServers, matching documented "User DB (highest)" priority
- lazyInitConfigServer: use immutable spread instead of direct mutation
  for parsedConfig.source, consistent with addServer fix
- Fix test to mock getAllServerConfigs as {} instead of null, remove
  unnecessary || {} defensive guard in getMCPSetupData

* fix: error handling, stable hashing, flatten nesting, remove dead param

- Wrap resolveConfigServers/resolveAllMcpConfigs in try/catch with
  graceful {} fallback so transient DB/cache errors don't crash tool pipeline
- Sort keys in configCacheKey JSON.stringify for deterministic hashing
  regardless of object property insertion order
- Flatten clearMcpConfigCache from 3 nested try-catch to early returns;
  document that user connections are cleaned up lazily (accepted tradeoff)
- Remove dead configServers param from getAppToolFunctions (never passed)
- Add security rationale comment for source field in redactServerSecrets

* fix: use recursive key-sorting replacer in configCacheKey to prevent cross-tenant cache collision

The array replacer in JSON.stringify acts as a property allowlist at
every nesting depth, silently dropping nested keys like headers['X-API-Key'],
oauth.client_secret, etc. Two configs with different nested values but
identical top-level structure produced the same hash, causing cross-tenant
cache hits and potential credential contamination.

Switch to a function replacer that recursively sorts keys at all depths
without dropping any properties.

Also document the known gap in getOAuthServers: config-source OAuth
servers are not covered by auto-reconnection or uninstall cleanup
because callers lack request context.

* fix: move clearMcpConfigCache to packages/api to eliminate circular dependency

The function only depends on MCPServersRegistry and MCPManager, both of
which live in packages/api. Import it directly from @librechat/api in
the CJS layer instead of using dynamic require('~/config').

* chore: imports/fields ordering

* fix: address review findings — error handling, targeted lookup, test gaps

- Narrow resolveAllMcpConfigs catch to only wrap ensureConfigServers so
  getAppConfig/getAllServerConfigs failures propagate instead of masking
  infrastructure errors as empty server lists.
- Use targeted getServerConfig in getMCPServerById instead of fetching
  all server configs for a single-server lookup.
- Forward configServers to inner createMCPTool calls so reconnect path
  works for config-source servers.
- Update getAllServerConfigs JSDoc to document disjoint-key design.
- Add OAuth callback oauthHeaders fallback tests (flow state present
  vs registry fallback).
- Add resolveConfigServers/resolveAllMcpConfigs unit tests covering
  happy path and error propagation.

* fix: add getOAuthReconnectionManager mock to OAuth callback tests

* chore: imports ordering
2026-03-28 10:36:43 -04:00
Danny Avila
b5c097e5c7
⚗️ feat: Agent Context Compaction/Summarization (#12287)
* chore: imports/types

Add summarization config and package-level summarize handler contracts

Register summarize handlers across server controller paths

Port cursor dual-read/dual-write summary support and UI status handling

Selectively merge cursor branch files for BaseClient summary content
block detection (last-summary-wins), dual-write persistence, summary
block unit tests, and on_summarize_status SSE event handling with
started/completed/failed branches.

Co-authored-by: Cursor <cursoragent@cursor.com>

refactor: type safety

feat: add localization for summarization status messages

refactor: optimize summary block detection in BaseClient

Updated the logic for identifying existing summary content blocks to use a reverse loop for improved efficiency. Added a new test case to ensure the last summary content block is updated correctly when multiple summary blocks exist.

chore: add runName to chainOptions in AgentClient

refactor: streamline summarization configuration and handler integration

Removed the deprecated summarizeNotConfigured function and replaced it with a more flexible createSummarizeFn. Updated the summarization handler setup across various controllers to utilize the new function, enhancing error handling and configuration resolution. Improved overall code clarity and maintainability by consolidating summarization logic.

feat(summarization): add staged chunk-and-merge fallback

feat(usage): track summarization usage separately from messages

feat(summarization): resolve prompt from config in runtime

fix(endpoints): use @librechat/api provider config loader

refactor(agents): import getProviderConfig from @librechat/api

chore: code order

feat(app-config): auto-enable summarization when configured

feat: summarization config

refactor(summarization): streamline persist summary handling and enhance configuration validation

Removed the deprecated createDeferredPersistSummary function and integrated a new createPersistSummary function for MongoDB persistence. Updated summarization handlers across various controllers to utilize the new persistence method. Enhanced validation for summarization configuration to ensure provider, model, and prompt are properly set, improving error handling and overall robustness.

refactor(summarization): update event handling and remove legacy summarize handlers

Replaced the deprecated summarization handlers with new event-driven handlers for summarization start and completion across multiple controllers. This change enhances the clarity of the summarization process and improves the integration of summarization events in the application. Additionally, removed unused summarization functions and streamlined the configuration loading process.

refactor(summarization): standardize event names in handlers

Updated event names in the summarization handlers to use constants from GraphEvents for consistency and clarity. This change improves maintainability and reduces the risk of errors related to string literals in event handling.

feat(summarization): enhance usage tracking for summarization events

Added logic to track summarization usage in multiple controllers by checking the current node type. If the node indicates a summarization task, the usage type is set accordingly. This change improves the granularity of usage data collected during summarization processes.

feat(summarization): integrate SummarizationConfig into AppSummarizationConfig type

Enhanced the AppSummarizationConfig type by extending it with the SummarizationConfig type from librechat-data-provider. This change improves type safety and consistency in the summarization configuration structure.

test: add end-to-end tests for summarization functionality

Introduced a comprehensive suite of end-to-end tests for the summarization feature, covering the full LibreChat pipeline from message creation to summarization. This includes a new setup file for environment configuration and a Jest configuration specifically for E2E tests. The tests utilize real API keys and ensure proper integration with the summarization process, enhancing overall test coverage and reliability.

refactor(summarization): include initial summary in formatAgentMessages output

Updated the formatAgentMessages function to return an initial summary alongside messages and index token count map. This change is reflected in multiple controllers and the corresponding tests, enhancing the summarization process by providing additional context for each agent's response.

refactor: move hydrateMissingIndexTokenCounts to tokenMap utility

Extracted the hydrateMissingIndexTokenCounts function from the AgentClient and related tests into a new tokenMap utility file. This change improves code organization and reusability, allowing for better management of token counting logic across the application.

refactor(summarization): standardize step event handling and improve summary rendering

Refactored the step event handling in the useStepHandler and related components to utilize constants for event names, enhancing consistency and maintainability. Additionally, improved the rendering logic in the Summary component to conditionally display the summary text based on its availability, providing a better user experience during the summarization process.

feat(summarization): introduce baseContextTokens and reserveTokensRatio for improved context management

Added baseContextTokens to the InitializedAgent type to calculate the context budget based on agentMaxContextNum and maxOutputTokensNum. Implemented reserveTokensRatio in the createRun function to allow configurable context token management. Updated related tests to validate these changes and ensure proper functionality.

feat(summarization): add minReserveTokens, context pruning, and overflow recovery configurations

Introduced new configuration options for summarization, including minReserveTokens, context pruning settings, and overflow recovery parameters. Updated the createRun function to accommodate these new options and added a comprehensive test suite to validate their functionality and integration within the summarization process.

feat(summarization): add updatePrompt and reserveTokensRatio to summarization configuration

Introduced an updatePrompt field for updating existing summaries with new messages, enhancing the flexibility of the summarization process. Additionally, added reserveTokensRatio to the configuration schema, allowing for improved management of token allocation during summarization. Updated related tests to validate these new features.

feat(logging): add on_agent_log event handler for structured logging

Implemented an on_agent_log event handler in both the agents' callbacks and responses to facilitate structured logging of agent activities. This enhancement allows for better tracking and debugging of agent interactions by logging messages with associated metadata. Updated the summarization process to ensure proper handling of log events.

fix: remove duplicate IBalanceUpdate interface declaration

perf(usage): single-pass partition of collectedUsage

Replace two Array.filter() passes with a single for-of loop that
partitions message vs. summarization usages in one iteration.

fix(BaseClient): shallow-copy message content before mutating and preserve string content

Avoid mutating the original message.content array in-place when
appending a summary block. Also convert string content to a text
content part instead of silently discarding it.

fix(ui): fix Part.tsx indentation and useStepHandler summarize-complete handling

- Fix SUMMARY else-if branch indentation in Part.tsx to match chain level
- Guard ON_SUMMARIZE_COMPLETE with didFinalize flag to avoid unnecessary
  re-renders when no summarizing parts exist
- Protect against undefined completeData.summary instead of unsafe spread

fix(agents): use strict enabled check for summarization handlers

Change summarizationConfig?.enabled !== false to === true so handlers
are not registered when summarizationConfig is undefined.

chore: fix initializeClient JSDoc and move DEFAULT_RESERVE_RATIO to module scope

refactor(Summary): align collapse/expand behavior with Reasoning component

- Single render path instead of separate streaming vs completed branches
- Use useMessageContext for isSubmitting/isLatestMessage awareness so
  the "Summarizing..." label only shows during active streaming
- Default to collapsed (matching Reasoning), user toggles to expand
- Add proper aria attributes (aria-hidden, role, aria-controls, contentId)
- Hide copy button while actively streaming

feat(summarization): default to self-summarize using agent's own provider/model

When no summarization config is provided (neither in librechat.yaml nor
on the agent), automatically enable summarization using the agent's own
provider and model. The agents package already provides default prompts,
so no prompt configuration is needed.

Also removes the dead resolveSummarizationLLMConfig in summarize.ts
(and its spec) — run.ts buildAgentContext is the single source of truth
for summarization config resolution. Removes the duplicate
RuntimeSummarizationConfig local type in favor of the canonical
SummarizationConfig from data-provider.

chore: schema and type cleanup for summarization

- Add trigger field to summarizationAgentOverrideSchema so per-agent
  trigger overrides in librechat.yaml are not silently stripped by Zod
- Remove unused SummarizationStatus type from runs.ts
- Make AppSummarizationConfig.enabled non-optional to reflect the
  invariant that loadSummarizationConfig always sets it

refactor(responses): extract duplicated on_agent_log handler

refactor(run): use agents package types for summarization config

Import SummarizationConfig, ContextPruningConfig, and
OverflowRecoveryConfig from @librechat/agents and use them to
type-check the translation layer in buildAgentContext. This ensures
the config object passed to the agent graph matches what it expects.

- Use `satisfies AgentSummarizationConfig` on the config object
- Cast contextPruningConfig and overflowRecoveryConfig to agents types
- Properly narrow trigger fields from DeepPartial to required shape

feat(config): add maxToolResultChars to base endpoint schema

Add maxToolResultChars to baseEndpointSchema so it can be configured
on any endpoint in librechat.yaml. Resolved during agent initialization
using getProviderConfig's endpoint resolution: custom endpoint config
takes precedence, then the provider-specific endpoint config, then the
shared `all` config.

Passed through to the agents package ToolNode, which uses it to cap
tool result length before it enters the context window. When not
configured, the agents package computes a sensible default from
maxContextTokens.

fix(summarization): forward agent model_parameters in self-summarize default

When no explicit summarization config exists, the self-summarize
default now forwards the agent's model_parameters as the
summarization parameters. This ensures provider-specific settings
(e.g. Bedrock region, credentials, endpoint host) are available
when the agents package constructs the summarization LLM.

fix(agents): register summarization handlers by default

Change the enabled gate from === true to !== false so handlers
register when no explicit summarization config exists. This aligns
with the self-summarize default where summarization is always on
unless explicitly disabled via enabled: false.

refactor(summarization): let agents package inherit clientOptions for self-summarize

Remove model_parameters forwarding from the self-summarize default.
The agents package now reuses the agent's own clientOptions when the
summarization provider matches the agent's provider, inheriting all
provider-specific settings (region, credentials, proxy, etc.)
automatically.

refactor(summarization): use MessageContentComplex[] for summary content

Unify summary content to always use MessageContentComplex[] arrays,
matching the pattern used by on_message_delta. No more string | array
unions — content is always an array of typed blocks ({ type: 'text',
text: '...' } for text, { type: 'reasoning_content', ... } for
reasoning).

Agents package:
- SummaryContentBlock.content: MessageContentComplex[] (was string)
- tokenCount now optional (not sent on deltas)
- Removed reasoning field — reasoning is now a content block type
- streamAndCollect normalizes all chunks to content block arrays
- Delta events pass content blocks directly

LibreChat:
- SummaryContentPart.content: Agents.MessageContentComplex[]
- Updated Part.tsx, Summary.tsx, useStepHandler.ts, BaseClient.js
- Summary.tsx derives display text from content blocks via useMemo
- Aggregator uses simple array spread

refactor(summarization): enhance summary handling and text extraction

- Updated BaseClient.js to improve summary text extraction, accommodating both legacy and new content formats.
- Modified summarization logic to ensure consistent handling of summary content across different message formats.
- Adjusted test cases in summarization.e2e.spec.js to utilize the new summary text extraction method.
- Refined SSE useStepHandler to initialize summary content as an array.
- Updated configuration schema by removing unused minReserveTokens field.
- Cleaned up SummaryContentPart type by removing rangeHash property.

These changes streamline the summarization process and ensure compatibility with various content structures.

refactor(summarization): streamline usage tracking and logging

- Removed direct checks for summarization nodes in ModelEndHandler and replaced them with a dedicated markSummarizationUsage function for better readability and maintainability.
- Updated OpenAIChatCompletionController and responses handlers to utilize the new markSummarizationUsage function for setting usage types.
- Enhanced logging functionality by ensuring the logger correctly handles different log levels.
- Introduced a new useCopyToClipboard hook in the Summary component to encapsulate clipboard copy logic, improving code reusability and clarity.

These changes improve the overall structure and efficiency of the summarization handling and logging processes.

refactor(summarization): update summary content block documentation

- Removed outdated comment regarding the last summary content block in BaseClient.js.
- Added a new comment to clarify the purpose of the findSummaryContentBlock method, ensuring consistency in documentation.

These changes enhance code clarity and maintainability by providing accurate descriptions of the summarization logic.

refactor(summarization): update summary content structure in tests

- Modified the summarization content structure in e2e tests to use an array format for text, aligning with recent changes in summary handling.
- Updated test descriptions to clarify the behavior of context token calculations, ensuring consistency and clarity in the tests.

These changes enhance the accuracy and maintainability of the summarization tests by reflecting the updated content structure.

refactor(summarization): remove legacy E2E test setup and configuration

- Deleted the e2e-setup.js and jest.e2e.config.js files, which contained legacy configurations for E2E tests using real API keys.
- Introduced a new summarization.e2e.ts file that implements comprehensive E2E backend integration tests for the summarization process, utilizing real AI providers and tracking summaries throughout the run.

These changes streamline the testing framework by consolidating E2E tests into a single, more robust file while removing outdated configurations.

refactor(summarization): enhance E2E tests and error handling

- Added a cleanup step to force exit after all tests to manage Redis connections.
- Updated the summarization model to 'claude-haiku-4-5-20251001' for consistency across tests.
- Improved error handling in the processStream function to capture and return processing errors.
- Enhanced logging for cross-run tests and tight context scenarios to provide better insights into test execution.

These changes improve the reliability and clarity of the E2E tests for the summarization process.

refactor(summarization): enhance test coverage for maxContextTokens behavior

- Updated run-summarization.test.ts to include a new test case ensuring that maxContextTokens does not exceed user-defined limits, even when calculated ratios suggest otherwise.
- Modified summarization.e2e.ts to replace legacy UsageMetadata type with a more appropriate type for collectedUsage, improving type safety and clarity in the test setup.

These changes improve the robustness of the summarization tests by validating context token constraints and refining type definitions.

feat(summarization): add comprehensive E2E tests for summarization process

- Introduced a new summarization.e2e.test.ts file that implements extensive end-to-end integration tests for the summarization pipeline, covering the full flow from LibreChat to agents.
- The tests utilize real AI providers and include functionality to track summaries during and between runs.
- Added necessary cleanup steps to manage Redis connections post-tests and ensure proper exit.

These changes enhance the testing framework by providing robust coverage for the summarization process, ensuring reliability and performance under real-world conditions.

fix(service): import logger from winston configuration

- Removed the import statement for logger from '@librechat/data-schemas' and replaced it with an import from '~/config/winston'.
- This change ensures that the logger is correctly sourced from the updated configuration, improving consistency in logging practices across the application.

refactor(summary): simplify Summary component and enhance token display

- Removed the unused `meta` prop from the `SummaryButton` component to streamline its interface.
- Updated the token display logic to use a localized string for better internationalization support.
- Adjusted the rendering of the `meta` information to improve its visibility within the `Summary` component.

These changes enhance the clarity and usability of the Summary component while ensuring better localization practices.

feat(summarization): add maxInputTokens configuration for summarization

- Introduced a new `maxInputTokens` property in the summarization configuration schema to control the amount of conversation context sent to the summarizer, with a default value of 10000.
- Updated the `createRun` function to utilize the new `maxInputTokens` setting, allowing for more flexible summarization based on agent context.

These changes enhance the summarization capabilities by providing better control over input token limits, improving the overall summarization process.

refactor(summarization): simplify maxInputTokens logic in createRun function

- Updated the logic for the `maxInputTokens` property in the `createRun` function to directly use the agent's base context tokens when the resolved summarization configuration does not specify a value.
- This change streamlines the configuration process and enhances clarity in how input token limits are determined for summarization.

These modifications improve the maintainability of the summarization configuration by reducing complexity in the token calculation logic.

feat(summary): enhance Summary component to display meta information

- Updated the SummaryContent component to accept an optional `meta` prop, allowing for additional contextual information to be displayed above the main content.
- Adjusted the rendering logic in the Summary component to utilize the new `meta` prop, improving the visibility of supplementary details.

These changes enhance the user experience by providing more context within the Summary component, making it clearer and more informative.

refactor(summarization): standardize reserveRatio configuration in summarization logic

- Replaced instances of `reserveTokensRatio` with `reserveRatio` in the `createRun` function and related tests to unify the terminology across the codebase.
- Updated the summarization configuration schema to reflect this change, ensuring consistency in how the reserve ratio is defined and utilized.
- Removed the per-agent override logic for summarization configuration, simplifying the overall structure and enhancing clarity.

These modifications improve the maintainability and readability of the summarization logic by standardizing the configuration parameters.

* fix: circular dependency of `~/models`

* chore: update logging scope in agent log handlers

Changed log scope from `[agentus:${data.scope}]` to `[agents:${data.scope}]` in both the callbacks and responses controllers to ensure consistent logging format across the application.

* feat: calibration ratio

* refactor(tests): update summarizationConfig tests to reflect changes in enabled property

Modified tests to check for the new `summarizationEnabled` property instead of the deprecated `enabled` field in the summarization configuration. This change ensures that the tests accurately validate the current configuration structure and behavior of the agents.

* feat(tests): add markSummarizationUsage mock for improved test coverage

Introduced a mock for the markSummarizationUsage function in the responses unit tests to enhance the testing of summarization usage tracking. This addition supports better validation of summarization-related functionalities and ensures comprehensive test coverage for the agents' response handling.

* refactor(tests): simplify event handler setup in createResponse tests

Removed redundant mock implementations for event handlers in the createResponse unit tests, streamlining the setup process. This change enhances test clarity and maintainability while ensuring that the tests continue to validate the correct behavior of usage tracking during on_chat_model_end events.

* refactor(agents): move calibration ratio capture to finally block

Reorganized the logic for capturing the calibration ratio in the AgentClient class to ensure it is executed in the finally block. This change guarantees that the ratio is captured even if the run is aborted, enhancing the reliability of the response message persistence. Removed redundant code and improved clarity in the handling of context metadata.

* refactor(agents): streamline bulk write logic in recordCollectedUsage function

Removed redundant bulk write operations and consolidated document handling in the recordCollectedUsage function. The logic now combines all documents into a single bulk write operation, improving efficiency and reducing error handling complexity. Updated logging to provide consistent error messages for bulk write failures.

* refactor(agents): enhance summarization configuration resolution in createRun function

Streamlined the summarization configuration logic by introducing a base configuration and allowing for overrides from agent-specific settings. This change improves clarity and maintainability, ensuring that the summarization configuration is consistently applied while retaining flexibility for customization. Updated the handling of summarization parameters to ensure proper integration with the agent's model and provider settings.

* refactor(agents): remove unused tokenCountMap and streamline calibration ratio handling

Eliminated the unused tokenCountMap variable from the AgentClient class to enhance code clarity. Additionally, streamlined the logic for capturing the calibration ratio by using optional chaining and a fallback value, ensuring that context metadata is consistently defined. This change improves maintainability and reduces potential confusion in the codebase.

* refactor(agents): extract agent log handler for improved clarity and reusability

Refactored the agent log handling logic by extracting it into a dedicated function, `agentLogHandler`, enhancing code clarity and reusability across different modules. Updated the event handlers in both the OpenAI and responses controllers to utilize the new handler, ensuring consistent logging behavior throughout the application.

* test: add summarization event tests for useStepHandler

Implemented a series of tests for the summarization events in the useStepHandler hook. The tests cover scenarios for ON_SUMMARIZE_START, ON_SUMMARIZE_DELTA, and ON_SUMMARIZE_COMPLETE events, ensuring proper handling of summarization logic, including message accumulation and finalization. This addition enhances test coverage and validates the correct behavior of the summarization process within the application.

* refactor(config): update summarizationTriggerSchema to use enum for type validation

Changed the type of the `type` field in the summarizationTriggerSchema from a string to an enum with a single value 'token_count'. This modification enhances type safety and ensures that only valid types are accepted in the configuration, improving overall clarity and maintainability of the schema.

* test(usage): add bulk write tests for message and summarization usage

Implemented tests for the bulk write functionality in the recordCollectedUsage function, covering scenarios for combined message and summarization usage, summarization-only usage, and message-only usage. These tests ensure correct document handling and token rollup calculations, enhancing test coverage and validating the behavior of the usage tracking logic.

* refactor(Chat): enhance clipboard copy functionality and type definitions in Summary component

Updated the Summary component to improve the clipboard copy functionality by handling clipboard permission errors. Refactored type definitions for SummaryProps to use a more specific type, enhancing type safety. Adjusted the SummaryButton and FloatingSummaryBar components to accept isCopied and onCopy props, promoting better separation of concerns and reusability.

* chore(translations): remove unused "Expand Summary" key from English translations

Deleted the "Expand Summary" key from the English translation file to streamline the localization resources and improve clarity in the user interface. This change helps maintain an organized and efficient translation structure.

* refactor: adjust token counting for Claude model to account for API discrepancies

Implemented a correction factor for token counting when using the Claude model, addressing discrepancies between Anthropic's API and local tokenizer results. This change ensures accurate token counts by applying a scaling factor, improving the reliability of token-related functionalities.

* refactor(agents): implement token count adjustment for Claude model messages

Added a method to adjust token counts for messages processed by the Claude model, applying a correction factor to align with API expectations. This enhancement improves the accuracy of token counting, ensuring reliable functionality when interacting with the Claude model.

* refactor(agents): token counting for media content in messages

Introduced a new method to estimate token costs for image and document blocks in messages, improving the accuracy of token counting. This enhancement ensures that media content is properly accounted for, particularly for the Claude model, by integrating additional token estimation logic for various content types. Updated the token counting function to utilize this new method, enhancing overall reliability and functionality.

* chore: fix missing import

* fix(agents): clamp baseContextTokens and document reserve ratio change

Prevent negative baseContextTokens when maxOutputTokens exceeds the
context window (misconfigured models). Document the 10%→5% default
reserve ratio reduction introduced alongside summarization.

* fix(agents): include media tokens in hydrated token counts

Add estimateMediaTokensForMessage to createTokenCounter so the hydration
path (used by hydrateMissingIndexTokenCounts) matches the precomputed
path in AgentClient.getTokenCountForMessage. Without this, messages
containing images or documents were systematically undercounted during
hydration, risking context window overflow.

Add 34 unit tests covering all block-type branches of
estimateMediaTokensForMessage.

* fix(agents): include summarization output tokens in usage return value

The returned output_tokens from recordCollectedUsage now reflects all
billed LLM calls (message + summarization). Previously, summarization
completions were billed but excluded from the returned metadata, causing
a discrepancy between what users were charged and what the response
message reported.

* fix(tests): replace process.exit with proper Redis cleanup in e2e test

The summarization E2E test used process.exit(0) to work around a Redis
connection opened at import time, which killed the Jest runner and
bypassed teardown. Use ioredisClient.quit() and keyvRedisClient.disconnect()
for graceful cleanup instead.

* fix(tests): update getConvo imports in OpenAI and response tests

Refactor test files to import getConvo from the main models module instead of the Conversation submodule. This change ensures consistency across tests and simplifies the import structure, enhancing maintainability.

* fix(clients): improve summary text validation in BaseClient

Refactor the summary extraction logic to ensure that only non-empty summary texts are considered valid. This change enhances the robustness of the message processing by utilizing a dedicated method for summary text retrieval, improving overall reliability.

* fix(config): replace z.any() with explicit union in summarization schema

Model parameters (temperature, top_p, etc.) are constrained to
primitive types rather than the policy-violating z.any().

* refactor(agents): deduplicate CLAUDE_TOKEN_CORRECTION constant

Export from the TS source in packages/api and import in the JS client,
eliminating the static class property that could drift out of sync.

* refactor(agents): eliminate duplicate selfProvider in buildAgentContext

selfProvider and provider were derived from the same expression with
different type casts. Consolidated to a single provider variable.

* refactor(agents): extract shared SSE handlers and restrict log levels

- buildSummarizationHandlers() factory replaces triplicated handler
  blocks across responses.js and openai.js
- agentLogHandlerObj exported from callbacks.js for consistent reuse
- agentLogHandler restricted to an allowlist of safe log levels
  (debug, info, warn, error) instead of accepting arbitrary strings

* fix(SSE): batch summarize deltas, add exhaustiveness check, conditional error announcement

- ON_SUMMARIZE_DELTA coalesces rapid-fire renders via requestAnimationFrame
  instead of calling setMessages per chunk
- Exhaustive never-check on TStepEvent catches unhandled variants at
  compile time when new StepEvents are added
- ON_SUMMARIZE_COMPLETE error announcement only fires when a summary
  part was actually present and removed

* feat(agents): persist instruction overhead in contextMeta and seed across runs

Extend contextMeta with instructionOverhead and toolCount so the
provider-observed instruction overhead is persisted on the response message
and seeded into the pruner on subsequent runs. This enables the pruner to
use a calibrated budget from the first call instead of waiting for a
provider observation, preventing the ratio collapse caused by local
tokenizer overestimating tool schema tokens.

The seeded overhead is only used when encoding and tool count match
between runs, ensuring stale values from different configurations
are discarded.

* test(agents): enhance OpenAI test mocks for summarization handlers

Updated the OpenAI test suite to include additional mock implementations for summarization handlers, including buildSummarizationHandlers, markSummarizationUsage, and agentLogHandlerObj. This improves test coverage and ensures consistent behavior during testing.

* fix(agents): address review findings for summarization v2

Cancel rAF on unmount to prevent stale Recoil writes from dead
component context. Clear orphaned summarizing:true parts when
ON_SUMMARIZE_COMPLETE arrives without a summary payload. Add null
guard and safe spread to agentLogHandler. Handle Anthropic-format
base64 image/* documents in estimateMediaTokensForMessage. Use
role="region" for expandable summary content. Add .describe() to
contextMeta Zod fields. Extract duplicate usage loop into helper.

* refactor: simplify contextMeta to calibrationRatio + encoding only

Remove instructionOverhead and toolCount from cross-run persistence —
instruction tokens change too frequently between runs (prompt edits,
tool changes) for a persisted seed to be reliable. The intra-run
calibration in the pruner still self-corrects via provider observations.
contextMeta now stores only the tokenizer-bias ratio and encoding,
which are stable across instruction changes.

* test(SSE): enhance useStepHandler tests for ON_SUMMARIZE_COMPLETE behavior

Updated the test for ON_SUMMARIZE_COMPLETE to clarify that it finalizes the existing part with summarizing set to false when the summary is undefined. Added assertions to verify the correct behavior of message updates and the state of summary parts.

* refactor(BaseClient): remove handleContextStrategy and truncateToolCallOutputs functions

Eliminated the handleContextStrategy method from BaseClient to streamline message handling. Also removed the truncateToolCallOutputs function from the prompts module, simplifying the codebase and improving maintainability.

* refactor: add AGENT_DEBUG_LOGGING option and refactor token count handling in BaseClient

Introduced AGENT_DEBUG_LOGGING to .env.example for enhanced debugging capabilities. Refactored token count handling in BaseClient by removing the handleTokenCountMap method and simplifying token count updates. Updated AgentClient to log detailed token count recalculations and adjustments, improving traceability during message processing.

* chore: update dependencies in package-lock.json and package.json files

Bumped versions of several dependencies, including @librechat/agents to ^3.1.62 and various AWS SDK packages to their latest versions. This ensures compatibility and incorporates the latest features and fixes.

* chore: imports order

* refactor: extract summarization config resolution from buildAgentContext

* refactor: rename and simplify summarization configuration shaping function

* refactor: replace AgentClient token counting methods with single-pass pure utility

Extract getTokenCount() and getTokenCountForMessage() from AgentClient
into countFormattedMessageTokens(), a pure function in packages/api that
handles text, tool_call, image, and document content types in one loop.

- Decompose estimateMediaTokensForMessage into block-level helpers
  (estimateImageDataTokens, estimateImageBlockTokens, estimateDocumentBlockTokens)
  shared by both estimateMediaTokensForMessage and the new single-pass function
- Remove redundant per-call getEncoding() resolution (closure captures once)
- Remove deprecated gpt-3.5-turbo-0301 model branching
- Drop this.getTokenCount guard from BaseClient.sendMessage

* refactor: streamline token counting in createTokenCounter function

Simplified the createTokenCounter function by removing the media token estimation and directly calculating the token count. This change enhances clarity and performance by consolidating the token counting logic into a single pass, while maintaining compatibility with Claude's token correction.

* refactor: simplify summarization configuration types

Removed the AppSummarizationConfig type and directly used SummarizationConfig in the AppConfig interface. This change streamlines the type definitions and enhances consistency across the codebase.

* chore: import order

* fix: summarization event handling in useStepHandler

- Cancel pending summarizeDeltaRaf in clearStepMaps to prevent stale
  frames firing after map reset or component unmount
- Move announcePolite('summarize_completed') inside the didFinalize
  guard so screen readers only announce when finalization actually occurs
- Remove dead cleanup closure returned from stepHandler useCallback body
  that was never invoked by any caller

* fix: estimate tokens for non-PDF/non-image base64 document blocks

Previously estimateDocumentBlockTokens returned 0 for unrecognized MIME
types (e.g. text/plain, application/json), silently underestimating
context budget. Fall back to character-based heuristic or countTokens.

* refactor: return cloned usage from markSummarizationUsage

Avoid mutating LangChain's internal usage_metadata object by returning
a shallow clone with the usage_type tag. Update all call sites in
callbacks, openai, and responses controllers to use the returned value.

* refactor: consolidate debug logging loops in buildMessages

Merge the two sequential O(n) debug-logging passes over orderedMessages
into a single pass inside the map callback where all data is available.

* refactor: narrow SummaryContentPart.content type

Replace broad Agents.MessageContentComplex[] with the specific
Array<{ type: ContentTypes.TEXT; text: string }> that all producers
and consumers already use, improving compile-time safety.

* refactor: use single output array in recordCollectedUsage

Have processUsageGroup append to a shared array instead of returning
separate arrays that are spread into a third, reducing allocations.

* refactor: use for...in in hydrateMissingIndexTokenCounts

Replace Object.entries with for...in to avoid allocating an
intermediate tuple array during token map hydration.
2026-03-21 14:28:56 -04:00
Danny Avila
0412f05daf
🪢 chore: Consolidate Pricing and Tx Imports After tx.js Module Removal (#12086)
* 🧹 chore: resolve imports due to rebase

* chore: Update model mocks in unit tests for consistency

- Consolidated model mock implementations across various test files to streamline setup and reduce redundancy.
- Removed duplicate mock definitions for `getMultiplier` and `getCacheMultiplier`, ensuring a unified approach in `recordCollectedUsage.spec.js`, `openai.spec.js`, `responses.unit.spec.js`, and `abortMiddleware.spec.js`.
- Enhanced clarity and maintainability of test files by aligning mock structures with the latest model updates.

* fix: Safeguard token credit checks in transaction tests

- Updated assertions in `transaction.spec.ts` to handle potential null values for `updatedBalance` by using optional chaining.
- Enhanced robustness of tests related to token credit calculations, ensuring they correctly account for scenarios where the balance may not be found.

* chore: transaction methods with bulk insert functionality

- Introduced `bulkInsertTransactions` method in `transaction.ts` to facilitate batch insertion of transaction documents.
- Updated test file `transactions.bulk-parity.spec.ts` to utilize new pricing function assignments and handle potential null values in calculations, improving test robustness.
- Refactored pricing function initialization for clarity and consistency.

* refactor: Enhance type definitions and introduce new utility functions for model matching

- Added `findMatchingPattern` and `matchModelName` utility functions to improve model name matching logic in transaction methods.
- Updated type definitions for `findMatchingPattern` to accept a more specific tokensMap structure, enhancing type safety.
- Refactored `dbMethods` initialization in `transactions.bulk-parity.spec.ts` to include the new utility functions, improving test clarity and functionality.

* refactor: Update database method imports and enhance transaction handling

- Refactored `abortMiddleware.js` to utilize centralized database methods for message handling and conversation retrieval, improving code consistency.
- Enhanced `bulkInsertTransactions` in `transaction.ts` to handle empty document arrays gracefully and added error logging for better debugging.
- Updated type definitions in `transactions.ts` to enforce stricter typing for token types, enhancing type safety across transaction methods.
- Improved test setup in `transactions.bulk-parity.spec.ts` by refining pricing function assignments and ensuring robust handling of potential null values.

* refactor: Update database method references and improve transaction multiplier handling

- Refactored `client.js` to update database method references for `bulkInsertTransactions` and `updateBalance`, ensuring consistency in method usage.
- Enhanced transaction multiplier calculations in `transaction.spec.ts` to provide fallback values for write and read multipliers, improving robustness in cost calculations across structured token spending tests.
2026-03-21 14:28:53 -04:00
Danny Avila
8ba2bde5c1
📦 refactor: Consolidate DB models, encapsulating Mongoose usage in data-schemas (#11830)
* chore: move database model methods to /packages/data-schemas

* chore: add TypeScript ESLint rule to warn on unused variables

* refactor: model imports to streamline access

- Consolidated model imports across various files to improve code organization and reduce redundancy.
- Updated imports for models such as Assistant, Message, Conversation, and others to a unified import path.
- Adjusted middleware and service files to reflect the new import structure, ensuring functionality remains intact.
- Enhanced test files to align with the new import paths, maintaining test coverage and integrity.

* chore: migrate database models to packages/data-schemas and refactor all direct Mongoose Model usage outside of data-schemas

* test: update agent model mocks in unit tests

- Added `getAgent` mock to `client.test.js` to enhance test coverage for agent-related functionality.
- Removed redundant `getAgent` and `getAgents` mocks from `openai.spec.js` and `responses.unit.spec.js` to streamline test setup and reduce duplication.
- Ensured consistency in agent mock implementations across test files.

* fix: update types in data-schemas

* refactor: enhance type definitions in transaction and spending methods

- Updated type definitions in `checkBalance.ts` to use specific request and response types.
- Refined `spendTokens.ts` to utilize a new `SpendTxData` interface for better clarity and type safety.
- Improved transaction handling in `transaction.ts` by introducing `TransactionResult` and `TxData` interfaces, ensuring consistent data structures across methods.
- Adjusted unit tests in `transaction.spec.ts` to accommodate new type definitions and enhance robustness.

* refactor: streamline model imports and enhance code organization

- Consolidated model imports across various controllers and services to a unified import path, improving code clarity and reducing redundancy.
- Updated multiple files to reflect the new import structure, ensuring all functionalities remain intact.
- Enhanced overall code organization by removing duplicate import statements and optimizing the usage of model methods.

* feat: implement loadAddedAgent and refactor agent loading logic

- Introduced `loadAddedAgent` function to handle loading agents from added conversations, supporting multi-convo parallel execution.
- Created a new `load.ts` file to encapsulate agent loading functionalities, including `loadEphemeralAgent` and `loadAgent`.
- Updated the `index.ts` file to export the new `load` module instead of the deprecated `loadAgent`.
- Enhanced type definitions and improved error handling in the agent loading process.
- Adjusted unit tests to reflect changes in the agent loading structure and ensure comprehensive coverage.

* refactor: enhance balance handling with new update interface

- Introduced `IBalanceUpdate` interface to streamline balance update operations across the codebase.
- Updated `upsertBalanceFields` method signatures in `balance.ts`, `transaction.ts`, and related tests to utilize the new interface for improved type safety.
- Adjusted type imports in `balance.spec.ts` to include `IBalanceUpdate`, ensuring consistency in balance management functionalities.
- Enhanced overall code clarity and maintainability by refining type definitions related to balance operations.

* feat: add unit tests for loadAgent functionality and enhance agent loading logic

- Introduced comprehensive unit tests for the `loadAgent` function, covering various scenarios including null and empty agent IDs, loading of ephemeral agents, and permission checks.
- Enhanced the `initializeClient` function by moving `getConvoFiles` to the correct position in the database method exports, ensuring proper functionality.
- Improved test coverage for agent loading, including handling of non-existent agents and user permissions.

* chore: reorder memory method exports for consistency

- Moved `deleteAllUserMemories` to the correct position in the exported memory methods, ensuring a consistent and logical order of method exports in `memory.ts`.
2026-03-21 14:28:53 -04:00
Danny Avila
8e8fb01d18
🧱 fix: Enforce Agent Access Control on Context and OCR File Loading (#12253)
* 🔏 fix: Apply agent access control filtering to context/OCR resource loading

The context/OCR file path in primeResources fetched files by file_id
without applying filterFilesByAgentAccess, unlike the file_search and
execute_code paths. Add filterFiles dependency injection to primeResources
and invoke it after getFiles to enforce consistent access control.

* fix: Wire filterFilesByAgentAccess into all agent initialization callers

Pass the filterFilesByAgentAccess function from the JS layer into the TS
initializeAgent → primeResources chain via dependency injection, covering
primary, handoff, added-convo, and memory agent init paths.

* test: Add access control filtering tests for primeResources

Cover filterFiles invocation with context/OCR files, verify filtering
rejects inaccessible files, and confirm graceful fallback when filterFiles,
userId, or agentId are absent.

* fix: Guard filterFilesByAgentAccess against ephemeral agent IDs

Ephemeral agents have no DB document, so getAgent returns null and the
access map defaults to all-false, silently blocking all non-owned files.
Short-circuit with isEphemeralAgentId to preserve the pass-through
behavior for inline-built agents (memory, tool agents).

* fix: Clean up resources.ts and JS caller import order

Remove redundant optional chain on req.user.role inside user-guarded
block, update primeResources JSDoc with filterFiles and agentId params,
and reorder JS imports to longest-to-shortest per project conventions.

* test: Strengthen OCR assertion and add filterFiles error-path test

Use toHaveBeenCalledWith for the OCR filtering test to verify exact
arguments after the OCR→context merge step. Add test for filterFiles
rejection to verify graceful degradation (logs error, returns original
tool_resources).

* fix: Correct import order in addedConvo.js and initialize.js

Sort by total line length descending: loadAddedAgent (91) before
filterFilesByAgentAccess (84), loadAgentTools (91) before
filterFilesByAgentAccess (84).

* test: Add unit tests for filterFilesByAgentAccess and hasAccessToFilesViaAgent

Cover every branch in permissions.js: ephemeral agent guard, missing
userId/agentId/files early returns, all-owned short-circuit, mixed
owned + non-owned with VIEW/no-VIEW, agent-not-found fail-closed,
author path scoped to attached files, EDIT gate on delete, DB error
fail-closed, and agent with no tool_resources.

* test: Cover file.user undefined/null in permissions spec

Files with no user field fall into the non-owned path and get run
through hasAccessToFilesViaAgent. Add two cases: attached file with
no user field is returned, unattached file with no user field is
excluded.
2026-03-15 23:02:36 -04:00
Danny Avila
9a5d7eaa4e
refactor: Replace tiktoken with ai-tokenizer (#12175)
* chore: Update dependencies by adding ai-tokenizer and removing tiktoken

- Added ai-tokenizer version 1.0.6 to package.json and package-lock.json across multiple packages.
- Removed tiktoken version 1.0.15 from package.json and package-lock.json in the same locations, streamlining dependency management.

* refactor: replace js-tiktoken with ai-tokenizer

- Added support for 'claude' encoding in the AgentClient class to improve model compatibility.
- Updated Tokenizer class to utilize 'ai-tokenizer' for both 'o200k_base' and 'claude' encodings, replacing the previous 'tiktoken' dependency.
- Refactored tests to reflect changes in tokenizer behavior and ensure accurate token counting for both encoding types.
- Removed deprecated references to 'tiktoken' and adjusted related tests for improved clarity and functionality.

* chore: remove tiktoken mocks from DALLE3 tests

- Eliminated mock implementations of 'tiktoken' from DALLE3-related test files to streamline test setup and align with recent dependency updates.
- Adjusted related test structures to ensure compatibility with the new tokenizer implementation.

* chore: Add distinct encoding support for Anthropic Claude models

- Introduced a new method `getEncoding` in the AgentClient class to handle the specific BPE tokenizer for Claude models, ensuring compatibility with the distinct encoding requirements.
- Updated documentation to clarify the encoding logic for Claude and other models.

* docs: Update return type documentation for getEncoding method in AgentClient

- Clarified the return type of the getEncoding method to specify that it can return an EncodingName or undefined, enhancing code readability and type safety.

* refactor: Tokenizer class and error handling

- Exported the EncodingName type for broader usage.
- Renamed encodingMap to encodingData for clarity.
- Improved error handling in getTokenCount method to ensure recovery attempts are logged and return 0 on failure.
- Updated countTokens function documentation to specify the use of 'o200k_base' encoding.

* refactor: Simplify encoding documentation and export type

- Updated the getEncoding method documentation to clarify the default behavior for non-Anthropic Claude models.
- Exported the EncodingName type separately from the Tokenizer module for improved clarity and usage.

* test: Update text processing tests for token limits

- Adjusted test cases to handle smaller text sizes, changing scenarios from ~120k tokens to ~20k tokens for both the real tokenizer and countTokens functions.
- Updated token limits in tests to reflect new constraints, ensuring tests accurately assess performance and call reduction.
- Enhanced console log messages for clarity regarding token counts and reductions in the updated scenarios.

* refactor: Update Tokenizer imports and exports

- Moved Tokenizer and countTokens exports to the tokenizer module for better organization.
- Adjusted imports in memory.ts to reflect the new structure, ensuring consistent usage across the codebase.
- Updated memory.test.ts to mock the Tokenizer from the correct module path, enhancing test accuracy.

* refactor: Tokenizer initialization and error handling

- Introduced an async `initEncoding` method to preload tokenizers, improving performance and accuracy in token counting.
- Updated `getTokenCount` to handle uninitialized tokenizers more gracefully, ensuring proper recovery and logging on errors.
- Removed deprecated synchronous tokenizer retrieval, streamlining the overall tokenizer management process.

* test: Enhance tokenizer tests with initialization and encoding checks

- Added `beforeAll` hooks to initialize tokenizers for 'o200k_base' and 'claude' encodings before running tests, ensuring proper setup.
- Updated tests to validate the loading of encodings and the correctness of token counts for both 'o200k_base' and 'claude'.
- Improved test structure to deduplicate concurrent initialization calls, enhancing performance and reliability.
2026-03-10 23:14:52 -04:00
Danny Avila
e1e204d6cf
🧮 refactor: Bulk Transactions & Balance Updates for Token Spending (#11996)
* refactor: transaction handling by integrating pricing and bulk write operations

- Updated `recordCollectedUsage` to accept pricing functions and bulk write operations, improving transaction management.
- Refactored `AgentClient` and related controllers to utilize the new transaction handling capabilities, ensuring better performance and accuracy in token spending.
- Added tests to validate the new functionality, ensuring correct behavior for both standard and bulk transaction paths.
- Introduced a new `transactions.ts` file to encapsulate transaction-related logic and types, enhancing code organization and maintainability.

* chore: reorganize imports in agents client controller

- Moved `getMultiplier` and `getCacheMultiplier` imports to maintain consistency and clarity in the import structure.
- Removed duplicate import of `updateBalance` and `bulkInsertTransactions`, streamlining the code for better readability.

* refactor: add TransactionData type and CANCEL_RATE constant to data-schemas

Establishes a single source of truth for the transaction document shape
and the incomplete-context billing rate constant, both consumed by
packages/api and api/.

* refactor: use proper types in data-schemas transaction methods

- Replace `as unknown as { tokenCredits }` with `lean<IBalance>()`
- Use `TransactionData[]` instead of `Record<string, unknown>[]`
  for bulkInsertTransactions parameter
- Add JSDoc noting insertMany bypasses document middleware
- Remove orphan section comment in methods/index.ts

* refactor: use shared types in transactions.ts, fix bulk write logic

- Import CANCEL_RATE from data-schemas instead of local duplicate
- Import TransactionData from data-schemas for PreparedEntry/BulkWriteDeps
- Use tilde alias for EndpointTokenConfig import
- Pass valueKey through to getMultiplier
- Only sum tokenValue for balance-enabled docs in bulkWriteTransactions
- Consolidate two loops into single-pass map

* refactor: remove duplicate updateBalance from Transaction.js

Import updateBalance from ~/models (sourced from data-schemas) instead
of maintaining a second copy. Also import CANCEL_RATE from data-schemas
and remove the Balance model import (no longer needed directly).

* fix: test real spendCollectedUsage instead of IIFE replica

Export spendCollectedUsage from abortMiddleware.js and rewrite the test
file to import and test the actual function. Previously the tests ran
against a hand-written replica that could silently diverge from the real
implementation.

* test: add transactions.spec.ts and restore regression comments

Add 22 direct unit tests for transactions.ts financial logic covering
prepareTokenSpend, prepareStructuredTokenSpend, bulkWriteTransactions,
CANCEL_RATE paths, NaN guards, disabled transactions, zero tokens,
cache multipliers, and balance-enabled filtering.

Restore critical regression documentation comments in
recordCollectedUsage.spec.js explaining which production bugs the
tests guard against.

* fix: widen setValues type to include lastRefill

The UpdateBalanceParams.setValues type was Partial<Pick<IBalance,
'tokenCredits'>> which excluded lastRefill — used by
createAutoRefillTransaction. Widen to also pick 'lastRefill'.

* test: use real MongoDB for bulkWriteTransactions tests

Replace mock-based bulkWriteTransactions tests with real DB tests using
MongoMemoryServer. Pure function tests (prepareTokenSpend,
prepareStructuredTokenSpend) remain mock-based since they don't touch
DB. Add end-to-end integration tests that verify the full prepare →
bulk write → DB state pipeline with real Transaction and Balance models.

* chore: update @librechat/agents dependency to version 3.1.54 in package-lock.json and related package.json files

* test: add bulk path parity tests proving identical DB outcomes

Three test suites proving the bulk path (prepareTokenSpend/
prepareStructuredTokenSpend + bulkWriteTransactions) produces
numerically identical results to the legacy path for all scenarios:

- usage.bulk-parity.spec.ts: mirrors all legacy recordCollectedUsage
  tests; asserts same return values and verifies metadata fields on
  the insertMany docs match what spendTokens args would carry

- transactions.bulk-parity.spec.ts: real-DB tests using actual
  getMultiplier/getCacheMultiplier pricing functions; asserts exact
  tokenValue, rate, rawAmount and balance deductions for standard
  tokens, structured/cache tokens, CANCEL_RATE, premium pricing,
  multi-entry batches, and edge cases (NaN, zero, disabled)

- Transaction.spec.js: adds describe('Bulk path parity') that mirrors
  7 key legacy tests via recordCollectedUsage + bulk deps against
  real MongoDB, asserting same balance deductions and doc counts

* refactor: update llmConfig structure to use modelKwargs for reasoning effort

Refactor the llmConfig in getOpenAILLMConfig to store reasoning effort within modelKwargs instead of directly on llmConfig. This change ensures consistency in the configuration structure and improves clarity in the handling of reasoning properties in the tests.

* test: update performance checks in processAssistantMessage tests

Revise the performance assertions in the processAssistantMessage tests to ensure that each message processing time remains under 100ms, addressing potential ReDoS vulnerabilities. This change enhances the reliability of the tests by focusing on maximum processing time rather than relative ratios.

* test: fill parity test gaps — model fallback, abort context, structured edge cases

- usage.bulk-parity: add undefined model fallback test
- transactions.bulk-parity: add abort context test (txns inserted,
  balance unchanged when balance not passed), fix readTokens type cast
- Transaction.spec: add 3 missing mirrors — balance disabled with
  transactions enabled, structured transactions disabled, structured
  balance disabled

* fix: deduct balance before inserting transactions to prevent orphaned docs

Swap the order in bulkWriteTransactions: updateBalance runs before
insertMany. If updateBalance fails (after exhausting retries), no
transaction documents are written — avoiding the inconsistent state
where transactions exist in MongoDB with no corresponding balance
deduction.

* chore: import order

* test: update config.spec.ts for OpenRouter reasoning in modelKwargs

Same fix as llm.spec.ts — OpenRouter reasoning is now passed via
modelKwargs instead of llmConfig.reasoning directly.
2026-03-01 12:26:36 -05:00
Danny Avila
8b159079f5
🪙 feat: Add messageId to Transactions (#11987)
* feat: Add messageId to transactions

* chore: field order

* feat: Enhance token usage tracking by adding messageId parameter

- Updated `recordTokenUsage` method in BaseClient to accept a new `messageId` parameter for improved tracking.
- Propagated `messageId` in the AgentClient when recording usage.
- Added tests to ensure `messageId` is correctly passed and handled in various scenarios, including propagation across multiple usage entries.

* chore: Correct field order in createGeminiImageTool function

- Moved the conversationId field to the correct position in the object being passed to the recordTokenUsage method, ensuring proper parameter alignment for improved functionality.

* refactor: Update OpenAIChatCompletionController and createResponse to use responseId instead of requestId

- Replaced instances of requestId with responseId in the OpenAIChatCompletionController for improved clarity in logging and tracking.
- Updated createResponse to include responseId in the requestBody, ensuring consistency across the handling of message identifiers.

* test: Add messageId to agent client tests

- Included messageId in the agent client tests to ensure proper handling and propagation of message identifiers during transaction recording.
- This update enhances the test coverage for scenarios involving messageId, aligning with recent changes in the tracking of message identifiers.

* fix: Update OpenAIChatCompletionController to use requestId for context

- Changed the context object in OpenAIChatCompletionController to use `requestId` instead of `responseId` for improved clarity and consistency in handling request identifiers.

* chore: field order
2026-02-27 23:50:13 -05:00
Danny Avila
a0f9782e60
🪣 fix: Prevent Memory Retention from AsyncLocalStorage Context Propagation (#11942)
* fix: store hide_sequential_outputs before processStream clears config

processStream now clears config.configurable after completion to break
memory retention chains. Save hide_sequential_outputs to a local
variable before calling runAgents so the post-stream filter still works.

* feat: memory diagnostics

* chore: expose garbage collection in backend inspect command

Updated the backend inspect command in package.json to include the --expose-gc flag, enabling garbage collection diagnostics for improved memory management during development.

* chore: update @librechat/agents dependency to version 3.1.52

Bumped the version of @librechat/agents in package.json and package-lock.json to ensure compatibility and access to the latest features and fixes.

* fix: clear heavy config state after processStream to prevent memory leaks

Break the reference chain from LangGraph's internal __pregel_scratchpad
through @langchain/core RunTree.extra[lc:child_config] into the
AsyncLocalStorage context captured by timers and I/O handles.

After stream completion, null out symbol-keyed scratchpad properties
(currentTaskInput), config.configurable, and callbacks. Also call
Graph.clearHeavyState() to release config, signal, content maps,
handler registry, and tool sessions.

* chore: fix imports for memory utils

* chore: add circular dependency check in API build step

Enhanced the backend review workflow to include a check for circular dependencies during the API build process. If a circular dependency is detected, an error message is displayed, and the process exits with a failure status.

* chore: update API build step to include circular dependency detection

Modified the backend review workflow to rename the API package installation step to reflect its new functionality, which now includes detection of circular dependencies during the build process.

* chore: add memory diagnostics option to .env.example

Included a commented-out configuration option for enabling memory diagnostics in the .env.example file, which logs heap and RSS snapshots every 60 seconds when activated.

* chore: remove redundant agentContexts cleanup in disposeClient function

Streamlined the disposeClient function by eliminating duplicate cleanup logic for agentContexts, ensuring efficient memory management during client disposal.

* refactor: move runOutsideTracing utility to utils and update its usage

Refactored the runOutsideTracing function by relocating it to the utils module for better organization. Updated the tool execution handler to utilize the new import, ensuring consistent tracing behavior during tool execution.

* refactor: enhance connection management and diagnostics

Added a method to ConnectionsRepository for retrieving the active connection count. Updated UserConnectionManager to utilize this new method for app connection count reporting. Refined the OAuthReconnectionTracker's getStats method to improve clarity in diagnostics. Introduced a new tracing utility in the utils module to streamline tracing context management. Additionally, added a safeguard in memory diagnostics to prevent unnecessary snapshot collection for very short intervals.

* refactor: enhance tracing utility and add memory diagnostics tests

Refactored the runOutsideTracing function to improve warning logic when the AsyncLocalStorage context is missing. Added tests for memory diagnostics and tracing utilities to ensure proper functionality and error handling. Introduced a new test suite for memory diagnostics, covering snapshot collection and garbage collection behavior.
2026-02-25 17:41:23 -05:00
Danny Avila
f72378d389
🧩 chore: Extract Agent Client Utilities to /packages/api (#11789)
Extract 7 standalone utilities from api/server/controllers/agents/client.js
into packages/api/src/agents/client.ts for TypeScript support and to
declutter the 1400-line controller module:

- omitTitleOptions: Set of keys to exclude from title generation options
- payloadParser: Extracts model_parameters from request body for non-agent endpoints
- createTokenCounter: Factory for langchain-compatible token counting functions
- logToolError: Callback handler for agent tool execution errors
- findPrimaryAgentId: Resolves primary agent from suffixed parallel agent IDs
- createMultiAgentMapper: Message content processor that filters parallel agent
  output to primary agents and applies agent labels for handoff/multi-agent flows

Supporting changes:
- Add endpointOption and endpointType to RequestBody type (packages/api/src/types/http.ts)
  so payloadParser can access middleware-attached fields without type casts
- Add @typescript-eslint/no-unused-vars with underscore ignore patterns to the
  packages/api eslint config block, matching the convention used by client/ and
  data-provider/ blocks
- Update agent controller imports to consume the moved functions from @librechat/api
  and remove now-unused direct imports (logAxiosError, labelContentByAgent,
  getTokenCountForMessage)
2026-02-13 23:17:53 -05:00
Danny Avila
467df0f07a
🎭 feat: Override Custom Endpoint Schema with Specified Params Endpoint (#11788)
* 🔧 refactor: Simplify payload parsing and enhance getSaveOptions logic

- Removed unused bedrockInputSchema from payloadParser, streamlining the function.
- Updated payloadParser to handle optional chaining for model parameters.
- Enhanced getSaveOptions to ensure runOptions defaults to an empty object if parsing fails, improving robustness.
- Adjusted the assignment of maxContextTokens to use the instance variable for consistency.

* 🔧 fix: Update maxContextTokens assignment logic in initializeAgent function

- Enhanced the maxContextTokens assignment to allow for user-defined values, ensuring it defaults to a calculated value only when not provided or invalid. This change improves flexibility in agent initialization.

* 🧪 test: Add unit tests for initializeAgent function

- Introduced comprehensive unit tests for the initializeAgent function, focusing on maxContextTokens behavior.
- Tests cover scenarios for user-defined values, fallback calculations, and edge cases such as zero and negative values, enhancing overall test coverage and reliability of agent initialization logic.

* refactor: default params Endpoint Configuration Handling

- Integrated `getEndpointsConfig` to fetch endpoint configurations, allowing for dynamic handling of `defaultParamsEndpoint`.
- Updated `buildEndpointOption` to pass `defaultParamsEndpoint` to `parseCompactConvo`, ensuring correct parameter handling based on endpoint type.
- Added comprehensive unit tests for `buildDefaultConvo` and `cleanupPreset` to validate behavior with `defaultParamsEndpoint`, covering various scenarios and edge cases.
- Refactored related hooks and utility functions to support the new configuration structure, improving overall flexibility and maintainability.

* refactor: Centralize defaultParamsEndpoint retrieval

- Introduced `getDefaultParamsEndpoint` function to streamline the retrieval of `defaultParamsEndpoint` across various hooks and middleware.
- Updated multiple files to utilize the new function, enhancing code consistency and maintainability.
- Removed redundant logic for fetching `defaultParamsEndpoint`, simplifying the codebase.
2026-02-13 23:04:51 -05:00
Danny Avila
7067c35787
🏁 fix: Resolve Content Aggregation Race Condition in Agent Event Handlers (#11757)
* 🔧 refactor: Consolidate aggregateContent calls in agent handlers

- Moved aggregateContent function calls to the beginning of the event handling functions in the agent callbacks to ensure consistent data aggregation before processing events. This change improves code clarity and maintains the intended functionality without redundancy.

* 🔧 chore: Update @librechat/agents to version 3.1.40 in package.json and package-lock.json across multiple packages

* 🔧 fix: Increase default recursion limit in AgentClient from 25 to 50 for improved processing capability
2026-02-12 15:42:22 -05:00
Danny Avila
5af1342dbb
🦥 refactor: Event-Driven Lazy Tool Loading (#11588)
* refactor: json schema tools with lazy loading

- Added LocalToolExecutor class for lazy loading and caching of tools during execution.
- Introduced ToolExecutionContext and ToolExecutor interfaces for better type management.
- Created utility functions to generate tool proxies with JSON schema support.
- Added ExtendedJsonSchema type for enhanced schema definitions.
- Updated existing toolkits to utilize the new schema and executor functionalities.
- Introduced a comprehensive tool definitions registry for managing various tool schemas.

chore: update @librechat/agents to version 3.1.2

refactor: enhance tool loading optimization and classification

- Improved the loadAgentToolsOptimized function to utilize a proxy pattern for all tools, enabling deferred execution and reducing overhead.
- Introduced caching for tool instances and refined tool classification logic to streamline tool management.
- Updated the handling of MCP tools to improve logging and error reporting for missing tools in the cache.
- Enhanced the structure of tool definitions to support better classification and integration with existing tools.

refactor: modularize tool loading and enhance optimization

- Moved the loadAgentToolsOptimized function to a new service file for better organization and maintainability.
- Updated the ToolService to utilize the new service for optimized tool loading, improving code clarity.
- Removed legacy tool loading methods and streamlined the tool loading process to enhance performance and reduce complexity.
- Introduced feature flag handling for optimized tool loading, allowing for easier toggling of this functionality.

refactor: replace loadAgentToolsWithFlag with loadAgentTools in tool loader

refactor: enhance MCP tool loading with proxy creation and classification

refactor: optimize MCP tool loading by grouping tools by server

- Introduced a Map to group cached tools by server name, improving the organization of tool data.
- Updated the createMCPProxyTool function to accept server name directly, enhancing clarity.
- Refactored the logic for handling MCP tools, streamlining the process of creating proxy tools for classification.

refactor: enhance MCP tool loading and proxy creation

- Added functionality to retrieve MCP server tools and reinitialize servers if necessary, improving tool availability.
- Updated the tool loading logic to utilize a Map for organizing tools by server, enhancing clarity and performance.
- Refactored the createToolProxy function to ensure a default response format, streamlining tool creation.

refactor: update createToolProxy to ensure consistent response format

- Modified the createToolProxy function to await the executor's execution and validate the result format.
- Ensured that the function returns a default response structure when the result is not an array of two elements, enhancing reliability in tool proxy creation.

refactor: ToolExecutionContext with toolCall property

- Added toolCall property to ToolExecutionContext interface for improved context handling during tool execution.
- Updated LocalToolExecutor to include toolCall in the runnable configuration, allowing for more flexible tool invocation.
- Modified createToolProxy to pass toolCall from the configuration, ensuring consistent context across tool executions.

refactor: enhance event-driven tool execution and logging

- Introduced ToolExecuteOptions for improved handling of event-driven tool execution, allowing for parallel execution of tool calls.
- Updated getDefaultHandlers to include support for ON_TOOL_EXECUTE events, enhancing the flexibility of tool invocation.
- Added detailed logging in LocalToolExecutor to track tool loading and execution metrics, improving observability and debugging capabilities.
- Refactored initializeClient to integrate event-driven tool loading, ensuring compatibility with the new execution model.

chore: update @librechat/agents to version 3.1.21

refactor: remove legacy tool loading and executor components

- Eliminated the loadAgentToolsWithFlag function, simplifying the tool loading process by directly using loadAgentTools.
- Removed the LocalToolExecutor and related executor components to streamline the tool execution architecture.
- Updated ToolService and related files to reflect the removal of deprecated features, enhancing code clarity and maintainability.

refactor: enhance tool classification and definitions handling

- Updated the loadAgentTools function to return toolDefinitions alongside toolRegistry, improving the structure of tool data returned to clients.
- Removed the convertRegistryToDefinitions function from the initialize.js file, simplifying the initialization process.
- Adjusted the buildToolClassification function to ensure toolDefinitions are built and returned simultaneously with the toolRegistry, enhancing efficiency in tool management.
- Updated type definitions in initialize.ts to include toolDefinitions, ensuring consistency across the codebase.

refactor: implement event-driven tool execution handler

- Introduced createToolExecuteHandler function to streamline the handling of ON_TOOL_EXECUTE events, allowing for parallel execution of tool calls.
- Updated getDefaultHandlers to utilize the new handler, simplifying the event-driven architecture.
- Added handlers.ts file to encapsulate tool execution logic, improving code organization and maintainability.
- Enhanced OpenAI handlers to integrate the new tool execution capabilities, ensuring consistent event handling across the application.

refactor: integrate event-driven tool execution options

- Added toolExecuteOptions to support event-driven tool execution in OpenAI and responses controllers, enhancing flexibility in tool handling.
- Updated handlers to utilize createToolExecuteHandler, allowing for streamlined execution of tools during agent interactions.
- Refactored service dependencies to include toolExecuteOptions, ensuring consistent integration across the application.

refactor: enhance tool loading with definitionsOnly parameter

- Updated createToolLoader and loadAgentTools functions to include a definitionsOnly parameter, allowing for the retrieval of only serializable tool definitions in event-driven mode.
- Adjusted related interfaces and documentation to reflect the new parameter, improving clarity and flexibility in tool management.
- Ensured compatibility across various components by integrating the definitionsOnly option in the initialization process.

refactor: improve agent tool presence check in initialization

- Added a check for tool presence using a new hasAgentTools variable, which evaluates both structuredTools and toolDefinitions.
- Updated the conditional logic in the agent initialization process to utilize the hasAgentTools variable, enhancing clarity and maintainability in tool management.

refactor: enhance agent tool extraction to support tool definitions

- Updated the extractMCPServers function to handle both tool instances and serializable tool definitions, improving flexibility in agent tool management.
- Added a new property toolDefinitions to the AgentWithTools type for better integration of event-driven mode.
- Enhanced documentation to clarify the function's capabilities in extracting unique MCP server names from both tools and tool definitions.

refactor: enhance tool classification and registry building

- Added serverName property to ToolDefinition for improved tool identification.
- Introduced buildToolRegistry function to streamline the creation of tool registries based on MCP tool definitions and agent options.
- Updated buildToolClassification to utilize the new registry building logic, ensuring basic definitions are returned even when advanced classification features are not allowed.
- Enhanced documentation and logging for clarity in tool classification processes.

refactor: update @librechat/agents dependency to version 3.1.22

fix: expose loadTools function in ToolService

- Added loadTools function to the exported module in ToolService.js, enhancing the accessibility of tool loading functionality.

chore: remove configurable options from tool execute options in OpenAI controller

refactor: enhance tool loading mechanism to utilize agent-specific context

chore: update @librechat/agents dependency to version 3.1.23

fix: simplify result handling in createToolExecuteHandler

* refactor: loadToolDefinitions for efficient tool loading in event-driven mode

* refactor: replace legacy tool loading with loadToolsForExecution in OpenAI and responses controllers

- Updated OpenAIChatCompletionController and createResponse functions to utilize loadToolsForExecution for improved tool loading.
- Removed deprecated loadToolsLegacy references, streamlining the tool execution process.
- Enhanced tool loading options to include agent-specific context and configurations.

* refactor: enhance tool loading and execution handling

- Introduced loadActionToolsForExecution function to streamline loading of action tools, improving organization and maintainability.
- Updated loadToolsForExecution to handle both regular and action tools, optimizing the tool loading process.
- Added detailed logging for missing tools in createToolExecuteHandler, enhancing error visibility.
- Refactored tool definitions to normalize action tool names, improving consistency in tool management.

* refactor: enhance built-in tool definitions loading

- Updated loadToolDefinitions to include descriptions and parameters from the tool registry for built-in tools, improving the clarity and usability of tool definitions.
- Integrated getToolDefinition to streamline the retrieval of tool metadata, enhancing the overall tool management process.

* feat: add action tool definitions loading to tool service

- Introduced getActionToolDefinitions function to load action tool definitions based on agent ID and tool names, enhancing the tool loading process.
- Updated loadToolDefinitions to integrate action tool definitions, allowing for better management and retrieval of action-specific tools.
- Added comprehensive tests for action tool definitions to ensure correct loading and parameter handling, improving overall reliability and functionality.

* chore: update @librechat/agents dependency to version 3.1.26

* refactor: add toolEndCallback to handle tool execution results

* fix: tool definitions and execution handling

- Introduced native tools (execute_code, file_search, web_search) to the tool service, allowing for better integration and management of these tools.
- Updated isBuiltInTool function to include native tools in the built-in check, improving tool recognition.
- Added comprehensive tests for loading parameters of native tools, ensuring correct functionality and parameter handling.
- Enhanced tool definitions registry to include new agent tool definitions, streamlining tool retrieval and management.

* refactor: enhance tool loading and execution context

- Added toolRegistry to the context for OpenAIChatCompletionController and createResponse functions, improving tool management.
- Updated loadToolsForExecution to utilize toolRegistry for better integration of programmatic tools and tool search functionalities.
- Enhanced the initialization process to include toolRegistry in agent context, streamlining tool access and configuration.
- Refactored tool classification logic to support event-driven execution, ensuring compatibility with new tool definitions.

* chore: add request duration logging to OpenAI and Responses controllers

- Introduced logging for request start and completion times in OpenAIChatCompletionController and createResponse functions.
- Calculated and logged the duration of each request, enhancing observability and performance tracking.
- Improved debugging capabilities by providing detailed logs for both streaming and non-streaming responses.

* chore: update @librechat/agents dependency to version 3.1.27

* refactor: implement buildToolSet function for tool management

- Introduced buildToolSet function to streamline the creation of tool sets from agent configurations, enhancing tool management across various controllers.
- Updated AgentClient, OpenAIChatCompletionController, and createResponse functions to utilize buildToolSet, improving consistency in tool handling.
- Added comprehensive tests for buildToolSet to ensure correct functionality and edge case handling, enhancing overall reliability.

* refactor: update import paths for ToolExecuteOptions and createToolExecuteHandler

* fix: update GoogleSearch.js description for maximum search results

- Changed the default maximum number of search results from 10 to 5 in the Google Search JSON schema description, ensuring accurate documentation of the expected behavior.

* chore: remove deprecated Browser tool and associated assets

- Deleted the Browser tool definition from manifest.json, which included its name, plugin key, description, and authentication configuration.
- Removed the web-browser.svg asset as it is no longer needed following the removal of the Browser tool.

* fix: ensure tool definitions are valid before processing

- Added a check to verify the existence of tool definitions in the registry before accessing their properties, preventing potential runtime errors.
- Updated the loading logic for built-in tool definitions to ensure that only valid definitions are pushed to the built-in tool definitions array.

* fix: extend ExtendedJsonSchema to support 'null' type and nullable enums

- Updated the ExtendedJsonSchema type to include 'null' as a valid type option.
- Modified the enum property to accept an array of values that can include strings, numbers, booleans, and null, enhancing schema flexibility.

* test: add comprehensive tests for tool definitions loading and registry behavior

- Implemented tests to verify the handling of built-in tools without registry definitions, ensuring they are skipped correctly.
- Added tests to confirm that built-in tools include descriptions and parameters in the registry.
- Enhanced tests for action tools, checking for proper inclusion of metadata and handling of tools without parameters in the registry.

* test: add tests for mixed-type and number enum schema handling

- Introduced tests to validate the parsing of mixed-type enum values, including strings, numbers, booleans, and null.
- Added tests for number enum schema values to ensure correct parsing of numeric inputs, enhancing schema validation coverage.

* fix: update mock implementation for @librechat/agents

- Changed the mock for @librechat/agents to spread the actual module's properties, ensuring that all necessary functionalities are preserved in tests.
- This adjustment enhances the accuracy of the tests by reflecting the real structure of the module.

* fix: change max_results type in GoogleSearch schema from number to integer

- Updated the type of max_results in the Google Search JSON schema to 'integer' for better type accuracy and validation consistency.

* fix: update max_results description and type in GoogleSearch schema

- Changed the type of max_results from 'number' to 'integer' for improved type accuracy.
- Updated the description to reflect the new default maximum number of search results, changing it from 10 to 5.

* refactor: remove unused code and improve tool registry handling

- Eliminated outdated comments and conditional logic related to event-driven mode in the ToolService.
- Enhanced the handling of the tool registry by ensuring it is configurable for better integration during tool execution.

* feat: add definitionsOnly option to buildToolClassification for event-driven mode

- Introduced a new parameter, definitionsOnly, to the BuildToolClassificationParams interface to enable a mode that skips tool instance creation.
- Updated the buildToolClassification function to conditionally add tool definitions without instantiating tools when definitionsOnly is true.
- Modified the loadToolDefinitions function to pass definitionsOnly as true, ensuring compatibility with the new feature.

* test: add unit tests for buildToolClassification with definitionsOnly option

- Implemented tests to verify the behavior of buildToolClassification when definitionsOnly is set to true or false.
- Ensured that tool instances are not created when definitionsOnly is true, while still adding necessary tool definitions.
- Confirmed that loadAuthValues is called appropriately based on the definitionsOnly parameter, enhancing test coverage for this new feature.
2026-02-01 08:50:57 -05:00
Danny Avila
75c02a1a18
🗂️ feat: Better Persistence for Code Execution Files Between Sessions (#11362)
* refactor: process code output files for re-use (WIP)

* feat: file attachment handling with additional metadata for downloads

* refactor: Update directory path logic for local file saving based on basePath

* refactor: file attachment handling to support TFile type and improve data merging logic

* feat: thread filtering of code-generated files

- Introduced parentMessageId parameter in addedConvo and initialize functions to enhance thread management.
- Updated related methods to utilize parentMessageId for retrieving messages and filtering code-generated files by conversation threads.
- Enhanced type definitions to include parentMessageId in relevant interfaces for better clarity and usage.

* chore: imports/params ordering

* feat: update file model to use messageId for filtering and processing

- Changed references from 'message' to 'messageId' in file-related methods for consistency.
- Added messageId field to the file schema and updated related types.
- Enhanced file processing logic to accommodate the new messageId structure.

* feat: enhance file retrieval methods to support user-uploaded execute_code files

- Added a new method `getUserCodeFiles` to retrieve user-uploaded execute_code files, excluding code-generated files.
- Updated existing file retrieval methods to improve filtering logic and handle edge cases.
- Enhanced thread data extraction to collect both message IDs and file IDs efficiently.
- Integrated `getUserCodeFiles` into relevant endpoints for better file management in conversations.

* chore: update @librechat/agents package version to 3.0.78 in package-lock.json and related package.json files

* refactor: file processing and retrieval logic

- Added a fallback mechanism for download URLs when files exceed size limits or cannot be processed locally.
- Implemented a deduplication strategy for code-generated files based on conversationId and filename to optimize storage.
- Updated file retrieval methods to ensure proper filtering by messageIds, preventing orphaned files from being included.
- Introduced comprehensive tests for new thread data extraction functionality, covering edge cases and performance considerations.

* fix: improve file retrieval tests and handling of optional properties

- Updated tests to safely access optional properties using non-null assertions.
- Modified test descriptions for clarity regarding the exclusion of execute_code files.
- Ensured that the retrieval logic correctly reflects the expected outcomes for file queries.

* test: add comprehensive unit tests for processCodeOutput functionality

- Introduced a new test suite for the processCodeOutput function, covering various scenarios including file retrieval, creation, and processing for both image and non-image files.
- Implemented mocks for dependencies such as axios, logger, and file models to isolate tests and ensure reliable outcomes.
- Validated behavior for existing files, new file creation, and error handling, including size limits and fallback mechanisms.
- Enhanced test coverage for metadata handling and usage increment logic, ensuring robust verification of file processing outcomes.

* test: enhance file size limit enforcement in processCodeOutput tests

- Introduced a configurable file size limit for tests to improve flexibility and coverage.
- Mocked the `librechat-data-provider` to allow dynamic adjustment of file size limits during tests.
- Updated the file size limit enforcement test to validate behavior when files exceed specified limits, ensuring proper fallback to download URLs.
- Reset file size limit after tests to maintain isolation for subsequent test cases.
2026-01-28 17:44:32 -05:00
Danny Avila
7c9c7e530b
⏲️ feat: Defer Loading MCP Tools (#11270)
* WIP: code ptc

* refactor: tool classification and calling logic

* 🔧 fix: Update @librechat/agents dependency to version 3.0.68

* chore: import order and correct renamed tool name for tool search

* refactor: streamline tool classification logic for local and programmatic tools

* feat: add per-tool configuration options for agents, including deferred loading and allowed callers

- Introduced `tool_options` in agent forms to manage tool behavior.
- Updated tool classification logic to prioritize agent-level configurations.
- Enhanced UI components to support tool deferral functionality.
- Added localization strings for new tool options and actions.

* feat: enhance agent schema with per-tool options for configuration

- Added `tool_options` schema to support per-tool configurations, including `defer_loading` and `allowed_callers`.
- Updated agent data model to incorporate new tool options, ensuring flexibility in tool behavior management.
- Modified type definitions to reflect the new `tool_options` structure for agents.

* feat: add tool_options parameter to loadTools and initializeAgent for enhanced agent configuration

* chore: update @librechat/agents dependency to version 3.0.71 and enhance agent tool loading logic

- Updated the @librechat/agents package to version 3.0.71 across multiple files.
- Added support for handling deferred loading of tools in agent initialization and execution processes.
- Improved the extraction of discovered tools from message history to optimize tool loading behavior.

* chore: update @librechat/agents dependency to version 3.0.72

* chore: update @librechat/agents dependency to version 3.0.75

* refactor: simplify tool defer loading logic in MCPTool component

- Removed local state management for deferred tools, relying on form state instead.
- Updated related functions to directly use form values for checking and toggling defer loading.
- Cleaned up code by eliminating unnecessary optimistic updates and local state dependencies.

* chore: remove deprecated localization strings for tool deferral in translation.json

- Eliminated unused strings related to deferred loading descriptions in the English translation file.
- Streamlined localization to reflect recent changes in tool loading logic.

* refactor: improve tool defer loading handling in MCPTool component

- Enhanced the logic for managing deferred loading of tools by simplifying the update process for tool options.
- Ensured that the state reflects the correct loading behavior based on the new deferred loading conditions.
- Cleaned up the code to remove unnecessary complexity in handling tool options.

* refactor: update agent mocks in callbacks test to use actual implementations

- Modified the agent mocks in the callbacks test to include actual implementations from the @librechat/agents module.
- This change enhances the accuracy of the tests by ensuring they reflect the real behavior of the agent functions.
2026-01-28 17:44:30 -05:00
Danny Avila
0b4deac953
🧩 fix: Missing Memory Agent Assignment for Matching IDs (#11514)
* fix: `useMemory` in AgentClient for PrelimAgent Assignment

* Updated the useMemory method in AgentClient to handle prelimAgent assignment based on memory configuration.
* Added logic to return early if prelimAgent is undefined, improving flow control.
* Introduced comprehensive unit tests to validate behavior for various memory configurations, including scenarios for matching and differing agent IDs, as well as handling of ephemeral agents.
* Mocked necessary dependencies in tests to ensure isolation and reliability of the new functionality.

* fix: Update temperature handling for Bedrock and Anthropic providers in memory management

* fix: Replace hardcoded provider strings with constants in memory agent tests

* fix: Replace hardcoded provider string with constant in allowedProviders for AgentClient

* fix: memory agent tests to use actual Providers and GraphEvents constants
2026-01-25 12:08:52 -05:00
Danny Avila
cfd5c793a9
🧑‍🏫 fix: Multi-Agent Instructions Handling (#11484)
* 🧑‍🏫 fix: Multi-Agent Instructions Handling

* Refactored AgentClient to streamline the process of building messages by applying shared run context and agent-specific instructions.
* Introduced new utility functions in context.ts for extracting MCP server names, fetching MCP instructions, and building combined agent instructions.
* Updated the Agent type to make instructions optional, allowing for more flexible agent configurations.
* Improved the handling of context application to agents, ensuring that all relevant information is correctly integrated before execution.

* chore: Update EphemeralAgent Type in Context

* Enhanced the context.ts file by importing the TEphemeralAgent type from librechat-data-provider.
* Updated the applyContextToAgent function to use TEphemeralAgent for the ephemeralAgent parameter, improving type safety and clarity in agent context handling.

* ci: Update Agent Instructions in Tests for Clarity

* Revised test assertions in AgentClient to clarify the source of agent instructions, ensuring they are explicitly referenced as coming from agent configuration rather than build options.
* Updated comments in tests to enhance understanding of the expected behavior regarding base agent instructions and their handling in various scenarios.

* ci: Unit Tests for Agent Context Utilities

* Introduced comprehensive unit tests for agent context utilities, including functions for extracting MCP servers, fetching MCP instructions, and building agent instructions.
* Enhanced test coverage to ensure correct behavior across various scenarios, including handling of empty tools, mixed tool types, and error cases.
* Improved type definitions for AgentWithTools to clarify the structure and requirements for agent context operations.
2026-01-22 19:36:06 -05:00
Danny Avila
36c5a88c4e
💰 fix: Multi-Agent Token Spending & Prevent Double-Spend (#11433)
* fix: Token Spending Logic for Multi-Agents on Abort Scenarios

* Implemented logic to skip token spending if a conversation is aborted, preventing double-spending.
* Introduced `spendCollectedUsage` function to handle token spending for multiple models during aborts, ensuring accurate accounting for parallel agents.
* Updated `GenerationJobManager` to store and retrieve collected usage data for improved abort handling.
* Added comprehensive tests for the new functionality, covering various scenarios including cache token handling and parallel agent usage.

* fix: Memory Context Handling for Multi-Agents

* Refactored `buildMessages` method to pass memory context to parallel agents, ensuring they share the same user context.
* Improved handling of memory context when no existing instructions are present for parallel agents.
* Added comprehensive tests to verify memory context propagation and behavior under various scenarios, including cases with no memory available and empty agent configurations.
* Enhanced logging for better traceability of memory context additions to agents.

* chore: Memory Context Documentation for Parallel Agents

* Updated documentation in the `AgentClient` class to clarify the in-place mutation of agentConfig objects when passing memory context to parallel agents.
* Added notes on the implications of mutating objects directly to ensure all parallel agents receive the correct memory context before execution.

* chore: UsageMetadata Interface docs for Token Spending

* Expanded the UsageMetadata interface to support both OpenAI and Anthropic cache token formats.
* Added detailed documentation for cache token properties, including mutually exclusive fields for different model types.
* Improved clarity on how to access cache token details for accurate token spending tracking.

* fix: Enhance Token Spending Logic in Abort Middleware

* Refactored `spendCollectedUsage` function to utilize Promise.all for concurrent token spending, improving performance and ensuring all operations complete before clearing the collectedUsage array.
* Added documentation to clarify the importance of clearing the collectedUsage array to prevent double-spending in abort scenarios.
* Updated tests to verify the correct behavior of the spending logic and the clearing of the array after spending operations.
2026-01-20 14:43:19 -05:00
Danny Avila
2a50c372ef
🪙 refactor: Collected Usage & Anthropic Prompt Caching (#11319)
* 🔧 refactor: Improve token calculation in AgentClient.recordCollectedUsage

- Updated the token calculation logic to sum output tokens directly from all entries, addressing issues with negative values in parallel execution scenarios.
- Added comments for clarity on the usage of input tokens and output tokens.
- Introduced a new test file for comprehensive testing of the recordCollectedUsage function, covering various execution scenarios including sequential and parallel processing, cache token handling, and model fallback logic.

* 🔧 refactor: Anthropic `promptCache` handling in LLM configuration

* 🔧 test: Add comprehensive test for cache token handling in recordCollectedUsage

- Introduced a new test case to validate the handling of cache tokens across multiple tool calls in the recordCollectedUsage function.
- Ensured correct calculations for input and output tokens, including scenarios with cache creation and reading.
- Verified the expected interactions with token spending methods to enhance the robustness of the token management logic.
2026-01-12 23:02:08 -05:00
Scott Finlay
083251508e
⏭️ fix: Skip Title Generation for Temporary Chats (#11282)
* Not generating titles for temporary chats

* Minor linter fix to prettify debug line

* Adding a test for skipping title generation for temporary chats
2026-01-09 14:34:30 -05:00
Dustin Healy
b5aa38ff33
💾 feat: Custom Endpoint Support for Memory LLM Config (#11214)
* feat: add support for designating custom endpoints to use with memory tool

* test: add tests for header resolution in processMemory

* chore: address comments
2026-01-06 11:25:07 -05:00
Danny Avila
e452c1a8d9
🔀 refactor: Conditional Mapping Support for Multi-Convo (Parallel) Messages (#11180)
* refactor: message handling with addedConvo support

- Introduced `addedConvo` property in message schema to track conversation additions.
- Updated `BaseClient` to conditionally include `addedConvo` in saved messages based on request body.
- Enhanced `AgentClient` to apply mapping logic for messages with the `addedConvo` flag, improving message processing.
- Updated documentation to reflect new optional `mapCondition` parameter for message mapping functions, enhancing flexibility in message handling.

* test: Add comprehensive tests for getMessagesForConversation method

- Introduced a suite of tests for the `getMessagesForConversation` method in the `AgentClient` to validate mapping logic based on `mapMethod` and `mapCondition`.
- Covered various scenarios including applying mapping to all messages, conditional mapping based on `addedConvo`, handling of empty messages, and preserving message order.
- Ensured robust handling of edge cases such as null `mapMethod` and undefined `mapCondition`, enhancing overall test coverage and reliability of message processing.
2026-01-02 19:42:54 -05:00
Danny Avila
791dab8f20
🫱🏼🫲🏽 refactor: Improve Agent Handoffs (#11172)
* fix: Tool Resources Dropped between Agent Handoffs

* fix: agent deletion process to remove handoff edges

- Added logic to the `deleteAgent` function to remove references to the deleted agent from other agents' handoff edges.
- Implemented error handling to log any issues encountered during the edge removal process.
- Introduced a new test case to verify that handoff edges are correctly removed when an agent is deleted, ensuring data integrity across agent relationships.

* fix: Improve agent loading process by handling orphaned references

- Added logic to track and log agents that fail to load during initialization, preventing errors from interrupting the process.
- Introduced a Set to store skipped agent IDs and updated edge filtering to exclude these orphaned references, enhancing data integrity in agent relationships.

* chore: Update @librechat/agents to version 3.0.62

* feat: Enhance agent initialization with edge collection and filtering

- Introduced new functions for edge collection and filtering orphaned edges, improving the agent loading process.
- Refactored the `initializeClient` function to utilize breadth-first search (BFS) for discovering connected agents, enabling transitive handoffs.
- Added a new module for edge-related utilities, including deduplication and participant extraction, to streamline edge management.
- Updated the agent configuration handling to ensure proper edge processing and integrity during initialization.

* refactor: primary agent ID selection for multi-agent conversations

- Added a new function `findPrimaryAgentId` to determine the primary agent ID from a set of agent IDs based on suffix rules.
- Updated `createMultiAgentMapper` to filter messages by primary agent for parallel agents and handle handoffs appropriately.
- Enhanced message processing logic to ensure correct inclusion of agent content based on group and agent ID presence.
- Improved documentation to clarify the distinctions between parallel execution and handoff scenarios.

* feat: Implement primary agent ID selection for multi-agent content filtering

* chore: Update @librechat/agents to version 3.0.63 in package.json and package-lock.json

* chore: Update @librechat/agents to version 3.0.64 in package.json and package-lock.json

* chore: Update @librechat/agents to version 3.0.65 in package.json and package-lock.json

* feat: Add optional agent name to run creation for improved identification

* chore: Update @librechat/agents to version 3.0.66 in package.json and package-lock.json

* test: Add unit tests for edge utilities including key generation, participant extraction, and orphaned edge filtering

- Implemented tests for `getEdgeKey`, `getEdgeParticipants`, `filterOrphanedEdges`, and `createEdgeCollector` functions.
- Ensured comprehensive coverage for various edge cases, including handling of arrays and default values.
- Verified correct behavior of edge filtering based on skipped agents and deduplication of edges.
2026-01-01 16:02:51 -05:00
Joseph Licata
90c63a56f3
🤖 feat: Anthropic Vertex AI Support (#10780)
* feat: Add Anthropic Vertex AI Support

* Remove changes from the unused AnthropicClient class

* Add @anthropic-ai/vertex-sdk as peerDependency to packages/api

* Clean up Vertex AI credentials handling

* feat: websearch header

* feat: add prompt caching support for Anthropic Vertex AI

- Support both OpenAI format (input_token_details) and Anthropic format (cache_*_input_tokens) for token usage tracking

- Filter out unsupported anthropic-beta header values for Vertex AI (prompt-caching, max-tokens, output-128k, token-efficient-tools, context-1m)

*  feat: Add Vertex AI support for Anthropic models

- Introduced configuration options for running Anthropic models via Google Cloud Vertex AI in the YAML file.
- Updated ModelService to prioritize Vertex AI models from the configuration.
- Enhanced endpoint configuration to enable Anthropic endpoint when Vertex AI is configured.
- Implemented validation and processing for Vertex AI credentials and options.
- Added new types and schemas for Vertex AI configuration in the data provider.
- Created utility functions for loading and validating Vertex AI credentials and configurations.
- Updated various services to integrate Vertex AI options into the Anthropic client setup.

* 🔒 fix: Improve error handling for missing credentials in LLM configuration

- Updated the `getLLMConfig` function to throw a specific error message when credentials are missing, enhancing clarity for users.
- Refactored the `parseCredentials` function to handle plain API key strings more gracefully, returning them wrapped in an object if JSON parsing fails.

* 🔧 refactor: Clean up code formatting and improve readability

- Updated the `setOptions` method in `AgentClient` to use a parameter name for clarity.
- Refactored error handling in `loadDefaultModels` for better readability.
- Removed unnecessary blank lines in `initialize.js`, `endpoints.ts`, and `vertex.ts` to streamline the code.
- Enhanced formatting in `validateVertexConfig` for improved consistency and clarity.

* 🔧 refactor: Enhance Vertex AI Model Configuration and Integration

- Updated the YAML configuration to support visible model names and deployment mappings for Vertex AI.
- Refactored the `loadDefaultModels` function to utilize the new model name structure.
- Improved the `initializeClient` function to pass full Vertex AI configuration, including model mappings.
- Added utility functions to map visible model names to deployment names, enhancing the integration of Vertex AI models.
- Updated various services and types to accommodate the new model configuration schema and improve overall clarity and functionality.

* 🔧 chore: Update @anthropic-ai/sdk dependency to version 0.71.0 in package.json and package-lock.json

* refactor: Change clientOptions declaration from let to const in initialize.ts for better code clarity

* chore: repository cleanup

* 🌊 feat: Resumable LLM Streams with Horizontal Scaling (#10926)

*  feat: Implement Resumable Generation Jobs with SSE Support

- Introduced GenerationJobManager to handle resumable LLM generation jobs independently of HTTP connections.
- Added support for subscribing to ongoing generation jobs via SSE, allowing clients to reconnect and receive updates without losing progress.
- Enhanced existing agent controllers and routes to integrate resumable functionality, including job creation, completion, and error handling.
- Updated client-side hooks to manage adaptive SSE streams, switching between standard and resumable modes based on user settings.
- Added UI components and settings for enabling/disabling resumable streams, improving user experience during unstable connections.

* WIP: resuming

* WIP: resumable stream

* feat: Enhance Stream Management with Abort Functionality

- Updated the abort endpoint to support aborting ongoing generation streams using either streamId or conversationId.
- Introduced a new mutation hook `useAbortStreamMutation` for client-side integration.
- Added `useStreamStatus` query to monitor stream status and facilitate resuming conversations.
- Enhanced `useChatHelpers` to incorporate abort functionality when stopping generation.
- Improved `useResumableSSE` to handle stream errors and token refresh seamlessly.
- Updated `useResumeOnLoad` to check for active streams and resume conversations appropriately.

* fix: Update query parameter handling in useChatHelpers

- Refactored the logic for determining the query parameter used in fetching messages to prioritize paramId from the URL, falling back to conversationId only if paramId is not available. This change ensures consistency with the ChatView component's expectations.

* fix: improve syncing when switching conversations

* fix: Prevent memory leaks in useResumableSSE by clearing handler maps on stream completion and cleanup

* fix: Improve content type mismatch handling in useStepHandler

- Enhanced the condition for detecting content type mismatches to include additional checks, ensuring more robust validation of content types before processing updates.

* fix: Allow dynamic content creation in useChatFunctions

- Updated the initial response handling to avoid pre-initializing content types, enabling dynamic creation of content parts based on incoming delta events. This change supports various content types such as think and text.

* fix: Refine response message handling in useStepHandler

- Updated logic to determine the appropriate response message based on the last message's origin, ensuring correct message replacement or appending based on user interaction. This change enhances the accuracy of message updates in the chat flow.

* refactor: Enhance GenerationJobManager with In-Memory Implementations

- Introduced InMemoryJobStore, InMemoryEventTransport, and InMemoryContentState for improved job management and event handling.
- Updated GenerationJobManager to utilize these new implementations, allowing for better separation of concerns and easier maintenance.
- Enhanced job metadata handling to support user messages and response IDs for resumable functionality.
- Improved cleanup and state management processes to prevent memory leaks and ensure efficient resource usage.

* refactor: Enhance GenerationJobManager with improved subscriber handling

- Updated RuntimeJobState to include allSubscribersLeftHandlers for managing client disconnections without affecting subscriber count.
- Refined createJob and subscribe methods to ensure generation starts only when the first real client connects.
- Added detailed documentation for methods and properties to clarify the synchronization of job generation with client readiness.
- Improved logging for subscriber checks and event handling to facilitate debugging and monitoring.

* chore: Adjust timeout for subscriber readiness in ResumableAgentController

- Reduced the timeout duration from 5000ms to 2500ms in the startGeneration function to improve responsiveness when waiting for subscriber readiness. This change aims to enhance the efficiency of the agent's background generation process.

* refactor: Update GenerationJobManager documentation and structure

- Enhanced the documentation for GenerationJobManager to clarify the architecture and pluggable service design.
- Updated comments to reflect the potential for Redis integration and the need for async refactoring.
- Improved the structure of the GenerationJob facade to emphasize the unified API while allowing for implementation swapping without affecting consumer code.

* refactor: Convert GenerationJobManager methods to async for improved performance

- Updated methods in GenerationJobManager and InMemoryJobStore to be asynchronous, enhancing the handling of job creation, retrieval, and management.
- Adjusted the ResumableAgentController and related routes to await job operations, ensuring proper flow and error handling.
- Increased timeout duration in ResumableAgentController's startGeneration function to 3500ms for better subscriber readiness management.

* refactor: Simplify initial response handling in useChatFunctions

- Removed unnecessary pre-initialization of content types in the initial response, allowing for dynamic content creation based on incoming delta events. This change enhances flexibility in handling various content types in the chat flow.

* refactor: Clarify content handling logic in useStepHandler

- Updated comments to better explain the handling of initialContent and existingContent in edit and resume scenarios.
- Simplified the logic for merging content, ensuring that initialContent is used directly when available, improving clarity and maintainability.

* refactor: Improve message handling logic in useStepHandler

- Enhanced the logic for managing messages in multi-tab scenarios, ensuring that the most up-to-date message history is utilized.
- Removed existing response placeholders and ensured user messages are included, improving the accuracy of message updates in the chat flow.

* fix: remove unnecessary content length logging in the chat stream response, simplifying the debug message while retaining essential information about run steps. This change enhances clarity in logging without losing critical context.

* refactor: Integrate streamId handling for improved resumable functionality for attachments

- Added streamId parameter to various functions to support resumable mode in tool loading and memory processing.
- Updated related methods to ensure proper handling of attachments and responses based on the presence of streamId, enhancing the overall streaming experience.
- Improved logging and attachment management to accommodate both standard and resumable modes.

* refactor: Streamline abort handling and integrate GenerationJobManager for improved job management

- Removed the abortControllers middleware and integrated abort handling directly into GenerationJobManager.
- Updated abortMessage function to utilize GenerationJobManager for aborting jobs by conversation ID, enhancing clarity and efficiency.
- Simplified cleanup processes and improved error handling during abort operations.
- Enhanced metadata management for jobs, including endpoint and model information, to facilitate better tracking and resource management.

* refactor: Unify streamId and conversationId handling for improved job management

- Updated ResumableAgentController and AgentController to generate conversationId upfront, ensuring it matches streamId for consistency.
- Simplified job creation and metadata management by removing redundant conversationId updates from callbacks.
- Refactored abortMiddleware and related methods to utilize the unified streamId/conversationId approach, enhancing clarity in job handling.
- Removed deprecated methods from GenerationJobManager and InMemoryJobStore, streamlining the codebase and improving maintainability.

* refactor: Enhance resumable SSE handling with improved UI state management and error recovery

- Added UI state restoration on successful SSE connection to indicate ongoing submission.
- Implemented detailed error handling for network failures, including retry logic with exponential backoff.
- Introduced abort event handling to reset UI state on intentional stream closure.
- Enhanced debugging capabilities for testing reconnection and clean close scenarios.
- Updated generation function to retry on network errors, improving resilience during submission processes.

* refactor: Consolidate content state management into IJobStore for improved job handling

- Removed InMemoryContentState and integrated its functionality into InMemoryJobStore, streamlining content state management.
- Updated GenerationJobManager to utilize jobStore for content state operations, enhancing clarity and reducing redundancy.
- Introduced RedisJobStore for horizontal scaling, allowing for efficient job management and content reconstruction from chunks.
- Updated IJobStore interface to reflect changes in content state handling, ensuring consistency across implementations.

* feat: Introduce Redis-backed stream services for enhanced job management

- Added createStreamServices function to configure job store and event transport, supporting both Redis and in-memory options.
- Updated GenerationJobManager to allow configuration with custom job stores and event transports, improving flexibility for different deployment scenarios.
- Refactored IJobStore interface to support asynchronous content retrieval, ensuring compatibility with Redis implementations.
- Implemented RedisEventTransport for real-time event delivery across instances, enhancing scalability and responsiveness.
- Updated InMemoryJobStore to align with new async patterns for content and run step retrieval, ensuring consistent behavior across storage options.

* refactor: Remove redundant debug logging in GenerationJobManager and RedisEventTransport

- Eliminated unnecessary debug statements in GenerationJobManager related to subscriber actions and job updates, enhancing log clarity.
- Removed debug logging in RedisEventTransport for subscription and subscriber disconnection events, streamlining the logging output.
- Cleaned up debug messages in RedisJobStore to focus on essential information, improving overall logging efficiency.

* refactor: Enhance job state management and TTL configuration in RedisJobStore

- Updated the RedisJobStore to allow customizable TTL values for job states, improving flexibility in job management.
- Refactored the handling of job expiration and cleanup processes to align with new TTL configurations.
- Simplified the response structure in the chat status endpoint by consolidating state retrieval, enhancing clarity and performance.
- Improved comments and documentation for better understanding of the changes made.

* refactor: cleanupOnComplete option to GenerationJobManager for flexible resource management

- Introduced a new configuration option, cleanupOnComplete, allowing immediate cleanup of event transport and job resources upon job completion.
- Updated completeJob and abortJob methods to respect the cleanupOnComplete setting, enhancing memory management.
- Improved cleanup logic in the cleanup method to handle orphaned resources effectively.
- Enhanced documentation and comments for better clarity on the new functionality.

* refactor: Update TTL configuration for completed jobs in InMemoryJobStore

- Changed the TTL for completed jobs from 5 minutes to 0, allowing for immediate cleanup.
- Enhanced cleanup logic to respect the new TTL setting, improving resource management.
- Updated comments for clarity on the behavior of the TTL configuration.

* refactor: Enhance RedisJobStore with local graph caching for improved performance

- Introduced a local cache for graph references using WeakRef to optimize reconnects for the same instance.
- Updated job deletion and cleanup methods to manage the local cache effectively, ensuring stale entries are removed.
- Enhanced content retrieval methods to prioritize local cache access, reducing Redis round-trips for same-instance reconnects.
- Improved documentation and comments for clarity on the caching mechanism and its benefits.

* feat: Add integration tests for GenerationJobManager, RedisEventTransport, and RedisJobStore, add Redis Cluster support

- Introduced comprehensive integration tests for GenerationJobManager, covering both in-memory and Redis modes to ensure consistent job management and event handling.
- Added tests for RedisEventTransport to validate pub/sub functionality, including cross-instance event delivery and error handling.
- Implemented integration tests for RedisJobStore, focusing on multi-instance job access, content reconstruction from chunks, and consumer group behavior.
- Enhanced test setup and teardown processes to ensure a clean environment for each test run, improving reliability and maintainability.

* fix: Improve error handling in GenerationJobManager for allSubscribersLeft handlers

- Enhanced the error handling logic when retrieving content parts for allSubscribersLeft handlers, ensuring that any failures are logged appropriately.
- Updated the promise chain to catch errors from getContentParts, improving robustness and clarity in error reporting.

* ci: Improve Redis client disconnection handling in integration tests

- Updated the afterAll cleanup logic in integration tests for GenerationJobManager, RedisEventTransport, and RedisJobStore to use `quit()` for graceful disconnection of the Redis client.
- Added fallback to `disconnect()` if `quit()` fails, enhancing robustness in resource management during test teardown.
- Improved comments for clarity on the disconnection process and error handling.

* refactor: Enhance GenerationJobManager and event transports for improved resource management

- Updated GenerationJobManager to prevent immediate cleanup of eventTransport upon job completion, allowing final events to transmit fully before cleanup.
- Added orphaned stream cleanup logic in GenerationJobManager to handle streams without corresponding jobs.
- Introduced getTrackedStreamIds method in both InMemoryEventTransport and RedisEventTransport for better management of orphaned streams.
- Improved comments for clarity on resource management and cleanup processes.

* refactor: Update GenerationJobManager and ResumableAgentController for improved event handling

- Modified GenerationJobManager to resolve readyPromise immediately, eliminating startup latency and allowing early event buffering for late subscribers.
- Enhanced event handling logic to replay buffered events when the first subscriber connects, ensuring no events are lost due to race conditions.
- Updated comments for clarity on the new event synchronization mechanism and its benefits in both Redis and in-memory modes.

* fix: Update cache integration test command for stream to ensure proper execution

- Modified the test command for cache integration related to streams by adding the --forceExit flag to prevent hanging tests.
- This change enhances the reliability of the test suite by ensuring all tests complete as expected.

* feat: Add active job management for user and show progress in conversation list

- Implemented a new endpoint to retrieve active generation job IDs for the current user, enhancing user experience by allowing visibility of ongoing tasks.
- Integrated active job tracking in the Conversations component, displaying generation indicators based on active jobs.
- Optimized job management in the GenerationJobManager and InMemoryJobStore to support user-specific job queries, ensuring efficient resource handling and cleanup.
- Updated relevant components and hooks to utilize the new active jobs feature, improving overall application responsiveness and user feedback.

* feat: Implement active job tracking by user in RedisJobStore

- Added functionality to retrieve active job IDs for a specific user, enhancing user experience by allowing visibility of ongoing tasks.
- Implemented self-healing cleanup for stale job entries, ensuring accurate tracking of active jobs.
- Updated job creation, update, and deletion methods to manage user-specific job sets effectively.
- Enhanced integration tests to validate the new user-specific job management features.

* refactor: Simplify job deletion logic by removing user job cleanup from InMemoryJobStore and RedisJobStore

* WIP: Add backend inspect script for easier debugging in production

* refactor: title generation logic

- Changed the title generation endpoint from POST to GET, allowing for more efficient retrieval of titles based on conversation ID.
- Implemented exponential backoff for title fetching retries, improving responsiveness and reducing server load.
- Introduced a queuing mechanism for title generation, ensuring titles are generated only after job completion.
- Updated relevant components and hooks to utilize the new title generation logic, enhancing user experience and application performance.

* feat: Enhance updateConvoInAllQueries to support moving conversations to the top

* chore: temp. remove added multi convo

* refactor: Update active jobs query integration for optimistic updates on abort

- Introduced a new interface for active jobs response to standardize data handling.
- Updated query keys for active jobs to ensure consistency across components.
- Enhanced job management logic in hooks to properly reflect active job states, improving overall application responsiveness.

* refactor: useResumableStreamToggle hook to manage resumable streams for legacy/assistants endpoints

- Introduced a new hook, useResumableStreamToggle, to automatically toggle resumable streams off for assistants endpoints and restore the previous value when switching away.
- Updated ChatView component to utilize the new hook, enhancing the handling of streaming behavior based on endpoint type.
- Refactored imports in ChatView for better organization.

* refactor: streamline conversation title generation handling

- Removed unused type definition for TGenTitleMutation in mutations.ts to clean up the codebase.
- Integrated queueTitleGeneration call in useEventHandlers to trigger title generation for new conversations, enhancing the responsiveness of the application.

* feat: Add USE_REDIS_STREAMS configuration for stream job storage

- Introduced USE_REDIS_STREAMS to control Redis usage for resumable stream job storage, defaulting to true if USE_REDIS is enabled but not explicitly set.
- Updated cacheConfig to include USE_REDIS_STREAMS and modified createStreamServices to utilize this new configuration.
- Enhanced unit tests to validate the behavior of USE_REDIS_STREAMS under various environment settings, ensuring correct defaults and overrides.

* fix: title generation queue management for assistants

- Introduced a queueListeners mechanism to notify changes in the title generation queue, improving responsiveness for non-resumable streams.
- Updated the useTitleGeneration hook to track queue changes with a queueVersion state, ensuring accurate updates when jobs complete.
- Refactored the queueTitleGeneration function to trigger listeners upon adding new conversation IDs, enhancing the overall title generation flow.

* refactor: streamline agent controller and remove legacy resumable handling

- Updated the AgentController to route all requests to ResumableAgentController, simplifying the logic.
- Deprecated the legacy non-resumable path, providing a clear migration path for future use.
- Adjusted setHeaders middleware to remove unnecessary checks for resumable mode.
- Cleaned up the useResumableSSE hook to eliminate redundant query parameters, enhancing clarity and performance.

* feat: Add USE_REDIS_STREAMS configuration to .env.example

- Updated .env.example to include USE_REDIS_STREAMS setting, allowing control over Redis usage for resumable LLM streams.
- Provided additional context on the behavior of USE_REDIS_STREAMS when not explicitly set, enhancing clarity for configuration management.

* refactor: remove unused setHeaders middleware from chat route

- Eliminated the setHeaders middleware from the chat route, streamlining the request handling process.
- This change contributes to cleaner code and improved performance by reducing unnecessary middleware checks.

* fix: Add streamId parameter for resumable stream handling across services (actions, mcp oauth)

* fix(flow): add immediate abort handling and fix intervalId initialization

- Add immediate abort handler that responds instantly to abort signal
- Declare intervalId before cleanup function to prevent 'Cannot access before initialization' error
- Consolidate cleanup logic into single function to avoid duplicate cleanup
- Properly remove abort event listener on cleanup

* fix(mcp): clean up OAuth flows on abort and simplify flow handling

- Add abort handler in reconnectServer to clean up mcp_oauth and mcp_get_tokens flows
- Update createAbortHandler to clean up both flow types on tool call abort
- Pass abort signal to createFlow in returnOnOAuth path
- Simplify handleOAuthRequired to always cancel existing flows and start fresh
- This ensures user always gets a new OAuth URL instead of waiting for stale flows

* fix(agents): handle 'new' conversationId and improve abort reliability

- Treat 'new' as placeholder that needs UUID in request controller
- Send JSON response immediately before tool loading for faster SSE connection
- Use job's abort controller instead of prelimAbortController
- Emit errors to stream if headers already sent
- Skip 'new' as valid ID in abort endpoint
- Add fallback to find active jobs by userId when conversationId is 'new'

* fix(stream): detect early abort and prevent navigation to non-existent conversation

- Abort controller on job completion to signal pending operations
- Detect early abort (no content, no responseMessageId) in abortJob
- Set conversation and responseMessage to null for early aborts
- Add earlyAbort flag to final event for frontend detection
- Remove unused text field from AbortResult interface
- Frontend handles earlyAbort by staying on/navigating to new chat

* test(mcp): update test to expect signal parameter in createFlow

* 🔧 refactor: Update Vertex AI Configuration Handling

- Simplified the logic for enabling Vertex AI in the Anthropic initialization process, ensuring it defaults to enabled unless explicitly set to false.
- Adjusted the Vertex AI schema to make the 'enabled' property optional, defaulting to true when the configuration is present.
- Updated related comments and documentation for clarity on the configuration behavior.

* 🔧 chore: Update Anthropic Configuration and Logging Enhancements

- Changed the default region for Anthropic Vertex AI from 'global' to 'us-east5' in the .env.example file for better regional alignment.
- Added debug logging to handle non-JSON credentials in the Anthropic client, improving error visibility during credential parsing.
- Updated the service key path resolution in the Vertex AI client to use the current working directory, enhancing flexibility in file location.

---------

Co-authored-by: Ziyan <5621658+Ziyann@users.noreply.github.com>
Co-authored-by: Aron Gates <aron@muonspace.com>
Co-authored-by: Danny Avila <danny@librechat.ai>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 18:16:52 -05:00
Danny Avila
439bc98682
⏸ refactor: Improve UX for Parallel Streams (Multi-Convo) (#11096)
* 🌊 feat: Implement multi-conversation feature with added conversation context and payload adjustments

* refactor: Replace isSubmittingFamily with isSubmitting across message components for consistency

* feat: Add loadAddedAgent and processAddedConvo for multi-conversation agent execution

* refactor: Update ContentRender usage to conditionally render PlaceholderRow based on isLast and isSubmitting

* WIP: first pass, sibling index

* feat: Enhance multi-conversation support with agent tracking and display improvements

* refactor: Introduce isEphemeralAgentId utility and update related logic for agent handling

* refactor: Implement createDualMessageContent utility for sibling message display and enhance useStepHandler for added conversations

* refactor: duplicate tools for added agent if ephemeral and primary agent is also ephemeral

* chore: remove deprecated multimessage rendering

* refactor: enhance dual message content creation and agent handling for parallel rendering

* refactor: streamline message rendering and submission handling by removing unused state and optimizing conditional logic

* refactor: adjust content handling in parallel mode to utilize existing content for improved agent display

* refactor: update @librechat/agents dependency to version 3.0.53

* refactor: update @langchain/core and @librechat/agents dependencies to latest versions

* refactor: remove deprecated @langchain/core dependency from package.json

* chore: remove unused SearchToolConfig and GetSourcesParams types from web.ts

* refactor: remove unused message properties from Message component

* refactor: enhance parallel content handling with groupId support in ContentParts and useStepHandler

* refactor: implement parallel content styling in Message, MessageRender, and ContentRender components. use explicit model name

* refactor: improve agent ID handling in createDualMessageContent for dual message display

* refactor: simplify title generation in AddedConvo by removing unused sender and preset logic

* refactor: replace string interpolation with cn utility for className in HoverButtons component

* refactor: enhance agent ID handling by adding suffix management for parallel agents and updating related components

* refactor: enhance column ordering in ContentParts by sorting agents with suffix management

* refactor: update @librechat/agents dependency to version 3.0.55

* feat: implement parallel content rendering with metadata support

- Added `ParallelContentRenderer` and `ParallelColumns` components for rendering messages in parallel based on groupId and agentId.
- Introduced `contentMetadataMap` to store metadata for each content part, allowing efficient parallel content detection.
- Updated `Message` and `ContentRender` components to utilize the new metadata structure for rendering.
- Modified `useStepHandler` to manage content indices and metadata during message processing.
- Enhanced `IJobStore` interface and its implementations to support storing and retrieving content metadata.
- Updated data schemas to include `contentMetadataMap` for messages, enabling multi-agent and parallel execution scenarios.

* refactor: update @librechat/agents dependency to version 3.0.56

* refactor: remove unused EPHEMERAL_AGENT_ID constant and simplify agent ID check

* refactor: enhance multi-agent message processing and primary agent determination

* refactor: implement branch message functionality for parallel responses

* refactor: integrate added conversation retrieval into message editing and regeneration processes

* refactor: remove unused isCard and isMultiMessage props from MessageRender and ContentRender components

* refactor: update @librechat/agents dependency to version 3.0.60

* refactor: replace usage of EPHEMERAL_AGENT_ID constant with isEphemeralAgentId function for improved clarity and consistency

* refactor: standardize agent ID format in tests for consistency

* chore: move addedConvo property to the correct position in payload construction

* refactor: rename agent_id values in loadAgent tests for clarity

* chore: reorder props in ContentParts component for improved readability

* refactor: rename variable 'content' to 'result' for clarity in RedisJobStore tests

* refactor: streamline useMessageActions by removing duplicate handleFeedback assignment

* chore: revert placeholder rendering logic MessageRender and ContentRender components to original

* refactor: implement useContentMetadata hook for optimized content metadata handling

* refactor: remove contentMetadataMap and related logic from the codebase and revert back to agentId/groupId in content parts

- Eliminated contentMetadataMap from various components and services, simplifying the handling of message content.
- Updated functions to directly access agentId and groupId from content parts instead of relying on a separate metadata map.
- Adjusted related hooks and components to reflect the removal of contentMetadataMap, ensuring consistent handling of message content.
- Updated tests and documentation to align with the new structure of message content handling.

* refactor: remove logging from groupParallelContent function to clean up output

* refactor: remove model parameter from TBranchMessageRequest type for simplification

* refactor: enhance branch message creation by stripping metadata for standalone content

* chore: streamline branch message creation by simplifying content filtering and removing unnecessary metadata checks

* refactor: include attachments in branch message creation for improved content handling

* refactor: streamline agent content processing by consolidating primary agent identification and filtering logic

* refactor: simplify multi-agent message processing by creating a dedicated mapping method and enhancing content filtering

* refactor: remove unused parameter from loadEphemeralAgent function for cleaner code

* refactor: update groupId handling in metadata to only set when provided by the server
2025-12-25 01:43:54 -05:00
Danny Avila
0ae3b87b65
🌊 feat: Resumable LLM Streams with Horizontal Scaling (#10926)
*  feat: Implement Resumable Generation Jobs with SSE Support

- Introduced GenerationJobManager to handle resumable LLM generation jobs independently of HTTP connections.
- Added support for subscribing to ongoing generation jobs via SSE, allowing clients to reconnect and receive updates without losing progress.
- Enhanced existing agent controllers and routes to integrate resumable functionality, including job creation, completion, and error handling.
- Updated client-side hooks to manage adaptive SSE streams, switching between standard and resumable modes based on user settings.
- Added UI components and settings for enabling/disabling resumable streams, improving user experience during unstable connections.

* WIP: resuming

* WIP: resumable stream

* feat: Enhance Stream Management with Abort Functionality

- Updated the abort endpoint to support aborting ongoing generation streams using either streamId or conversationId.
- Introduced a new mutation hook `useAbortStreamMutation` for client-side integration.
- Added `useStreamStatus` query to monitor stream status and facilitate resuming conversations.
- Enhanced `useChatHelpers` to incorporate abort functionality when stopping generation.
- Improved `useResumableSSE` to handle stream errors and token refresh seamlessly.
- Updated `useResumeOnLoad` to check for active streams and resume conversations appropriately.

* fix: Update query parameter handling in useChatHelpers

- Refactored the logic for determining the query parameter used in fetching messages to prioritize paramId from the URL, falling back to conversationId only if paramId is not available. This change ensures consistency with the ChatView component's expectations.

* fix: improve syncing when switching conversations

* fix: Prevent memory leaks in useResumableSSE by clearing handler maps on stream completion and cleanup

* fix: Improve content type mismatch handling in useStepHandler

- Enhanced the condition for detecting content type mismatches to include additional checks, ensuring more robust validation of content types before processing updates.

* fix: Allow dynamic content creation in useChatFunctions

- Updated the initial response handling to avoid pre-initializing content types, enabling dynamic creation of content parts based on incoming delta events. This change supports various content types such as think and text.

* fix: Refine response message handling in useStepHandler

- Updated logic to determine the appropriate response message based on the last message's origin, ensuring correct message replacement or appending based on user interaction. This change enhances the accuracy of message updates in the chat flow.

* refactor: Enhance GenerationJobManager with In-Memory Implementations

- Introduced InMemoryJobStore, InMemoryEventTransport, and InMemoryContentState for improved job management and event handling.
- Updated GenerationJobManager to utilize these new implementations, allowing for better separation of concerns and easier maintenance.
- Enhanced job metadata handling to support user messages and response IDs for resumable functionality.
- Improved cleanup and state management processes to prevent memory leaks and ensure efficient resource usage.

* refactor: Enhance GenerationJobManager with improved subscriber handling

- Updated RuntimeJobState to include allSubscribersLeftHandlers for managing client disconnections without affecting subscriber count.
- Refined createJob and subscribe methods to ensure generation starts only when the first real client connects.
- Added detailed documentation for methods and properties to clarify the synchronization of job generation with client readiness.
- Improved logging for subscriber checks and event handling to facilitate debugging and monitoring.

* chore: Adjust timeout for subscriber readiness in ResumableAgentController

- Reduced the timeout duration from 5000ms to 2500ms in the startGeneration function to improve responsiveness when waiting for subscriber readiness. This change aims to enhance the efficiency of the agent's background generation process.

* refactor: Update GenerationJobManager documentation and structure

- Enhanced the documentation for GenerationJobManager to clarify the architecture and pluggable service design.
- Updated comments to reflect the potential for Redis integration and the need for async refactoring.
- Improved the structure of the GenerationJob facade to emphasize the unified API while allowing for implementation swapping without affecting consumer code.

* refactor: Convert GenerationJobManager methods to async for improved performance

- Updated methods in GenerationJobManager and InMemoryJobStore to be asynchronous, enhancing the handling of job creation, retrieval, and management.
- Adjusted the ResumableAgentController and related routes to await job operations, ensuring proper flow and error handling.
- Increased timeout duration in ResumableAgentController's startGeneration function to 3500ms for better subscriber readiness management.

* refactor: Simplify initial response handling in useChatFunctions

- Removed unnecessary pre-initialization of content types in the initial response, allowing for dynamic content creation based on incoming delta events. This change enhances flexibility in handling various content types in the chat flow.

* refactor: Clarify content handling logic in useStepHandler

- Updated comments to better explain the handling of initialContent and existingContent in edit and resume scenarios.
- Simplified the logic for merging content, ensuring that initialContent is used directly when available, improving clarity and maintainability.

* refactor: Improve message handling logic in useStepHandler

- Enhanced the logic for managing messages in multi-tab scenarios, ensuring that the most up-to-date message history is utilized.
- Removed existing response placeholders and ensured user messages are included, improving the accuracy of message updates in the chat flow.

* fix: remove unnecessary content length logging in the chat stream response, simplifying the debug message while retaining essential information about run steps. This change enhances clarity in logging without losing critical context.

* refactor: Integrate streamId handling for improved resumable functionality for attachments

- Added streamId parameter to various functions to support resumable mode in tool loading and memory processing.
- Updated related methods to ensure proper handling of attachments and responses based on the presence of streamId, enhancing the overall streaming experience.
- Improved logging and attachment management to accommodate both standard and resumable modes.

* refactor: Streamline abort handling and integrate GenerationJobManager for improved job management

- Removed the abortControllers middleware and integrated abort handling directly into GenerationJobManager.
- Updated abortMessage function to utilize GenerationJobManager for aborting jobs by conversation ID, enhancing clarity and efficiency.
- Simplified cleanup processes and improved error handling during abort operations.
- Enhanced metadata management for jobs, including endpoint and model information, to facilitate better tracking and resource management.

* refactor: Unify streamId and conversationId handling for improved job management

- Updated ResumableAgentController and AgentController to generate conversationId upfront, ensuring it matches streamId for consistency.
- Simplified job creation and metadata management by removing redundant conversationId updates from callbacks.
- Refactored abortMiddleware and related methods to utilize the unified streamId/conversationId approach, enhancing clarity in job handling.
- Removed deprecated methods from GenerationJobManager and InMemoryJobStore, streamlining the codebase and improving maintainability.

* refactor: Enhance resumable SSE handling with improved UI state management and error recovery

- Added UI state restoration on successful SSE connection to indicate ongoing submission.
- Implemented detailed error handling for network failures, including retry logic with exponential backoff.
- Introduced abort event handling to reset UI state on intentional stream closure.
- Enhanced debugging capabilities for testing reconnection and clean close scenarios.
- Updated generation function to retry on network errors, improving resilience during submission processes.

* refactor: Consolidate content state management into IJobStore for improved job handling

- Removed InMemoryContentState and integrated its functionality into InMemoryJobStore, streamlining content state management.
- Updated GenerationJobManager to utilize jobStore for content state operations, enhancing clarity and reducing redundancy.
- Introduced RedisJobStore for horizontal scaling, allowing for efficient job management and content reconstruction from chunks.
- Updated IJobStore interface to reflect changes in content state handling, ensuring consistency across implementations.

* feat: Introduce Redis-backed stream services for enhanced job management

- Added createStreamServices function to configure job store and event transport, supporting both Redis and in-memory options.
- Updated GenerationJobManager to allow configuration with custom job stores and event transports, improving flexibility for different deployment scenarios.
- Refactored IJobStore interface to support asynchronous content retrieval, ensuring compatibility with Redis implementations.
- Implemented RedisEventTransport for real-time event delivery across instances, enhancing scalability and responsiveness.
- Updated InMemoryJobStore to align with new async patterns for content and run step retrieval, ensuring consistent behavior across storage options.

* refactor: Remove redundant debug logging in GenerationJobManager and RedisEventTransport

- Eliminated unnecessary debug statements in GenerationJobManager related to subscriber actions and job updates, enhancing log clarity.
- Removed debug logging in RedisEventTransport for subscription and subscriber disconnection events, streamlining the logging output.
- Cleaned up debug messages in RedisJobStore to focus on essential information, improving overall logging efficiency.

* refactor: Enhance job state management and TTL configuration in RedisJobStore

- Updated the RedisJobStore to allow customizable TTL values for job states, improving flexibility in job management.
- Refactored the handling of job expiration and cleanup processes to align with new TTL configurations.
- Simplified the response structure in the chat status endpoint by consolidating state retrieval, enhancing clarity and performance.
- Improved comments and documentation for better understanding of the changes made.

* refactor: cleanupOnComplete option to GenerationJobManager for flexible resource management

- Introduced a new configuration option, cleanupOnComplete, allowing immediate cleanup of event transport and job resources upon job completion.
- Updated completeJob and abortJob methods to respect the cleanupOnComplete setting, enhancing memory management.
- Improved cleanup logic in the cleanup method to handle orphaned resources effectively.
- Enhanced documentation and comments for better clarity on the new functionality.

* refactor: Update TTL configuration for completed jobs in InMemoryJobStore

- Changed the TTL for completed jobs from 5 minutes to 0, allowing for immediate cleanup.
- Enhanced cleanup logic to respect the new TTL setting, improving resource management.
- Updated comments for clarity on the behavior of the TTL configuration.

* refactor: Enhance RedisJobStore with local graph caching for improved performance

- Introduced a local cache for graph references using WeakRef to optimize reconnects for the same instance.
- Updated job deletion and cleanup methods to manage the local cache effectively, ensuring stale entries are removed.
- Enhanced content retrieval methods to prioritize local cache access, reducing Redis round-trips for same-instance reconnects.
- Improved documentation and comments for clarity on the caching mechanism and its benefits.

* feat: Add integration tests for GenerationJobManager, RedisEventTransport, and RedisJobStore, add Redis Cluster support

- Introduced comprehensive integration tests for GenerationJobManager, covering both in-memory and Redis modes to ensure consistent job management and event handling.
- Added tests for RedisEventTransport to validate pub/sub functionality, including cross-instance event delivery and error handling.
- Implemented integration tests for RedisJobStore, focusing on multi-instance job access, content reconstruction from chunks, and consumer group behavior.
- Enhanced test setup and teardown processes to ensure a clean environment for each test run, improving reliability and maintainability.

* fix: Improve error handling in GenerationJobManager for allSubscribersLeft handlers

- Enhanced the error handling logic when retrieving content parts for allSubscribersLeft handlers, ensuring that any failures are logged appropriately.
- Updated the promise chain to catch errors from getContentParts, improving robustness and clarity in error reporting.

* ci: Improve Redis client disconnection handling in integration tests

- Updated the afterAll cleanup logic in integration tests for GenerationJobManager, RedisEventTransport, and RedisJobStore to use `quit()` for graceful disconnection of the Redis client.
- Added fallback to `disconnect()` if `quit()` fails, enhancing robustness in resource management during test teardown.
- Improved comments for clarity on the disconnection process and error handling.

* refactor: Enhance GenerationJobManager and event transports for improved resource management

- Updated GenerationJobManager to prevent immediate cleanup of eventTransport upon job completion, allowing final events to transmit fully before cleanup.
- Added orphaned stream cleanup logic in GenerationJobManager to handle streams without corresponding jobs.
- Introduced getTrackedStreamIds method in both InMemoryEventTransport and RedisEventTransport for better management of orphaned streams.
- Improved comments for clarity on resource management and cleanup processes.

* refactor: Update GenerationJobManager and ResumableAgentController for improved event handling

- Modified GenerationJobManager to resolve readyPromise immediately, eliminating startup latency and allowing early event buffering for late subscribers.
- Enhanced event handling logic to replay buffered events when the first subscriber connects, ensuring no events are lost due to race conditions.
- Updated comments for clarity on the new event synchronization mechanism and its benefits in both Redis and in-memory modes.

* fix: Update cache integration test command for stream to ensure proper execution

- Modified the test command for cache integration related to streams by adding the --forceExit flag to prevent hanging tests.
- This change enhances the reliability of the test suite by ensuring all tests complete as expected.

* feat: Add active job management for user and show progress in conversation list

- Implemented a new endpoint to retrieve active generation job IDs for the current user, enhancing user experience by allowing visibility of ongoing tasks.
- Integrated active job tracking in the Conversations component, displaying generation indicators based on active jobs.
- Optimized job management in the GenerationJobManager and InMemoryJobStore to support user-specific job queries, ensuring efficient resource handling and cleanup.
- Updated relevant components and hooks to utilize the new active jobs feature, improving overall application responsiveness and user feedback.

* feat: Implement active job tracking by user in RedisJobStore

- Added functionality to retrieve active job IDs for a specific user, enhancing user experience by allowing visibility of ongoing tasks.
- Implemented self-healing cleanup for stale job entries, ensuring accurate tracking of active jobs.
- Updated job creation, update, and deletion methods to manage user-specific job sets effectively.
- Enhanced integration tests to validate the new user-specific job management features.

* refactor: Simplify job deletion logic by removing user job cleanup from InMemoryJobStore and RedisJobStore

* WIP: Add backend inspect script for easier debugging in production

* refactor: title generation logic

- Changed the title generation endpoint from POST to GET, allowing for more efficient retrieval of titles based on conversation ID.
- Implemented exponential backoff for title fetching retries, improving responsiveness and reducing server load.
- Introduced a queuing mechanism for title generation, ensuring titles are generated only after job completion.
- Updated relevant components and hooks to utilize the new title generation logic, enhancing user experience and application performance.

* feat: Enhance updateConvoInAllQueries to support moving conversations to the top

* chore: temp. remove added multi convo

* refactor: Update active jobs query integration for optimistic updates on abort

- Introduced a new interface for active jobs response to standardize data handling.
- Updated query keys for active jobs to ensure consistency across components.
- Enhanced job management logic in hooks to properly reflect active job states, improving overall application responsiveness.

* refactor: useResumableStreamToggle hook to manage resumable streams for legacy/assistants endpoints

- Introduced a new hook, useResumableStreamToggle, to automatically toggle resumable streams off for assistants endpoints and restore the previous value when switching away.
- Updated ChatView component to utilize the new hook, enhancing the handling of streaming behavior based on endpoint type.
- Refactored imports in ChatView for better organization.

* refactor: streamline conversation title generation handling

- Removed unused type definition for TGenTitleMutation in mutations.ts to clean up the codebase.
- Integrated queueTitleGeneration call in useEventHandlers to trigger title generation for new conversations, enhancing the responsiveness of the application.

* feat: Add USE_REDIS_STREAMS configuration for stream job storage

- Introduced USE_REDIS_STREAMS to control Redis usage for resumable stream job storage, defaulting to true if USE_REDIS is enabled but not explicitly set.
- Updated cacheConfig to include USE_REDIS_STREAMS and modified createStreamServices to utilize this new configuration.
- Enhanced unit tests to validate the behavior of USE_REDIS_STREAMS under various environment settings, ensuring correct defaults and overrides.

* fix: title generation queue management for assistants

- Introduced a queueListeners mechanism to notify changes in the title generation queue, improving responsiveness for non-resumable streams.
- Updated the useTitleGeneration hook to track queue changes with a queueVersion state, ensuring accurate updates when jobs complete.
- Refactored the queueTitleGeneration function to trigger listeners upon adding new conversation IDs, enhancing the overall title generation flow.

* refactor: streamline agent controller and remove legacy resumable handling

- Updated the AgentController to route all requests to ResumableAgentController, simplifying the logic.
- Deprecated the legacy non-resumable path, providing a clear migration path for future use.
- Adjusted setHeaders middleware to remove unnecessary checks for resumable mode.
- Cleaned up the useResumableSSE hook to eliminate redundant query parameters, enhancing clarity and performance.

* feat: Add USE_REDIS_STREAMS configuration to .env.example

- Updated .env.example to include USE_REDIS_STREAMS setting, allowing control over Redis usage for resumable LLM streams.
- Provided additional context on the behavior of USE_REDIS_STREAMS when not explicitly set, enhancing clarity for configuration management.

* refactor: remove unused setHeaders middleware from chat route

- Eliminated the setHeaders middleware from the chat route, streamlining the request handling process.
- This change contributes to cleaner code and improved performance by reducing unnecessary middleware checks.

* fix: Add streamId parameter for resumable stream handling across services (actions, mcp oauth)

* fix(flow): add immediate abort handling and fix intervalId initialization

- Add immediate abort handler that responds instantly to abort signal
- Declare intervalId before cleanup function to prevent 'Cannot access before initialization' error
- Consolidate cleanup logic into single function to avoid duplicate cleanup
- Properly remove abort event listener on cleanup

* fix(mcp): clean up OAuth flows on abort and simplify flow handling

- Add abort handler in reconnectServer to clean up mcp_oauth and mcp_get_tokens flows
- Update createAbortHandler to clean up both flow types on tool call abort
- Pass abort signal to createFlow in returnOnOAuth path
- Simplify handleOAuthRequired to always cancel existing flows and start fresh
- This ensures user always gets a new OAuth URL instead of waiting for stale flows

* fix(agents): handle 'new' conversationId and improve abort reliability

- Treat 'new' as placeholder that needs UUID in request controller
- Send JSON response immediately before tool loading for faster SSE connection
- Use job's abort controller instead of prelimAbortController
- Emit errors to stream if headers already sent
- Skip 'new' as valid ID in abort endpoint
- Add fallback to find active jobs by userId when conversationId is 'new'

* fix(stream): detect early abort and prevent navigation to non-existent conversation

- Abort controller on job completion to signal pending operations
- Detect early abort (no content, no responseMessageId) in abortJob
- Set conversation and responseMessage to null for early aborts
- Add earlyAbort flag to final event for frontend detection
- Remove unused text field from AbortResult interface
- Frontend handles earlyAbort by staying on/navigating to new chat

* test(mcp): update test to expect signal parameter in createFlow

fix(agents): include 'new' conversationId in newConvo check for title generation

When frontend sends 'new' as conversationId, it should still trigger
title generation since it's a new conversation. Rename boolean variable for clarity

fix(agents): check abort state before completeJob for title generation

completeJob now triggers abort signal for cleanup, so we need to
capture the abort state beforehand to correctly determine if title
generation should run.
2025-12-19 12:14:19 -05:00
Danny Avila
04a4a2aa44
🧵 refactor: Migrate Endpoint Initialization to TypeScript (#10794)
* refactor: move endpoint initialization methods to typescript

* refactor: move agent init to packages/api

- Introduced `initialize.ts` for agent initialization, including file processing and tool loading.
- Updated `resources.ts` to allow optional appConfig parameter.
- Enhanced endpoint configuration handling in various initialization files to support model parameters.
- Added new artifacts and prompts for React component generation.
- Refactored existing code to improve type safety and maintainability.

* refactor: streamline endpoint initialization and enhance type safety

- Updated initialization functions across various endpoints to use a consistent request structure, replacing `unknown` types with `ServerResponse`.
- Simplified request handling by directly extracting keys from the request body.
- Improved type safety by ensuring user IDs are safely accessed with optional chaining.
- Removed unnecessary parameters and streamlined model options handling for better clarity and maintainability.

* refactor: moved ModelService and extractBaseURL to packages/api

- Added comprehensive tests for the models fetching functionality, covering scenarios for OpenAI, Anthropic, Google, and Ollama models.
- Updated existing endpoint index to include the new models module.
- Enhanced utility functions for URL extraction and model data processing.
- Improved type safety and error handling across the models fetching logic.

* refactor: consolidate utility functions and remove unused files

- Merged `deriveBaseURL` and `extractBaseURL` into the `@librechat/api` module for better organization.
- Removed redundant utility files and their associated tests to streamline the codebase.
- Updated imports across various client files to utilize the new consolidated functions.
- Enhanced overall maintainability by reducing the number of utility modules.

* refactor: replace ModelService references with direct imports from @librechat/api and remove ModelService file

* refactor: move encrypt/decrypt methods and key db methods to data-schemas, use `getProviderConfig` from `@librechat/api`

* chore: remove unused 'res' from options in AgentClient

* refactor: file model imports and methods

- Updated imports in various controllers and services to use the unified file model from '~/models' instead of '~/models/File'.
- Consolidated file-related methods into a new file methods module in the data-schemas package.
- Added comprehensive tests for file methods including creation, retrieval, updating, and deletion.
- Enhanced the initializeAgent function to accept dependency injection for file-related methods.
- Improved error handling and logging in file methods.

* refactor: streamline database method references in agent initialization

* refactor: enhance file method tests and update type references to IMongoFile

* refactor: consolidate database method imports in agent client and initialization

* chore: remove redundant import of initializeAgent from @librechat/api

* refactor: move checkUserKeyExpiry utility to @librechat/api and update references across endpoints

* refactor: move updateUserPlugins logic to user.ts and simplify UserController

* refactor: update imports for user key management and remove UserService

* refactor: remove unused Anthropics and Bedrock endpoint files and clean up imports

* refactor: consolidate and update encryption imports across various files to use @librechat/data-schemas

* chore: update file model mock to use unified import from '~/models'

* chore: import order

* refactor: remove migrated to TS agent.js file and its associated logic from the endpoints

* chore: add reusable function to extract imports from source code in unused-packages workflow

* chore: enhance unused-packages workflow to include @librechat/api dependencies and improve dependency extraction

* chore: improve dependency extraction in unused-packages workflow with enhanced error handling and debugging output

* chore: add detailed debugging output to unused-packages workflow for better visibility into unused dependencies and exclusion lists

* chore: refine subpath handling in unused-packages workflow to correctly process scoped and non-scoped package imports

* chore: clean up unused debug output in unused-packages workflow and reorganize type imports in initialize.ts
2025-12-11 16:37:16 -05:00
Jón Levy
ef3bf0a932
🆔 feat: Add OpenID Connect Federated Provider Token Support (#9931)
* feat: Add OpenID Connect federated provider token support

Implements support for passing federated provider tokens (Cognito, Azure AD, Auth0)
as variables in LibreChat's librechat.yaml configuration for both custom endpoints
and MCP servers.

Features:
- New LIBRECHAT_OPENID_* template variables for federated provider tokens
- JWT claims parsing from ID tokens without verification (for claim extraction)
- Token validation with expiration checking
- Support for multiple token storage locations (federatedTokens, openidTokens)
- Integration with existing template variable system
- Comprehensive test suite with Cognito-specific scenarios
- Provider-agnostic design supporting Cognito, Azure AD, Auth0, etc.

Security:
- Server-side only token processing
- Automatic token expiration validation
- Graceful fallbacks for missing/invalid tokens
- No client-side token exposure

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Add federated token propagation to OIDC authentication strategies

Adds federatedTokens object to user during authentication to enable
federated provider token template variables in LibreChat configuration.

Changes:
- OpenID JWT Strategy: Extract raw JWT from Authorization header and
  attach as federatedTokens.access_token to enable {{LIBRECHAT_OPENID_TOKEN}}
  placeholder resolution
- OpenID Strategy: Attach tokenset tokens as federatedTokens object to
  standardize token access across both authentication strategies

This enables proper token propagation for custom endpoints and MCP
servers that require federated provider tokens for authorization.

Resolves missing token issue reported by @ramden in PR #9931

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Denis Ramic <denis.ramic@nfon.com>
Co-Authored-By: Claude <noreply@anthropic.com>

* test: Add federatedTokens validation tests for OIDC strategies

Adds comprehensive test coverage for the federated token propagation
feature implemented in the authentication strategies.

Tests added:
- Verify federatedTokens object is attached to user with correct structure
  (access_token, refresh_token, expires_at)
- Verify both tokenset and federatedTokens are present in user object
- Ensure tokens from OIDC provider are correctly propagated

Also fixes existing test suite by adding missing mocks:
- isEmailDomainAllowed function mock
- findOpenIDUser function mock

These tests validate the fix from commit 5874ba29f that enables
{{LIBRECHAT_OPENID_TOKEN}} template variable functionality.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* docs: Remove implementation documentation file

The PR description already contains all necessary implementation details.
This documentation file is redundant and was requested to be removed.

* fix: skip s256 check

* fix(openid): handle missing refresh token in Cognito token refresh response

When OPENID_REUSE_TOKENS=true, the token refresh flow was failing because
Cognito (and most OAuth providers) don't return a new refresh token in the
refresh grant response - they only return new access and ID tokens.

Changes:
- Modified setOpenIDAuthTokens() to accept optional existingRefreshToken parameter
- Updated validation to only require access_token (refresh_token now optional)
- Added logic to reuse existing refresh token when not provided in tokenset
- Updated refreshController to pass original refresh token as fallback
- Added comments explaining standard OAuth 2.0 refresh token behavior

This fixes the "Token is not present. User is not authenticated." error that
occurred during silent token refresh with Cognito as the OpenID provider.

Fixes: Authentication loop with OPENID_REUSE_TOKENS=true and AWS Cognito

* fix(openid): extract refresh token from cookies for template variable replacement

When OPENID_REUSE_TOKENS=true, the openIdJwtStrategy populates user.federatedTokens
to enable template variable replacement (e.g., {{LIBRECHAT_OPENID_ACCESS_TOKEN}}).

However, the refresh_token field was incorrectly sourced from payload.refresh_token,
which is always undefined because:
1. JWTs don't contain refresh tokens in their payload
2. The JWT itself IS the access token
3. Refresh tokens are separate opaque tokens stored in HTTP-only cookies

This caused extractOpenIDTokenInfo() to receive incomplete federatedTokens,
resulting in template variables remaining unreplaced in headers.

**Root Cause:**
- Line 90: `refresh_token: payload.refresh_token` (always undefined)
- JWTs only contain access token data in their claims
- Refresh tokens are separate, stored securely in cookies

**Solution:**
- Import `cookie` module to parse cookies from request
- Extract refresh token from `refreshToken` cookie
- Populate federatedTokens with both access token (JWT) and refresh token (from cookie)

**Impact:**
- Template variables like {{LIBRECHAT_OPENID_ACCESS_TOKEN}} now work correctly
- Headers in librechat.yaml are properly replaced with actual tokens
- MCP server authentication with federated tokens now functional

**Technical Details:**
- passReqToCallback=true in JWT strategy provides req object access
- Refresh token extracted via cookies.parse(req.headers.cookie).refreshToken
- Falls back gracefully if cookie header or refreshToken is missing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: re-resolve headers on each request to pick up fresh federatedTokens

- OpenAIClient now re-resolves headers in chatCompletion() before each API call
- This ensures template variables like {{LIBRECHAT_OPENID_TOKEN}} are replaced
  with actual token values from req.user.federatedTokens
- initialize.js now stores original template headers instead of pre-resolved ones
- Fixes template variable replacement when OPENID_REUSE_TOKENS=true

The issue was that headers were only resolved once during client initialization,
before openIdJwtStrategy had populated user.federatedTokens. Now headers are
re-resolved on every request with the current user's fresh tokens.

* debug: add logging to track header resolution in OpenAIClient

* debug: log tokenset structure after refresh to diagnose missing access_token

* fix: set federatedTokens on user object after OAuth refresh

- After successful OAuth token refresh, the user object was not being
  updated with federatedTokens
- This caused template variable resolution to fail on subsequent requests
- Now sets user.federatedTokens with access_token, id_token, refresh_token
  and expires_at from the refreshed tokenset
- Fixes template variables like {{LIBRECHAT_OPENID_TOKEN}} not being
  replaced after token refresh
- Related to PR #9931 (OpenID federated token support)

* fix(openid): pass user object through agent chain for template variable resolution

Root cause: buildAgentContext in agents/run.ts called resolveHeaders without
the user parameter, preventing OpenID federated token template variables from
being resolved in agent runtime parameters.

Changes:
- packages/api/src/agents/run.ts: Add user parameter to createRun signature
- packages/api/src/agents/run.ts: Pass user to resolveHeaders in buildAgentContext
- api/server/controllers/agents/client.js: Pass user when calling createRun
- api/server/services/Endpoints/bedrock/options.js: Add resolveHeaders call with debug logging
- api/server/services/Endpoints/custom/initialize.js: Add debug logging
- packages/api/src/utils/env.ts: Add comprehensive debug logging and stack traces
- packages/api/src/utils/oidc.ts: Fix eslint errors (unused type, explicit any)

This ensures template variables like {{LIBRECHAT_OPENID_TOKEN}} and
{{LIBRECHAT_USER_OPENIDID}} are properly resolved in both custom endpoint
headers and Bedrock AgentCore runtime parameters.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* refactor: remove debug logging from OpenID token template feature

Removed excessive debug logging that was added during development to make
the PR more suitable for upstream review:

- Removed 7 debug statements from OpenAIClient.js
- Removed all console.log statements from packages/api/src/utils/env.ts
- Removed debug logging from bedrock/options.js
- Removed debug logging from custom/initialize.js
- Removed debug statement from AuthController.js

This reduces the changeset by ~50 lines while maintaining full functionality
of the OpenID federated token template variable feature.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* test(openid): add comprehensive unit tests for template variable substitution

- Add 34 unit tests for OIDC token utilities (oidc.spec.ts)
- Test coverage for token extraction, validation, and placeholder processing
- Integration tests for full OpenID token flow
- All tests pass with comprehensive edge case coverage

🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>

* test: fix OpenID federated tokens test failures

- Add serverMetadata() mock to openid-client mock configuration
  * Fixes TypeError in openIdJwtStrategy.js where serverMetadata() was being called
  * Mock now returns jwks_uri and end_session_endpoint as expected by the code

- Update outdated initialize.spec.js test
  * Remove test expecting resolveHeaders call during initialization
  * Header resolution was refactored to be deferred until LLM request time
  * Update test to verify options are returned correctly with useLegacyContent flag

Fixes #9931 CI failures for backend unit tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* chore: fix package-lock.json conflict

* chore: sync package-log with upstream

* chore: cleanup

* fix: use createSafeUser

* fix: fix createSafeUser signature

* chore: remove comments

* chore: purge comments

* fix: update Jest testPathPattern to testPathPatterns for Jest 30+ compatibility

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Denis Ramic <denis.ramic@nfon.com>
Co-authored-by: kristjanaapro <kristjana@apro.is>

chore: import order and add back JSDoc for OpenID JWT callback
2025-11-21 09:51:11 -05:00
Danny Avila
c0cb48256e
🤖 refactor: Improve Agent Handoff Context Tracking (#10553)
* chore: update @librechat/agents dependency to version 3.0.18

* refactor: add optional metadata field to message schema and types

* chore: update @librechat/agents to v3.0.19

* refactor: update return type of sendCompletion method to include metadata

* chore: linting

* chore: update @librechat/agents dependency to v3.0.20

* refactor: implement agent labeling for conversation history in multi-agent scenarios

* refactor: improve error handling for capturing agent ID map in AgentClient

* refactor: clear agentIdMap and related properties during client disposal to prevent memory leaks

* chore: update sendCompletion method for FakeClient to return an object with completion and metadata fields
2025-11-17 16:57:51 -05:00
Danny Avila
cabc8afeac
🔧 fix: Await MCP Instructions and Filter Malformed Tool Calls (#10485)
* fix: Await MCP instructions formatting in AgentClient

* fix: don't render or aggregate malformed tool calls

* fix: implement filter for malformed tool call content parts and add tests
2025-11-13 14:17:47 -05:00