Commit graph

19 commits

Author SHA1 Message Date
Danny Avila
7eafe317cc
🗝️ fix: Resolve MCP Runtime User and Request Placeholders (#13626)
* fix: Resolve MCP Runtime User Placeholders

* fix: Harden MCP Runtime Placeholder Connections

* fix: Update MCP Source Tag Test Expectations

* fix: Complete MCP Runtime Placeholder Reinit

* fix: Harden MCP Request Scoped Runtime Configs

* fix: Align MCP OAuth Tests With Domain Policy

* fix: Harden MCP Runtime Resolution Edges

* fix: Avoid MCP Runtime Reprocessing Pitfalls

* fix: Reuse MCP Request Scoped Tool Discovery

* fix: Validate MCP Body Runtime Fields

* 🛡️ refactor: Harden runtime placeholder edges from review

- Warn at inspection when a trusted server URL contains runtime
  placeholders but no domain allowlist restricts the resolved target
- Document the three resolution sites that must stay in sync so the
  validated config always matches the connected one
- Note the per-call connect cost of ephemeral GRAPH/BODY connections
- Drop the no-op removeUserConnection in callTool's ephemeral cleanup;
  ephemeral connections are never stored, and removing the entry could
  orphan a still-connected cached connection after a config change

* 🪪 fix: Cover oauth_headers, Graph URL gating, and request-scoped reconnects

Address Codex review:

- Resolve runtime placeholders in oauth_headers (processMCPEnv + Graph
  pre-pass) and include the field in placeholder detection, so OAuth
  discovery/token requests no longer send literals; consolidate the
  detection field lists into one helper
- Defer the early domain gate when the URL still carries a Graph
  placeholder (resolved async later); the authoritative
  assertResolvedRuntimeConfigAllowed check still enforces policy
- Bypass the 10s reconnect throttle for request-scoped servers, which
  re-fetch tool definitions on every message by design
2026-06-09 18:52:57 -04:00
Danny Avila
cb1d536874
📻 fix: Replay MCP OAuth Prompts for Coalesced Connections (#13565)
* fix: Replay MCP OAuth URL for Joined Connections

* chore: Sort MCP OAuth Imports

* test: Restore MCP OAuth Registry Spies

* fix: Replay pending MCP OAuth prompts

* fix: Replay MCP OAuth on Stream Resume

* fix: Preserve MCP OAuth Replay Context

* chore: Format MCP OAuth Replay Context

* test: Expect MCP OAuth Replay Expiry

* fix: Render pending MCP OAuth prompts

* chore: Clean MCP OAuth Replay Type Narrowing

* fix: Stabilize new MCP OAuth chats

* fix: Re-emit cached MCP OAuth prompts

* fix: Replay pending OAuth for selected MCP tools

* fix: Avoid stalling pending MCP OAuth replay

* test: Clean MCP OAuth review findings

* test: Restore MCP OAuth registry spy

* fix: Resolve OAuth Typecheck Regressions

* fix: Harden MCP OAuth replay edge cases

* test: Cover MCP OAuth joined prompt expiry

* test: Mark joined OAuth replay fixture

* test: Use OAuth fixture for joined replay expiry

* fix: Anchor resumed MCP OAuth prompts

* fix: Seed resumable turn metadata before MCP init

* test: Format resume metadata regression

* fix: Prioritize resumable stream routes

* fix: Preserve MCP OAuth resume message tree

* test: Fix MCP OAuth Resume Test Types

* fix: Replay MCP OAuth Regenerate Prompts

* fix: Skip OAuth-only Abort Persistence

* fix: Stabilize OAuth Resume Replay

* fix: Target Non-Tail Regenerate Responses

* fix: Scope Regenerate Step Updates

* fix: Clean Up OAuth Abort State

* fix: Preserve Regenerate Branch Siblings

* fix: Preserve OAuth Resume Branch State

* fix: Preserve OAuth Branch Resume State

* chore: Sort OAuth Resume Imports

* fix: Address OAuth Resume Review Findings

* test: Fix Abort Fixture Typing
2026-06-07 10:45:54 -04:00
janluedemann-esome
2ed59ac98a
🔐 fix: Handle Multiple Concurrent MCP OAuth Login Prompts (#13200)
* fix: handle multiple MCP OAuth prompts

* fix: address MCP OAuth review feedback

* fix: address MCP OAuth prompt lifecycle review

* fix: narrow OAuth prompt slot cleanup

* fix: format OAuth prompt test

---------

Co-authored-by: Danny Avila <danny@librechat.ai>
2026-06-05 17:18:24 -04:00
jcbartle
268f095c1a
🔒 feat: Add On-Behalf-Of (OBO) token exchange support for MCP Servers (#13429)
Some checks failed
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run
GitNexus Index / index (push) Waiting to run
GitNexus Index / post-index (push) Blocked by required conditions
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions
Sync Helm Chart Tags / Ignore non-main push (push) Waiting to run
Sync Helm Chart Tags / Sync chart tags (push) Waiting to run
Publish `librechat-data-provider` to NPM / pack (push) Has been cancelled
Publish `@librechat/data-schemas` to NPM / pack (push) Has been cancelled
Publish `librechat-data-provider` to NPM / publish-npm (push) Has been cancelled
Publish `@librechat/data-schemas` to NPM / publish-npm (push) Has been cancelled
* Add OBO (On-Behalf-Of) token exchange support for MCP server connections

Enables transparent authentication to Entra ID-backed MCP servers using the logged-in user's federated token via the OAuth 2.0 jwt-bearer grant. Configured via obo.scopes in librechat.yaml server config.

- Extract generic OboTokenService from GraphTokenService (jwt-bearer grant + cache)
- Refactor GraphTokenService to thin wrapper delegating to OboTokenService
- Add obo schema field to BaseOptionsSchema in data-provider
- Add resolveOboToken in packages/api/src/mcp/oauth/obo.ts (validates federated token, calls resolver, returns MCPOAuthTokens)
- Wire oboTokenResolver through MCPConnectionFactory, MCPManager, UserConnectionManager
- OBO tokens injected via request headers (not OAuth transport), refreshed on each tool call
- Explicit error on OBO failure (no fallthrough to standard OAuth redirect)
- Add unit tests for both resolveOboToken (9 tests) and exchangeOboToken (14 tests)

* Add OBO authentication option to MCP server UI configuration

  Enable users to configure On-Behalf-Of (OBO) token exchange for MCP servers created via the UI (MongoDB-stored), in addition to the existing YAML-based configuration.

  - Add "On-Behalf-Of (OBO)" radio option to MCP server auth section with scopes input field
  - Remove obo from omitServerManagedFields so the field passes UI schema validation
  - Add OBO to AuthTypeEnum, obo_scopes to AuthConfig, and OBO handling in form defaults and submission
  - Add .min(1) validation on obo.scopes to reject empty strings
  - Add English localization keys: com_ui_obo, com_ui_obo_scopes, com_ui_obo_scopes_description
  - Add 5 schema validation tests for OBO field acceptance, transport compatibility, and edge cases

* 🧊 fix: Add obo to safe properties in redactServerSecrets. Fixes the OBO configuration not showing up in the MCP UI after app restart

* Address linter errors

* 🧊 fix: fail closed on OBO refresh errors and retry transient token exchange failures

- stop tool calls from falling back to stale Authorization headers when per-call OBO refresh fails
- add one-time retry for transient Entra OBO exchange failures (network/429/5xx)
- preserve structured OBO failure reasons and retryability in resolveOboToken
- improve OBO auth error messaging for connection setup and tool execution
- add tests for transient vs permanent OBO failure paths

* Addressing linting errors / warnings

* 🧊 fix: isolate OBO MCP auth to user-scoped connections

- block OBO-enabled servers from app-level shared MCP connections
- bypass shared connection lookup for OBO servers in MCPManager.getConnection
- add regressions covering OBO connection scoping and preserve non-OBO app connection reuse

* 🛠️ refactor: centralize MCP user-scoped connection policy

- add shared requiresUserScopedConnection helper for OAuth, OBO, and customUserVars
- use the shared predicate in MCPManager and ConnectionsRepository
- add utils coverage for user-scoped connection policy

* 🧊 fix: restrict MCP OBO config to header-capable transports

- Move OBO configuration out of the shared MCP base options schema and allow it
only on SSE and streamable-http transports, where request headers are applied.
- Explicitly reject OBO on stdio and websocket configs to avoid accepted-but-
nonfunctional server definitions. Add schema coverage for admin/config parsing
and user-input websocket validation.

* 🧊 fix: single-flight concurrent OBO token exchanges

Concurrent tool calls that arrive on a cache miss were each issuing
their own jwt-bearer request to the IdP. Under that fan-out, Entra
intermittently returned errors that the retry classifier saw as
non-retryable, surfacing as:

  "The identity provider rejected the OBO token exchange.
   Cannot execute tool <name>. Re-authenticate the user or
   verify the configured OBO scopes and retry."

A user retry then hit the populated cache and succeeded, which matches
the observed flakiness — the cache was empty at the moment of fan-out
but populated by the time the user clicked retry.

- Coalesce concurrent exchanges in `OboTokenService.exchangeOboToken`
keyed by `${openidId}:${scopes}`. Callers that arrive while an exchange
is in flight share the same upstream request and receive the same
result. `fromCache=false` continues to force a fresh, independent
exchange (and is not joined by `fromCache=true` callers). The IdP
call, single-retry path, and cache write are unchanged — they were
moved into a `performOboExchange` helper so the coalescing wrapper
stays small.
- Tests cover: coalescing on the same key, isolation between different
keys, cleanup on success, cleanup on failure, and the
`fromCache=false` bypass.

* 🔒 feat: gate MCP OBO config behind MCP_SERVERS.CONFIGURE_OBO permission

OBO silently mints per-user delegated tokens from the caller's federated
access token and forwards them to whatever URL the server config points at.
Previously, anyone with MCP_SERVERS.CREATE could configure obo.scopes — so
if server creation is ever delegated beyond admins, a user could stand up
an attacker-controlled server, attach it to a shared agent, and exfiltrate
other users' downstream tokens on tool invocation.

Add a dedicated MCP_SERVERS.CONFIGURE_OBO permission (ADMIN: true, USER:
false by default) and enforce it at three layers so the safety property
no longer depends on CREATE staying admin-only:

- Create/update: POST/PATCH /api/mcp/servers returns 403 when the body
  carries `obo` and the caller's role lacks the permission.
- Runtime fail-closed: for DB-sourced configs, MCPConnectionFactory and
  MCPManager.callTool re-check the original author's role before each
  OBO exchange. If the author has been downgraded, the exchange is
  skipped (factory) or refused (callTool) — retained configs lose their
  privileges automatically.
- UI: the OBO option is hidden in the MCP server dialog for users
  without the permission; a CONFIGURE_OBO toggle is exposed in the MCP
  admin role editor.

Existing role docs receive the new sub-key via the permission backfill
in updateInterfacePermissions on next startup, preserving any
operator-set values. YAML/Config-sourced server configs are unaffected
since they're admin-controlled at the deployment level.

* 🧊 fix: wire OBO machinery for servers with requiresOAuth: false

The discovery and user-connection paths gated OAuth wiring (flow
manager, token methods, oboTokenResolver, oboTrustChecker) behind
isOAuthServer(), which only considers requiresOAuth/oauth fields.
A DB-stored OBO server with requiresOAuth: false therefore landed in
the non-OAuth branch, never received an oboTokenResolver, and the
factory's usesObo getter evaluated to false — sending a bare request
that the upstream rejected with invalid_token.

Add requiresOAuthMachinery() (OAuth OR OBO) and use it at those two
gates. isOAuthServer remains for the OAuth-handshake-only check
(shouldInitiateOAuthBeforeConnect), where OBO must not initiate a
handshake. Plumb the OBO resolver/trust-checker through
ToolDiscoveryOptions so reinitMCPServer can pass them on the
discovery path.

* 🧊 fix: lock all OBO-target fields (URL, proxy, headers, auth) without CONFIGURE_OBO

The CONFIGURE_OBO permission was meant to gate control of the endpoint
that receives OBO-minted per-user delegated tokens and the scopes that
are requested. The previous frontend lock + backend gate only covered
obo.scopes and the auth section, leaving url/proxy/headers/etc. editable
by anyone with UPDATE — meaning a non-permission user could still
redirect an existing OBO server's token flow to an attacker endpoint.

Switch to an allowlist policy: when editing an OBO server without
CONFIGURE_OBO, only title/description/iconPath are mutable. Backend
rejects any other field change with 403; frontend disables the
non-allowlist sections (URL, transport, auth, trust) via fieldset.
The comparison surface (MCP_USER_INPUT_FIELDS) is derived from
MCPServerUserInputSchema's union members so it stays in sync with the
schema. New schema fields land in the locked set by default — adding to
the allowlist is the only way to unlock them, which preserves the
security-review boundary.

* 🧊 fix: skip unauthenticated MCP inspection for OBO-only servers

MCPServerInspector.inspectServer() ran an unauthenticated temp connection
unless the config had requiresOAuth or customUserVars set. For OBO-only
servers without standard MCP OAuth advertisement, this caused
MCPConnectionFactory.create to attempt the connection without a user or
oboTokenResolver — failing on servers that reject the MCP initialize
handshake without a valid bearer token, which surfaced as
MCP_INSPECTION_FAILED on create/update.

Add `obo` to the skip list alongside requiresOAuth and customUserVars,
matching the existing pattern for user-scoped auth modes.

* Addressed linting error: watchedTitle is declared but never referenced (the auto-fill logic at line 156 uses getValues('title') instead). Deleted constant.
2026-06-01 22:36:18 -04:00
Danny Avila
479e9d59b7
🧠 refactor: Memoize MCP Permission Checks Per Request (#13419) 2026-05-30 18:32:06 -04:00
Danny Avila
100871c3ec
🛂 fix: Enforce MCP Permissions for Agent Tools (#13174)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
GitNexus Index / index (push) Waiting to run
GitNexus Index / post-index (push) Blocked by required conditions
* fix: Enforce MCP Permissions for Agent Tools

* fix: Measure MCP Image Limit by Decoded Size

* fix: gate cached MCP tools and tighten remote image URL detection

Addresses Codex review findings on the MCP permissions PR:

- filterAuthorizedTools previously fast-accepted any tool present in the
  global tool cache before reaching the MCP-use permission gate. App-level
  MCP tools (keyed `name_mcp_server` by MCPServerInspector and merged into
  the cache via mergeAppTools) therefore bypassed the canUseMCP check,
  letting a user without MCP_SERVERS.USE persist/bind them. Route all
  MCP-delimited tools through the permission + server-access gate
  regardless of cache presence.

- assertImageDataWithinLimit / image formatter used startsWith("http")
  to skip the size cap, which also matched base64 payloads that happen to
  begin with those chars. Require http:// or https:// via a shared
  isRemoteImageUrl helper so oversized inline base64 can no longer bypass
  MCP_IMAGE_DATA_MAX_BYTES.

Adds regression tests for both paths.

* fix: address Codex round-2 findings on MCP permissions PR

- parsers.ts: parseAsString dropped the image payload for unrecognized
  providers, returning only `Image result: <mimeType>`. Pre-PR these
  items survived via JSON.stringify(item). Keep the size guard but fall
  through to JSON.stringify so the data/URL is preserved.

- MCP.js: the runtime MCP-use check only read `configurable.user`, so
  paths that propagate `user_id` only (e.g. the OpenAI-compatible API in
  agents/openai/service.ts) rejected every MCP tool call for an
  authenticated user. Add resolveMCPPermissionUser: use the safe user
  directly when it already carries a role (no extra DB call), otherwise
  fall back to loading the role by user_id. Update fail-closed tests to
  the resolved behavior.

- v1.js: the update path only re-filtered newly added MCP tools, so a
  user who lost MCP_SERVERS.USE kept existing MCP bindings on edit while
  create/duplicate/revert stripped them. Strip all MCP tools on update
  when the permission is revoked; keep the narrower new-tool gating (and
  disconnect/registry preservation) when it is intact.

Updates and adds regression tests for all three paths.

* fix: populate safe user at producer instead of resolving in runtime MCP check

Corrects the Finding B approach from the previous commit. Rather than
loading the user by id inside the runtime MCP permission check, populate
`configurable.user` (and createRun's `user`) with the full safe user at
the producer, matching the in-repo agent controllers
(responses.js / openai.js) which already pass `createSafeUser(req.user)`.

- service.ts: derive `safeUser` via createSafeUser(req.user) and pass it
  to both createRun and processStream's configurable, so the role-bearing
  identity reaches the runtime `userCanUseMCPServers(configurable.user)`
  check. Falls back to a bare id when the host app attached no user,
  which correctly leaves MCP gated (fail closed).
- MCP.js: revert the resolveMCPPermissionUser DB-load fallback; the
  runtime check again reads configurable.user directly and fails closed
  when absent (defense in depth).
- MCP.spec.js: revert to the matching runtime test expectations.

* test: cover safe-user propagation in createAgentChatCompletion

Adds a focused spec for the OpenAI-compatible chat completion service
(the producer fixed for Codex Finding B). Injects mocked deps and asserts
that createRun and processStream's configurable.user carry the role from
req.user (with sensitive fields stripped by createSafeUser), and that an
unauthenticated request falls back to a bare { id: 'api-user' } so the
runtime MCP check fails closed.

* fix: address Codex round-3 findings + TS6133

- MCP.js (P1): the assistants required-action path invokes tool._call(
  toolInput) with no LangChain config, so the runtime check saw no
  configurable.user and rejected authorized users. createToolInstance now
  captures the creation-time user (req.user via createMCPTool) and _call
  falls back to it for both the permission check and userId. Still fails
  closed when neither config nor captured user carries a role.

- v1.js (P2): the update-path isMCPTool used a bare mcp_delimiter substring
  check, misclassifying action tools whose operationId contains "_mcp_"
  (e.g. sync_mcp_state_action_...) as MCP and dropping them on a
  permission-revoked edit. Delegate to the canonical isActionTool so only
  real MCP tools are gated. Regression test added.

- service.ts: drop the now-unused IUser import (TS6133); derive reqUser's
  type from createSafeUser's own parameter instead.

* fix: resolve TS7022 self-reference in service.spec mock res

The mock response object referenced `res` inside its own `status`/`json`
initializers without a type annotation, so tsc inferred `res` as `any`
(TS7022). Annotate the object and assign the self-referencing chainable
methods after declaration.

* fix: correct round-4 findings (isActionTool import, captured user, partial-update)

- v1.js: import isActionTool from librechat-data-provider (its real export;
  @librechat/api does not export it, so the prior import was undefined and
  threw TypeError). Exclude action tools from MCP classification in both the
  main filterAuthorizedTools loop and the update path, so action tools whose
  operationId contains _mcp_ (e.g. sync_mcp_state_action_...) are preserved
  regardless of MCP permission.
- v1.js: evaluate the effective tool set (updateData.tools ?? existingAgent.tools)
  so a tools-less PATCH by a user who lost MCP_SERVERS.USE still strips stale
  MCP bindings, matching create/duplicate/revert.
- MCP.js: createToolInstance now receives the construction-time user and _call
  falls back to it (permissionUser) when configurable.user is absent, fixing the
  assistants required-action path that invokes _call without a config and
  resolving the capturedUser no-undef/ReferenceError.
- Tests: action-tool preservation (authorized + denied), tools-less revocation
  PATCH, updated revocation test to expect all MCP tools stripped.

Affected specs pass locally: MCP 49/49, filterAuthorizedTools 49/49.

* fix: guard isActionTool against non-string tools; correct actionDelimiter import

Two test regressions from the prior commit:
- The main filterAuthorizedTools loop called isActionTool(tool) directly,
  but isActionTool does toolName.indexOf(...) and throws on null/undefined.
  Compute isActionToolName = typeof tool === 'string' && isActionTool(tool)
  once and reuse it, restoring graceful null/undefined handling.
- The action-tool test referenced Constants.actionDelimiter (undefined);
  actionDelimiter is a standalone librechat-data-provider export. Import and
  use it directly.

filterAuthorizedTools 36/36 and MCP 40/40 pass locally.

* fix: address MCP permission review follow-ups

* fix: preserve shared agent MCP tools
2026-05-30 16:19:49 -04:00
Max
1746153c17
🪬 fix: Skip MCP Tools When Required Custom User Vars Are Unset (#13152)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
GitNexus Index / index (push) Waiting to run
GitNexus Index / post-index (push) Blocked by required conditions
* fix: skip MCP tools when required customUserVars are unset (#10969)

* fix: whitespace-only values

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* fix: guard MCP registry lookup and unknown server config in customUserVars gate

* fix: fail closed on MCP registry lookup errors

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Danny Avila <danny@librechat.ai>
2026-05-23 09:24:29 -04:00
Danny Avila
bd64251eb9
🪪 fix: Prevent MCP Server Name Collisions (#13256)
* fix: prevent MCP server name collisions

* chore: address MCP registry review nits

* fix: reserve MCP config names from request context

* chore: format MCP registry changes

* chore: address MCP collision review findings
2026-05-22 20:46:14 -04:00
Danny Avila
749eb06e67
🧭 fix: Reduce MCP Registry ACL Lookups (#13195) 2026-05-19 17:16:37 -04:00
Danny Avila
5bab22d236
🛡️ fix: Gate Bash PTC Capabilities (#13053) 2026-05-10 21:23:02 -04:00
Danny Avila
715a4a5fc1
🧰 refactor: Use Bash PTC for Agent Tools (#13042)
* fix: Use Bash PTC for programmatic agent tools

* fix: Preserve legacy PTC event calls
2026-05-09 16:31:09 -04:00
Danny Avila
f839a447e1
🧬 fix: Subagent MCP requestBody Propagation (bump @librechat/agents to 3.1.78 + cleanup) (#12959)
* 📦 chore: bump `@librechat/agents` to v3.1.78

v3.1.78 ships [danny-avila/agents#147](https://github.com/danny-avila/agents/pull/147),
which makes `SubagentExecutor` inherit the parent invocation's
`configurable` (with `thread_id`/`run_id`/`parent_run_id` scrubbed)
into the child workflow. Subagent tool dispatches through the parent's
`ON_TOOL_EXECUTE` handler now arrive with parent's `requestBody`,
`user`, `userMCPAuthMap`, etc. — so `{{LIBRECHAT_BODY_*}}` placeholder
substitution and per-user MCP connection lookup work for subagent
tool calls the same way they do for the parent agent.

Note: `package-lock.json` will need an `npm install` refresh once
v3.1.78 lands on the registry. The user/user_id injection added in
PR #12950 stays as defense-in-depth.

* 🗑️ refactor: drop redundant user/user_id injection from `loadToolsForExecution`

`@librechat/agents@3.1.78` (via danny-avila/agents#147) makes
`SubagentExecutor` forward the parent's `configurable` verbatim into
the child workflow. Subagent `ON_TOOL_EXECUTE` dispatches now arrive
with parent's `user` / `user_id` already in `data.configurable` —
making the host-side injection added in #12950 a no-op.

Removes:
- The conditional `user: createSafeUser(req.user); user_id: req.user.id`
  block in `loadToolsForExecution` (req.user.id-guarded so the
  `'api-user'` fallback in Responses/OpenAI controllers is preserved).
- The unused `createSafeUser` import.
- The 4 unit tests covering the now-deleted behavior.

The merge in `handlers.ts` (`{ ...configurable, ...toolConfigurable }`)
still produces a `mergedConfigurable` with the right user identity for
both parent and subagent paths — the values just come from
`configurable` (forwarded by the SDK) rather than `toolConfigurable`.

Other fixes from #12950 stay (IUser.id narrowing, the env.ts /
google/initialize.ts / remoteAgentAuth.ts TS-warning fixes) — they
were independent of the subagent identity propagation issue.

* 📦 chore: update `@librechat/agents` to v3.1.78

This update reflects the transition from the development version `3.1.78-dev.0` to the stable release `3.1.78`. The package-lock.json has been refreshed to ensure consistency with the new version, including updated integrity checks and resolved URLs for the package. This change is part of ongoing improvements to enhance the functionality and stability of the agents module.
2026-05-05 22:07:26 -04:00
Danny Avila
bff9bfea87
🐛 fix: Propagate User Identity to Subagent MCP Tool Calls (#12950)
* 🐛 fix: Propagate User Identity to Subagent MCP Tool Calls

The `@librechat/agents` SDK's `SubagentExecutor` invokes the child
workflow with a fresh configurable of `{ thread_id }` only — the
parent's `user` / `user_id` are dropped on the way into the child
graph. The child's `ToolNode` then dispatches `ON_TOOL_EXECUTE` to the
parent's handler, which merges `{ ...configurable, ...toolConfigurable }`,
but neither side carries user identity for subagents.

Downstream MCP tools read `config.configurable.user?.id || user_id` and
got `undefined`, so `MCPManager.getConnection` fell through to the
"No connection found for server X" error path — it can't reach the
user-connection lookup without a userId.

Re-inject `user` (via `createSafeUser`) and `user_id` from `req.user`
into the configurable returned by `loadToolsForExecution`. This is the
single point all controllers (chat, Responses API, OpenAI-compat) flow
through. For the parent agent it's a no-op (outer config already
carries the same values); for subagents it fills the gap so MCP
connection lookup, user-placeholder substitution, and tools that read
configurable.user all work correctly.

* 🐛 fix: Preserve `api-user` Fallback When Injecting Subagent Identity

Codex review pointed out that the prior commit unconditionally wrote
`user_id: req.user?.id` (and `user`) into `toolConfigurable`. The handler
merges via `{ ...configurable, ...toolConfigurable }` — `toolConfigurable`
wins — so when `req.user` is absent, this overwrote the outer config's
`'api-user'` fallback (set by `responses.js` / `openai.js` for the
unauthenticated API-key path) with `undefined`, breaking MCP connection
lookup for that path.

Only inject the keys when `req.user.id` is truthy. Omitting them lets
the merge preserve whatever the outer configurable already had. Tests
updated to assert key omission for `req.user` undefined / null / present
without `id`.

* 🩹 fix: Narrow `IUser.id` to required string

`IUser` extends mongoose `Document`, which types `id?: any` (the optional
virtual). At runtime `id` is always `_id.toString()` for a hydrated doc,
so narrow the type to a required string.

Closes two `@rollup/plugin-typescript` TS2322 warnings introduced by
PR #12450 (OIDC Bearer Token Authentication for Remote Agent API)
where `req.user = userResolution.user` and the
`(req: Request, res: Response, next: NextFunction)` signature both
failed against the project's local `Express.User` augmentation
(`{ [key: string]: any; id: string; }`) because `IUser.id` was
`any`/optional. Narrowing here fixes both at the source rather than
casting at every assignment site.

* 🩹 fix: Resolve TS Build Warnings Surfaced by `IUser.id` Narrowing

Three rollup TS plugin warnings surfaced after narrowing
`IUser.id` from `any` to `string`:

- `utils/env.ts:95` — `safeUser[field] = user[field]` failed strict
  checking because indexed write through a union-typed key collapses
  the LHS to the intersection of all field write types (i.e.,
  `undefined` when fields have mixed types). The previous `id?: any`
  on IUser had been masking this. Switch to `Object.assign(safeUser,
  { [field]: user[field] })` which widens the assignment.

- `endpoints/google/initialize.ts:35` — `getUserKey({ userId:
  req.user?.id, ... })` failed because `req.user?.id` is now
  `string | undefined` (no longer `any`). Match the pattern already
  used in `endpoints/openAI/initialize.ts:49`: `req.user?.id ?? ''`.

- `middleware/remoteAgentAuth.ts:465` — pre-existing, unrelated to
  the IUser change. The local (gitignored) `express.d.ts` augments
  `express.Request` but not `express-serve-static-core.Request`,
  so the explicit `(req: Request, ...)` annotation imported from
  `'express'` resolves to a Request whose `req.user` differs from
  the one `RequestHandler` expects internally. Type the closure as
  `RequestHandler` directly so TS infers params from the augmented
  type.

* 🩹 fix: Cast `RemoteAgentAuth` Closure to `RequestHandler`

My previous attempt removed the explicit `req: Request` annotation on
the closure to side-step the outer `RequestHandler` mismatch. That
shifted the error to every helper call site inside the closure
(`getConfigOptions(req)`, `runApiKeyAuth(req, ...)` at 467/474/493/
512/531), because the helpers annotate their params with
`express.Request` (which has the local `Request.user` augmentation),
while the unannotated closure inferred `req` as
`express-serve-static-core.Request` (no augmentation). Reproduced
locally by stubbing the gitignored `src/types/express.d.ts`.

Right approach: keep the explicit `req: Request` annotation so the
closure body matches the helpers' types, then cast at the return —
`RequestHandler`'s internal `Request` resolves through
`express-serve-static-core` and lacks the augmentation, so the cast
is the boundary that bridges the two views of `req.user`.

Verified against a build with the local express.d.ts stub: zero
warnings on `remoteAgentAuth.ts`, `env.ts`, and `google/initialize.ts`.
2026-05-05 12:19:50 +09:00
Caesar Isidro Va-ay
9b9a86d17d
🔀 fix: Resolve Action Tools by Exact Name to Prevent Multi-Action Domain Collision (#12594)
* 🐛 fix: resolve Action tools by exact tool name to prevent multi-action collision

When two OpenAPI Actions on the same Agent share a hostname, the second
action's entry overwrote the first in the encoded-domain Map and one
action's tools silently disappeared from the LLM payload. The buggy
resolution loop also used substring matching, which caused similar
shadowing for any encoded-domain prefix overlap.

This change builds a Map keyed on the full tool name
(`<operationId>_action_<encoded-domain>`) directly, mirroring the exact
lookup pattern that getActionToolDefinitions already uses. Each function
in an action's spec gets its own slot, so two actions sharing a hostname
no longer collide. Both the new and the legacy domain encodings are
registered for each function so agents whose stored tool names predate
the current encoding still resolve.

Applied at all three call sites that had the buggy pattern:

  - processRequiredActions (assistants/threads path)
  - loadAgentTools (agent build path)
  - loadActionToolsForExecution (agent execution path)

Adds three regression tests covering both ordering directions and the
execution path. Tests fail without the fix and pass with it.

* 🐛 fix: Normalize action tool name at lookup + cover assistants path

Follow-up to the multi-action domain collision fix. Addresses PR #12594
review feedback:

**Must-fix #1 — short-hostname lookup mismatch.** The toolToAction map
is keyed on the `_`-collapsed domain, but `agent.tools` and
`currentAction.tool` persist the raw `domainParser(..., true)` output,
which for hostnames ≤ ENCODED_DOMAIN_LENGTH is a `---`-separated string
(e.g. `medium---com`). Exact-match `Map.get()` missed those keys and
silently dropped the tool. Fix: normalize every incoming tool name
through a new `normalizeActionToolName` helper before the lookup in
`loadAgentTools`, `processRequiredActions`, and
`loadActionToolsForExecution`.

**Must-fix #2 — assistants path coverage.** `processRequiredActions`
received the same structural rewrite but had zero tests. Added a
regression test under `multi-action domain collision regression` that
drives two shared-hostname actions through the assistants path and
asserts each tool reaches its own request builder.

**Must-fix #3 — legacy encoding branch coverage.** The
`if (legacyNormalized !== normalizedDomain)` registration was never
exercised by any test. Added a test where `agent.tools` stores the
legacy-format name and asserts it still resolves.

**Should-fix #4 — DRY the registration loop.** Extracted
`registerActionTools({ toolToAction, functionSignatures,
normalizedDomain, legacyNormalized, makeEntry })`. All three call sites
now share the same key-building logic; the key template lives in one
place.

**Should-fix #5 — remove stale optional chaining.** In
`loadActionToolsForExecution`, `functionSignature?.description ?? ''`
became `functionSignature.description` — `sig` is always defined by the
iterator, matching the style of `loadAgentTools`.

**Should-fix #6 — drop unreachable `!requestBuilder` guard.** Entries
in `processRequiredActions` are now pre-built with
`requestBuilder: requestBuilders[sig.name]`, which `openapiToFunction`
always produces alongside the signature, so the guard is dead.

**Should-fix #7 — unwrap `actionSetsData`.** It now holds a bare
`Map` instead of `{ toolToAction }`; the sentinel `!actionSetsData`
check still works because `new Map()` is truthy.

Also added a short-hostname regression test
(`loadAgentTools resolves raw ---separated tool names`) that reproduces
Must-fix #1: it fails against the previous commit (0 create calls) and
passes with the normalization in place.

41 tests, all passing. The 3 new regression tests are under
`multi-action domain collision regression` and cover the assistants
path, the legacy encoding branch, and the short-hostname lookup path.

* 🐛 fix: Tighten registerActionTools key handling and assistants test

Follow-up to d643444 addressing the second review pass on PR #12594.

**ESLint** — Two prettier errors in the spec file (multi-line arrow
function bodies that should fit on one line). Auto-fixed.

**[MINOR] operationId containing `---` → key mismatch.** The lookup
path collapsed every `actionDomainSeparator` sequence in the full tool
name, but the registration path passed `sig.name` through unchanged.
A `---` that survived into an operationId would shift the underscore
boundary at lookup and miss its own key. Fix in `registerActionTools`:
normalize `sig.name` with the same helper so registration and lookup
always agree on the canonical form. `sanitizeOperationId` strips the
characters that produce `---` in practice, so this is theoretical
hardening, not a fix for a known reproducer.

**[MINOR] Same-operationId + same-hostname silent overwrite.** Two
actions sharing both an operationId and a hostname still produced a
silent `Map.set()` overwrite (the new key is identical, so neither the
operationId nor the domain disambiguates). Added a `setKey` helper
inside `registerActionTools` that logs a `[Actions] operationId
collision: ...` warning whenever a key is already present, naming the
overwriting action_id. The silent-overwrite mode from the original bug
cannot reappear under a different disguise without surfacing in the
logs.

**[NIT] processRequiredActions test simulated a runtime crash.**
`mockCreateActionTool` returned a tool with `_call: jest.fn()`, which
resolves to `undefined`. `processRequiredActions` chains
`.then(handleToolOutput).catch(handleToolError)` directly onto that
return, so `undefined.then(...)` threw synchronously and the outer
try/catch funneled the error into `handleToolError`. Creation count
assertions still passed because `createActionTool` runs before the
crash, but the test was silently exercising the failure path. Updated
the global mock to `_call: jest.fn().mockResolvedValue('{"status":"ok"}')`
so the success path runs end-to-end. The assistants regression test
now executes in ~5ms instead of ~90ms, which corroborates that it's
no longer hitting the synchronous throw.

**[NIT] Duplicated rationale comments.** All three call sites carried
multi-line comment blocks restating why we key on the full tool name.
That rationale now lives canonically in `registerActionTools`'s JSDoc;
the inline blocks collapsed to `// See registerActionTools for the
key-shape rationale.` Net -22 lines of comments.

41/41 tests still pass; lint is clean.

* 🐛 fix: Scope tool-name normalization to the encoded-domain suffix

Follow-up to f22228e addressing the Codex P1 on PR #12594.

**Regression.** The previous commit normalized the entire tool name
(`normalizeActionToolName(sig.name)` at registration, full-name
`.replace()` at lookup) to handle operationIds that theoretically
contained `---`. But `openapiToFunction` uses user-supplied
operationIds verbatim and the fallback `sanitizeOperationId` only
strips characters outside `[a-zA-Z0-9_-]`, so specs can legitimately
produce operationIds like `get_foo---bar` and `get_foo_bar` side by
side. Collapsing `---` to `_` across the entire key merged those two
into a single map slot — one silently overwrote the other, and both
tool requests routed to the surviving entry's request builder.

**Fix.** Limit normalization to the encoded-domain portion of the
full tool name, i.e. the substring after the last `actionDelimiter`.
The operationId half is left verbatim, so hyphens-vs-underscores
remain disambiguating. The short-hostname bug (Must-fix #1 from the
original review) is still covered because the `---` → `_` collapse
still happens on the domain suffix where it matters:

```js
const normalizeActionToolName = (toolName) => {
  const delimiterIndex = toolName.lastIndexOf(actionDelimiter);
  if (delimiterIndex === -1) return toolName;
  const prefixEnd = delimiterIndex + actionDelimiter.length;
  const encodedDomain = toolName.slice(prefixEnd);
  return toolName.slice(0, prefixEnd) + encodedDomain.replace(domainSeparatorRegex, '_');
};
```

`registerActionTools` reverts to `sig.name` verbatim — no more
`normalizeActionToolName(sig.name)` ahead of the key build.

**Regression test.** Added
`loadAgentTools distinguishes operationIds that differ only by
---` vs `_`` under `multi-action domain collision regression`. It
loads two actions sharing a hostname whose operationIds are
`get_foo---bar` and `get_foo_bar` respectively, each pointing at a
different path (`/foo-bar`, `/foo_bar`), and asserts that each tool
resolves to its own request builder. Verified the test fails against
f22228e (`hyphenTool` resolves to `/foo_bar` — the sibling's builder)
and passes with this commit.

42/42 tests pass; lint clean.
2026-04-13 09:08:06 -04:00
Danny Avila
275af48592
🎯 fix: MCP Tool Misclassification from Action Delimiter Collision (#12512)
* fix: prevent MCP tools with `_action` in name from being misclassified as OpenAPI action tools

Add `isActionTool()` helper that checks for the `_action_` delimiter
while guarding against cross-delimiter collision with `_mcp_`. Replace
all `includes(actionDelimiter)` classification checks with the new
helper across backend and frontend.

* test: add coverage for MCP/action cross-delimiter collision

Verify that `isActionTool` correctly rejects MCP tool names containing
`_action` and that `loadAgentTools` does not filter them based on
`actionsEnabled`. Add ToolIcon and definitions test cases.

* fix: simplify isActionTool to handle all MCP name patterns

- Use `!toolName.includes('_mcp_')` instead of checking only after the
  first `_action_` occurrence, which missed MCP tools with `_action_` in
  the middle of their name (e.g. `get_action_data_mcp_myserver`).
- Reference `Constants.mcp_delimiter` value via a local const to avoid
  circular import from config.ts, with a comment explaining why.
- Remove dead `actionDelimiter` import from definitions.ts.
- Replace double-filter with single-pass partition in loadToolsForExecution.
- Add test for mid-name `_action_` collision case.

* fix: narrow MCP exclusion to delimiter position in isActionTool

Only reject when `_mcp_` appears after `_action_` (the MCP suffix
position). `_mcp_` before `_action_` is part of the operationId and
is valid — e.g. `sync_mcp_state_action_api---example---com` is a
legitimate action tool whose operationId happens to contain `_mcp_`.

* fix: document positional _mcp_ guard and known RFC-invalid domain limitation

Expand JSDoc on isActionTool to explain the action/MCP format
disambiguation and the theoretical false negative for non-RFC-compliant
domains containing `_mcp_`. Add test documenting this known edge case.
2026-04-01 22:36:21 -04:00
Danny Avila
935288f841
🏗️ feat: 3-Tier MCP Server Architecture with Config-Source Lazy Init (#12435)
* feat: add MCPServerSource type, tenantMcpPolicy schema, and source-based dbSourced wiring

- Add `tenantMcpPolicy` to `mcpSettings` in YAML config schema with
  `enabled`, `maxServersPerTenant`, `allowedTransports`, and `allowedDomains`
- Add `MCPServerSource` type ('yaml' | 'config' | 'user') and `source`
  field to `ParsedServerConfig`
- Change `dbSourced` determination from `!!config.dbId` to
  `config.source === 'user'` across MCPManager, ConnectionsRepository,
  UserConnectionManager, and MCPServerInspector
- Set `source: 'user'` on all DB-sourced servers in ServerConfigsDB

* feat: three-layer MCPServersRegistry with config cache and lazy init

- Add `configCacheRepo` as third repository layer between YAML cache and
  DB for admin-defined config-source MCP servers
- Implement `ensureConfigServers()` that identifies config-override servers
  from resolved `getAppConfig()` mcpConfig, lazily inspects them, and
  caches parsed configs with `source: 'config'`
- Add `lazyInitConfigServer()` with timeout, stub-on-failure, and
  concurrent-init deduplication via `pendingConfigInits` map
- Extend `getAllServerConfigs()` with optional `configServers` param for
  three-way merge: YAML → Config → User
- Add `getServerConfig()` lookup through config cache layer
- Add `invalidateConfigCache()` for clearing config-source inspection
  results on admin config mutations
- Tag `source: 'yaml'` on CACHE-stored servers and `source: 'user'` on
  DB-stored servers in `addServer()` and `addServerStub()`

* feat: wire tenant context into MCP controllers, services, and cache invalidation

- Resolve config-source servers via `getAppConfig({ role, tenantId })`
  in `getMCPTools()` and `getMCPServersList()` controllers
- Pass `ensureConfigServers()` results through `getAllServerConfigs()`
  for three-way merge of YAML + Config + User servers
- Add tenant/role context to `getMCPSetupData()` and connection status
  routes via `getTenantId()` from ALS
- Add `clearMcpConfigCache()` to `invalidateConfigCaches()` so admin
  config mutations trigger re-inspection of config-source MCP servers

* feat: enforce tenantMcpPolicy on admin config mcpServers mutations

- Add `validateMcpServerPolicy()` helper that checks mcpServers against
  operator-defined `tenantMcpPolicy` (enabled, maxServersPerTenant,
  allowedTransports, allowedDomains)
- Wire validation into `upsertConfigOverrides` and `patchConfigField`
  handlers — rejects with 403 when policy is violated
- Infer transport type from config shape (command → stdio, url protocol
  → websocket/sse, type field → streamable-http)
- Validate server domains against policy allowlist when configured

* revert: remove tenantMcpPolicy schema and enforcement

The existing admin config CRUD routes already provide the mechanism
for granular MCP server prepopulation (groups, roles, users). The
tenantMcpPolicy gating adds unnecessary complexity that can be
revisited if needed in the future.

- Remove tenantMcpPolicy from mcpSettings Zod schema
- Remove validateMcpServerPolicy helper and TenantMcpPolicy interface
- Remove policy enforcement from upsertConfigOverrides and
  patchConfigField handlers

* test: update test assertions for source field and config-server wiring

- Use objectContaining in MCPServersRegistry reset test to account for
  new source: 'yaml' field on CACHE-stored configs
- Add getTenantId and ensureConfigServers mocks to MCP route tests
- Add getAppConfig mock to route test Config service mock
- Update getMCPSetupData assertion to expect second options argument
- Update getAllServerConfigs assertions for new configServers parameter

* fix: disconnect active connections when config-source servers are evicted

When admin config overrides change and config-source MCP servers are
removed, the invalidation now proactively disconnects active connections
for evicted servers instead of leaving them lingering until timeout.

- Return evicted server names from invalidateConfigCache()
- Disconnect app-level connections for evicted servers in
  clearMcpConfigCache() via MCPManager.appConnections.disconnect()

* fix: address code review findings (CRITICAL, MAJOR, MINOR)

CRITICAL fixes:
- Scope configCacheRepo keys by config content hash to prevent
  cross-tenant cache poisoning when two tenants define the same
  server name with different configurations
- Change dbSourced checks from `source === 'user'` to
  `source !== 'yaml' && source !== 'config'` so undefined source
  (pre-upgrade cached configs) fails closed to restricted mode

MAJOR fixes:
- Derive OAuth servers from already-computed mcpConfig instead of
  calling getOAuthServers() separately — config-source OAuth servers
  are now properly detected
- Add parseInt radix (10) and NaN guard with fallback to 30_000
  for CONFIG_SERVER_INIT_TIMEOUT_MS
- Add CONFIG_CACHE_NAMESPACE to aggregate-key branch in
  ServerConfigsCacheFactory to avoid SCAN-based Redis stalls
- Remove `if (role || tenantId)` guard in getMCPSetupData — config
  servers now always resolve regardless of tenant context

MINOR fixes:
- Extract resolveAllMcpConfigs() helper in mcp controller to
  eliminate 3x copy-pasted config resolution boilerplate
- Distinguish "not initialized" from real errors in
  clearMcpConfigCache — log actual failures instead of swallowing
- Remove narrative inline comments per style guide
- Remove dead try/catch inside Promise.allSettled in
  ensureConfigServers (inner method never throws)
- Memoize YAML server names to avoid repeated cacheConfigsRepo.getAll()
  calls per request

Test updates:
- Add ensureConfigServers mock to registry test fixtures
- Update getMCPSetupData assertions for inline OAuth derivation

* fix: address code review findings (CRITICAL, MAJOR, MINOR)

CRITICAL fixes:
- Break circular dependency: move CONFIG_CACHE_NAMESPACE from
  MCPServersRegistry to ServerConfigsCacheFactory
- Fix dbSourced fail-closed: use source field when present, fall back to
  legacy dbId check when absent (backward-compatible with pre-upgrade
  cached configs that lack source field)

MAJOR fixes:
- Add CONFIG_CACHE_NAMESPACE to aggregate-key set in
  ServerConfigsCacheFactory to avoid SCAN-based Redis stalls
- Add comprehensive test suite (ensureConfigServers.test.ts, 18 tests)
  covering lazy init, stub-on-failure, cross-tenant isolation via config
  hash keys, concurrent deduplication, merge order, and cache invalidation

MINOR fixes:
- Update MCPServerInspector test assertion for dbSourced change

* fix: restore getServerConfig lookup for config-source servers (NEW-1)

Add configNameToKey map that indexes server name → hash-based cache key
for O(1) lookup by name in getServerConfig. This restores the config
cache layer that was dropped when hash-based keys were introduced.

Without this fix, config-source servers appeared in tool listings
(via getAllServerConfigs) but getServerConfig returned undefined,
breaking all connection and tool call paths.

- Populate configNameToKey in ensureSingleConfigServer
- Clear configNameToKey in invalidateConfigCache and reset
- Clear stale read-through cache entries after lazy init
- Remove dead code in invalidateConfigCache (config.title, key parsing)
- Add getServerConfig tests for config-source server lookup

* fix: eliminate configNameToKey race via caller-provided configServers param

Replace the process-global configNameToKey map (last-writer-wins under
concurrent multi-tenant load) with a configServers parameter on
getServerConfig. Callers pass the pre-resolved config servers map
directly — no shared mutable state, no cross-tenant race.

- Add optional configServers param to getServerConfig; when provided,
  returns matching config directly without any global lookup
- Remove configNameToKey map entirely (was the source of the race)
- Extract server names from cache keys via lastIndexOf in
  invalidateConfigCache (safe for names containing colons)
- Use mcpConfig[serverName] directly in getMCPTools instead of a
  redundant getServerConfig call
- Add cross-tenant isolation test for getServerConfig

* fix: populate read-through cache after config server lazy init

After lazyInitConfigServer succeeds, write the parsed config to
readThroughCache keyed by serverName so that getServerConfig calls
from ConnectionsRepository, UserConnectionManager, and
MCPManager.callTool find the config without needing configServers.

Without this, config-source servers appeared in tool listings but
every connection attempt and tool call returned undefined.

* fix: user-scoped getServerConfig fallback to server-only cache key

When getServerConfig is called with a userId (e.g., from callTool or
UserConnectionManager), the cache key is serverName::userId. Config-source
servers are cached under the server-only key (no userId). Add a fallback
so user-scoped lookups find config-source servers in the read-through cache.

* fix: configCacheRepo fallback, isUserSourced DRY, cross-process race

CRITICAL: Add findInConfigCache fallback in getServerConfig so
config-source servers remain reachable after readThroughCache TTL
expires (5s). Without this, every tool call after 5s returned
undefined for config-source servers.

MAJOR: Extract isUserSourced() helper to mcp/utils.ts and replace
all 5 inline dbSourced ternary expressions (MCPManager x2,
ConnectionsRepository, UserConnectionManager, MCPServerInspector).

MAJOR: Fix cross-process Redis race in lazyInitConfigServer — when
configCacheRepo.add throws (key exists from another process), fall
back to reading the existing entry instead of returning undefined.

MINOR: Parallelize invalidateConfigCache awaits with Promise.all.
Remove redundant .catch(() => {}) inside Promise.allSettled.
Tighten dedup test assertion to toBe(1).
Add TTL-expiry tests for getServerConfig (with and without userId).

* feat: thread configServers through getAppToolFunctions and formatInstructionsForContext

Add optional configServers parameter to getAppToolFunctions,
getInstructions, and formatInstructionsForContext so config-source
server tools and instructions are visible to agent initialization
and context injection paths.

Existing callers (boot-time init, tests) pass no argument and
continue to work unchanged. Agent runtime paths can now thread
resolved config servers from request context.

* fix: stale failure stubs retry after 5 min, upsert for cross-process races

- Add CONFIG_STUB_RETRY_MS (5 min) — stale failure stubs are retried
  instead of permanently disabling config-source servers after transient
  errors (DNS outage, cold-start race)
- Extract upsertConfigCache() helper that tries add then falls back to
  update, preventing cross-process Redis races where a second instance's
  successful inspection result was discarded
- Add test for stale-stub retry after CONFIG_STUB_RETRY_MS

* fix: stamp updatedAt on failure stubs, null-guard callTool config, test cleanup

- Add updatedAt: Date.now() to failure stubs in lazyInitConfigServer so
  CONFIG_STUB_RETRY_MS (5 min) window works correctly — without it, stubs
  were always considered stale (updatedAt ?? 0 → epoch → always expired)
- Add null guard for rawConfig in MCPManager.callTool before passing to
  preProcessGraphTokens — prevents unsafe `as` cast on undefined
- Log double-failure in upsertConfigCache instead of silently swallowing
- Replace module-scope Date.now monkey-patch with jest.useFakeTimers /
  jest.setSystemTime / jest.useRealTimers in ensureConfigServers tests

* fix: server-only readThrough fallback only returns truthy values

Prevents a cached undefined from a prior no-userId lookup from
short-circuiting the DB query on a subsequent userId-scoped lookup.

* fix: remove findInConfigCache to eliminate cross-tenant config leakage

The findInConfigCache prefix scan (serverName:*) could return any
tenant's config after readThrough TTL expires, violating tenant
isolation. Config-source servers are now ONLY resolvable through:

1. The configServers param (callers with tenant context from ALS)
2. The readThrough cache (populated by ensureSingleConfigServer,
   5s TTL, repopulated on every HTTP request via resolveAllMcpConfigs)

Connection/tool-call paths without tenant context rely exclusively on
the readThrough cache. If it expires before the next HTTP request
repopulates it, the server is not found — which is correct because
there is no tenant context to determine which config to return.

- Remove findInConfigCache method and its call in getServerConfig
- Update server-only readThrough fallback to only return truthy values
  (prevents cached undefined from short-circuiting user-scoped DB lookup)
- Update tests to document tenant isolation behavior after cache expiry

* style: fix import order per AGENTS.md conventions

Sort package imports shortest-to-longest, local imports longest-to-shortest
across MCPServersRegistry, ConnectionsRepository, MCPManager,
UserConnectionManager, and MCPServerInspector.

* fix: eliminate cross-tenant readThrough contamination and TTL-expiry tool failures

Thread pre-resolved serverConfig from tool creation context into
callTool, removing dependency on the readThrough cache for config-source
servers. This fixes two issues:

- Cross-tenant contamination: the readThrough cache key was unscoped
  (just serverName), so concurrent multi-tenant requests for same-named
  servers would overwrite each other's entries
- TTL expiry: tool calls happening >5s after config resolution would
  fail with "Configuration not found" because the readThrough entry
  had expired

Changes:
- Add optional serverConfig param to MCPManager.callTool — uses
  provided config directly, falling back to getServerConfig lookup
  for YAML/user servers
- Thread serverConfig from createMCPTool through createToolInstance
  closure to callTool
- Remove readThrough write from ensureSingleConfigServer — config-source
  servers are only accessible via configServers param (tenant-scoped)
- Remove server-only readThrough fallback from getServerConfig
- Increase config cache hash from 8 to 16 hex chars (64-bit)
- Add isUserSourced boundary tests for all source/dbId combinations
- Fix double Object.keys call in getMCPTools controller
- Update test assertions for new getServerConfig behavior

* fix: cache base configs for config-server users; narrow upsertConfigCache error handling

- Refactor getAllServerConfigs to separate base config fetch (YAML + DB)
  from config-server layering. Base configs are cached via readThroughCacheAll
  regardless of whether configServers is provided, eliminating uncached
  MongoDB queries per request for config-server users
- Narrow upsertConfigCache catch to duplicate-key errors only;
  infrastructure errors (Redis timeouts, network failures) now propagate
  instead of being silently swallowed, preventing inspection storms
  during outages

* fix: restore correct merge order and document upsert error matching

- Restore YAML → Config → User DB precedence in getAllServerConfigs
  (user DB servers have highest precedence, matching the JSDoc contract)
- Add source comment on upsertConfigCache duplicate-key detection
  linking to the two cache implementations that define the error message

* feat: complete config-source server support across all execution paths

Wire configServers through the entire agent execution pipeline so
config-source MCP servers are fully functional — not just visible in
listings but executable in agent sessions.

- Thread configServers into handleTools.js agent tool pipeline: resolve
  config servers from tenant context before MCP tool iteration, pass to
  getServerConfig, createMCPTools, and createMCPTool
- Thread configServers into agent instructions pipeline:
  applyContextToAgent → getMCPInstructionsForServers →
  formatInstructionsForContext, resolved in client.js before agent
  context application
- Add configServers param to createMCPTool and createMCPTools for
  reconnect path fallback
- Add source field to redactServerSecrets allowlist for client UI
  differentiation of server tiers
- Narrow invalidateConfigCache to only clear readThroughCacheAll (merged
  results), preserving YAML individual-server readThrough entries
- Update context.spec.ts assertions for new configServers parameter

* fix: add missing mocks for config-source server dependencies in client.test.js

Mock getMCPServersRegistry, getAppConfig, and getTenantId that were added
to client.js but not reflected in the test file's jest.mock declarations.

* fix: update formatInstructionsForContext assertions for configServers param

The test assertions expected formatInstructionsForContext to be called with
only the server names array, but it now receives configServers as a second
argument after the config-source server feature wiring.

* fix: move configServers resolution before MCP tool loop to avoid TDZ

configServers was declared with `let` after the first tool loop but
referenced inside it via getServerConfig(), causing a ReferenceError
temporal dead zone. Move declaration and resolution before the loop,
using tools.some(mcpToolPattern) to gate the async resolution.

* fix: address review findings — cache bypass, discoverServerTools gap, DRY

- #2: getAllServerConfigs now always uses getBaseServerConfigs (cached via
  readThroughCacheAll) instead of bypassing it when configServers is present.
  Extracts user-DB entries from cached base by diffing against YAML keys
  to maintain YAML → Config → User DB merge order without extra MongoDB calls.

- #3: Add configServers param to ToolDiscoveryOptions and thread it through
  discoverServerTools → getServerConfig so config-source servers are
  discoverable during OAuth reconnection flows.

- #6: Replace inline import() type annotations in context.ts with proper
  import type { ParsedServerConfig } per AGENTS.md conventions.

- #7: Extract resolveConfigServers(req) helper in MCP.js and use it from
  handleTools.js and client.js, eliminating the duplicated 6-line config
  resolution pattern.

- #10: Restore removed "why" comment explaining getLoaded() vs getAll()
  choice in getMCPSetupData — documents non-obvious correctness constraint.

- #11: Fix incomplete JSDoc param type on resolveAllMcpConfigs.

* fix: consolidate imports, reorder constants, fix YAML-DB merge edge case

- Merge duplicate @librechat/data-schemas requires in MCP.js into one
- Move resolveConfigServers after module-level constants
- Fix getAllServerConfigs edge case where user-DB entry overriding a
  YAML entry with the same name was excluded from userDbConfigs; now
  uses reference equality check to detect DB-overwritten YAML keys

* fix: replace fragile string-match error detection with proper upsert method

Add upsert() to IServerConfigsRepositoryInterface and all implementations
(InMemory, Redis, RedisAggregateKey, DB). This eliminates the brittle
error message string match ('already exists in cache') in upsertConfigCache
that was the only thing preventing cross-process init races from silently
discarding inspection results.

Each implementation handles add-or-update atomically:
- InMemory: direct Map.set()
- Redis: direct cache.set()
- RedisAggregateKey: read-modify-write under write lock
- DB: delegates to update() (DB servers use explicit add() with ACL setup)

* fix: wire configServers through remaining HTTP endpoints

- getMCPServerById: use resolveAllMcpConfigs instead of bare getServerConfig
- reinitialize route: resolve configServers before getServerConfig
- auth-values route: resolve configServers before getServerConfig
- getOAuthHeaders: accept configServers param, thread from callers
- Update mcp.spec.js tests to mock getAllServerConfigs for GET by name

* fix: thread serverConfig through getConnection for config-source servers

Config-source servers exist only in configCacheRepo, not in YAML cache or
DB. When callTool → getConnection → getUserConnection → getServerConfig
runs without configServers, it returns undefined and throws. Fix by
threading the pre-resolved serverConfig (providedConfig) from callTool
through getConnection → getUserConnection → createUserConnectionInternal,
using it as a fallback before the registry lookup.

* fix: thread configServers through reinit, reconnect, and tool definition paths

Wire configServers through every remaining call chain that creates or
reconnects MCP server connections:

- reinitMCPServer: accepts serverConfig and configServers, uses them for
  getServerConfig fallback, getConnection, and discoverServerTools
- reconnectServer: accepts and passes configServers to reinitMCPServer
- createMCPTools/createMCPTool: pass configServers to reconnectServer
- ToolService.loadToolDefinitionsWrapper: resolves configServers from req,
  passes to both reinitMCPServer call sites
- reinitialize route: passes serverConfig and configServers to reinitMCPServer

* fix: address review findings — simplify merge, harden error paths, fix log labels

- Simplify getAllServerConfigs merge: replace fragile reference-equality
  loop with direct spread { ...yamlConfigs, ...configServers, ...base }
- Guard upsertConfigCache in lazyInitConfigServer catch block so cache
  failures don't mask the original inspection error
- Deduplicate getYamlServerNames cold-start with promise dedup pattern
- Remove dead `if (!mcpConfig)` guard in getMCPSetupData
- Fix hardcoded "App server" in ServerConfigsCacheRedisAggregateKey error
  messages — now uses this.namespace for correct Config/App labeling
- Remove misleading OAuth callback comment about readThrough cache
- Move resolveConfigServers after module-level constants in MCP.js

* fix: clear rejected yamlServerNames promise, fix config-source reinspect, fix reset log label

- Clear yamlServerNamesPromise on rejection so transient cache errors
  don't permanently prevent ensureConfigServers from working
- Skip reinspectServer for config-source servers (source: 'config') in
  reinitMCPServer — they lack a CACHE/DB storage location; retry is
  handled by CONFIG_STUB_RETRY_MS in ensureConfigServers
- Use source field instead of dbId for storageLocation derivation
- Fix remaining hardcoded "App" in reset() leaderCheck message

* fix: persist oauthHeaders in flow state for config-source OAuth servers

The OAuth callback route has no JWT auth context and cannot resolve
config-source server configs. Previously, getOAuthHeaders would silently
return {} for config-source servers, dropping custom token exchange headers.

Now oauthHeaders are persisted in MCPOAuthFlowMetadata during flow
initiation (which has auth context), and the callback reads them from
the stored flow state with a fallback to the registry lookup for
YAML/user-DB servers.

* fix: update tests for getMCPSetupData null guard removal and ToolService mock

- MCP.spec.js: update test to expect graceful handling of null mcpConfig
  instead of a throw (getAllServerConfigs always returns an object)
- MCP.js: add defensive || {} for Object.entries(mcpConfig) in case of
  null from test mocks
- ToolService.spec.js: add missing mock for ~/server/services/MCP
  (resolveConfigServers)

* fix: address review findings — DRY, naming, logging, dead code, defensive guards

- #1: Simplify getAllServerConfigs to single getBaseServerConfigs call,
  eliminating redundant double-fetch of cacheConfigsRepo.getAll()
- #2: Add warning log when oauthHeaders absent from OAuth callback flow state
- #3: Extract resolveAllMcpConfigs to MCP.js service layer; controller
  imports shared helper instead of reimplementing
- #4: Rename _serverConfig/_provider to capturedServerConfig/capturedProvider
  in createToolInstance — these are actively used, not unused
- #5: Log rejected results from ensureConfigServers Promise.allSettled
  so cache errors are visible instead of silently dropped
- #6: Remove dead 'MCP config not found' error handlers from routes
- #7: Document circular-dependency reason for dynamic require in clearMcpConfigCache
- #8: Remove logger.error from withTimeout to prevent double-logging timeouts
- #10: Add explicit userId guard in ServerConfigsDB.upsert with clear error message
- #12: Use spread instead of mutation in addServer for immutability consistency
- Add upsert mock to ensureConfigServers.test.ts DB mock
- Update route tests for resolveAllMcpConfigs import change

* fix: restore correct merge priority, use immutable spread, fix test mock

- getAllServerConfigs: { ...configServers, ...base } so userDB wins over
  configServers, matching documented "User DB (highest)" priority
- lazyInitConfigServer: use immutable spread instead of direct mutation
  for parsedConfig.source, consistent with addServer fix
- Fix test to mock getAllServerConfigs as {} instead of null, remove
  unnecessary || {} defensive guard in getMCPSetupData

* fix: error handling, stable hashing, flatten nesting, remove dead param

- Wrap resolveConfigServers/resolveAllMcpConfigs in try/catch with
  graceful {} fallback so transient DB/cache errors don't crash tool pipeline
- Sort keys in configCacheKey JSON.stringify for deterministic hashing
  regardless of object property insertion order
- Flatten clearMcpConfigCache from 3 nested try-catch to early returns;
  document that user connections are cleaned up lazily (accepted tradeoff)
- Remove dead configServers param from getAppToolFunctions (never passed)
- Add security rationale comment for source field in redactServerSecrets

* fix: use recursive key-sorting replacer in configCacheKey to prevent cross-tenant cache collision

The array replacer in JSON.stringify acts as a property allowlist at
every nesting depth, silently dropping nested keys like headers['X-API-Key'],
oauth.client_secret, etc. Two configs with different nested values but
identical top-level structure produced the same hash, causing cross-tenant
cache hits and potential credential contamination.

Switch to a function replacer that recursively sorts keys at all depths
without dropping any properties.

Also document the known gap in getOAuthServers: config-source OAuth
servers are not covered by auto-reconnection or uninstall cleanup
because callers lack request context.

* fix: move clearMcpConfigCache to packages/api to eliminate circular dependency

The function only depends on MCPServersRegistry and MCPManager, both of
which live in packages/api. Import it directly from @librechat/api in
the CJS layer instead of using dynamic require('~/config').

* chore: imports/fields ordering

* fix: address review findings — error handling, targeted lookup, test gaps

- Narrow resolveAllMcpConfigs catch to only wrap ensureConfigServers so
  getAppConfig/getAllServerConfigs failures propagate instead of masking
  infrastructure errors as empty server lists.
- Use targeted getServerConfig in getMCPServerById instead of fetching
  all server configs for a single-server lookup.
- Forward configServers to inner createMCPTool calls so reconnect path
  works for config-source servers.
- Update getAllServerConfigs JSDoc to document disjoint-key design.
- Add OAuth callback oauthHeaders fallback tests (flow state present
  vs registry fallback).
- Add resolveConfigServers/resolveAllMcpConfigs unit tests covering
  happy path and error propagation.

* fix: add getOAuthReconnectionManager mock to OAuth callback tests

* chore: imports ordering
2026-03-28 10:36:43 -04:00
Danny Avila
6f87b49df8
🛂 fix: Enforce Actions Capability Gate Across All Event-Driven Tool Loading Paths (#12252)
* fix: gate action tools by actions capability in all code paths

Extract resolveAgentCapabilities helper to eliminate 3x-duplicated
capability resolution. Apply early action-tool filtering in both
loadToolDefinitionsWrapper and loadAgentTools non-definitions path.
Gate loadActionToolsForExecution in loadToolsForExecution behind an
actionsEnabled parameter with a cache-based fallback. Replace the
late capability guard in loadAgentTools with a hasActionTools check
to avoid unnecessary loadActionSets DB calls and duplicate warnings.

* fix: thread actionsEnabled through InitializedAgent type

Add actionsEnabled to the loadTools callback return type,
InitializedAgent, and the initializeAgent destructuring/return
so callers can forward the resolved value to loadToolsForExecution
without redundant getEndpointsConfig cache lookups.

* fix: pass actionsEnabled from callers to loadToolsForExecution

Thread actionsEnabled through the agentToolContexts map in
initialize.js (primary and handoff agents) and through
primaryConfig in the openai.js and responses.js controllers,
avoiding per-tool-call capability re-resolution on the hot path.

* test: add regression tests for action capability gating

Test the real exported functions (resolveAgentCapabilities,
loadAgentTools, loadToolsForExecution) with mocked dependencies
instead of shadow re-implementations. Covers definition filtering,
execution gating, actionsEnabled param forwarding, and fallback
capability resolution.

* test: use Constants.EPHEMERAL_AGENT_ID in ephemeral fallback test

Replaces a string guess with the canonical constant to avoid
fragility if the ephemeral detection heuristic changes.

* fix: populate agentToolContexts for addedConvo parallel agents

After processAddedConvo returns, backfill agentToolContexts for
any agents in agentConfigs not already present, so ON_TOOL_EXECUTE
for added-convo agents receives actionsEnabled instead of falling
back to a per-call cache lookup.
2026-03-15 23:01:36 -04:00
Danny Avila
d3c06052d7
🗝️ feat: Credential Variables for DB-Sourced MCP Servers (#12044)
* feat: Allow Credential Variables in Headers for DB-sourced MCP Servers

- Removed the hasCustomUserVars check from ToolService.js, directly retrieving userMCPAuthMap.
- Updated MCPConnectionFactory and related classes to include a dbSourced flag for better handling of database-sourced configurations.
- Added integration tests to ensure proper behavior of dbSourced servers, verifying that sensitive placeholders are not resolved while allowing customUserVars.
- Adjusted various MCP-related files to accommodate the new dbSourced logic, ensuring consistent handling across the codebase.

* chore: MCPConnectionFactory Tests with Additional Flow Metadata for typing

- Updated MCPConnectionFactory tests to include new fields in flowMetadata: serverUrl and state.
- Enhanced mockFlowData in multiple test cases to reflect the updated structure, ensuring comprehensive coverage of the OAuth flow scenarios.
- Added authorization_endpoint to metadata in the test setup for improved validation of the OAuth process.

* refactor: Simplify MCPManager Configuration Handling

- Removed unnecessary type assertions and streamlined the retrieval of server configuration in MCPManager.
- Enhanced the handling of OAuth and database-sourced flags for improved clarity and efficiency.
- Updated tests to reflect changes in user object structure and ensure proper processing of MCP environment variables.

* refactor: Optimize User MCP Auth Map Retrieval in ToolService

- Introduced conditional loading of userMCPAuthMap based on the presence of MCP-delimited tools, improving efficiency by avoiding unnecessary calls.
- Updated the loadToolDefinitionsWrapper and loadAgentTools functions to reflect this change, enhancing overall performance and clarity.

* test: Add userMCPAuthMap gating tests in ToolService

- Introduced new tests to validate the logic for determining if MCP tools are present in the agent's tool list.
- Implemented various scenarios to ensure accurate detection of MCP tools, including edge cases for empty, undefined, and null tool lists.
- Enhanced clarity and coverage of the ToolService capability checking logic.

* refactor: Enhance MCP Environment Variable Processing

- Simplified the handling of the dbSourced parameter in the processMCPEnv function.
- Introduced a failsafe mechanism to derive dbSourced from options if not explicitly provided, improving robustness and clarity in MCP environment variable processing.

* refactor: Update Regex Patterns for Credential Placeholders in ServerConfigsDB

- Modified regex patterns to include additional credential/env placeholders that should not be allowed in user-provided configurations.
- Clarified comments to emphasize the security risks associated with credential exfiltration when MCP servers are shared between users.

* chore: field order

* refactor: Clean Up dbSourced Parameter Handling in processMCPEnv

- Reintroduced the failsafe mechanism for deriving the dbSourced parameter from options, ensuring clarity and robustness in MCP environment variable processing.
- Enhanced code readability by maintaining consistent comment structure.

* refactor: Update MCPOptions Type to Include Optional dbId

- Modified the processMCPEnv function to extend the MCPOptions type, allowing for an optional dbId property.
- Simplified the logic for deriving the dbSourced parameter by directly checking the dbId property, enhancing code clarity and maintainability.
2026-03-03 18:02:37 -05:00
Danny Avila
3d98194a99
🦥 feat: Add Deferred Tools as Agents Capability (#11295) 2026-01-28 17:44:30 -05:00