LibreChat

mirror of https://github.com/danny-avila/LibreChat.git synced 2026-06-26 17:31:27 +00:00

Author	SHA1	Message	Date
Danny Avila	7eafe317cc	🗝️ fix: Resolve MCP Runtime User and Request Placeholders (#13626 ) * fix: Resolve MCP Runtime User Placeholders * fix: Harden MCP Runtime Placeholder Connections * fix: Update MCP Source Tag Test Expectations * fix: Complete MCP Runtime Placeholder Reinit * fix: Harden MCP Request Scoped Runtime Configs * fix: Align MCP OAuth Tests With Domain Policy * fix: Harden MCP Runtime Resolution Edges * fix: Avoid MCP Runtime Reprocessing Pitfalls * fix: Reuse MCP Request Scoped Tool Discovery * fix: Validate MCP Body Runtime Fields * 🛡️ refactor: Harden runtime placeholder edges from review - Warn at inspection when a trusted server URL contains runtime placeholders but no domain allowlist restricts the resolved target - Document the three resolution sites that must stay in sync so the validated config always matches the connected one - Note the per-call connect cost of ephemeral GRAPH/BODY connections - Drop the no-op removeUserConnection in callTool's ephemeral cleanup; ephemeral connections are never stored, and removing the entry could orphan a still-connected cached connection after a config change * 🪪 fix: Cover oauth_headers, Graph URL gating, and request-scoped reconnects Address Codex review: - Resolve runtime placeholders in oauth_headers (processMCPEnv + Graph pre-pass) and include the field in placeholder detection, so OAuth discovery/token requests no longer send literals; consolidate the detection field lists into one helper - Defer the early domain gate when the URL still carries a Graph placeholder (resolved async later); the authoritative assertResolvedRuntimeConfigAllowed check still enforces policy - Bypass the 10s reconnect throttle for request-scoped servers, which re-fetch tool definitions on every message by design	2026-06-09 18:52:57 -04:00
Danny Avila	cb1d536874	📻 fix: Replay MCP OAuth Prompts for Coalesced Connections (#13565 ) * fix: Replay MCP OAuth URL for Joined Connections * chore: Sort MCP OAuth Imports * test: Restore MCP OAuth Registry Spies * fix: Replay pending MCP OAuth prompts * fix: Replay MCP OAuth on Stream Resume * fix: Preserve MCP OAuth Replay Context * chore: Format MCP OAuth Replay Context * test: Expect MCP OAuth Replay Expiry * fix: Render pending MCP OAuth prompts * chore: Clean MCP OAuth Replay Type Narrowing * fix: Stabilize new MCP OAuth chats * fix: Re-emit cached MCP OAuth prompts * fix: Replay pending OAuth for selected MCP tools * fix: Avoid stalling pending MCP OAuth replay * test: Clean MCP OAuth review findings * test: Restore MCP OAuth registry spy * fix: Resolve OAuth Typecheck Regressions * fix: Harden MCP OAuth replay edge cases * test: Cover MCP OAuth joined prompt expiry * test: Mark joined OAuth replay fixture * test: Use OAuth fixture for joined replay expiry * fix: Anchor resumed MCP OAuth prompts * fix: Seed resumable turn metadata before MCP init * test: Format resume metadata regression * fix: Prioritize resumable stream routes * fix: Preserve MCP OAuth resume message tree * test: Fix MCP OAuth Resume Test Types * fix: Replay MCP OAuth Regenerate Prompts * fix: Skip OAuth-only Abort Persistence * fix: Stabilize OAuth Resume Replay * fix: Target Non-Tail Regenerate Responses * fix: Scope Regenerate Step Updates * fix: Clean Up OAuth Abort State * fix: Preserve Regenerate Branch Siblings * fix: Preserve OAuth Resume Branch State * fix: Preserve OAuth Branch Resume State * chore: Sort OAuth Resume Imports * fix: Address OAuth Resume Review Findings * test: Fix Abort Fixture Typing	2026-06-07 10:45:54 -04:00
janluedemann-esome	2ed59ac98a	🔐 fix: Handle Multiple Concurrent MCP OAuth Login Prompts (#13200 ) * fix: handle multiple MCP OAuth prompts * fix: address MCP OAuth review feedback * fix: address MCP OAuth prompt lifecycle review * fix: narrow OAuth prompt slot cleanup * fix: format OAuth prompt test --------- Co-authored-by: Danny Avila <danny@librechat.ai>	2026-06-05 17:18:24 -04:00
jcbartle	268f095c1a	🔒 feat: Add On-Behalf-Of (OBO) token exchange support for MCP Servers (#13429 ) Some checks failed Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run Details Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run Details GitNexus Index / index (push) Waiting to run Details GitNexus Index / post-index (push) Blocked by required conditions Details Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run Details Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions Details Sync Helm Chart Tags / Ignore non-main push (push) Waiting to run Details Sync Helm Chart Tags / Sync chart tags (push) Waiting to run Details Publish `librechat-data-provider` to NPM / pack (push) Has been cancelled Details Publish `@librechat/data-schemas` to NPM / pack (push) Has been cancelled Details Publish `librechat-data-provider` to NPM / publish-npm (push) Has been cancelled Details Publish `@librechat/data-schemas` to NPM / publish-npm (push) Has been cancelled Details * Add OBO (On-Behalf-Of) token exchange support for MCP server connections Enables transparent authentication to Entra ID-backed MCP servers using the logged-in user's federated token via the OAuth 2.0 jwt-bearer grant. Configured via obo.scopes in librechat.yaml server config. - Extract generic OboTokenService from GraphTokenService (jwt-bearer grant + cache) - Refactor GraphTokenService to thin wrapper delegating to OboTokenService - Add obo schema field to BaseOptionsSchema in data-provider - Add resolveOboToken in packages/api/src/mcp/oauth/obo.ts (validates federated token, calls resolver, returns MCPOAuthTokens) - Wire oboTokenResolver through MCPConnectionFactory, MCPManager, UserConnectionManager - OBO tokens injected via request headers (not OAuth transport), refreshed on each tool call - Explicit error on OBO failure (no fallthrough to standard OAuth redirect) - Add unit tests for both resolveOboToken (9 tests) and exchangeOboToken (14 tests) * Add OBO authentication option to MCP server UI configuration Enable users to configure On-Behalf-Of (OBO) token exchange for MCP servers created via the UI (MongoDB-stored), in addition to the existing YAML-based configuration. - Add "On-Behalf-Of (OBO)" radio option to MCP server auth section with scopes input field - Remove obo from omitServerManagedFields so the field passes UI schema validation - Add OBO to AuthTypeEnum, obo_scopes to AuthConfig, and OBO handling in form defaults and submission - Add .min(1) validation on obo.scopes to reject empty strings - Add English localization keys: com_ui_obo, com_ui_obo_scopes, com_ui_obo_scopes_description - Add 5 schema validation tests for OBO field acceptance, transport compatibility, and edge cases * 🧊 fix: Add obo to safe properties in redactServerSecrets. Fixes the OBO configuration not showing up in the MCP UI after app restart * Address linter errors * 🧊 fix: fail closed on OBO refresh errors and retry transient token exchange failures - stop tool calls from falling back to stale Authorization headers when per-call OBO refresh fails - add one-time retry for transient Entra OBO exchange failures (network/429/5xx) - preserve structured OBO failure reasons and retryability in resolveOboToken - improve OBO auth error messaging for connection setup and tool execution - add tests for transient vs permanent OBO failure paths * Addressing linting errors / warnings * 🧊 fix: isolate OBO MCP auth to user-scoped connections - block OBO-enabled servers from app-level shared MCP connections - bypass shared connection lookup for OBO servers in MCPManager.getConnection - add regressions covering OBO connection scoping and preserve non-OBO app connection reuse * 🛠️ refactor: centralize MCP user-scoped connection policy - add shared requiresUserScopedConnection helper for OAuth, OBO, and customUserVars - use the shared predicate in MCPManager and ConnectionsRepository - add utils coverage for user-scoped connection policy * 🧊 fix: restrict MCP OBO config to header-capable transports - Move OBO configuration out of the shared MCP base options schema and allow it only on SSE and streamable-http transports, where request headers are applied. - Explicitly reject OBO on stdio and websocket configs to avoid accepted-but- nonfunctional server definitions. Add schema coverage for admin/config parsing and user-input websocket validation. * 🧊 fix: single-flight concurrent OBO token exchanges Concurrent tool calls that arrive on a cache miss were each issuing their own jwt-bearer request to the IdP. Under that fan-out, Entra intermittently returned errors that the retry classifier saw as non-retryable, surfacing as: "The identity provider rejected the OBO token exchange. Cannot execute tool <name>. Re-authenticate the user or verify the configured OBO scopes and retry." A user retry then hit the populated cache and succeeded, which matches the observed flakiness — the cache was empty at the moment of fan-out but populated by the time the user clicked retry. - Coalesce concurrent exchanges in `OboTokenService.exchangeOboToken` keyed by `${openidId}:${scopes}`. Callers that arrive while an exchange is in flight share the same upstream request and receive the same result. `fromCache=false` continues to force a fresh, independent exchange (and is not joined by `fromCache=true` callers). The IdP call, single-retry path, and cache write are unchanged — they were moved into a `performOboExchange` helper so the coalescing wrapper stays small. - Tests cover: coalescing on the same key, isolation between different keys, cleanup on success, cleanup on failure, and the `fromCache=false` bypass. * 🔒 feat: gate MCP OBO config behind MCP_SERVERS.CONFIGURE_OBO permission OBO silently mints per-user delegated tokens from the caller's federated access token and forwards them to whatever URL the server config points at. Previously, anyone with MCP_SERVERS.CREATE could configure obo.scopes — so if server creation is ever delegated beyond admins, a user could stand up an attacker-controlled server, attach it to a shared agent, and exfiltrate other users' downstream tokens on tool invocation. Add a dedicated MCP_SERVERS.CONFIGURE_OBO permission (ADMIN: true, USER: false by default) and enforce it at three layers so the safety property no longer depends on CREATE staying admin-only: - Create/update: POST/PATCH /api/mcp/servers returns 403 when the body carries `obo` and the caller's role lacks the permission. - Runtime fail-closed: for DB-sourced configs, MCPConnectionFactory and MCPManager.callTool re-check the original author's role before each OBO exchange. If the author has been downgraded, the exchange is skipped (factory) or refused (callTool) — retained configs lose their privileges automatically. - UI: the OBO option is hidden in the MCP server dialog for users without the permission; a CONFIGURE_OBO toggle is exposed in the MCP admin role editor. Existing role docs receive the new sub-key via the permission backfill in updateInterfacePermissions on next startup, preserving any operator-set values. YAML/Config-sourced server configs are unaffected since they're admin-controlled at the deployment level. * 🧊 fix: wire OBO machinery for servers with requiresOAuth: false The discovery and user-connection paths gated OAuth wiring (flow manager, token methods, oboTokenResolver, oboTrustChecker) behind isOAuthServer(), which only considers requiresOAuth/oauth fields. A DB-stored OBO server with requiresOAuth: false therefore landed in the non-OAuth branch, never received an oboTokenResolver, and the factory's usesObo getter evaluated to false — sending a bare request that the upstream rejected with invalid_token. Add requiresOAuthMachinery() (OAuth OR OBO) and use it at those two gates. isOAuthServer remains for the OAuth-handshake-only check (shouldInitiateOAuthBeforeConnect), where OBO must not initiate a handshake. Plumb the OBO resolver/trust-checker through ToolDiscoveryOptions so reinitMCPServer can pass them on the discovery path. * 🧊 fix: lock all OBO-target fields (URL, proxy, headers, auth) without CONFIGURE_OBO The CONFIGURE_OBO permission was meant to gate control of the endpoint that receives OBO-minted per-user delegated tokens and the scopes that are requested. The previous frontend lock + backend gate only covered obo.scopes and the auth section, leaving url/proxy/headers/etc. editable by anyone with UPDATE — meaning a non-permission user could still redirect an existing OBO server's token flow to an attacker endpoint. Switch to an allowlist policy: when editing an OBO server without CONFIGURE_OBO, only title/description/iconPath are mutable. Backend rejects any other field change with 403; frontend disables the non-allowlist sections (URL, transport, auth, trust) via fieldset. The comparison surface (MCP_USER_INPUT_FIELDS) is derived from MCPServerUserInputSchema's union members so it stays in sync with the schema. New schema fields land in the locked set by default — adding to the allowlist is the only way to unlock them, which preserves the security-review boundary. * 🧊 fix: skip unauthenticated MCP inspection for OBO-only servers MCPServerInspector.inspectServer() ran an unauthenticated temp connection unless the config had requiresOAuth or customUserVars set. For OBO-only servers without standard MCP OAuth advertisement, this caused MCPConnectionFactory.create to attempt the connection without a user or oboTokenResolver — failing on servers that reject the MCP initialize handshake without a valid bearer token, which surfaced as MCP_INSPECTION_FAILED on create/update. Add `obo` to the skip list alongside requiresOAuth and customUserVars, matching the existing pattern for user-scoped auth modes. * Addressed linting error: watchedTitle is declared but never referenced (the auto-fill logic at line 156 uses getValues('title') instead). Deleted constant.	2026-06-01 22:36:18 -04:00
Danny Avila	479e9d59b7	🧠 refactor: Memoize MCP Permission Checks Per Request (#13419 )	2026-05-30 18:32:06 -04:00
Danny Avila	100871c3ec	🛂 fix: Enforce MCP Permissions for Agent Tools (#13174 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details GitNexus Index / index (push) Waiting to run Details GitNexus Index / post-index (push) Blocked by required conditions Details * fix: Enforce MCP Permissions for Agent Tools * fix: Measure MCP Image Limit by Decoded Size * fix: gate cached MCP tools and tighten remote image URL detection Addresses Codex review findings on the MCP permissions PR: - filterAuthorizedTools previously fast-accepted any tool present in the global tool cache before reaching the MCP-use permission gate. App-level MCP tools (keyed `name_mcp_server` by MCPServerInspector and merged into the cache via mergeAppTools) therefore bypassed the canUseMCP check, letting a user without MCP_SERVERS.USE persist/bind them. Route all MCP-delimited tools through the permission + server-access gate regardless of cache presence. - assertImageDataWithinLimit / image formatter used startsWith("http") to skip the size cap, which also matched base64 payloads that happen to begin with those chars. Require http:// or https:// via a shared isRemoteImageUrl helper so oversized inline base64 can no longer bypass MCP_IMAGE_DATA_MAX_BYTES. Adds regression tests for both paths. * fix: address Codex round-2 findings on MCP permissions PR - parsers.ts: parseAsString dropped the image payload for unrecognized providers, returning only `Image result: <mimeType>`. Pre-PR these items survived via JSON.stringify(item). Keep the size guard but fall through to JSON.stringify so the data/URL is preserved. - MCP.js: the runtime MCP-use check only read `configurable.user`, so paths that propagate `user_id` only (e.g. the OpenAI-compatible API in agents/openai/service.ts) rejected every MCP tool call for an authenticated user. Add resolveMCPPermissionUser: use the safe user directly when it already carries a role (no extra DB call), otherwise fall back to loading the role by user_id. Update fail-closed tests to the resolved behavior. - v1.js: the update path only re-filtered newly added MCP tools, so a user who lost MCP_SERVERS.USE kept existing MCP bindings on edit while create/duplicate/revert stripped them. Strip all MCP tools on update when the permission is revoked; keep the narrower new-tool gating (and disconnect/registry preservation) when it is intact. Updates and adds regression tests for all three paths. * fix: populate safe user at producer instead of resolving in runtime MCP check Corrects the Finding B approach from the previous commit. Rather than loading the user by id inside the runtime MCP permission check, populate `configurable.user` (and createRun's `user`) with the full safe user at the producer, matching the in-repo agent controllers (responses.js / openai.js) which already pass `createSafeUser(req.user)`. - service.ts: derive `safeUser` via createSafeUser(req.user) and pass it to both createRun and processStream's configurable, so the role-bearing identity reaches the runtime `userCanUseMCPServers(configurable.user)` check. Falls back to a bare id when the host app attached no user, which correctly leaves MCP gated (fail closed). - MCP.js: revert the resolveMCPPermissionUser DB-load fallback; the runtime check again reads configurable.user directly and fails closed when absent (defense in depth). - MCP.spec.js: revert to the matching runtime test expectations. * test: cover safe-user propagation in createAgentChatCompletion Adds a focused spec for the OpenAI-compatible chat completion service (the producer fixed for Codex Finding B). Injects mocked deps and asserts that createRun and processStream's configurable.user carry the role from req.user (with sensitive fields stripped by createSafeUser), and that an unauthenticated request falls back to a bare { id: 'api-user' } so the runtime MCP check fails closed. * fix: address Codex round-3 findings + TS6133 - MCP.js (P1): the assistants required-action path invokes tool._call( toolInput) with no LangChain config, so the runtime check saw no configurable.user and rejected authorized users. createToolInstance now captures the creation-time user (req.user via createMCPTool) and _call falls back to it for both the permission check and userId. Still fails closed when neither config nor captured user carries a role. - v1.js (P2): the update-path isMCPTool used a bare mcp_delimiter substring check, misclassifying action tools whose operationId contains "_mcp_" (e.g. sync_mcp_state_action_...) as MCP and dropping them on a permission-revoked edit. Delegate to the canonical isActionTool so only real MCP tools are gated. Regression test added. - service.ts: drop the now-unused IUser import (TS6133); derive reqUser's type from createSafeUser's own parameter instead. * fix: resolve TS7022 self-reference in service.spec mock res The mock response object referenced `res` inside its own `status`/`json` initializers without a type annotation, so tsc inferred `res` as `any` (TS7022). Annotate the object and assign the self-referencing chainable methods after declaration. * fix: correct round-4 findings (isActionTool import, captured user, partial-update) - v1.js: import isActionTool from librechat-data-provider (its real export; @librechat/api does not export it, so the prior import was undefined and threw TypeError). Exclude action tools from MCP classification in both the main filterAuthorizedTools loop and the update path, so action tools whose operationId contains _mcp_ (e.g. sync_mcp_state_action_...) are preserved regardless of MCP permission. - v1.js: evaluate the effective tool set (updateData.tools ?? existingAgent.tools) so a tools-less PATCH by a user who lost MCP_SERVERS.USE still strips stale MCP bindings, matching create/duplicate/revert. - MCP.js: createToolInstance now receives the construction-time user and _call falls back to it (permissionUser) when configurable.user is absent, fixing the assistants required-action path that invokes _call without a config and resolving the capturedUser no-undef/ReferenceError. - Tests: action-tool preservation (authorized + denied), tools-less revocation PATCH, updated revocation test to expect all MCP tools stripped. Affected specs pass locally: MCP 49/49, filterAuthorizedTools 49/49. * fix: guard isActionTool against non-string tools; correct actionDelimiter import Two test regressions from the prior commit: - The main filterAuthorizedTools loop called isActionTool(tool) directly, but isActionTool does toolName.indexOf(...) and throws on null/undefined. Compute isActionToolName = typeof tool === 'string' && isActionTool(tool) once and reuse it, restoring graceful null/undefined handling. - The action-tool test referenced Constants.actionDelimiter (undefined); actionDelimiter is a standalone librechat-data-provider export. Import and use it directly. filterAuthorizedTools 36/36 and MCP 40/40 pass locally. * fix: address MCP permission review follow-ups * fix: preserve shared agent MCP tools	2026-05-30 16:19:49 -04:00
Max	1746153c17	🪬 fix: Skip MCP Tools When Required Custom User Vars Are Unset (#13152 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details GitNexus Index / index (push) Waiting to run Details GitNexus Index / post-index (push) Blocked by required conditions Details * fix: skip MCP tools when required customUserVars are unset (#10969) * fix: whitespace-only values Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix: guard MCP registry lookup and unknown server config in customUserVars gate * fix: fail closed on MCP registry lookup errors --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: Danny Avila <danny@librechat.ai>	2026-05-23 09:24:29 -04:00
Danny Avila	bd64251eb9	🪪 fix: Prevent MCP Server Name Collisions (#13256 ) * fix: prevent MCP server name collisions * chore: address MCP registry review nits * fix: reserve MCP config names from request context * chore: format MCP registry changes * chore: address MCP collision review findings	2026-05-22 20:46:14 -04:00
Danny Avila	749eb06e67	🧭 fix: Reduce MCP Registry ACL Lookups (#13195 )	2026-05-19 17:16:37 -04:00
Danny Avila	5bab22d236	🛡️ fix: Gate Bash PTC Capabilities (#13053 )	2026-05-10 21:23:02 -04:00
Danny Avila	715a4a5fc1	🧰 refactor: Use Bash PTC for Agent Tools (#13042 ) * fix: Use Bash PTC for programmatic agent tools * fix: Preserve legacy PTC event calls	2026-05-09 16:31:09 -04:00
Danny Avila	f839a447e1	🧬 fix: Subagent MCP requestBody Propagation (bump `@librechat/agents to 3.1.78` + cleanup) (#12959 ) * 📦 chore: bump `@librechat/agents` to v3.1.78 v3.1.78 ships [danny-avila/agents#147](https://github.com/danny-avila/agents/pull/147), which makes `SubagentExecutor` inherit the parent invocation's `configurable` (with `thread_id`/`run_id`/`parent_run_id` scrubbed) into the child workflow. Subagent tool dispatches through the parent's `ON_TOOL_EXECUTE` handler now arrive with parent's `requestBody`, `user`, `userMCPAuthMap`, etc. — so `{{LIBRECHAT_BODY_}}` placeholder substitution and per-user MCP connection lookup work for subagent tool calls the same way they do for the parent agent. Note: `package-lock.json` will need an `npm install` refresh once v3.1.78 lands on the registry. The user/user_id injection added in PR #12950 stays as defense-in-depth. 🗑️ refactor: drop redundant user/user_id injection from `loadToolsForExecution` `@librechat/agents@3.1.78` (via danny-avila/agents#147) makes `SubagentExecutor` forward the parent's `configurable` verbatim into the child workflow. Subagent `ON_TOOL_EXECUTE` dispatches now arrive with parent's `user` / `user_id` already in `data.configurable` — making the host-side injection added in #12950 a no-op. Removes: - The conditional `user: createSafeUser(req.user); user_id: req.user.id` block in `loadToolsForExecution` (req.user.id-guarded so the `'api-user'` fallback in Responses/OpenAI controllers is preserved). - The unused `createSafeUser` import. - The 4 unit tests covering the now-deleted behavior. The merge in `handlers.ts` (`{ ...configurable, ...toolConfigurable }`) still produces a `mergedConfigurable` with the right user identity for both parent and subagent paths — the values just come from `configurable` (forwarded by the SDK) rather than `toolConfigurable`. Other fixes from #12950 stay (IUser.id narrowing, the env.ts / google/initialize.ts / remoteAgentAuth.ts TS-warning fixes) — they were independent of the subagent identity propagation issue. * 📦 chore: update `@librechat/agents` to v3.1.78 This update reflects the transition from the development version `3.1.78-dev.0` to the stable release `3.1.78`. The package-lock.json has been refreshed to ensure consistency with the new version, including updated integrity checks and resolved URLs for the package. This change is part of ongoing improvements to enhance the functionality and stability of the agents module.	2026-05-05 22:07:26 -04:00
Danny Avila	bff9bfea87	🐛 fix: Propagate User Identity to Subagent MCP Tool Calls (#12950 ) * 🐛 fix: Propagate User Identity to Subagent MCP Tool Calls The `@librechat/agents` SDK's `SubagentExecutor` invokes the child workflow with a fresh configurable of `{ thread_id }` only — the parent's `user` / `user_id` are dropped on the way into the child graph. The child's `ToolNode` then dispatches `ON_TOOL_EXECUTE` to the parent's handler, which merges `{ ...configurable, ...toolConfigurable }`, but neither side carries user identity for subagents. Downstream MCP tools read `config.configurable.user?.id \|\| user_id` and got `undefined`, so `MCPManager.getConnection` fell through to the "No connection found for server X" error path — it can't reach the user-connection lookup without a userId. Re-inject `user` (via `createSafeUser`) and `user_id` from `req.user` into the configurable returned by `loadToolsForExecution`. This is the single point all controllers (chat, Responses API, OpenAI-compat) flow through. For the parent agent it's a no-op (outer config already carries the same values); for subagents it fills the gap so MCP connection lookup, user-placeholder substitution, and tools that read configurable.user all work correctly. * 🐛 fix: Preserve `api-user` Fallback When Injecting Subagent Identity Codex review pointed out that the prior commit unconditionally wrote `user_id: req.user?.id` (and `user`) into `toolConfigurable`. The handler merges via `{ ...configurable, ...toolConfigurable }` — `toolConfigurable` wins — so when `req.user` is absent, this overwrote the outer config's `'api-user'` fallback (set by `responses.js` / `openai.js` for the unauthenticated API-key path) with `undefined`, breaking MCP connection lookup for that path. Only inject the keys when `req.user.id` is truthy. Omitting them lets the merge preserve whatever the outer configurable already had. Tests updated to assert key omission for `req.user` undefined / null / present without `id`. * 🩹 fix: Narrow `IUser.id` to required string `IUser` extends mongoose `Document`, which types `id?: any` (the optional virtual). At runtime `id` is always `_id.toString()` for a hydrated doc, so narrow the type to a required string. Closes two `@rollup/plugin-typescript` TS2322 warnings introduced by PR #12450 (OIDC Bearer Token Authentication for Remote Agent API) where `req.user = userResolution.user` and the `(req: Request, res: Response, next: NextFunction)` signature both failed against the project's local `Express.User` augmentation (`{ [key: string]: any; id: string; }`) because `IUser.id` was `any`/optional. Narrowing here fixes both at the source rather than casting at every assignment site. * 🩹 fix: Resolve TS Build Warnings Surfaced by `IUser.id` Narrowing Three rollup TS plugin warnings surfaced after narrowing `IUser.id` from `any` to `string`: - `utils/env.ts:95` — `safeUser[field] = user[field]` failed strict checking because indexed write through a union-typed key collapses the LHS to the intersection of all field write types (i.e., `undefined` when fields have mixed types). The previous `id?: any` on IUser had been masking this. Switch to `Object.assign(safeUser, { [field]: user[field] })` which widens the assignment. - `endpoints/google/initialize.ts:35` — `getUserKey({ userId: req.user?.id, ... })` failed because `req.user?.id` is now `string \| undefined` (no longer `any`). Match the pattern already used in `endpoints/openAI/initialize.ts:49`: `req.user?.id ?? ''`. - `middleware/remoteAgentAuth.ts:465` — pre-existing, unrelated to the IUser change. The local (gitignored) `express.d.ts` augments `express.Request` but not `express-serve-static-core.Request`, so the explicit `(req: Request, ...)` annotation imported from `'express'` resolves to a Request whose `req.user` differs from the one `RequestHandler` expects internally. Type the closure as `RequestHandler` directly so TS infers params from the augmented type. * 🩹 fix: Cast `RemoteAgentAuth` Closure to `RequestHandler` My previous attempt removed the explicit `req: Request` annotation on the closure to side-step the outer `RequestHandler` mismatch. That shifted the error to every helper call site inside the closure (`getConfigOptions(req)`, `runApiKeyAuth(req, ...)` at 467/474/493/ 512/531), because the helpers annotate their params with `express.Request` (which has the local `Request.user` augmentation), while the unannotated closure inferred `req` as `express-serve-static-core.Request` (no augmentation). Reproduced locally by stubbing the gitignored `src/types/express.d.ts`. Right approach: keep the explicit `req: Request` annotation so the closure body matches the helpers' types, then cast at the return — `RequestHandler`'s internal `Request` resolves through `express-serve-static-core` and lacks the augmentation, so the cast is the boundary that bridges the two views of `req.user`. Verified against a build with the local express.d.ts stub: zero warnings on `remoteAgentAuth.ts`, `env.ts`, and `google/initialize.ts`.	2026-05-05 12:19:50 +09:00
Caesar Isidro Va-ay	9b9a86d17d	🔀 fix: Resolve Action Tools by Exact Name to Prevent Multi-Action Domain Collision (#12594 ) * 🐛 fix: resolve Action tools by exact tool name to prevent multi-action collision When two OpenAPI Actions on the same Agent share a hostname, the second action's entry overwrote the first in the encoded-domain Map and one action's tools silently disappeared from the LLM payload. The buggy resolution loop also used substring matching, which caused similar shadowing for any encoded-domain prefix overlap. This change builds a Map keyed on the full tool name (`<operationId>_action_<encoded-domain>`) directly, mirroring the exact lookup pattern that getActionToolDefinitions already uses. Each function in an action's spec gets its own slot, so two actions sharing a hostname no longer collide. Both the new and the legacy domain encodings are registered for each function so agents whose stored tool names predate the current encoding still resolve. Applied at all three call sites that had the buggy pattern: - processRequiredActions (assistants/threads path) - loadAgentTools (agent build path) - loadActionToolsForExecution (agent execution path) Adds three regression tests covering both ordering directions and the execution path. Tests fail without the fix and pass with it. * 🐛 fix: Normalize action tool name at lookup + cover assistants path Follow-up to the multi-action domain collision fix. Addresses PR #12594 review feedback: Must-fix #1 — short-hostname lookup mismatch. The toolToAction map is keyed on the `_`-collapsed domain, but `agent.tools` and `currentAction.tool` persist the raw `domainParser(..., true)` output, which for hostnames ≤ ENCODED_DOMAIN_LENGTH is a `---`-separated string (e.g. `medium---com`). Exact-match `Map.get()` missed those keys and silently dropped the tool. Fix: normalize every incoming tool name through a new `normalizeActionToolName` helper before the lookup in `loadAgentTools`, `processRequiredActions`, and `loadActionToolsForExecution`. Must-fix #2 — assistants path coverage. `processRequiredActions` received the same structural rewrite but had zero tests. Added a regression test under `multi-action domain collision regression` that drives two shared-hostname actions through the assistants path and asserts each tool reaches its own request builder. Must-fix #3 — legacy encoding branch coverage. The `if (legacyNormalized !== normalizedDomain)` registration was never exercised by any test. Added a test where `agent.tools` stores the legacy-format name and asserts it still resolves. Should-fix #4 — DRY the registration loop. Extracted `registerActionTools({ toolToAction, functionSignatures, normalizedDomain, legacyNormalized, makeEntry })`. All three call sites now share the same key-building logic; the key template lives in one place. Should-fix #5 — remove stale optional chaining. In `loadActionToolsForExecution`, `functionSignature?.description ?? ''` became `functionSignature.description` — `sig` is always defined by the iterator, matching the style of `loadAgentTools`. Should-fix #6 — drop unreachable `!requestBuilder` guard. Entries in `processRequiredActions` are now pre-built with `requestBuilder: requestBuilders[sig.name]`, which `openapiToFunction` always produces alongside the signature, so the guard is dead. Should-fix #7 — unwrap `actionSetsData`. It now holds a bare `Map` instead of `{ toolToAction }`; the sentinel `!actionSetsData` check still works because `new Map()` is truthy. Also added a short-hostname regression test (`loadAgentTools resolves raw ---separated tool names`) that reproduces Must-fix #1: it fails against the previous commit (0 create calls) and passes with the normalization in place. 41 tests, all passing. The 3 new regression tests are under `multi-action domain collision regression` and cover the assistants path, the legacy encoding branch, and the short-hostname lookup path. * 🐛 fix: Tighten registerActionTools key handling and assistants test Follow-up to `d643444` addressing the second review pass on PR #12594. ESLint — Two prettier errors in the spec file (multi-line arrow function bodies that should fit on one line). Auto-fixed. [MINOR] operationId containing `---` → key mismatch. The lookup path collapsed every `actionDomainSeparator` sequence in the full tool name, but the registration path passed `sig.name` through unchanged. A `---` that survived into an operationId would shift the underscore boundary at lookup and miss its own key. Fix in `registerActionTools`: normalize `sig.name` with the same helper so registration and lookup always agree on the canonical form. `sanitizeOperationId` strips the characters that produce `---` in practice, so this is theoretical hardening, not a fix for a known reproducer. [MINOR] Same-operationId + same-hostname silent overwrite. Two actions sharing both an operationId and a hostname still produced a silent `Map.set()` overwrite (the new key is identical, so neither the operationId nor the domain disambiguates). Added a `setKey` helper inside `registerActionTools` that logs a `[Actions] operationId collision: ...` warning whenever a key is already present, naming the overwriting action_id. The silent-overwrite mode from the original bug cannot reappear under a different disguise without surfacing in the logs. [NIT] processRequiredActions test simulated a runtime crash. `mockCreateActionTool` returned a tool with `_call: jest.fn()`, which resolves to `undefined`. `processRequiredActions` chains `.then(handleToolOutput).catch(handleToolError)` directly onto that return, so `undefined.then(...)` threw synchronously and the outer try/catch funneled the error into `handleToolError`. Creation count assertions still passed because `createActionTool` runs before the crash, but the test was silently exercising the failure path. Updated the global mock to `_call: jest.fn().mockResolvedValue('{"status":"ok"}')` so the success path runs end-to-end. The assistants regression test now executes in ~5ms instead of ~90ms, which corroborates that it's no longer hitting the synchronous throw. [NIT] Duplicated rationale comments. All three call sites carried multi-line comment blocks restating why we key on the full tool name. That rationale now lives canonically in `registerActionTools`'s JSDoc; the inline blocks collapsed to `// See registerActionTools for the key-shape rationale.` Net -22 lines of comments. 41/41 tests still pass; lint is clean. * 🐛 fix: Scope tool-name normalization to the encoded-domain suffix Follow-up to `f22228e` addressing the Codex P1 on PR #12594. Regression. The previous commit normalized the entire tool name (`normalizeActionToolName(sig.name)` at registration, full-name `.replace()` at lookup) to handle operationIds that theoretically contained `---`. But `openapiToFunction` uses user-supplied operationIds verbatim and the fallback `sanitizeOperationId` only strips characters outside `[a-zA-Z0-9_-]`, so specs can legitimately produce operationIds like `get_foo---bar` and `get_foo_bar` side by side. Collapsing `---` to `_` across the entire key merged those two into a single map slot — one silently overwrote the other, and both tool requests routed to the surviving entry's request builder. Fix. Limit normalization to the encoded-domain portion of the full tool name, i.e. the substring after the last `actionDelimiter`. The operationId half is left verbatim, so hyphens-vs-underscores remain disambiguating. The short-hostname bug (Must-fix #1 from the original review) is still covered because the `---` → `_` collapse still happens on the domain suffix where it matters: ```js const normalizeActionToolName = (toolName) => { const delimiterIndex = toolName.lastIndexOf(actionDelimiter); if (delimiterIndex === -1) return toolName; const prefixEnd = delimiterIndex + actionDelimiter.length; const encodedDomain = toolName.slice(prefixEnd); return toolName.slice(0, prefixEnd) + encodedDomain.replace(domainSeparatorRegex, '_'); }; ``` `registerActionTools` reverts to `sig.name` verbatim — no more `normalizeActionToolName(sig.name)` ahead of the key build. Regression test. Added `loadAgentTools distinguishes operationIds that differ only by ---` vs `_`` under `multi-action domain collision regression`. It loads two actions sharing a hostname whose operationIds are `get_foo---bar` and `get_foo_bar` respectively, each pointing at a different path (`/foo-bar`, `/foo_bar`), and asserts that each tool resolves to its own request builder. Verified the test fails against `f22228e` (`hyphenTool` resolves to `/foo_bar` — the sibling's builder) and passes with this commit. 42/42 tests pass; lint clean.	2026-04-13 09:08:06 -04:00
Danny Avila	275af48592	🎯 fix: MCP Tool Misclassification from Action Delimiter Collision (#12512 ) * fix: prevent MCP tools with `_action` in name from being misclassified as OpenAPI action tools Add `isActionTool()` helper that checks for the `_action_` delimiter while guarding against cross-delimiter collision with `_mcp_`. Replace all `includes(actionDelimiter)` classification checks with the new helper across backend and frontend. * test: add coverage for MCP/action cross-delimiter collision Verify that `isActionTool` correctly rejects MCP tool names containing `_action` and that `loadAgentTools` does not filter them based on `actionsEnabled`. Add ToolIcon and definitions test cases. * fix: simplify isActionTool to handle all MCP name patterns - Use `!toolName.includes('_mcp_')` instead of checking only after the first `_action_` occurrence, which missed MCP tools with `_action_` in the middle of their name (e.g. `get_action_data_mcp_myserver`). - Reference `Constants.mcp_delimiter` value via a local const to avoid circular import from config.ts, with a comment explaining why. - Remove dead `actionDelimiter` import from definitions.ts. - Replace double-filter with single-pass partition in loadToolsForExecution. - Add test for mid-name `_action_` collision case. * fix: narrow MCP exclusion to delimiter position in isActionTool Only reject when `_mcp_` appears after `_action_` (the MCP suffix position). `_mcp_` before `_action_` is part of the operationId and is valid — e.g. `sync_mcp_state_action_api---example---com` is a legitimate action tool whose operationId happens to contain `_mcp_`. * fix: document positional _mcp_ guard and known RFC-invalid domain limitation Expand JSDoc on isActionTool to explain the action/MCP format disambiguation and the theoretical false negative for non-RFC-compliant domains containing `_mcp_`. Add test documenting this known edge case.	2026-04-01 22:36:21 -04:00
Danny Avila	935288f841	🏗️ feat: 3-Tier MCP Server Architecture with Config-Source Lazy Init (#12435 ) * feat: add MCPServerSource type, tenantMcpPolicy schema, and source-based dbSourced wiring - Add `tenantMcpPolicy` to `mcpSettings` in YAML config schema with `enabled`, `maxServersPerTenant`, `allowedTransports`, and `allowedDomains` - Add `MCPServerSource` type ('yaml' \| 'config' \| 'user') and `source` field to `ParsedServerConfig` - Change `dbSourced` determination from `!!config.dbId` to `config.source === 'user'` across MCPManager, ConnectionsRepository, UserConnectionManager, and MCPServerInspector - Set `source: 'user'` on all DB-sourced servers in ServerConfigsDB * feat: three-layer MCPServersRegistry with config cache and lazy init - Add `configCacheRepo` as third repository layer between YAML cache and DB for admin-defined config-source MCP servers - Implement `ensureConfigServers()` that identifies config-override servers from resolved `getAppConfig()` mcpConfig, lazily inspects them, and caches parsed configs with `source: 'config'` - Add `lazyInitConfigServer()` with timeout, stub-on-failure, and concurrent-init deduplication via `pendingConfigInits` map - Extend `getAllServerConfigs()` with optional `configServers` param for three-way merge: YAML → Config → User - Add `getServerConfig()` lookup through config cache layer - Add `invalidateConfigCache()` for clearing config-source inspection results on admin config mutations - Tag `source: 'yaml'` on CACHE-stored servers and `source: 'user'` on DB-stored servers in `addServer()` and `addServerStub()` * feat: wire tenant context into MCP controllers, services, and cache invalidation - Resolve config-source servers via `getAppConfig({ role, tenantId })` in `getMCPTools()` and `getMCPServersList()` controllers - Pass `ensureConfigServers()` results through `getAllServerConfigs()` for three-way merge of YAML + Config + User servers - Add tenant/role context to `getMCPSetupData()` and connection status routes via `getTenantId()` from ALS - Add `clearMcpConfigCache()` to `invalidateConfigCaches()` so admin config mutations trigger re-inspection of config-source MCP servers * feat: enforce tenantMcpPolicy on admin config mcpServers mutations - Add `validateMcpServerPolicy()` helper that checks mcpServers against operator-defined `tenantMcpPolicy` (enabled, maxServersPerTenant, allowedTransports, allowedDomains) - Wire validation into `upsertConfigOverrides` and `patchConfigField` handlers — rejects with 403 when policy is violated - Infer transport type from config shape (command → stdio, url protocol → websocket/sse, type field → streamable-http) - Validate server domains against policy allowlist when configured * revert: remove tenantMcpPolicy schema and enforcement The existing admin config CRUD routes already provide the mechanism for granular MCP server prepopulation (groups, roles, users). The tenantMcpPolicy gating adds unnecessary complexity that can be revisited if needed in the future. - Remove tenantMcpPolicy from mcpSettings Zod schema - Remove validateMcpServerPolicy helper and TenantMcpPolicy interface - Remove policy enforcement from upsertConfigOverrides and patchConfigField handlers * test: update test assertions for source field and config-server wiring - Use objectContaining in MCPServersRegistry reset test to account for new source: 'yaml' field on CACHE-stored configs - Add getTenantId and ensureConfigServers mocks to MCP route tests - Add getAppConfig mock to route test Config service mock - Update getMCPSetupData assertion to expect second options argument - Update getAllServerConfigs assertions for new configServers parameter * fix: disconnect active connections when config-source servers are evicted When admin config overrides change and config-source MCP servers are removed, the invalidation now proactively disconnects active connections for evicted servers instead of leaving them lingering until timeout. - Return evicted server names from invalidateConfigCache() - Disconnect app-level connections for evicted servers in clearMcpConfigCache() via MCPManager.appConnections.disconnect() * fix: address code review findings (CRITICAL, MAJOR, MINOR) CRITICAL fixes: - Scope configCacheRepo keys by config content hash to prevent cross-tenant cache poisoning when two tenants define the same server name with different configurations - Change dbSourced checks from `source === 'user'` to `source !== 'yaml' && source !== 'config'` so undefined source (pre-upgrade cached configs) fails closed to restricted mode MAJOR fixes: - Derive OAuth servers from already-computed mcpConfig instead of calling getOAuthServers() separately — config-source OAuth servers are now properly detected - Add parseInt radix (10) and NaN guard with fallback to 30_000 for CONFIG_SERVER_INIT_TIMEOUT_MS - Add CONFIG_CACHE_NAMESPACE to aggregate-key branch in ServerConfigsCacheFactory to avoid SCAN-based Redis stalls - Remove `if (role \|\| tenantId)` guard in getMCPSetupData — config servers now always resolve regardless of tenant context MINOR fixes: - Extract resolveAllMcpConfigs() helper in mcp controller to eliminate 3x copy-pasted config resolution boilerplate - Distinguish "not initialized" from real errors in clearMcpConfigCache — log actual failures instead of swallowing - Remove narrative inline comments per style guide - Remove dead try/catch inside Promise.allSettled in ensureConfigServers (inner method never throws) - Memoize YAML server names to avoid repeated cacheConfigsRepo.getAll() calls per request Test updates: - Add ensureConfigServers mock to registry test fixtures - Update getMCPSetupData assertions for inline OAuth derivation * fix: address code review findings (CRITICAL, MAJOR, MINOR) CRITICAL fixes: - Break circular dependency: move CONFIG_CACHE_NAMESPACE from MCPServersRegistry to ServerConfigsCacheFactory - Fix dbSourced fail-closed: use source field when present, fall back to legacy dbId check when absent (backward-compatible with pre-upgrade cached configs that lack source field) MAJOR fixes: - Add CONFIG_CACHE_NAMESPACE to aggregate-key set in ServerConfigsCacheFactory to avoid SCAN-based Redis stalls - Add comprehensive test suite (ensureConfigServers.test.ts, 18 tests) covering lazy init, stub-on-failure, cross-tenant isolation via config hash keys, concurrent deduplication, merge order, and cache invalidation MINOR fixes: - Update MCPServerInspector test assertion for dbSourced change * fix: restore getServerConfig lookup for config-source servers (NEW-1) Add configNameToKey map that indexes server name → hash-based cache key for O(1) lookup by name in getServerConfig. This restores the config cache layer that was dropped when hash-based keys were introduced. Without this fix, config-source servers appeared in tool listings (via getAllServerConfigs) but getServerConfig returned undefined, breaking all connection and tool call paths. - Populate configNameToKey in ensureSingleConfigServer - Clear configNameToKey in invalidateConfigCache and reset - Clear stale read-through cache entries after lazy init - Remove dead code in invalidateConfigCache (config.title, key parsing) - Add getServerConfig tests for config-source server lookup * fix: eliminate configNameToKey race via caller-provided configServers param Replace the process-global configNameToKey map (last-writer-wins under concurrent multi-tenant load) with a configServers parameter on getServerConfig. Callers pass the pre-resolved config servers map directly — no shared mutable state, no cross-tenant race. - Add optional configServers param to getServerConfig; when provided, returns matching config directly without any global lookup - Remove configNameToKey map entirely (was the source of the race) - Extract server names from cache keys via lastIndexOf in invalidateConfigCache (safe for names containing colons) - Use mcpConfig[serverName] directly in getMCPTools instead of a redundant getServerConfig call - Add cross-tenant isolation test for getServerConfig * fix: populate read-through cache after config server lazy init After lazyInitConfigServer succeeds, write the parsed config to readThroughCache keyed by serverName so that getServerConfig calls from ConnectionsRepository, UserConnectionManager, and MCPManager.callTool find the config without needing configServers. Without this, config-source servers appeared in tool listings but every connection attempt and tool call returned undefined. * fix: user-scoped getServerConfig fallback to server-only cache key When getServerConfig is called with a userId (e.g., from callTool or UserConnectionManager), the cache key is serverName::userId. Config-source servers are cached under the server-only key (no userId). Add a fallback so user-scoped lookups find config-source servers in the read-through cache. * fix: configCacheRepo fallback, isUserSourced DRY, cross-process race CRITICAL: Add findInConfigCache fallback in getServerConfig so config-source servers remain reachable after readThroughCache TTL expires (5s). Without this, every tool call after 5s returned undefined for config-source servers. MAJOR: Extract isUserSourced() helper to mcp/utils.ts and replace all 5 inline dbSourced ternary expressions (MCPManager x2, ConnectionsRepository, UserConnectionManager, MCPServerInspector). MAJOR: Fix cross-process Redis race in lazyInitConfigServer — when configCacheRepo.add throws (key exists from another process), fall back to reading the existing entry instead of returning undefined. MINOR: Parallelize invalidateConfigCache awaits with Promise.all. Remove redundant .catch(() => {}) inside Promise.allSettled. Tighten dedup test assertion to toBe(1). Add TTL-expiry tests for getServerConfig (with and without userId). * feat: thread configServers through getAppToolFunctions and formatInstructionsForContext Add optional configServers parameter to getAppToolFunctions, getInstructions, and formatInstructionsForContext so config-source server tools and instructions are visible to agent initialization and context injection paths. Existing callers (boot-time init, tests) pass no argument and continue to work unchanged. Agent runtime paths can now thread resolved config servers from request context. * fix: stale failure stubs retry after 5 min, upsert for cross-process races - Add CONFIG_STUB_RETRY_MS (5 min) — stale failure stubs are retried instead of permanently disabling config-source servers after transient errors (DNS outage, cold-start race) - Extract upsertConfigCache() helper that tries add then falls back to update, preventing cross-process Redis races where a second instance's successful inspection result was discarded - Add test for stale-stub retry after CONFIG_STUB_RETRY_MS * fix: stamp updatedAt on failure stubs, null-guard callTool config, test cleanup - Add updatedAt: Date.now() to failure stubs in lazyInitConfigServer so CONFIG_STUB_RETRY_MS (5 min) window works correctly — without it, stubs were always considered stale (updatedAt ?? 0 → epoch → always expired) - Add null guard for rawConfig in MCPManager.callTool before passing to preProcessGraphTokens — prevents unsafe `as` cast on undefined - Log double-failure in upsertConfigCache instead of silently swallowing - Replace module-scope Date.now monkey-patch with jest.useFakeTimers / jest.setSystemTime / jest.useRealTimers in ensureConfigServers tests * fix: server-only readThrough fallback only returns truthy values Prevents a cached undefined from a prior no-userId lookup from short-circuiting the DB query on a subsequent userId-scoped lookup. * fix: remove findInConfigCache to eliminate cross-tenant config leakage The findInConfigCache prefix scan (serverName:) could return any tenant's config after readThrough TTL expires, violating tenant isolation. Config-source servers are now ONLY resolvable through: 1. The configServers param (callers with tenant context from ALS) 2. The readThrough cache (populated by ensureSingleConfigServer, 5s TTL, repopulated on every HTTP request via resolveAllMcpConfigs) Connection/tool-call paths without tenant context rely exclusively on the readThrough cache. If it expires before the next HTTP request repopulates it, the server is not found — which is correct because there is no tenant context to determine which config to return. - Remove findInConfigCache method and its call in getServerConfig - Update server-only readThrough fallback to only return truthy values (prevents cached undefined from short-circuiting user-scoped DB lookup) - Update tests to document tenant isolation behavior after cache expiry style: fix import order per AGENTS.md conventions Sort package imports shortest-to-longest, local imports longest-to-shortest across MCPServersRegistry, ConnectionsRepository, MCPManager, UserConnectionManager, and MCPServerInspector. * fix: eliminate cross-tenant readThrough contamination and TTL-expiry tool failures Thread pre-resolved serverConfig from tool creation context into callTool, removing dependency on the readThrough cache for config-source servers. This fixes two issues: - Cross-tenant contamination: the readThrough cache key was unscoped (just serverName), so concurrent multi-tenant requests for same-named servers would overwrite each other's entries - TTL expiry: tool calls happening >5s after config resolution would fail with "Configuration not found" because the readThrough entry had expired Changes: - Add optional serverConfig param to MCPManager.callTool — uses provided config directly, falling back to getServerConfig lookup for YAML/user servers - Thread serverConfig from createMCPTool through createToolInstance closure to callTool - Remove readThrough write from ensureSingleConfigServer — config-source servers are only accessible via configServers param (tenant-scoped) - Remove server-only readThrough fallback from getServerConfig - Increase config cache hash from 8 to 16 hex chars (64-bit) - Add isUserSourced boundary tests for all source/dbId combinations - Fix double Object.keys call in getMCPTools controller - Update test assertions for new getServerConfig behavior * fix: cache base configs for config-server users; narrow upsertConfigCache error handling - Refactor getAllServerConfigs to separate base config fetch (YAML + DB) from config-server layering. Base configs are cached via readThroughCacheAll regardless of whether configServers is provided, eliminating uncached MongoDB queries per request for config-server users - Narrow upsertConfigCache catch to duplicate-key errors only; infrastructure errors (Redis timeouts, network failures) now propagate instead of being silently swallowed, preventing inspection storms during outages * fix: restore correct merge order and document upsert error matching - Restore YAML → Config → User DB precedence in getAllServerConfigs (user DB servers have highest precedence, matching the JSDoc contract) - Add source comment on upsertConfigCache duplicate-key detection linking to the two cache implementations that define the error message * feat: complete config-source server support across all execution paths Wire configServers through the entire agent execution pipeline so config-source MCP servers are fully functional — not just visible in listings but executable in agent sessions. - Thread configServers into handleTools.js agent tool pipeline: resolve config servers from tenant context before MCP tool iteration, pass to getServerConfig, createMCPTools, and createMCPTool - Thread configServers into agent instructions pipeline: applyContextToAgent → getMCPInstructionsForServers → formatInstructionsForContext, resolved in client.js before agent context application - Add configServers param to createMCPTool and createMCPTools for reconnect path fallback - Add source field to redactServerSecrets allowlist for client UI differentiation of server tiers - Narrow invalidateConfigCache to only clear readThroughCacheAll (merged results), preserving YAML individual-server readThrough entries - Update context.spec.ts assertions for new configServers parameter * fix: add missing mocks for config-source server dependencies in client.test.js Mock getMCPServersRegistry, getAppConfig, and getTenantId that were added to client.js but not reflected in the test file's jest.mock declarations. * fix: update formatInstructionsForContext assertions for configServers param The test assertions expected formatInstructionsForContext to be called with only the server names array, but it now receives configServers as a second argument after the config-source server feature wiring. * fix: move configServers resolution before MCP tool loop to avoid TDZ configServers was declared with `let` after the first tool loop but referenced inside it via getServerConfig(), causing a ReferenceError temporal dead zone. Move declaration and resolution before the loop, using tools.some(mcpToolPattern) to gate the async resolution. * fix: address review findings — cache bypass, discoverServerTools gap, DRY - #2: getAllServerConfigs now always uses getBaseServerConfigs (cached via readThroughCacheAll) instead of bypassing it when configServers is present. Extracts user-DB entries from cached base by diffing against YAML keys to maintain YAML → Config → User DB merge order without extra MongoDB calls. - #3: Add configServers param to ToolDiscoveryOptions and thread it through discoverServerTools → getServerConfig so config-source servers are discoverable during OAuth reconnection flows. - #6: Replace inline import() type annotations in context.ts with proper import type { ParsedServerConfig } per AGENTS.md conventions. - #7: Extract resolveConfigServers(req) helper in MCP.js and use it from handleTools.js and client.js, eliminating the duplicated 6-line config resolution pattern. - #10: Restore removed "why" comment explaining getLoaded() vs getAll() choice in getMCPSetupData — documents non-obvious correctness constraint. - #11: Fix incomplete JSDoc param type on resolveAllMcpConfigs. * fix: consolidate imports, reorder constants, fix YAML-DB merge edge case - Merge duplicate @librechat/data-schemas requires in MCP.js into one - Move resolveConfigServers after module-level constants - Fix getAllServerConfigs edge case where user-DB entry overriding a YAML entry with the same name was excluded from userDbConfigs; now uses reference equality check to detect DB-overwritten YAML keys * fix: replace fragile string-match error detection with proper upsert method Add upsert() to IServerConfigsRepositoryInterface and all implementations (InMemory, Redis, RedisAggregateKey, DB). This eliminates the brittle error message string match ('already exists in cache') in upsertConfigCache that was the only thing preventing cross-process init races from silently discarding inspection results. Each implementation handles add-or-update atomically: - InMemory: direct Map.set() - Redis: direct cache.set() - RedisAggregateKey: read-modify-write under write lock - DB: delegates to update() (DB servers use explicit add() with ACL setup) * fix: wire configServers through remaining HTTP endpoints - getMCPServerById: use resolveAllMcpConfigs instead of bare getServerConfig - reinitialize route: resolve configServers before getServerConfig - auth-values route: resolve configServers before getServerConfig - getOAuthHeaders: accept configServers param, thread from callers - Update mcp.spec.js tests to mock getAllServerConfigs for GET by name * fix: thread serverConfig through getConnection for config-source servers Config-source servers exist only in configCacheRepo, not in YAML cache or DB. When callTool → getConnection → getUserConnection → getServerConfig runs without configServers, it returns undefined and throws. Fix by threading the pre-resolved serverConfig (providedConfig) from callTool through getConnection → getUserConnection → createUserConnectionInternal, using it as a fallback before the registry lookup. * fix: thread configServers through reinit, reconnect, and tool definition paths Wire configServers through every remaining call chain that creates or reconnects MCP server connections: - reinitMCPServer: accepts serverConfig and configServers, uses them for getServerConfig fallback, getConnection, and discoverServerTools - reconnectServer: accepts and passes configServers to reinitMCPServer - createMCPTools/createMCPTool: pass configServers to reconnectServer - ToolService.loadToolDefinitionsWrapper: resolves configServers from req, passes to both reinitMCPServer call sites - reinitialize route: passes serverConfig and configServers to reinitMCPServer * fix: address review findings — simplify merge, harden error paths, fix log labels - Simplify getAllServerConfigs merge: replace fragile reference-equality loop with direct spread { ...yamlConfigs, ...configServers, ...base } - Guard upsertConfigCache in lazyInitConfigServer catch block so cache failures don't mask the original inspection error - Deduplicate getYamlServerNames cold-start with promise dedup pattern - Remove dead `if (!mcpConfig)` guard in getMCPSetupData - Fix hardcoded "App server" in ServerConfigsCacheRedisAggregateKey error messages — now uses this.namespace for correct Config/App labeling - Remove misleading OAuth callback comment about readThrough cache - Move resolveConfigServers after module-level constants in MCP.js * fix: clear rejected yamlServerNames promise, fix config-source reinspect, fix reset log label - Clear yamlServerNamesPromise on rejection so transient cache errors don't permanently prevent ensureConfigServers from working - Skip reinspectServer for config-source servers (source: 'config') in reinitMCPServer — they lack a CACHE/DB storage location; retry is handled by CONFIG_STUB_RETRY_MS in ensureConfigServers - Use source field instead of dbId for storageLocation derivation - Fix remaining hardcoded "App" in reset() leaderCheck message * fix: persist oauthHeaders in flow state for config-source OAuth servers The OAuth callback route has no JWT auth context and cannot resolve config-source server configs. Previously, getOAuthHeaders would silently return {} for config-source servers, dropping custom token exchange headers. Now oauthHeaders are persisted in MCPOAuthFlowMetadata during flow initiation (which has auth context), and the callback reads them from the stored flow state with a fallback to the registry lookup for YAML/user-DB servers. * fix: update tests for getMCPSetupData null guard removal and ToolService mock - MCP.spec.js: update test to expect graceful handling of null mcpConfig instead of a throw (getAllServerConfigs always returns an object) - MCP.js: add defensive \|\| {} for Object.entries(mcpConfig) in case of null from test mocks - ToolService.spec.js: add missing mock for ~/server/services/MCP (resolveConfigServers) * fix: address review findings — DRY, naming, logging, dead code, defensive guards - #1: Simplify getAllServerConfigs to single getBaseServerConfigs call, eliminating redundant double-fetch of cacheConfigsRepo.getAll() - #2: Add warning log when oauthHeaders absent from OAuth callback flow state - #3: Extract resolveAllMcpConfigs to MCP.js service layer; controller imports shared helper instead of reimplementing - #4: Rename _serverConfig/_provider to capturedServerConfig/capturedProvider in createToolInstance — these are actively used, not unused - #5: Log rejected results from ensureConfigServers Promise.allSettled so cache errors are visible instead of silently dropped - #6: Remove dead 'MCP config not found' error handlers from routes - #7: Document circular-dependency reason for dynamic require in clearMcpConfigCache - #8: Remove logger.error from withTimeout to prevent double-logging timeouts - #10: Add explicit userId guard in ServerConfigsDB.upsert with clear error message - #12: Use spread instead of mutation in addServer for immutability consistency - Add upsert mock to ensureConfigServers.test.ts DB mock - Update route tests for resolveAllMcpConfigs import change * fix: restore correct merge priority, use immutable spread, fix test mock - getAllServerConfigs: { ...configServers, ...base } so userDB wins over configServers, matching documented "User DB (highest)" priority - lazyInitConfigServer: use immutable spread instead of direct mutation for parsedConfig.source, consistent with addServer fix - Fix test to mock getAllServerConfigs as {} instead of null, remove unnecessary \|\| {} defensive guard in getMCPSetupData * fix: error handling, stable hashing, flatten nesting, remove dead param - Wrap resolveConfigServers/resolveAllMcpConfigs in try/catch with graceful {} fallback so transient DB/cache errors don't crash tool pipeline - Sort keys in configCacheKey JSON.stringify for deterministic hashing regardless of object property insertion order - Flatten clearMcpConfigCache from 3 nested try-catch to early returns; document that user connections are cleaned up lazily (accepted tradeoff) - Remove dead configServers param from getAppToolFunctions (never passed) - Add security rationale comment for source field in redactServerSecrets * fix: use recursive key-sorting replacer in configCacheKey to prevent cross-tenant cache collision The array replacer in JSON.stringify acts as a property allowlist at every nesting depth, silently dropping nested keys like headers['X-API-Key'], oauth.client_secret, etc. Two configs with different nested values but identical top-level structure produced the same hash, causing cross-tenant cache hits and potential credential contamination. Switch to a function replacer that recursively sorts keys at all depths without dropping any properties. Also document the known gap in getOAuthServers: config-source OAuth servers are not covered by auto-reconnection or uninstall cleanup because callers lack request context. * fix: move clearMcpConfigCache to packages/api to eliminate circular dependency The function only depends on MCPServersRegistry and MCPManager, both of which live in packages/api. Import it directly from @librechat/api in the CJS layer instead of using dynamic require('~/config'). * chore: imports/fields ordering * fix: address review findings — error handling, targeted lookup, test gaps - Narrow resolveAllMcpConfigs catch to only wrap ensureConfigServers so getAppConfig/getAllServerConfigs failures propagate instead of masking infrastructure errors as empty server lists. - Use targeted getServerConfig in getMCPServerById instead of fetching all server configs for a single-server lookup. - Forward configServers to inner createMCPTool calls so reconnect path works for config-source servers. - Update getAllServerConfigs JSDoc to document disjoint-key design. - Add OAuth callback oauthHeaders fallback tests (flow state present vs registry fallback). - Add resolveConfigServers/resolveAllMcpConfigs unit tests covering happy path and error propagation. * fix: add getOAuthReconnectionManager mock to OAuth callback tests * chore: imports ordering	2026-03-28 10:36:43 -04:00
Danny Avila	6f87b49df8	🛂 fix: Enforce Actions Capability Gate Across All Event-Driven Tool Loading Paths (#12252 ) * fix: gate action tools by actions capability in all code paths Extract resolveAgentCapabilities helper to eliminate 3x-duplicated capability resolution. Apply early action-tool filtering in both loadToolDefinitionsWrapper and loadAgentTools non-definitions path. Gate loadActionToolsForExecution in loadToolsForExecution behind an actionsEnabled parameter with a cache-based fallback. Replace the late capability guard in loadAgentTools with a hasActionTools check to avoid unnecessary loadActionSets DB calls and duplicate warnings. * fix: thread actionsEnabled through InitializedAgent type Add actionsEnabled to the loadTools callback return type, InitializedAgent, and the initializeAgent destructuring/return so callers can forward the resolved value to loadToolsForExecution without redundant getEndpointsConfig cache lookups. * fix: pass actionsEnabled from callers to loadToolsForExecution Thread actionsEnabled through the agentToolContexts map in initialize.js (primary and handoff agents) and through primaryConfig in the openai.js and responses.js controllers, avoiding per-tool-call capability re-resolution on the hot path. * test: add regression tests for action capability gating Test the real exported functions (resolveAgentCapabilities, loadAgentTools, loadToolsForExecution) with mocked dependencies instead of shadow re-implementations. Covers definition filtering, execution gating, actionsEnabled param forwarding, and fallback capability resolution. * test: use Constants.EPHEMERAL_AGENT_ID in ephemeral fallback test Replaces a string guess with the canonical constant to avoid fragility if the ephemeral detection heuristic changes. * fix: populate agentToolContexts for addedConvo parallel agents After processAddedConvo returns, backfill agentToolContexts for any agents in agentConfigs not already present, so ON_TOOL_EXECUTE for added-convo agents receives actionsEnabled instead of falling back to a per-call cache lookup.	2026-03-15 23:01:36 -04:00
Danny Avila	d3c06052d7	🗝️ feat: Credential Variables for DB-Sourced MCP Servers (#12044 ) * feat: Allow Credential Variables in Headers for DB-sourced MCP Servers - Removed the hasCustomUserVars check from ToolService.js, directly retrieving userMCPAuthMap. - Updated MCPConnectionFactory and related classes to include a dbSourced flag for better handling of database-sourced configurations. - Added integration tests to ensure proper behavior of dbSourced servers, verifying that sensitive placeholders are not resolved while allowing customUserVars. - Adjusted various MCP-related files to accommodate the new dbSourced logic, ensuring consistent handling across the codebase. * chore: MCPConnectionFactory Tests with Additional Flow Metadata for typing - Updated MCPConnectionFactory tests to include new fields in flowMetadata: serverUrl and state. - Enhanced mockFlowData in multiple test cases to reflect the updated structure, ensuring comprehensive coverage of the OAuth flow scenarios. - Added authorization_endpoint to metadata in the test setup for improved validation of the OAuth process. * refactor: Simplify MCPManager Configuration Handling - Removed unnecessary type assertions and streamlined the retrieval of server configuration in MCPManager. - Enhanced the handling of OAuth and database-sourced flags for improved clarity and efficiency. - Updated tests to reflect changes in user object structure and ensure proper processing of MCP environment variables. * refactor: Optimize User MCP Auth Map Retrieval in ToolService - Introduced conditional loading of userMCPAuthMap based on the presence of MCP-delimited tools, improving efficiency by avoiding unnecessary calls. - Updated the loadToolDefinitionsWrapper and loadAgentTools functions to reflect this change, enhancing overall performance and clarity. * test: Add userMCPAuthMap gating tests in ToolService - Introduced new tests to validate the logic for determining if MCP tools are present in the agent's tool list. - Implemented various scenarios to ensure accurate detection of MCP tools, including edge cases for empty, undefined, and null tool lists. - Enhanced clarity and coverage of the ToolService capability checking logic. * refactor: Enhance MCP Environment Variable Processing - Simplified the handling of the dbSourced parameter in the processMCPEnv function. - Introduced a failsafe mechanism to derive dbSourced from options if not explicitly provided, improving robustness and clarity in MCP environment variable processing. * refactor: Update Regex Patterns for Credential Placeholders in ServerConfigsDB - Modified regex patterns to include additional credential/env placeholders that should not be allowed in user-provided configurations. - Clarified comments to emphasize the security risks associated with credential exfiltration when MCP servers are shared between users. * chore: field order * refactor: Clean Up dbSourced Parameter Handling in processMCPEnv - Reintroduced the failsafe mechanism for deriving the dbSourced parameter from options, ensuring clarity and robustness in MCP environment variable processing. - Enhanced code readability by maintaining consistent comment structure. * refactor: Update MCPOptions Type to Include Optional dbId - Modified the processMCPEnv function to extend the MCPOptions type, allowing for an optional dbId property. - Simplified the logic for deriving the dbSourced parameter by directly checking the dbId property, enhancing code clarity and maintainability.	2026-03-03 18:02:37 -05:00
Danny Avila	3d98194a99	🦥 feat: Add Deferred Tools as Agents Capability (#11295 )	2026-01-28 17:44:30 -05:00

19 commits