LibreChat

mirror of https://github.com/danny-avila/LibreChat.git synced 2026-07-02 20:32:58 +00:00

Author	SHA1	Message	Date
Danny Avila	e807c63d5d	🔐 fix: Gate Shared Startup Config By Link Access (#13897 ) * fix: gate shared startup config by link access * fix: satisfy shared config CI checks * fix: align shared config client types * fix: reject expired shared link access	2026-06-23 08:28:37 -04:00
Danny Avila	e515063ffe	🔗 feat: Snapshot Files for Shared-Link Attachments (#13740 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details GitNexus Index / index (push) Waiting to run Details GitNexus Index / post-index (push) Blocked by required conditions Details * 🔗 feat: Snapshot Files for Shared-Link Attachments Shared-link viewers could read a shared conversation snapshot but not its attachments: file preview/download still went through the owner-scoped file ACL (the /api/files router sits behind requireJwtAuth + owner/agent checks), so anonymous viewers got 401s and authenticated non-owners got 403s — the repeated `[fileAccess] denied` warnings seen for the preview poller. Capture an immutable per-share file snapshot (embedded on the SharedLink document, referencing the original stored object — no byte copy) at share create/update, and serve those files through new share-scoped routes authorized by the existing shared-link view permission (public/ACL) plus snapshot membership, never the owner's live file ACL. - data-schemas: fileSnapshots on the share doc; capture in create/update; read-time rewrite of filepath/preview to /api/share/:id/files/:fileId; getSharedLinkFile + lazy backfillSharedLinkFiles for legacy links - api: GET /api/share/:shareId/files/:file_id[/download\|/preview]; route context added to fileAccess denial logs - packages/api: isFileSnapshotEnabled resolver (env + yaml) - data-provider: interface.sharedLinks.snapshotFiles (default on) + client endpoints/services - client: ShareContext.shareId wired to Image, preview hook, and downloads - config: SHARED_LINKS_SNAPSHOT_FILES env override (default on) * 🔒 fix: Address Codex review on shared-link file snapshots Triage of the Codex review on PR #13740 (2 P1, 7 P2 — all valid): - P1 (cross-user access): scope the snapshot lookup to the sharing user's own files so a message referencing another user's file_id can't widen access. - P1 (stored XSS): the inline share-file route now serves only safe preview types inline (raster images/pdf); everything else is forced to attachment with X-Content-Type-Options: nosniff. - Stream shared downloads by default; redirect to a signed URL only on ?direct=true (blob/XHR callers work without bucket CORS). - Read preview status live from the file record (always current for deferred previews) and stop embedding extracted text in the share doc (16MB-limit risk). - Only lazily backfill when the fileSnapshots field is absent (legacy), not on every snapshot miss. - Backfill legacy shares before rewriting message URLs, and gate URL rewriting to public shares so non-public (ACL) shares keep prior behavior (img/anchor can't carry the bearer token). - Frontend: only route a download through the share path when the file was actually snapshotted (rewritten href / filepath), else fall back. * 🔑 feat: Authorize shared-link files for non-public shares via cookie Extends shared-link file access to non-public (ACL) shares (Codex finding 5). `<img>`/anchor requests can't carry the bearer access token, so non-public shares previously 401'd on file loads. Add an optional cookie-auth fallback on the share file routes that resolves the viewer from the `refreshToken` cookie (or signed `openid_user_id` cookie) — the same mechanism secure image links use (validateImageRequest) — then let canAccessSharedLink run the viewer's ACL check. - new middleware optionalShareFileAuth (+ unit spec); applied to the three share file routes after optionalJwtAuth - URL rewriting in getSharedMessages is no longer gated to public shares (the route now authorizes header-less requests), so files work uniformly across public and non-public shares; revert the now-unused req.sharePublic plumbing * 🔒 fix: Second Codex pass on shared-link file snapshots Addresses the follow-up Codex findings on PR #13740: - Don't snapshot transient text-source files: FileSources.text filepaths are Multer temp paths the upload route deletes, so they can't be streamed — removed from the streamable allowlist. - Unset stale snapshots on a disabled-feature update: updateSharedLink now $unsets fileSnapshots when snapshotFiles is false, so an opted-out update can't keep serving file ids the update dropped. - Load tenant config after share resolution: configMiddleware now runs after canAccessSharedLink (which enters the share's tenant ALS context), so per-tenant interface.sharedLinks.snapshotFiles overrides apply to anonymous public views. - Return a clean 404 when the snapshotted object is gone: resolveShareFile now requires the live file record and 404s if it's been deleted/expired, instead of letting the stream error after headers are sent (ENOENT / 500). (The re-flagged P1 about private-viewer rewriting was already fixed in the prior commit's cookie-auth change.) * 🔒 fix: Third Codex pass on shared-link file snapshots Addresses the third Codex review pass on PR #13740: - P1: keep shared previews/files pinned to the snapshotted version. Snapshot the small previewRevision; resolveShareFile 404s when the live file's revision no longer matches (file_id reused/overwritten by a later turn), so old links can't surface post-share content — covers both preview text and streamed bytes. - Honor the toggle as a kill switch: resolveShareFile 404s when snapshotFiles is disabled, instead of only skipping backfill, so disabling stops serving already-snapshotted file URLs. - Lazy-sweep orphaned 'pending' previews to 'failed' in the share preview route (mirrors the owner route) so the client poller reaches a terminal state. - Resolve the cookie-fallback user in runAsSystem so strict tenant isolation doesn't throw before canAccessSharedLink establishes the share tenant context. * ✨ feat: Per-link "share files" checkbox for shared links Add a checkbox to the share-link dialog (checked by default) letting the user choose whether to include the conversation's files in the shared link, with copy explaining images/files won't be visible to viewers otherwise. Opting out skips snapshot creation/serving for that link. - client: ShareButton renders the checkbox gated on the new startupConfig.sharedLinksSnapshotFilesEnabled flag; state threads through SharedLinkButton into the create/update mutations as `snapshotFiles`. - data-provider: createSharedLink/updateSharedLink send `snapshotFiles` in the body; TStartupConfig gains `sharedLinksSnapshotFilesEnabled`. - api: POST/PATCH /api/share compute snapshotFiles as isFileSnapshotEnabled(req.config) && body.snapshotFiles !== false (admin gate AND per-link opt-out); config.js exposes the effective enabled flag to clients. - en locale: com_ui_share_files (+ _description). * 🐛 fix: Make the "share files" opt-out actually hide files Unchecking "share files" at creation didn't hide anything: the shared message JSON still carried each file's original (e.g. static-served) path, and because opting out only meant "no fileSnapshots field" — indistinguishable from a legacy link — getSharedMessages would backfill snapshots on first view whenever the admin feature was on, re-enabling files entirely. Fix by persisting and honoring the per-link choice: - Store `snapshotFiles` (boolean) on the SharedLink so opt-out is distinct from a legacy link; set it on create and update. - getSharedMessages computes includeFiles = adminEnabled && link not opted out; when excluded it strips files/attachments from the payload (no original-path leak) and never backfills the opted-out link. - Surface the stored choice via getSharedLink so the dialog checkbox reflects an existing link's actual setting instead of always defaulting to checked. Note: changing the checkbox on an already-created link still applies only when the link is refreshed (which regenerates the URL) — a UX follow-up. * 🔒 fix: Close remaining shared-link file opt-out leaks (Codex) Follow-up to the per-link opt-out, addressing the third Codex pass: - Honor the opt-out on the file route too: getSharedLinkFile now returns the link's `optedOut` choice; resolveShareFile 404s (and never backfills) an opted-out link, so a direct /files/:id request can't re-create snapshots. - Make read/serve viewer-independent: the gate no longer uses the viewer's resolved config (isFileSnapshotEnabled(req.config)) — it uses the link's stored choice plus a global env-only kill switch (isFileSnapshotKillSwitchActive). A viewer's own interface.sharedLinks.snapshotFiles can no longer hide a link's files. Create/update still use the creator's config to set the per-link choice. - Neutralize render URLs for non-snapshotted files: applyShareFileRoute now strips filepath/preview for any file/attachment not in the snapshot, so the owner's original (e.g. static) path can't be loaded through the share. * 🔒 fix: Harden shared-file version pinning and local path handling (Codex) - Refuse reused/overwritten file snapshots more broadly: resolveShareFile now refuses to serve when either previewRevision OR `bytes` changed vs the snapshot. `bytes` catches non-office reused outputs (e.g. code-exec same-filename images that lack previewRevision) and is stable across S3 URL refresh and the pending->ready transition. Same-size content swaps remain a best-effort gap inherent to the no-byte-copy design. - Strip cache-busting query strings before local streaming: code-output images add `?v=...` to filepath; the share route now splits it off so getLocalFileStream resolves the real filename instead of a literal `.png?v=...` path. 💬 fix: Clarify that file-sharing changes apply on link refresh For an already-created shared link, changing the "share files" checkbox only takes effect when the link is refreshed (which regenerates the snapshot). Add a note under the checkbox, shown only when a link already exists, so the behavior isn't surprising: "Refresh the link to apply this change — files are snapshotted when the link is refreshed."	2026-06-20 23:05:13 -04:00
Dan Lew	743f57f63e	🔖 feat: Add Pinned Conversations (#13492 ) * feat: add `convo.pinned` We want to be able to pin convos (so users can easily find them), thus we added a new field to the DB schema: `pinned`. We also had to add an API method for pinning a convo. It's got thorough tests. It's structured just like how /api/convos/archive works, only for pinning. * feat: add 'pinned' section to conversation list If there are any pinned conversations, they will appear above the normal "chats" list, with a pinned icon next to them. * feat: added pin/unpin to convo options ConvoOptions now has a pin/unpin button which lets you change the pin status of any given conversation. * fix: adjust ellipsizing gradient on ConvoLink Because it went across the whole ConvoLink, it would cover up any children (i.e. icons) that appear after the title. However, the point of the gradient is just to gradually make the title disappear, not the icons. This change places the gradient on the title only, so it achieves the same ellipsizing effect without interfering with the display of the child icons. * Fixed import sorting	2026-06-17 20:26:55 -04:00
Danny Avila	49f4b659f6	🔐 fix: Honor Admin-Panel MCP Allowlist Overrides Without Restart (#13814 ) * 🔐 fix: Honor Admin-Panel MCP Allowlist Overrides Without Restart MCPServersRegistry was built once at boot from getAppConfig({ baseOnly: true }), freezing allowedDomains/allowedAddresses to YAML. Admin-panel mcpSettings overrides were ignored by both inspection (addServer/ reinspectServer/updateServer/lazyInitConfigServer) and runtime connection enforcement (assertResolvedRuntimeConfigAllowed), so a domain allowed only via the panel failed inspection and never connected. Make the registry's effective allowlists mutable and refresh them from the merged admin-panel config: seed at boot, and re-apply on every config mutation via invalidateConfigCaches -> clearMcpConfigCache. Both inspection and connection paths read the same getters, so both honor overrides without a restart. Fail-safe: current allowlists are preserved when the merged read fails. * 🛡️ fix: Scope MCP allowlist refresh to global config, fail-safe on DB error Address Codex P1 review findings on the allowlist-refresh path: - Tenant-scoped config mutations no longer push one tenant's merged mcpSettings into the process-wide registry singleton (read by all MCP connection paths), which would leak allowlists across tenants. Only global (non-tenant) mutations refresh the registry; tenant mutations still evict the config-server cache. - The refresh read now uses strictOverrides:true so a transient DB error throws instead of silently returning YAML base config — preserving the last-known allowlists rather than overwriting them with fallback values. Adds the strictOverrides option to getAppConfig (default off, no behavior change for existing callers). * ♻️ refactor: Resolve MCP allowlists per-request (tenant-scoped) instead of a global singleton Supersedes the prior global-mutation approach. MCP allowlists live in mcpSettings, which is tenant/principal-scoped admin config, so a process-wide singleton value is the wrong model — it caused cross-tenant bleed and stale reads. Instead, inject a resolver (from the app layer, where the merged config lives) that the registry calls per inspection and per connection. It reads the ALS tenant context via getAppConfig and accepts the acting user so user/role-scoped overrides resolve; config-source inspection (no user) resolves at tenant scope. Falls back to the YAML base allowlists when no resolver is set or the lookup fails, so a transient error fails to the operator baseline rather than disabling the allowlist. Removes the now-unnecessary setAllowlists / boot-seed / invalidateConfigCaches refresh / getAppConfig.strictOverrides machinery. * 🔒 fix: Scope config-source cache by allowlist; resolve OAuth allowlists per-request Address Codex review of the per-request resolver: - Config-source cache key now folds in the resolved allowlists, not just the raw-config hash. Inspection results became allowlist-dependent, so without this a tenant whose allowlist rejects a URL could poison the shared key with an inspectionFailed stub for a tenant that allows it (and vice versa). The tenant-scoped allowlist is resolved once per ensureConfigServers pass and threaded through the cache key + inspection. - The two remaining request-time OAuth allowlist reads now use the merged config instead of the YAML base getters: the fallback OAuth-initiate path (routes/mcp.js) via resolveAllowlists, and OAuth revocation (UserController.maybeUninstallOAuthMCP) via the request's already-merged appConfig.mcpSettings. Without this, an OAuth endpoint allowed only by an admin-panel override was rejected while inspection/connection allowed it. * ✅ test: Update MCP OAuth registry/config mocks for per-request allowlists CI fix for the Finding-12 change. The OAuth-initiate route now calls registry.resolveAllowlists() and the revocation path reads the merged appConfig.mcpSettings, so the affected specs' mocks were asserting the old base-getter values: - routes/__tests__/mcp.spec.js: add resolveAllowlists to the registry mock. - UserController.mcpOAuth.spec.js: provide mcpSettings on the getAppConfig mock so revokeOAuthToken still receives the expected allowlists. * 🧪 test: e2e proof that admin-panel MCP allowlist override takes effect Adds a Playwright mock-harness spec for #13809. A URL-based MCP fixture (e2e-http, streamable-http SDK server) boots inspectionFailed because its origin is omitted from the YAML mcpSettings.allowedDomains; the spec adds that origin via an admin config override (PUT /api/admin/config/user/:id) and asserts the server reinitializes — exercising the real resolver path through the backend + DB. Before the fix, reinspection used the frozen YAML allowlist and the server stayed unreachable. - e2e/setup/fake-mcp-http-server.js: streamable-HTTP MCP fixture (health GET /). - e2e/playwright.config.mock.ts: boot the fixture as a second webServer. - e2e/config/librechat.e2e.yaml: mcpSettings.allowedDomains (excludes 127.0.0.1) + the e2e-http server. - e2e/specs/mock/mcp-allowlist-override.spec.ts: login → baseline reinit fails → apply override → reinit succeeds.	2026-06-17 20:14:53 -04:00
Dustin Healy	054fa4bfa7	🥽 fix: Restrict MCP Server URL Disclosure to Admins, Owners, and Editors (#13784 ) * 🥽 fix: Redact Non-User-Sourced MCP Server URLs by ACL Edit Permission GET /api/mcp/servers and GET /api/mcp/servers/:serverName return MCP server configs to any caller with MCP-use permission. For user-sourced configs (DB-stored, UI-submitted), the URL is the caller's own and is intentionally disclosed. For non-user-sourced configs (YAML or config-tier, operator-defined), the URL and OAuth flow endpoints (authorization_url, token_url) are operator-sensitive: they can encode internal infrastructure hostnames and are not editable through the API. This change redacts those fields on non-user-sourced configs unless the caller has edit authority on the resource, using the same ACL check (PermissionBits.EDIT) that the PATCH and DELETE routes already enforce via canAccessMCPServerResource. Callers with broad MANAGE_MCP_SERVERS capability bypass the per-resource check, matching the existing capability bypass in canAccessResource. customUserVars is intentionally not redacted: its values are UI hint metadata (title, description, sensitive), not user-supplied secrets; blanking it would give non-editor callers a Configure form with no field labels. * 🥽 fix: Correct getResourcePermissionsMap import path + tighten redact comments The MCP server redaction commit imported getResourcePermissionsMap from ~/server/controllers/PermissionsController, but that controller is a consumer of the helper, not its exporter. The canonical export lives in ~/server/services/PermissionService (which controllers/agents/v1.js already imports from). Fixes the runtime getResourcePermissionsMap is not a function failure on GET /api/mcp/servers and the four downstream route-spec failures whose config mocks lacked a source field and were therefore wrongly treated as non-user-sourced; mocks now reflect the real registry behavior (addServer/updateServer tag DB-stored configs with source: 'user'). Trims narrating JSDoc on the redact helpers and resorts the librechat-data-provider destructure by length. * chore: import order * 🥽 fix: Redact OAuth Revocation Endpoint Alongside Authorization And Token URLs The OAuth-URL strip path only dropped authorization_url and token_url. The UserOAuthOptionsSchema in packages/data-provider/src/mcp.ts (line 146) accepts revocation_endpoint as another operator-configurable URL, and the OAuth handler uses it to revoke tokens; it can hold the same internal IdP hostnames the existing strip is trying to hide. Adds revocation_endpoint to the destructure so a non-user-sourced YAML/config MCP server config no longer leaks the revocation URL to non-editor callers. The existing strip url and oauth flow URLs spec is extended with a revocation_endpoint value to lock in the new field. * 🥽 fix: Gate Shared DB Server URL Disclosure On ACL Edit Permission source-driven URL disclosure was incorrect for shared DB-backed MCP servers. ServerConfigsDB.mapDBServerToParsedConfig (packages/api/src/mcp/registry/db/ServerConfigsDB.ts:465) sets source: 'user' on every DB-stored config it returns, regardless of who is accessing it. A user with only VIEW share on a DB server, or with agent-mediated access, was therefore treated by the redaction layer as if they owned the URL, and GET /api/mcp/servers disclosed the owner's URL and OAuth flow URLs to viewers who could not edit the resource. The redaction is now driven purely by ACL edit authority: computeCanEditByServer routes every dbId-bearing config through PermissionBits.EDIT regardless of source; redactServerSecrets strips on !canEdit regardless of source. POST and PATCH controllers explicitly pass canEdit: true since both endpoints establish edit authority (POST creates the resource, PATCH is gated on the EDIT middleware). Legacy/ephemeral configs without a dbId still fall back to the source heuristic. * 📝 docs: correct redactServerSecrets URL-disclosure comment --------- Co-authored-by: Danny Avila <danny@librechat.ai>	2026-06-16 11:20:52 -04:00
Ravi Kumar L	fbc990f684	📈 fix: Isolate RUM Telemetry Proxy Auth from App Auth (#13765 ) * fix(rum): isolate telemetry proxy auth * feat(rum): track proxy error metrics * refactor(rum): simplify proxy auth strategy flow * test(rum): clarify proxy success metric assertion * test(metrics): use typed supertest import * test(metrics): add local supertest types * test(metrics): keep supertest types local * test(metrics): use official supertest types * fix(rum): log proxy auth strategy errors * fix(rum): classify proxy auth errors in metrics * style(rum): sort telemetry metric imports * ci: mention import sort check command * ci: show targeted import sort example	2026-06-15 12:49:44 -04:00
Danny Avila	49859c04a2	🗄️ fix: Gate Request-Scoped MCP Servers Out of Persistent Tool Cache (#13672 ) * 🗄️ fix: Gate Request-Scoped MCP Servers Out of Persistent Tool Cache PR #13626 established that request-scoped MCP servers (runtime OPENID/GRAPH/BODY placeholders) must not use the persistent 12h tool cache, but only gated three of five touchpoints. The panel endpoint still back-filled the cache and the OAuth callback still wrote to it, while agent loading read those entries ungated — pinning ephemeral model-spec/agent toolsets to stale definitions for up to 12h. Centralize the invariant in createMCPToolCacheService: a getServerConfig resolver dep gates both writers and a new service-owned getMCPServerTools read, so every current and future caller is covered. Callers that already hold the parsed config pass it to skip resolution; the per-call skipCache flag and duplicated call-site gates are removed in favor of the single config-based mechanism. Resolution failures fail open to preserve prior behavior. * 🩹 fix: Address Codex Review on Cache Gating - Repair getCachedTools.spec.js, which destructured the relocated getMCPServerTools directly from the module; its coverage now lives in the service-level tools.spec.ts. - Resolve the merged (Config-tier-aware) server config in the OAuth callback before writing tool definitions, so the cache gate detects request-scoped servers supplied via admin Config overlays that the base registry lookup cannot see. - Discover tools actively for request-scoped servers in the panel endpoint via ephemeral reinitialization: such servers have no stored app/user connections, so the previous getServerToolFunctions fallback returned an empty toolset once the cache read was gated. * 🧵 fix: Address Second Codex Review on Cache Gating - Resolve the merged server config before the OAuth callback reconnects, so the connection itself uses Config-tier overlays rather than only the subsequent cache write. - Pass Config-tier candidates into the panel's request-scoped discovery, matching the reinitialize route: reinitMCPServer forwards configServers (not the provided serverConfig) to its OAuth discovery fallback. - Document the accepted read-path trade-off: the gate resolver sees base configs only, all writers pass merged configs, so a pre-gating or overlay-divergent entry survives at most one cache TTL. * 🚏 chore: Rework Cache Gating for BODY-Only Request Scoping After #13673 narrowed requiresEphemeralUserConnection to BODY placeholders, the central gate follows the predicate unchanged, but the panel's active discovery no longer serves a purpose: the only remaining request-scoped class cannot connect outside a chat turn, so the reinitialization attempt would always fail at the missing-body check. Remove that path; OpenID/Graph servers are persistent user-scoped again and flow through the stored-connection and cache lookups as before. Flip test fixtures that used OPENID placeholders to denote request-scoped configs over to BODY placeholders. * 🪟 fix: Check Config Overlays in Agent-Loading Cache Reads The cache service's registry resolver sees only base YAML/DB configs, so a BODY placeholder introduced by a request-tier Config overlay was invisible to the gate on the agent-loading read path: model-spec and ephemeral-agent expansion could read a leftover persistent entry and pin stale concrete tool names instead of the mcp_all fresh-discovery path. Check the raw overlay candidate inline in loadEphemeralAgent and loadAddedAgent — a pure placeholder scan with no extra IO — and skip the cache read when the overlay makes the server request-scoped. Widen UserScopedConnectionConfig so raw (pre-inspection) configs qualify for the scoping predicates, which only check key presence. * 🧪 test: Guard Run-Scoped MCP Definition Handoff Boundaries The original ClickHouse breaker storm regressed precisely at field pass-through boundaries that unit tests of each end could not see: initializeAgent dropping mcpAvailableTools from its destructure, and the agent tool context losing it on the way into ON_TOOL_EXECUTE. Add direct guards on both hops: the loadTools result must surface on the initialized agent, and the captured toolExecuteOptions closure must forward it to loadToolsForExecution.	2026-06-13 11:26:49 -04:00
Danny Avila	c27d6b85a4	🤫 refactor: Silent MCP OAuth Refresh on Mid-Session 401 (#13369 ) * 🤫 fix: Silent MCP OAuth Refresh on Mid-Session 401 Avoids the hourly interactive re-auth prompt when an MCP server (e.g. Azure Entra ID) returns 401 mid-session by attempting a refresh token exchange first, and only falling back to the interactive OAuth flow when no refresh token is stored or the refresh server rejects it. Resolves #13364. * fix: Use distinct flow type for silent token refresh to avoid cache hit Addresses the Codex review on PR #13369: `attemptSilentTokenRefresh` was reusing the `'mcp_get_tokens'` flow type, so `FlowStateManager.createFlowWithHandler` would short-circuit and return the same tokens cached by an earlier `getOAuthTokens` call — the very tokens the server just rejected — without executing the forced-refresh handler. Switch silent refresh to the distinct `'mcp_force_refresh_tokens'` flow type so coalescing still works but stale `mcp_get_tokens` cache entries are not reused. After a successful refresh, invalidate the `mcp_get_tokens` flow cache so the next `getOAuthTokens` call reads the freshly persisted tokens from storage rather than the stale cached value. Add a regression test that simulates the real `FlowStateManager.createFlowWithHandler` cache-hit behavior for `mcp_get_tokens` and verifies the silent refresh handler still runs and returns the freshly refreshed tokens. * fix: Address Codex round-2 review on silent MCP OAuth refresh Three follow-up findings from Codex on PR #13369: 1. The new `mcp_force_refresh_tokens` flow type was itself cached by `FlowStateManager.createFlowWithHandler`, so a subsequent 401 within the refreshed token's `expires_at` could re-serve the just-rejected token without ever re-running the refresh handler. 2. The factory's `oauthRequired` listener was removed immediately after the initial `attemptToConnect` succeeded, so a real mid-session 401 emitted by `MCPConnection.connectClient` during transport recovery had no listener — the OAuth handled-promise would simply time out instead of triggering the silent refresh. 3. Routing the silent refresh through a distinct flow type broke coalescing with the `mcp_get_tokens` lock used by `getOAuthTokens`, letting two paths concurrently redeem the same stored refresh token. For providers that rotate refresh tokens (e.g. Azure Entra) the second redemption is rejected, kicking the user back into interactive OAuth despite a successful refresh elsewhere. Resolution: - Drop `FlowStateManager` from the silent-refresh path entirely. Replace with a process-local `inflightSilentRefreshes` Map keyed by `userId:serverName` that holds only the in-flight Promise (no cached result), so every fresh 401 after settlement triggers a fresh redemption while concurrent 401s for the same user/server still share one redemption. - Stop calling `cleanupOAuthHandlers()` on successful initial connect, keeping the OAuth handler attached for the connection's lifetime so mid-session 401s actually reach `attemptSilentTokenRefresh`. - Add a regression test reproducing the stale-cache scenario by faking the `mcp_get_tokens` cache hit and asserting silent refresh still runs against storage and returns the fresh tokens. - Add a coalescing test asserting two concurrent oauthRequired events for the same user/server result in a single `forceRefreshTokens` call. - Clear `inflightSilentRefreshes` in `beforeEach` to prevent cross-test leakage; switch the silent-refresh test mocks to `mockResolvedValueOnce` / `mockImplementationOnce` so leftover mock state cannot leak into later test cases. Acknowledged remaining gap: the silent refresh still races `getOAuthTokens`'s `mcp_get_tokens` flow when both run concurrently (narrow window when an existing connection's local `expires_at` is still valid but the server invalidated the token, and a new connection is being created in parallel). The race is self-healing on the next 401 and documented inline. * fix: Address Codex round-3 review on silent MCP OAuth refresh Three more findings from Codex on PR #13369: 1. The in-flight silent-refresh promise was unbounded. If `forceRefreshTokens()` ever hung (slow provider, dropped TCP), the `inflightSilentRefreshes` lock stayed occupied forever and every later 401 for the same user/server joined the stuck promise instead of starting a fresh attempt or falling back to interactive OAuth. 2. The interactive-OAuth fallback didn't invalidate the `mcp_get_tokens` flow cache after persisting fresh tokens. For providers that don't issue refresh tokens (so silent refresh returns null), the old cache could still feed stale access tokens to the next `getOAuthTokens` call until its TTL expired — causing an immediate reconnect with the same just-rejected token. 3. When silent refresh failed, the handler fell through to `handleOAuthRequired()` whose recent-completion fast path can reuse a COMPLETED `mcp_oauth` flow within `PENDING_STALE_MS`. Those cached tokens are exactly the ones the server just rejected, so the connection would keep adopting them and looping on 401s until the cache aged out. Resolution: - Wrap `runSilentRefresh()` with a 60-second `withTimeout` (well under `connectClient`'s 120s OAuth timeout). On timeout the `.catch` resolves to null and the `finally` clears the in-flight entry, so the next 401 starts fresh and falls through to interactive OAuth. - Extract two helpers — `invalidateGetTokensFlow` and `invalidateCompletedOAuthFlow` — and call them from the right branches: clear `mcp_get_tokens` after silent-refresh success AND after interactive-OAuth `storeTokens`; clear the COMPLETED `mcp_oauth` state (plus its CSRF mapping) before falling through to interactive OAuth so the fast-reuse path can't re-serve the rejected tokens. - Add three regression tests: hung refresh release-the-lock under fake timers, completed-OAuth cache invalidation pre-fallback, and `mcp_get_tokens` invalidation after interactive token store. * fix: Address Codex round-4 review on silent MCP OAuth refresh Three more findings from Codex on PR #13369: 1. (P1) The silent-refresh in-flight lock keyed only by `userId:serverName`. In multi-tenant setups where two tenants share a userId (e.g. username-based IDs) and the same MCP server name, a concurrent mid-session 401 from tenant B would join tenant A's in-flight refresh and adopt tenant A's freshly minted tokens onto a tenant-B connection — a cross-tenant credential leak. 2. (P2) `invalidateGetTokensFlow` deleted the `mcp_get_tokens` flow state regardless of its status. When another connection was currently in `getOAuthTokens()` (PENDING flow) and joiners were monitoring it, the unconditional delete made those waiters see "Flow state not found" and unnecessarily fall back to interactive OAuth — even though fresh tokens were already being written. 3. (P2) The 60s `withTimeout` wrapping `runSilentRefresh()` only races the promise; it does not cancel the underlying `forceRefreshTokens` / refresh-token HTTP request. If the request returned after a subsequent interactive OAuth had stored newer tokens, the late completion would `storeTokens` over the newer state. This requires a provider that doesn't rotate refresh tokens AND a refresh slower than 60s AND a successful interactive OAuth in that window — narrow but real. Resolution: - Capture `getTenantId()` into a new `factory.tenantId` field at factory construction time (before the OAuth handler closes over it outside the original request's async context) and include it in the silent-refresh lock key as `tenantId:userId:serverName`. - `invalidateGetTokensFlow` now calls `getFlowState` first and only deletes when `status === 'COMPLETED'`. PENDING lookups are left alone so concurrent `getOAuthTokens` waiters via `monitorFlow` can still settle. - For (3), document the race as a known limitation inline. Fully closing it requires threading an `AbortSignal` through `MCPTokenStorage.forceRefreshTokens` and the OAuth refresh handler to skip the late `storeTokens` after timeout — out of scope for this PR's surgical change. - Add `getTenantId` to the `MCPOAuthConnectionEvents` test's `@librechat/data-schemas` mock so the factory constructor doesn't blow up under that suite. - Add three regression tests: per-tenant lock isolation, PENDING-state preservation under `invalidateGetTokensFlow`, and (reused) the existing interactive-store invalidation test now driven through `getFlowState` returning the COMPLETED state. * fix: Address silent MCP OAuth refresh review Restore captured tenant context around token storage and OAuth fallback paths so mid-session callbacks do not lose tenant scope. Thread AbortSignal through forced refresh and OAuth token requests, cap silent refresh by the connection OAuth timeout, and prevent timed-out refreshes from writing stale credentials after fallback. Complete pending mcp_get_tokens flows with fresh tokens, add missing FlowState createdAt test fixtures, and cover the new tenant/abort/cache behaviors. * fix: Tighten tenant-scoped MCP token refresh Cap silent refresh by both the factory connect timeout and the connection OAuth wait timeout so fallback OAuth wins before the outer connect attempt expires. Tenant-scope mcp_get_tokens flow ids for both token lookup and refresh invalidation, preventing cross-tenant flow completion or cache deletion when tenants share user ids and server names. Add regression tests for the omitted initTimeout budget and tenant-prefixed token flow locks. * fix: Reserve MCP OAuth fallback budget * fix: Harden MCP OAuth refresh races * fix: Keep MCP OAuth fallback route-compatible * test: Add SDK MCP OAuth refresh repro * fix: Address MCP OAuth refresh review findings * fix: Address MCP OAuth tenant review findings * fix: Close MCP OAuth route tenant gaps * fix: Preserve MCP OAuth refresh flow guards * fix: Avoid reprocessing MCP OAuth reauth config * fix: Release timed-out MCP refresh locks * fix: Release MCP OAuth request callbacks * fix: Tenant-scope remaining MCP OAuth flow lookups * ci: Sort imports in MCP OAuth test suites	2026-06-10 13:12:42 -04:00
Danny Avila	a7f16911b2	⏳ fix: Extend and Decouple MCP OAuth Flow Timeouts (#13622 ) * ⏳ fix: Extend and decouple MCP OAuth flow timeouts The OAuth auth button disappeared after 2 minutes (the internal OAuth handling timeout) while the flow state lived for 3 minutes, leaving users who didn't click immediately stuck in an unrecoverable re-auth loop. The handling timeouts also reused the connection/init timeout, so a short initTimeout would shrink the OAuth window further. - Add MCP_OAUTH_HANDLING_TIMEOUT (10m) and MCP_OAUTH_FLOW_TTL (15m) to mcpConfig - Decouple the reactive/proactive OAuth waits from initTimeout/connectionTimeout - Use OAUTH_FLOW_TTL for the FlowStateManager TTL and the UI status window - Ensure the flow TTL outlives the handling timeout, fixing the "Flow state not found" race - Remove dead FLOW_TTL constant and document new env vars Fixes #13615 * ⏳ fix: Coordinate OAuth pending window with handling timeout Address Codex review: the extended OAuth wait was still capped by other timeouts that were not updated. - Align PENDING_STALE_MS (button validity + pending-flow reuse window) with MCP_OAUTH_HANDLING_TIMEOUT so a flow stays reusable for the full wait instead of 2 minutes (Finding 3) - Clamp MCP_OAUTH_FLOW_TTL to never fall below the handling timeout so a callback near the deadline still finds its flow state (Finding 2) - Floor attemptToConnect's timeout to the handling window for OAuth servers so the reactive in-connect OAuth wait is not killed by the 30s connection timeout (Finding 1) - Update flow staleness tests to reference the threshold symbolically * ⏳ fix: Align OAuth window across status, action flows, and client polling Address Codex round 2: extending the server wait exposed three more windows that were still capped or now over-extended. - checkOAuthFlowStatus reports a PENDING flow as active only within the usable PENDING_STALE_MS window, not the longer Keyv retention TTL, so the connect button reappears instead of a stuck 'connecting' state - Give Action (custom tool) OAuth its own FlowStateManager on the prior 3-minute TTL so the longer MCP OAuth TTL can't leave an action tool call waiting up to 15 minutes - Extend the MCP server-card client polling to the 10-minute handling window so a user who completes OAuth after 3 minutes is still picked up * 🧪 test: Make stale-flow CSRF test track PENDING_STALE_MS The CSRF-fallback stale-flow test hardcoded a 3-minute age, which is now within the 10-minute PENDING_STALE_MS window and was wrongly treated as active. Derive the age from PENDING_STALE_MS so it tracks the constant. * ⏳ fix: Add grace buffers and surface OAuth timeout to the client Address Codex round 3 (near-deadline edges): - Clamp MCP_OAUTH_FLOW_TTL to handling timeout + 60s grace (not equality), so flow state outlives the wait instead of expiring at the same instant - Extend attemptToConnect's OAuth floor by a 60s grace so a user who authorizes near the deadline still gets the post-OAuth reconnect - Surface OAUTH_HANDLING_TIMEOUT on the connection-status response and have the client poll for the configured window instead of a hardcoded 10 minutes, so a tuned server deadline isn't capped on the client * ⏳ fix: Refresh client OAuth timeout from the first status refetch If the connection-status cache is empty when polling starts, the client captured the 10-minute fallback and never picked up a tuned oauthTimeout. Re-read it after each refetch so a longer configured deadline is honored even on a cold cache. * 📝 refactor: Type oauthTimeout on MCPConnectionStatusResponse Declare the oauthTimeout field on the shared response type in data-provider instead of an ad-hoc inline cast in the client hook, and replace the pre-existing 'as any' on the status query read with the typed getQueryData. Type-level only; no runtime change.	2026-06-09 17:50:02 -04:00
Danny Avila	4b871a11ad	🧼 fix: Prevent Shared Link Caching and Strengthen Log Redaction (#13561 ) * fix: tighten share caching and log redaction * fix: sort changed imports * fix: redact splat log arguments * fix: avoid mutating log metadata during redaction * fix: redact error and api_key log values * fix: preserve error log context during redaction * fix: cover remaining log redaction paths * fix: bound log redaction work * fix: align redaction scan cap with log config	2026-06-06 18:40:57 -04:00
Danny Avila	da5876331e	🔐 fix: Reuse MCP OAuth Authorization URL (#13532 ) * fix: reuse MCP OAuth authorization URL * fix: validate MCP OAuth initiate flow ID	2026-06-05 17:18:59 -04:00
Danny Avila	6357ea10c1	🧭 feat: Scope Model Spec Skills (#13522 ) * feat: scope model spec skills * style: format skill catalog limit * fix: serialize model spec skill resolution * test: satisfy model spec load config typing * fix: apply model spec skills to added conversations * fix: support alwaysApply frontmatter alias * fix: address model spec skills review	2026-06-05 10:22:02 -04:00
Atef Bellaaj	86fe79c37d	🔗 feat: Add Granular Access Control to Shared Links via ACL System (#13051 ) * feat: Add granular access control to shared links via ACL system * fix(shared-links): preserve isPublic on failed migration grants Transient ACL failures during auto-migration permanently stranded links — $unset ran unconditionally, removing the legacy flag that triggers retry. Now only $unset isPublic after all grants succeed. * fix(config): skip isPublic unset for failed ACL grants Bulk migration unconditionally removed isPublic from all links, even those whose ACL writes failed. Failed links then lost the legacy marker needed for auto-migration retry. Now tracks failed link IDs per-batch and excludes them from the $unset step. Also adds sharedLink to AccessRole resourceType schema enum — was missing, only worked because seedDefaultRoles uses findOneAndUpdate which bypasses validation. * ci(config): add jest config and PR workflow for migration tests config/__tests__/ specs depend on api/jest.config.js module mappings but had no dedicated runner. Adds config/jest.config.js extending api config with absolutized paths, npm test:config script, and a GitHub Actions workflow triggered by changes to config/, api/models/, api/db/, or packages/ ACL code. * fix(permissions): honor boolean sharedLinks config SHARED_LINKS has no USE permission, so boolean config produced an empty update payload — gate conditions only matched object form, making `sharedLinks: false` a no-op on existing perms. * fix(share): resolve role before creating shared link Role lookup between create and grant left an orphaned link without ACL entries if getRoleByName threw — retry then hit "Share already exists" with no recovery path. * fix: Restore Public ACL Access Checks * fix: Type Public ACL Lookup * fix: Preserve Private Legacy Shared Links * chore: Promote Shared Link Permission Migration * fix: Address Shared Link Review Findings * fix: Repair Shared Link CI Follow-Up * fix: Narrow Shared Link Mongoose Test Mock * fix: Address Shared Link Review Follow-Ups * fix: Close Shared Link Review Gaps * fix: Guard Missing Shared Link Permission Backfill * test: Add Shared Link Mock E2E * test: Stabilize Shared Link Mock E2E --------- Co-authored-by: Danny Avila <danny@librechat.ai>	2026-06-03 14:17:17 -04:00
Ravi Kumar L	f27e7d7cad	🛂 fix: Gate RUM Proxy Route on the RUM_ENABLED Flag (#13475 )	2026-06-02 14:13:10 -04:00
jcbartle	268f095c1a	🔒 feat: Add On-Behalf-Of (OBO) token exchange support for MCP Servers (#13429 ) Some checks failed Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run Details Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run Details GitNexus Index / index (push) Waiting to run Details GitNexus Index / post-index (push) Blocked by required conditions Details Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run Details Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions Details Sync Helm Chart Tags / Ignore non-main push (push) Waiting to run Details Sync Helm Chart Tags / Sync chart tags (push) Waiting to run Details Publish `librechat-data-provider` to NPM / pack (push) Has been cancelled Details Publish `@librechat/data-schemas` to NPM / pack (push) Has been cancelled Details Publish `librechat-data-provider` to NPM / publish-npm (push) Has been cancelled Details Publish `@librechat/data-schemas` to NPM / publish-npm (push) Has been cancelled Details * Add OBO (On-Behalf-Of) token exchange support for MCP server connections Enables transparent authentication to Entra ID-backed MCP servers using the logged-in user's federated token via the OAuth 2.0 jwt-bearer grant. Configured via obo.scopes in librechat.yaml server config. - Extract generic OboTokenService from GraphTokenService (jwt-bearer grant + cache) - Refactor GraphTokenService to thin wrapper delegating to OboTokenService - Add obo schema field to BaseOptionsSchema in data-provider - Add resolveOboToken in packages/api/src/mcp/oauth/obo.ts (validates federated token, calls resolver, returns MCPOAuthTokens) - Wire oboTokenResolver through MCPConnectionFactory, MCPManager, UserConnectionManager - OBO tokens injected via request headers (not OAuth transport), refreshed on each tool call - Explicit error on OBO failure (no fallthrough to standard OAuth redirect) - Add unit tests for both resolveOboToken (9 tests) and exchangeOboToken (14 tests) * Add OBO authentication option to MCP server UI configuration Enable users to configure On-Behalf-Of (OBO) token exchange for MCP servers created via the UI (MongoDB-stored), in addition to the existing YAML-based configuration. - Add "On-Behalf-Of (OBO)" radio option to MCP server auth section with scopes input field - Remove obo from omitServerManagedFields so the field passes UI schema validation - Add OBO to AuthTypeEnum, obo_scopes to AuthConfig, and OBO handling in form defaults and submission - Add .min(1) validation on obo.scopes to reject empty strings - Add English localization keys: com_ui_obo, com_ui_obo_scopes, com_ui_obo_scopes_description - Add 5 schema validation tests for OBO field acceptance, transport compatibility, and edge cases * 🧊 fix: Add obo to safe properties in redactServerSecrets. Fixes the OBO configuration not showing up in the MCP UI after app restart * Address linter errors * 🧊 fix: fail closed on OBO refresh errors and retry transient token exchange failures - stop tool calls from falling back to stale Authorization headers when per-call OBO refresh fails - add one-time retry for transient Entra OBO exchange failures (network/429/5xx) - preserve structured OBO failure reasons and retryability in resolveOboToken - improve OBO auth error messaging for connection setup and tool execution - add tests for transient vs permanent OBO failure paths * Addressing linting errors / warnings * 🧊 fix: isolate OBO MCP auth to user-scoped connections - block OBO-enabled servers from app-level shared MCP connections - bypass shared connection lookup for OBO servers in MCPManager.getConnection - add regressions covering OBO connection scoping and preserve non-OBO app connection reuse * 🛠️ refactor: centralize MCP user-scoped connection policy - add shared requiresUserScopedConnection helper for OAuth, OBO, and customUserVars - use the shared predicate in MCPManager and ConnectionsRepository - add utils coverage for user-scoped connection policy * 🧊 fix: restrict MCP OBO config to header-capable transports - Move OBO configuration out of the shared MCP base options schema and allow it only on SSE and streamable-http transports, where request headers are applied. - Explicitly reject OBO on stdio and websocket configs to avoid accepted-but- nonfunctional server definitions. Add schema coverage for admin/config parsing and user-input websocket validation. * 🧊 fix: single-flight concurrent OBO token exchanges Concurrent tool calls that arrive on a cache miss were each issuing their own jwt-bearer request to the IdP. Under that fan-out, Entra intermittently returned errors that the retry classifier saw as non-retryable, surfacing as: "The identity provider rejected the OBO token exchange. Cannot execute tool <name>. Re-authenticate the user or verify the configured OBO scopes and retry." A user retry then hit the populated cache and succeeded, which matches the observed flakiness — the cache was empty at the moment of fan-out but populated by the time the user clicked retry. - Coalesce concurrent exchanges in `OboTokenService.exchangeOboToken` keyed by `${openidId}:${scopes}`. Callers that arrive while an exchange is in flight share the same upstream request and receive the same result. `fromCache=false` continues to force a fresh, independent exchange (and is not joined by `fromCache=true` callers). The IdP call, single-retry path, and cache write are unchanged — they were moved into a `performOboExchange` helper so the coalescing wrapper stays small. - Tests cover: coalescing on the same key, isolation between different keys, cleanup on success, cleanup on failure, and the `fromCache=false` bypass. * 🔒 feat: gate MCP OBO config behind MCP_SERVERS.CONFIGURE_OBO permission OBO silently mints per-user delegated tokens from the caller's federated access token and forwards them to whatever URL the server config points at. Previously, anyone with MCP_SERVERS.CREATE could configure obo.scopes — so if server creation is ever delegated beyond admins, a user could stand up an attacker-controlled server, attach it to a shared agent, and exfiltrate other users' downstream tokens on tool invocation. Add a dedicated MCP_SERVERS.CONFIGURE_OBO permission (ADMIN: true, USER: false by default) and enforce it at three layers so the safety property no longer depends on CREATE staying admin-only: - Create/update: POST/PATCH /api/mcp/servers returns 403 when the body carries `obo` and the caller's role lacks the permission. - Runtime fail-closed: for DB-sourced configs, MCPConnectionFactory and MCPManager.callTool re-check the original author's role before each OBO exchange. If the author has been downgraded, the exchange is skipped (factory) or refused (callTool) — retained configs lose their privileges automatically. - UI: the OBO option is hidden in the MCP server dialog for users without the permission; a CONFIGURE_OBO toggle is exposed in the MCP admin role editor. Existing role docs receive the new sub-key via the permission backfill in updateInterfacePermissions on next startup, preserving any operator-set values. YAML/Config-sourced server configs are unaffected since they're admin-controlled at the deployment level. * 🧊 fix: wire OBO machinery for servers with requiresOAuth: false The discovery and user-connection paths gated OAuth wiring (flow manager, token methods, oboTokenResolver, oboTrustChecker) behind isOAuthServer(), which only considers requiresOAuth/oauth fields. A DB-stored OBO server with requiresOAuth: false therefore landed in the non-OAuth branch, never received an oboTokenResolver, and the factory's usesObo getter evaluated to false — sending a bare request that the upstream rejected with invalid_token. Add requiresOAuthMachinery() (OAuth OR OBO) and use it at those two gates. isOAuthServer remains for the OAuth-handshake-only check (shouldInitiateOAuthBeforeConnect), where OBO must not initiate a handshake. Plumb the OBO resolver/trust-checker through ToolDiscoveryOptions so reinitMCPServer can pass them on the discovery path. * 🧊 fix: lock all OBO-target fields (URL, proxy, headers, auth) without CONFIGURE_OBO The CONFIGURE_OBO permission was meant to gate control of the endpoint that receives OBO-minted per-user delegated tokens and the scopes that are requested. The previous frontend lock + backend gate only covered obo.scopes and the auth section, leaving url/proxy/headers/etc. editable by anyone with UPDATE — meaning a non-permission user could still redirect an existing OBO server's token flow to an attacker endpoint. Switch to an allowlist policy: when editing an OBO server without CONFIGURE_OBO, only title/description/iconPath are mutable. Backend rejects any other field change with 403; frontend disables the non-allowlist sections (URL, transport, auth, trust) via fieldset. The comparison surface (MCP_USER_INPUT_FIELDS) is derived from MCPServerUserInputSchema's union members so it stays in sync with the schema. New schema fields land in the locked set by default — adding to the allowlist is the only way to unlock them, which preserves the security-review boundary. * 🧊 fix: skip unauthenticated MCP inspection for OBO-only servers MCPServerInspector.inspectServer() ran an unauthenticated temp connection unless the config had requiresOAuth or customUserVars set. For OBO-only servers without standard MCP OAuth advertisement, this caused MCPConnectionFactory.create to attempt the connection without a user or oboTokenResolver — failing on servers that reject the MCP initialize handshake without a valid bearer token, which surfaced as MCP_INSPECTION_FAILED on create/update. Add `obo` to the skip list alongside requiresOAuth and customUserVars, matching the existing pattern for user-scoped auth modes. * Addressed linting error: watchedTitle is declared but never referenced (the auto-fill logic at line 156 uses getValues('title') instead). Deleted constant.	2026-06-01 22:36:18 -04:00
Ravi Kumar L	a86e504a57	📡 feat: Add Authenticated Proxy Mode for Browser RUM Telemetry (#13464 )	2026-06-01 21:11:35 -04:00
Danny Avila	100871c3ec	🛂 fix: Enforce MCP Permissions for Agent Tools (#13174 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details GitNexus Index / index (push) Waiting to run Details GitNexus Index / post-index (push) Blocked by required conditions Details * fix: Enforce MCP Permissions for Agent Tools * fix: Measure MCP Image Limit by Decoded Size * fix: gate cached MCP tools and tighten remote image URL detection Addresses Codex review findings on the MCP permissions PR: - filterAuthorizedTools previously fast-accepted any tool present in the global tool cache before reaching the MCP-use permission gate. App-level MCP tools (keyed `name_mcp_server` by MCPServerInspector and merged into the cache via mergeAppTools) therefore bypassed the canUseMCP check, letting a user without MCP_SERVERS.USE persist/bind them. Route all MCP-delimited tools through the permission + server-access gate regardless of cache presence. - assertImageDataWithinLimit / image formatter used startsWith("http") to skip the size cap, which also matched base64 payloads that happen to begin with those chars. Require http:// or https:// via a shared isRemoteImageUrl helper so oversized inline base64 can no longer bypass MCP_IMAGE_DATA_MAX_BYTES. Adds regression tests for both paths. * fix: address Codex round-2 findings on MCP permissions PR - parsers.ts: parseAsString dropped the image payload for unrecognized providers, returning only `Image result: <mimeType>`. Pre-PR these items survived via JSON.stringify(item). Keep the size guard but fall through to JSON.stringify so the data/URL is preserved. - MCP.js: the runtime MCP-use check only read `configurable.user`, so paths that propagate `user_id` only (e.g. the OpenAI-compatible API in agents/openai/service.ts) rejected every MCP tool call for an authenticated user. Add resolveMCPPermissionUser: use the safe user directly when it already carries a role (no extra DB call), otherwise fall back to loading the role by user_id. Update fail-closed tests to the resolved behavior. - v1.js: the update path only re-filtered newly added MCP tools, so a user who lost MCP_SERVERS.USE kept existing MCP bindings on edit while create/duplicate/revert stripped them. Strip all MCP tools on update when the permission is revoked; keep the narrower new-tool gating (and disconnect/registry preservation) when it is intact. Updates and adds regression tests for all three paths. * fix: populate safe user at producer instead of resolving in runtime MCP check Corrects the Finding B approach from the previous commit. Rather than loading the user by id inside the runtime MCP permission check, populate `configurable.user` (and createRun's `user`) with the full safe user at the producer, matching the in-repo agent controllers (responses.js / openai.js) which already pass `createSafeUser(req.user)`. - service.ts: derive `safeUser` via createSafeUser(req.user) and pass it to both createRun and processStream's configurable, so the role-bearing identity reaches the runtime `userCanUseMCPServers(configurable.user)` check. Falls back to a bare id when the host app attached no user, which correctly leaves MCP gated (fail closed). - MCP.js: revert the resolveMCPPermissionUser DB-load fallback; the runtime check again reads configurable.user directly and fails closed when absent (defense in depth). - MCP.spec.js: revert to the matching runtime test expectations. * test: cover safe-user propagation in createAgentChatCompletion Adds a focused spec for the OpenAI-compatible chat completion service (the producer fixed for Codex Finding B). Injects mocked deps and asserts that createRun and processStream's configurable.user carry the role from req.user (with sensitive fields stripped by createSafeUser), and that an unauthenticated request falls back to a bare { id: 'api-user' } so the runtime MCP check fails closed. * fix: address Codex round-3 findings + TS6133 - MCP.js (P1): the assistants required-action path invokes tool._call( toolInput) with no LangChain config, so the runtime check saw no configurable.user and rejected authorized users. createToolInstance now captures the creation-time user (req.user via createMCPTool) and _call falls back to it for both the permission check and userId. Still fails closed when neither config nor captured user carries a role. - v1.js (P2): the update-path isMCPTool used a bare mcp_delimiter substring check, misclassifying action tools whose operationId contains "_mcp_" (e.g. sync_mcp_state_action_...) as MCP and dropping them on a permission-revoked edit. Delegate to the canonical isActionTool so only real MCP tools are gated. Regression test added. - service.ts: drop the now-unused IUser import (TS6133); derive reqUser's type from createSafeUser's own parameter instead. * fix: resolve TS7022 self-reference in service.spec mock res The mock response object referenced `res` inside its own `status`/`json` initializers without a type annotation, so tsc inferred `res` as `any` (TS7022). Annotate the object and assign the self-referencing chainable methods after declaration. * fix: correct round-4 findings (isActionTool import, captured user, partial-update) - v1.js: import isActionTool from librechat-data-provider (its real export; @librechat/api does not export it, so the prior import was undefined and threw TypeError). Exclude action tools from MCP classification in both the main filterAuthorizedTools loop and the update path, so action tools whose operationId contains _mcp_ (e.g. sync_mcp_state_action_...) are preserved regardless of MCP permission. - v1.js: evaluate the effective tool set (updateData.tools ?? existingAgent.tools) so a tools-less PATCH by a user who lost MCP_SERVERS.USE still strips stale MCP bindings, matching create/duplicate/revert. - MCP.js: createToolInstance now receives the construction-time user and _call falls back to it (permissionUser) when configurable.user is absent, fixing the assistants required-action path that invokes _call without a config and resolving the capturedUser no-undef/ReferenceError. - Tests: action-tool preservation (authorized + denied), tools-less revocation PATCH, updated revocation test to expect all MCP tools stripped. Affected specs pass locally: MCP 49/49, filterAuthorizedTools 49/49. * fix: guard isActionTool against non-string tools; correct actionDelimiter import Two test regressions from the prior commit: - The main filterAuthorizedTools loop called isActionTool(tool) directly, but isActionTool does toolName.indexOf(...) and throws on null/undefined. Compute isActionToolName = typeof tool === 'string' && isActionTool(tool) once and reuse it, restoring graceful null/undefined handling. - The action-tool test referenced Constants.actionDelimiter (undefined); actionDelimiter is a standalone librechat-data-provider export. Import and use it directly. filterAuthorizedTools 36/36 and MCP 40/40 pass locally. * fix: address MCP permission review follow-ups * fix: preserve shared agent MCP tools	2026-05-30 16:19:49 -04:00
ChrisJr404	6db059b8a9	🔒 fix: Strip post-login fields from unauthenticated /api/config response (#13102 ) * 🔒 fix: Strip post-login fields from unauthenticated /api/config response Follow-up to #12490 reported in #12688. The unauthenticated /api/config response still included fields that are only consumed after login (helpAndFaqURL, sharedLinksEnabled, publicSharedLinksEnabled, showBirthdayIcon, analyticsGtmId, openidReuseTokens, allowAccountDeletion, customFooter, cloudFront). None of these are read by the auth pages (Login, Registration, RequestPasswordReset, ResetPassword, VerifyEmail, TwoFactorScreen, AuthLayout, Footer, SocialLoginRender). Split buildSharedPayload into two helpers: - buildPreLoginPayload returns only the fields the unauthenticated auth pages need (appTitle, server domain, social-login flags, OpenID/SAML labels and image URLs, registration/email/password-reset flags, minPasswordLength, ldap). - buildPostLoginPayload returns the post-login informational fields and is merged into the response only when req.user is present. Also move buildCloudFrontStartupConfig into the authenticated branch: useAppStartup is the only consumer and it runs after login. Tests updated: existing CloudFront and allowAccountDeletion assertions move to the authenticated context, and two new assertions cover the stripped fields (one for the post-login informational fields, one for cloudFront) in the unauthenticated context. Signed-off-by: ChrisJr404 <chris@hacknow.com> * fix: Request share-context startup config * fix: Pass share startup config into footer --------- Signed-off-by: ChrisJr404 <chris@hacknow.com> Co-authored-by: Danny Avila <danny@librechat.ai>	2026-05-30 09:51:21 -07:00
Danny Avila	444d923e29	✂️ chore: Strip Session JWT Forwarding from Browser RUM (#13414 ) * fix: disable RUM user JWT auth * fix: remove stale RUM bootstrap import	2026-05-30 10:34:44 -04:00
Ravi Kumar L	71a7c9ce7b	📡 feat: Add Configurable HyperDX Browser Real User Monitoring (#13287 )	2026-05-29 11:04:26 -07:00
Danny Avila	6d6ea08da4	🆔 feat: Built-in Build Metadata for Support Triage (#12756 ) * 🏗️ refactor: Derive App Version from Root package.json + Add buildInfo Schema The hardcoded `Constants.VERSION` in `data-provider` is now replaced at rollup build time via `@rollup/plugin-replace`, sourcing from the root `package.json` so version bumps are a single-file change. Adds the shape needed by the rest of the series: - `interface.buildInfo` boolean flag (default `true`) — lets self-hosters opt out of exposing commit/branch/date. - `buildInfo` on `TStartupConfig` — commit/commitShort/branch/buildDate. - `SettingsTabValues.ABOUT` — new settings tab enum value. Ref: https://github.com/danny-avila/LibreChat/issues/12406 * 🛠️ feat: Add Build Metadata Resolver and Expose via /api/config Adds `resolveBuildInfo()` in `@librechat/api` that surfaces commit SHA, branch, and build date from (in order) `BUILD_` env vars, then local git metadata. Result is cached per-process. `/api/config` includes a `buildInfo` field on both authenticated and anonymous responses when `interface.buildInfo !== false` and at least one resolver field is populated. Omitted entirely otherwise. Designed so pre-built Docker images carry metadata via build-arg while source installs pick it up from `.git` — no manual version tracking. Ref: https://github.com/danny-avila/LibreChat/issues/12406 ℹ️ feat: Add Settings → About Panel with Diagnostics Copy New Settings tab that renders the running build's version, commit (short SHA), branch, and build date in a monospaced block alongside a "Copy diagnostics" button that emits a preformatted text blob for pasting into support issues. Tab is hidden when `interface.buildInfo` is set to `false`. Reads from `startupConfig.buildInfo` provided by `/api/config`. Ref: https://github.com/danny-avila/LibreChat/issues/12406 * 🐳 ci: Inject BUILD_COMMIT/BRANCH/DATE into Docker Images Adds optional `BUILD_COMMIT`, `BUILD_BRANCH`, `BUILD_DATE` ARGs to both `Dockerfile` and `Dockerfile.multi`, wired as `ENV` vars in the runtime stage so the backend's `resolveBuildInfo` picks them up. All image-publishing workflows (`tag`, `main`, `dev`, `dev-branch`, `dev-staging`) now compute `${github.sha}`, `${github.ref_name}`, and a UTC timestamp, then pass them to `docker/build-push-action` as `build-args`. Defaults are empty — non-CI builds (local `docker build`) still work, and the backend falls back to local `.git` metadata if ARGs aren't set. Ref: https://github.com/danny-avila/LibreChat/issues/12406 * 📝 docs: Direct Bug Reporters to Settings → About for Version Info The previous instructions (`docker images \| grep librechat`, `git rev-parse HEAD`) only worked for a subset of deployments and rarely produced a commit SHA for users pulling pre-built images. Point users to the new in-app Settings → About panel's "Copy diagnostics" button, which captures version, commit, branch, build date, and user agent in a single preformatted block. Fallback instructions preserved for older installs. Ref: https://github.com/danny-avila/LibreChat/issues/12406 * 🐳 fix: Move BUILD_* ENV to End of Docker Stages to Preserve Layer Cache Per-commit BUILD_COMMIT/BUILD_DATE changes were being promoted to ENV before `npm ci` / `npm run frontend` (single-stage) and before `npm ci --omit=dev` (multi-stage api-build), which invalidated the cache for every subsequent layer on every CI run. Move the ARG/ENV block below the heavy install and build steps in both Dockerfiles. Metadata is still available in the runtime image but no longer busts layer reuse. Addresses codex review on #12756. * 🔧 fix: Propagate interface.buildInfo=false to Unauthenticated /api/config The unauthenticated branch of `/api/config` was emitting an `interface` object only when `privacyPolicy` or `termsOfService` was set, which meant an admin's explicit `interface.buildInfo: false` opt-out was never visible to anonymous/guest clients. `Settings.tsx` gates the About tab on `startupConfig?.interface?.buildInfo !== false`, so a missing field fell through as "enabled" for those clients. Include `interface.buildInfo: false` in the unauth payload whenever it's explicitly disabled. Keep the implicit default (true) absent to preserve the minimal-unauth-payload convention. Addresses codex review on #12756. * 🔀 ci: Trigger Dev Image Workflows on Root package.json + Dockerfile Changes The baked `Constants.VERSION` now reads from the root `package.json` via rollup-plugin-replace, but the `dev-images.yml` and `dev-branch-images.yml` path filters only matched `api/`, `client/`, `packages/*`. A release commit that only bumps root `package.json` would not trigger a rebuild, leaving `latest` dev images with stale Footer/About version metadata. Include `package.json`, `package-lock.json`, and both Dockerfiles in the path filters so dependency changes (lockfile rebuilds) and image build tweaks also rebuild dev images. Addresses codex review on #12756. 🧽 fix: Harden About Panel Lifecycle, A11y, and Loading Gate Review follow-ups on #12756: - #1 timer leak: stash the copy-state `setTimeout` in a ref and clear it from a `useEffect` cleanup so unmounting the Settings dialog mid-toast doesn't fire `setCopied(false)` on an unmounted component. - #3 flash of About tab: gate `aboutEnabled` on `startupConfig != null` so the tab stays hidden until `/api/config` returns. For admins who disabled `interface.buildInfo`, the tab no longer briefly appears and vanishes on page load. - #6 aria-live placement: move the live region off the interactive button onto a dedicated `<span role="status" aria-live="polite">` so screen readers announce the copied state, not the full button content on every re-render. - #2 missing coverage: add `About.spec.tsx` exercising populated/empty buildInfo rendering, invalid-date handling, diagnostics clipboard payload, copy-state toggling, unmount cleanup, and the live region. * ⚡ perf: Eagerly Resolve Build Info at Module Load Review follow-up #4 on #12756: `resolveBuildInfo()` calls `execFileSync` with a 2s timeout on source installs without `BUILD_` env vars. Paying this cost on the first HTTP request blocks the event loop mid-flight. Call `resolveBuildInfo()` once at config route module load so the resolver's cache is warm before any request arrives. Docker images with the BUILD_ env vars set sidestep the git path entirely, so this only affects the edge case of source installs. * 📝 docs: Document rollup Version Placeholder Contract Review follow-ups #5 / #8 on #12756. The `__LIBRECHAT_VERSION__` placeholder relies on a substring replacement rule that only works because the token appears inside a string literal, and the substitution only runs during `npm run build:data-provider`. - Expand the `Constants.VERSION` JSDoc to spell out that consumers read the placeholder through the built dist bundle; source-level test imports would see the raw placeholder. - Add a NOTE above the rollup `replace` config warning future contributors not to repurpose the token as a bare identifier without switching to a quoted replacement value. Non-functional; prevents future contributors from stepping on a subtle constraint. * 🪪 fix: Only Toast "Copied" When Clipboard Copy Actually Succeeds Codex R5 on #12756. `copy-to-clipboard` returns a boolean indicating whether the underlying `execCommand('copy')` / fallback prompt actually wrote to the clipboard. The previous handler flipped to the "Copied" state unconditionally, which in hardened browsers or when the permission prompt is dismissed would mislead users into filing bug reports without the diagnostics blob attached. Gate the state/timer/live-region on the boolean return; silently no-op on failure rather than showing a false positive. Adds a test asserting the button label stays at "Copy diagnostics" when the clipboard call fails. * 🐳 fix: Derive main image metadata from checkout * 🪪 fix: Keep About enabled until disabled * ✅ test: Avoid literal Settings mock text * 🧱 refactor: Rename Build Info Module	2026-05-23 09:41:13 -04:00
Danny Avila	bd64251eb9	🪪 fix: Prevent MCP Server Name Collisions (#13256 ) * fix: prevent MCP server name collisions * chore: address MCP registry review nits * fix: reserve MCP config names from request context * chore: format MCP registry changes * chore: address MCP collision review findings	2026-05-22 20:46:14 -04:00
Danny Avila	9dd062e42e	🧯 fix: Harden Data Retention Semantics (#13049 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details GitNexus Index / index (push) Waiting to run Details GitNexus Index / post-index (push) Blocked by required conditions Details * feat: support data retention for normal chats Add retentionMode config variable supporting "all" and "temporary" values. When "all" is set, data retention applies to all chats, not just temporary ones. Adds isTemporary field to conversations for proper filtering. Adapted to new TS method files in packages/data-schemas since upstream moved models out of api/models/. Based on danny-avila/LibreChat#10532 Co-Authored-By: WhammyLeaf <233105313+WhammyLeaf@users.noreply.github.com> (cherry picked from commit `30109e90b0`) * feat: extend data retention to files, tool calls, and shared links Add expiredAt field and TTL indexes to file, toolCall, and share schemas. Set expiredAt on tool calls, shared links, and file uploads when retentionMode is "all" or chat is temporary. (cherry picked from commit `48973752d3`) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: lint/test (cherry picked from commit `310c514e6a`) * fix: address code review feedback for data retention PR Critical: - Fix BookmarkMenu crash: restore optional chaining on conversation - Fix migration hazard: backward-compatible sidebar filter that also checks expiredAt for documents without isTemporary field Major: - Add logging to getRetentionExpiry error path, align with tools.js - Add tests for retentionMode: ALL in saveConvo and saveMessage - Fix share route: apply expiredAt for temporary chats too by querying the conversation's isTemporary flag server-side - Add assertions for getRetentionExpiry mocks in process tests Minor: - Fix ChatRoute isTemporaryChat to be strictly boolean via Boolean() - Fix stale test description (expired -> temporary) - Comment out retentionMode default in example yaml - Simplify verbose if/else to isTemporary === true - Add compound index on { user: 1, isTemporary: 1 } - Remove narrating comment from process.spec.js Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> (cherry picked from commit `6bad535f90`) * chore: fix typescript (cherry picked from commit `826527a46b`) * fix: lint (cherry picked from commit `77817e80ea`) * fix: use mockSanitizeArtifactPath in retention test The 'getRetentionExpiry is called with the request object' test referenced an undefined `mockSanitizeFilename` identifier, breaking both lint (no-undef) and the test suite. Use the existing `mockSanitizeArtifactPath` mock that the surrounding tests already use, since `processCodeOutput` calls `sanitizeArtifactPath` (not `sanitizeFilename`) before invoking `getRetentionExpiry`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit `52ea2da66d`) * fix: forward isTemporary from client for retention on file uploads and tool calls Server-side `getRetentionExpiry` (file uploads) and the tool-call controller both read `req.body.isTemporary`, but the file upload multipart form and the tool-call payload did not include that field. In `retentionMode: temporary` (default), files uploaded and tool calls created from temporary chats were therefore retained indefinitely. Forward the Recoil `isTemporary` flag in both client paths so the existing server checks can fire correctly. `ToolParams` gains an optional `isTemporary` field. Addresses Codex P1 review feedback on PR #29. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit `7e937df05a`) * test: stub store.isTemporary in useFileHandling test mocks Previous commit added `useRecoilValue(store.isTemporary)` to the hook. The test file mocks `~/store` with only `ephemeralAgentByConvoId` and does not stub `useRecoilValue`, so all 7 cases threw "Invalid argument to useRecoilValue: expected an atom or selector but got undefined". Add a stub default export with `isTemporary` and a `useRecoilValue` mock returning `false`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit `eb1609537d`) * fix: harden data retention semantics * fix: provide sweep request context for expired files * fix: preserve temporary flags in all-retention updates * fix: honor assistant versions in retention sweeps * fix: retain non-temporary flags in all mode * fix: hide expired retained records * fix: propagate retained conversation expiry * fix: refresh meili retention cutoff * fix: prevent overlapping file sweeps * fix: show legacy retained conversations * fix: index legacy retained records * fix: harden retention cleanup edge cases * fix: count failed file storage sweeps * fix: preserve legacy temporary retention * fix: assign retention sweep worker deterministically * fix: hide expired shared links on reads * fix: prevent retention refresh after parent expiry * fix: break code output retention import cycle * fix: harden retention review findings * fix: ignore expired share duplicates * fix: reject expired retained share creation * fix: harden retention review edge cases * fix: address retention audit findings * fix: enforce expired conversation shares in all retention * fix: scope temporary upload flag to chat files * fix: address retention review findings * fix: address codex retention review findings * fix: tighten missing storage detection * test: remove unused file process spec bindings --------- Co-authored-by: WhammyLeaf <233105313+WhammyLeaf@users.noreply.github.com> Co-authored-by: Aron Gates <aron@muonspace.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-19 21:58:42 -04:00
Danny Avila	5b66196f58	🪪 fix: Scope Message Conversation Access (#13183 ) * fix: Scope message conversation access * style: Format message route query	2026-05-18 17:34:30 -04:00
Danny Avila	ca8c212c0d	🗝️ fix: Protect Model Spec Instructions (#13125 ) Some checks failed Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Has been cancelled Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Has been cancelled Details GitNexus Index / index (push) Has been cancelled Details GitNexus Index / post-index (push) Has been cancelled Details Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Has been cancelled Details Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Has been cancelled Details Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Has been cancelled Details Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Has been cancelled Details * fix: prevent instruction exposure * fix: tighten model spec preset restoration * refactor: type model spec preset handling	2026-05-14 10:07:23 -04:00
Danny Avila	6b5596ec36	🍪 refactor: Refresh CloudFront Media Cookies (#13091 ) * fix: refresh CloudFront media cookies * fix: satisfy changed-file lint * fix: centralize CloudFront image retry * fix: honor base path for CloudFront refresh * fix: bypass auth refresh for CloudFront cookie retry * fix: pass app auth header to CloudFront retry * test: cover CloudFront refresh with OpenID reuse * fix: avoid duplicate CloudFront refresh retries * fix: clear CloudFront scope cookie with matching flags	2026-05-12 13:26:05 -04:00
Danny Avila	4cce88be42	🪟 feat: Add allowedAddresses Exemption List For SSRF-Guarded Targets (#12933 ) * 🪟 feat: Add allowedAddresses Exemption List For SSRF-Guarded Targets LibreChat already blocks SSRF-prone targets (private IPs, loopback, link-local, .internal/.local TLDs) at every server-side fetch site that consumes user-controllable URLs — custom-endpoint baseURLs, MCP servers, OpenAPI Actions, and OAuth endpoints. The only existing escape hatch is `allowedDomains`, but that flips the field into a strict whitelist: adding `127.0.0.1` to permit a self-hosted Ollama also blocks every public destination that isn't in the list. Introduce `allowedAddresses` as the orthogonal primitive: a private- IP-space exemption list. When a hostname or its resolved IP appears in the list, the SSRF block is bypassed for that target. Public destinations remain reachable. Operators can now run self-hosted LLMs / MCP servers / Action endpoints on private addresses without weakening the default-deny posture for everything else. Schema additions in `packages/data-provider/src/config.ts`: - `endpoints.allowedAddresses` (new — gates `validateEndpointURL`) - `mcpSettings.allowedAddresses` (parallel to `allowedDomains`) - `actions.allowedAddresses` (parallel to `allowedDomains`) Core changes in `packages/api/src/auth/`: - New `isAddressAllowed(hostnameOrIP, allowedAddresses)` — pure, case-insensitive, bracket-stripped literal match. - Threaded the list through `isSSRFTarget`, `resolveHostnameSSRF`, `isDomainAllowedCore`, `isActionDomainAllowed`, `isMCPDomainAllowed`, `isOAuthUrlAllowed`, and `validateEndpointURL`. - Extended `createSSRFSafeAgents` and `createSSRFSafeUndiciConnect` to accept the list, building an SSRF-safe DNS lookup that exempts matching hostnames/IPs at TCP connect time (TOCTOU-safe). Wiring: - Custom and OpenAI endpoint initialize sites pass `endpoints.allowedAddresses` to `validateEndpointURL`. - `MCPServersRegistry` stores `allowedAddresses` and exposes it via `getAllowedAddresses()`. The factory, connection class, manager, `UserConnectionManager`, and `ConnectionsRepository` all thread it through to the SSRF utilities. - `MCPOAuthHandler.initiateOAuthFlow`, `refreshOAuthTokens`, and `validateOAuthUrl` accept the list and consult it on every URL validation along the OAuth chain. - `ToolService`, `ActionService`, and the assistants/agents action routes pass `actions.allowedAddresses` to `isActionDomainAllowed` and to `createSSRFSafeAgents` for runtime action calls. - `initializeMCPs.js` reads `mcpSettings.allowedAddresses` from the app config and forwards it to the registry constructor. Documentation: - `librechat.example.yaml` shows the new field next to each existing `allowedDomains` block, with a note clarifying that `allowedAddresses` is an exemption list (not a whitelist). Tests: - Unit tests for `isAddressAllowed` covering literal IPs, hostnames, IPv6 brackets, case insensitivity, and partial-match rejection. - Exemption tests for every entry point: `isSSRFTarget`, `resolveHostnameSSRF`, `validateEndpointURL`, `isActionDomainAllowed`, `isMCPDomainAllowed`, `isOAuthUrlAllowed`. - Existing tests updated to reflect the new optional parameter. Default behavior is unchanged: omitted = empty list = no exemptions. * 🩹 fix: Plumb allowedAddresses Through AppConfig endpoints Type The initial PR added `endpoints.allowedAddresses` to the data-provider config schema and consumed it in the endpoint initialize sites, but the runtime `AppConfig.endpoints` shape in `@librechat/data-schemas` was a hand-maintained subset that didn't include the new field — so `tsc` rejected `appConfig.endpoints.allowedAddresses`. Add the field to `AppConfig['endpoints']` in `packages/data-schemas/src/types/app.ts` and forward it from the loaded config in `packages/data-schemas/src/app/endpoints.ts` so the runtime config carries the value. Update `initializeMCPs.spec.js` to expect the third positional argument (`allowedAddresses`) on the `createMCPServersRegistry` call. * 🩹 fix: Enforce allowedDomains Before allowedAddresses In isOAuthUrlAllowed The initial implementation checked the address exemption first, so a URL whose hostname appeared in `allowedAddresses` would return true even when the admin had configured `allowedDomains` as a strict bound on OAuth endpoints. A malicious MCP server could advertise OAuth metadata, token, or revocation URLs at any address the admin had permitted for an unrelated reason (a self-hosted LLM at `127.0.0.1`, for example) and pass validation, expanding SSRF reach beyond the configured domain whitelist. Reorder: when `allowedDomains` is set, treat it as authoritative — return true only if the URL matches a domain entry, otherwise fall through to false. The address exemption only applies when no `allowedDomains` is configured (mirrors how the downstream SSRF check in `validateOAuthUrl` consults `allowedAddresses`). Add a regression test asserting that an `allowedAddresses` entry does not broaden a configured `allowedDomains` list. Reported by chatgpt-codex-connector on PR #12933. * 🩹 fix: Forward allowedAddresses To Remaining OAuth Callers Two `MCPOAuthHandler` callers still used the pre-feature signatures and were silently dropping the new `allowedAddresses` argument: - `api/server/routes/mcp.js` invoked `initiateOAuthFlow` with the old 5-argument shape, so OAuth flows initiated through the route handler ignored the registry's `getAllowedAddresses()` and would reject any metadata/authorization/token URL on a permitted private host. - `api/server/controllers/UserController.js#maybeUninstallOAuthMCP` invoked `revokeOAuthToken` without the address exemption, so uninstalling an OAuth-backed MCP server on a permitted private host would fail at the revocation step even though the rest of the MCP connection path now permits it. Both sites now read `allowedAddresses` from the registry alongside `allowedDomains` and forward it. Reported by Copilot on PR #12933. * 🩹 fix: Update Test Mocks And Assertions For OAuth allowedAddresses The previous commit started passing `allowedAddresses` to `MCPOAuthHandler.initiateOAuthFlow` from `api/server/routes/mcp.js` and to `MCPOAuthHandler.revokeOAuthToken` from `api/server/controllers/UserController.js`, but the corresponding test files mocked the registry without `getAllowedAddresses` (causing `TypeError`s) and asserted the old positional shape on `toHaveBeenCalledWith`. Update the mocks and assertions to match the new arity: - `api/server/routes/__tests__/mcp.spec.js`: add `getAllowedDomains`/`getAllowedAddresses` to the registry mock and expect the additional positional args on `initiateOAuthFlow`. - `api/server/controllers/__tests__/maybeUninstallOAuthMCP.spec.js`: add a `getAllowedAddresses` mock alongside the existing `getAllowedDomains` and seed it in `setupOAuthServerFound`. - `api/server/controllers/__tests__/UserController.mcpOAuth.spec.js`: add `getAllowedAddresses` to the registry mock and expect the trailing `null` arg on the three `revokeOAuthToken` assertions. * 🛡️ fix: Address Comprehensive Review — Scope allowedAddresses To Private IP Space Major findings from the comprehensive PR review (severity → fix): CRITICAL — `validateOAuthUrl` SSRF fallback bypass. When `allowedDomains` is configured and a URL fails the whitelist, the SSRF fallback in `validateOAuthUrl` was still passing `allowedAddresses` to `isSSRFTarget` / `resolveHostnameSSRF`, letting a malicious MCP server advertise OAuth endpoints at any address the admin had permitted for an unrelated reason. Suppress `allowedAddresses` in the fallback when `allowedDomains` is active — the address exemption is opt-in for the no-whitelist mode only. MAJOR — WebSocket transport SSRF check ignored exemptions. The `constructTransport` WebSocket branch called `resolveHostnameSSRF(wsHostname)` without `this.allowedAddresses`, so a permitted private MCP server would pass `isMCPDomainAllowed` but be blocked at transport creation. Forward the exemption. Scope `allowedAddresses` to private IP space only (operator directive). The exemption list is for permitting private/internal targets; it must not be a back-door to broaden trust to public destinations. - Schema (`packages/data-provider/src/config.ts`): new `allowedAddressesSchema` rejects URLs (`://`), paths/CIDR (`/`), whitespace, and public IPv4/IPv6 literals at config-load time. Wired into `endpoints`, `mcpSettings`, and `actions`. - Runtime (`packages/api/src/auth/domain.ts`): `isAddressAllowed` now drops public-IP candidates and public-IP entries on the match path — defense in depth so a misconfigured runtime list never grants exemption. - Hot path (`packages/api/src/auth/agent.ts`): `buildSSRFSafeLookup` pre-normalizes the list into a `Set<string>` once at construction and applies the same scoping filter, so the connect-time DNS lookup is an O(1) Set membership check instead of a full re-iterate-and-normalize on every outbound request. Test coverage for the connect-time and OAuth-fallback paths. - `agent.spec.ts`: new describe block exercising `buildSSRFSafeLookup` and `createSSRFSafe` with `allowedAddresses` — hostname-literal exemption, resolved-IP exemption, public-IP scoping, URL/CIDR/whitespace rejection, and the default no-list block. - `handler.allowedAddresses.test.ts` (new): integration tests for `validateOAuthUrl` — covers both the no-domains-set "permit private" path and the strict-bound regression where `allowedAddresses` must NOT bypass `allowedDomains`. Documentation & cleanup.* - `connection.ts` redirect SSRF check: explicit comment that `allowedAddresses` is intentionally NOT consulted for redirect targets (server-controlled, must not inherit the admin's exemption). - `MCPConnectionFactory.test.ts`: replaced an `eslint-disable` with a proper `import { getTenantId } from '@librechat/data-schemas'`. The disable was added to make a pre-existing `require()` quiet — the cleaner fix is to use the existing top-level import. Updated `MCPConnectionSSRF.test.ts` WebSocket SSRF assertions to match the new two-argument call shape (`hostname, allowedAddresses`). * 🩹 fix: Require Absolute URL Before allowedAddresses Trust Bypass In isOAuthUrlAllowed `parseDomainSpec` is lenient — it silently prepends `https://` to schemeless inputs so it can match patterns like bare `example.com`. That leniency leaked into `isOAuthUrlAllowed`'s new `allowedAddresses` short-circuit: a value like `10.0.0.5/oauth` (no scheme) would parse successfully via the prepended default, hit the address-exemption path, return `true`, and skip `validateOAuthUrl`'s strict `new URL(url)` parse-or-throw — only to fail later in OAuth discovery with a less clear runtime error. Add a strict `new URL(url)` gate at the top of `isOAuthUrlAllowed`. Schemeless inputs now fall through to `validateOAuthUrl`'s explicit "Invalid OAuth <field>" rejection. Tests added in both `auth/domain.spec.ts` (unit) and the OAuth handler integration spec (end-to-end). Reported by chatgpt-codex-connector (P2) on PR #12933. * 🛡️ fix: Address Follow-Up Comprehensive Review — Schema Tests, Shared Normalization, host:port Auditing the second comprehensive review: F1 MAJOR — schema validation untested. `allowedAddressesSchema` had zero coverage, so a regression in the three refinement stages or the three wiring locations (`endpoints` / `mcpSettings` / `actions`) would silently let invalid entries reach the runtime. Added a dedicated `describe('allowedAddressesSchema')` block in `config.spec.ts` covering: valid private IPs (v4 + v6, including the previously-missed 192.0.0.0/24 range), accepted hostnames, all rejection categories (URLs, CIDR, paths, whitespace tabs/newlines, host:port, public IP literals), and full `configSchema.parse()` integration at each of the three nesting points. F2 MINOR — `isPrivateIPv4Literal` divergence. The schema reimpl in `packages/data-provider` was discarding the `c` octet, so the `192.0.0.0/24` (RFC 5736 IETF protocol assignments) range that the authoritative `isPrivateIPv4` accepts was being rejected with a misleading "public IP" error. Destructure `c` and add the missing range check; covered by the new schema tests. F3 MINOR — DRY violation across `domain.ts` and `agent.ts`. Both files had independent normalization implementations with a subtle whitespace-check divergence (`/\s/` vs `.includes(' ')`). Extracted the shared logic into a new `packages/api/src/auth/allowedAddresses.ts` module that both consumers import: - `normalizeAddressEntry(entry)` — single-entry shape check - `looksLikeHostPort(entry)` — host:port detector (used by F4) - `normalizeAllowedAddressesSet(list)` — pre-normalized Set for the connect-time hot path - `isAddressInAllowedSet(candidate, set)` — membership check that enforces private-IP scoping on the candidate Both `isAddressAllowed` (preflight) and `buildSSRFSafeLookup` (connect) now go through the same primitives; the whitespace divergence is gone. To break the import cycle (`allowedAddresses` needs `isPrivateIP`, `domain` previously owned it), extracted IP private-range detection into a leaf `auth/ip.ts` module. `domain.ts` re-exports `isPrivateIP` for backward compatibility with existing call sites. F4 MINOR — `host:port` silently misclassified. Entries like `localhost:8080` previously slipped through the URL/path guard, were mis-detected as IPv6, failed `isPrivateIP`, and were silently dropped with a misleading "public IP" schema error. Added an explicit `looksLikeHostPort` check with a clear error: "allowedAddresses entries must not include a port — list the bare hostname or IP only." Bare `::1`, `[::1]`, and other valid IPv6 literals are intentionally not matched (regex distinguishes by colon count and the bracketed `[ipv6]:port` form). F5 MINOR — hostname-trust documentation gap. Hostname entries short-circuit `resolveHostnameSSRF` before any DNS lookup — that's a deliberate design (admin trusts the name) but it means the exemption follows whatever the name resolves to at runtime. Added an explicit note in `librechat.example.yaml` for both `mcpSettings.allowedAddresses` and `endpoints.allowedAddresses`: "a hostname entry trusts whatever IP that name resolves to. Only list hostnames whose DNS you control. Prefer literal IPs when you can." F6 (8 positional params) is flagged for follow-up; refactor to an options object is a breaking-API change deferred to a separate PR. F7 (redirect/WebSocket asymmetry, NIT, conf 40) — skipping; the existing inline comment is sufficient. * 🧹 chore: Address Follow-Up NITs — Import Order And Mirror-Function Naming Three NITs from the latest comprehensive review: NIT #1 (conf 85) — local import order. AGENTS.md requires local imports sorted longest-to-shortest. Both `domain.ts` and `agent.ts` had `./ip` (shorter) before `./allowedAddresses` (longer). Swapped. NIT #2 (conf 60) — missing cross-reference. The schema-side `isHostPortShape` in `packages/data-provider/src/config.ts` had no note pointing at the canonical runtime mirror. Added a JSDoc paragraph explaining the mirror relationship and why a local copy exists (the data-provider package can't import from `@librechat/api` without creating a circular dependency). NIT #3 (conf 50) — naming inconsistency. Renamed `isHostPortShape` → `looksLikeHostPort` so the schema mirror matches the runtime helper exactly. Kept as a separate function (not a shared import) for the same circular-dependency reason; the matching name makes it obvious they should stay in lockstep.	2026-05-03 21:43:59 -04:00
Danny Avila	5b5e2b0286	🛡️ fix: Handle MCP Tool Cache Lookup Failures (#12910 ) * Handle MCP tool cache lookup failures * Harden MCP cached tool lookup * Cover full MCP tool cache outage * Guard MCP tool cache store lookup	2026-05-02 09:21:28 +09:00
Dustin Healy	fc3189b718	🔐 fix: Restore Tenant Context in MCP OAuth Callback (#12782 ) * fix: restore tenant context in MCP OAuth callback for multi-tenant deployments The MCP OAuth callback is a cross-origin redirect from the OAuth provider. SameSite=Strict cookies (including the JWT) are not sent, leaving the callback with no tenant context. With TENANT_ISOLATION_STRICT=true, all DB writes fail. Stores tenantId in flow metadata at OAuth initiation time (when the user is authenticated), then restores it via tenantStorage.run in the callback, wrapping the entire post-validation body. * test: address review findings for tenant context tests - Assert tenantId flows through to initFlow in MCPConnectionFactory test - Add beforeEach to tenant context tests to reset mocks independently	2026-04-22 14:05:51 -07:00
Danny Avila	d350c58633	🚫 fix: Hide Delete Account Button When ALLOW_ACCOUNT_DELETION Is Disabled (#12568 ) * fix: hide Delete Account button when ALLOW_ACCOUNT_DELETION is false * fix: add admin bypass, inline env read, and tests for allowAccountDeletion - Show delete button for admin users even when ALLOW_ACCOUNT_DELETION=false, matching the canDeleteAccount middleware's ACCESS_ADMIN bypass - Move env var read inline in buildSharedPayload() for per-request evaluation - Add 4 frontend tests for Account conditional rendering - Add 3 backend tests for allowAccountDeletion config field * fix: use server-side ACCESS_ADMIN capability check instead of frontend role check - Replace frontend SystemRoles.ADMIN check with server-side hasCapability() in the authenticated config route, matching canDeleteAccount middleware exactly - Admin bypass now evaluates ACCESS_ADMIN capability per-user in GET /api/config, so users with the grant (regardless of role) see the button, and admins without the grant do not - Add 3 authenticated backend tests: without capability, with capability, and skip-when-already-enabled - Simplify frontend to pure config check (no role logic) - Remove redundant jest-dom import; add inline env var comment * test: add missing toHaveBeenCalled assertion in ACCESS_ADMIN test	2026-04-07 23:51:23 -04:00
Denis Palnitsky	8ed0bcf5ca	♻️ fix: Reuse Existing MCP OAuth Client Registrations to Prevent `client_id` Mismatch (#11925 ) * fix: reuse existing OAuth client registrations to prevent client_id mismatch When using auto-discovered OAuth (DCR), LibreChat calls /register on every flow initiation, getting a new client_id each time. When concurrent connections or reconnections happen, the client_id used during /authorize differs from the one used during /token, causing the server to reject the exchange. Before registering a new client, check if a valid client registration already exists in the database and reuse it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Handle re-registration of OAuth clients when redirect_uri changes * Add undefined fields for logo_uri and tos_uri in OAuth metadata tests * test: add client registration reuse tests for horizontal scaling race condition Reproduces the client_id mismatch bug that occurs in multi-replica deployments where concurrent initiateOAuthFlow calls each register a new OAuth client. Tests verify that the findToken-based client reuse prevents re-registration. * fix: address review findings for client registration reuse - Fix empty redirect_uris bug: invert condition so missing/empty redirect_uris triggers re-registration instead of silent reuse - Revert undocumented config?.redirect_uri in auto-discovery path - Change DB error logging from debug to warn for operator visibility - Fix import order: move package type import to correct section - Remove redundant type cast and misleading JSDoc comment - Test file: remove dead imports, restore process.env.DOMAIN_SERVER, rename describe blocks, add empty redirect_uris edge case test, add concurrent reconnection test with pre-seeded token, scope documentation to reconnection stabilization * fix: resolve type check errors for OAuthClientInformation redirect_uris The SDK's OAuthClientInformation type lacks redirect_uris (only on OAuthClientInformationFull). Cast to the local OAuthClientInformation type in handler.ts when accessing deserialized client info from DB, and use intersection types in tests for clientInfo with redirect_uris. * fix: address follow-up review findings R1, R2, R3 - R1: Move `import type { TokenMethods }` to the type-imports section, before local types, per CLAUDE.md import order rules - R2: Add unit test for empty redirect_uris in handler.test.ts to verify the inverted condition triggers re-registration - R3: Use delete for process.env.DOMAIN_SERVER restoration when the original value was undefined to avoid coercion to string "undefined" * fix: clear stale client registration on OAuth flow failure When a stored client_id is no longer recognized by the OAuth server, the flow fails but the stale client stays in MongoDB, causing every retry to reuse the same invalid registration in an infinite loop. On OAuth failure, clear the stored client registration so the next attempt falls through to fresh Dynamic Client Registration. - Add MCPTokenStorage.deleteClientRegistration() for targeted cleanup - Call it from MCPConnectionFactory's OAuth failure path - Add integration test proving recovery from stale client reuse * fix: validate auth server identity and target cleanup to reused clients - Gate client reuse on authorization server identity: compare stored issuer against freshly discovered metadata before reusing, preventing wrong-client reuse when the MCP server switches auth providers - Add reusedStoredClient flag to MCPOAuthFlowMetadata so cleanup only runs when the failed flow actually reused a stored registration, not on unrelated failures (timeouts, user-denied consent, etc.) - Add cleanup in returnOnOAuth path: when a prior flow that reused a stored client is detected as failed, clear the stale registration before re-initiating - Add tests for issuer mismatch and reusedStoredClient flag assertions * fix: address minor review findings N3, N5, N6 - N3: Type deleteClientRegistration param as TokenMethods['deleteTokens'] instead of Promise<unknown> - N5: Elevate deletion failure logging from debug to warn for operator visibility when stale client cleanup fails - N6: Use getLogPrefix() instead of hardcoded log prefix to respect system-user privacy convention * fix: correct stale-client cleanup in both OAuth paths - Blocking path: remove result?.clientInfo guard that made cleanup unreachable (handleOAuthRequired returns null on failure, so result?.clientInfo was always false in the failure branch) - returnOnOAuth path: only clear stored client when the prior flow status is FAILED, not on COMPLETED or PENDING flows, to avoid deleting valid registrations during normal flow replacement * fix: remove redundant cast on clientMetadata clientMetadata is already typed as Record<string, unknown>; the as Record<string, unknown> cast was a no-op. * fix: thread reusedStoredClient through return type instead of re-reading flow state FlowStateManager.createFlow() deletes FAILED flow state before rejecting, so getFlowState() after handleOAuthRequired() returns null would find nothing — making the stale-client cleanup dead code. Fix: hoist reusedStoredClient flag from flowMetadata into a local variable, include it in handleOAuthRequired()'s return type (both success and catch paths), and use result.reusedStoredClient directly in the caller instead of a second getFlowState() round-trip. * fix: selective stale-client cleanup in returnOnOAuth path The returnOnOAuth cleanup was unreliable: it depended on reading FAILED flow state, but FlowStateManager.monitorFlow() deletes FAILED state before rejecting. Move cleanup into createFlow's catch handler where flowMetadata.reusedStoredClient is still in scope. Make cleanup selective in both paths: add isClientRejection() helper that only matches errors indicating the OAuth server rejected the client_id (invalid_client, unauthorized_client, client not found). Timeouts, user-cancelled flows, and other transient failures no longer wipe valid stored registrations. Thread the error from handleOAuthRequired() through the return type so the blocking path can also check isClientRejection(). * fix: tighten isClientRejection heuristic Narrow 'client_id' match to 'client_id mismatch' to avoid false-positive cleanup on unrelated errors that happen to mention client_id. * test: add isClientRejection tests and enforced client_id on test server - Add isClientRejection unit tests: invalid_client, unauthorized_client, client_id mismatch, client not found, unknown client, and negative cases (timeout, flow state not found, user denied, null, undefined) - Enhance OAuth test server with enforceClientId option: binds auth codes to the client_id that initiated /authorize, rejects token exchange with mismatched or unregistered client_id (401 invalid_client) - Add integration tests proving the test server correctly rejects stale client_ids and accepts matching ones at /token * fix: issuer validation, callback error propagation, and cleanup DRY - Issuer check: re-register when storedIssuer is absent or non-string instead of silently reusing. Narrows unknown type with typeof guard and inverts condition so missing issuer → fresh DCR (safer default). - OAuth callback route: call failFlow with the OAuth error when the authorization server redirects back with error= parameter, so the waiting flow receives the actual rejection instead of timing out. This lets isClientRejection match stale-client errors correctly. - Extract duplicated cleanup block to clearStaleClientIfRejected() private method, called from both returnOnOAuth and blocking paths. - Test fixes: add issuer to stored metadata in reuse tests, reset server to undefined in afterEach to prevent double-close. * fix: gate failFlow behind callback validation, propagate reusedStoredClient on join - OAuth callback: move failFlow call to after CSRF/session/active-flow validation so an attacker with only a leaked state parameter cannot force-fail a flow without passing the same integrity checks required for legitimate callbacks - PENDING join path: propagate reusedStoredClient from flow metadata into the return object so joiners can trigger stale-client cleanup if the joined flow later fails with a client rejection * fix: restore early oauthError/code redirects, gate only failFlow behind CSRF The previous restructuring moved oauthError and missing-code checks behind CSRF validation, breaking tests that expect those redirects without cookies. The redirect itself is harmless (just shows an error page). Only the failFlow call needs CSRF gating to prevent DoS. Restructure: oauthError check stays early (redirects immediately), but failFlow inside it runs the full CSRF/session/active-flow validation before marking the flow as FAILED. * fix: require deleteTokens for client reuse, add missing import in MCP.js Client registration reuse without cleanup capability creates a permanent failure loop: if the reused client is stale, the code detects the rejection but cannot clear the stored registration because deleteTokens is missing, so every retry reuses the same broken client_id. - MCPConnectionFactory: only pass findToken to initiateOAuthFlow when deleteTokens is also available, ensuring reuse is only enabled when recovery is possible - api/server/services/MCP.js: add deleteTokens to the tokenMethods object (was the only MCP call site missing it) * fix: set reusedStoredClient before createFlow in joined-flow path When joining a PENDING flow, reusedStoredClient was only set on the success return but not before the await. If createFlow throws (e.g. invalid_client during token exchange), the outer catch returns the local variable which was still false, skipping stale-client cleanup. * fix: require browser binding (CSRF/session) for failFlow on OAuth error hasActiveFlow only proves a PENDING flow exists, not that the caller is the same browser that initiated it. An attacker with a leaked state could force-fail the flow without any user binding. Require hasCsrf or hasSession before calling failFlow on the oauthError path. * fix: guard findToken with deleteTokens check in blocking OAuth path Match the returnOnOAuth path's defense-in-depth: only enable client registration reuse when deleteTokens is also available, ensuring cleanup is possible if the reused client turns out to be stale. * fix: address review findings — tests, types, normalization, docs - Add deleteTokens method to InMemoryTokenStore matching TokenMethods contract; update test call site from deleteToken to deleteTokens - Add MCPConnectionFactory test: returnOnOAuth flow fails with invalid_client → clearStaleClientIfRejected invoked automatically - Add mcp.spec.js tests: OAuth error with CSRF → failFlow called; OAuth error without cookies → failFlow NOT called (DoS prevention) - Add JSDoc to isClientRejection with RFC 6749 and vendor attribution - Add inline comment explaining findToken/deleteTokens coupling guard - Normalize issuer comparison: strip trailing slashes to prevent spurious re-registrations from URL formatting differences - Fix dead-code: use local reusedStoredClient variable in PENDING join return instead of re-reading flowMeta * fix: address final review nits N1-N4 - N1: Add session cookie failFlow test — validates the hasSession branch triggers failFlow on OAuth error callback - N2: Replace setTimeout(50) with setImmediate for microtask drain - N3: Add 'unknown client' attribution to isClientRejection JSDoc - N4: Remove dead getFlowState mock from failFlow tests --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Danny Avila <danny@librechat.ai>	2026-04-03 22:15:08 -04:00
Dustin Healy	261941c05f	🔨 fix: Custom Role Permissions (#12528 ) * fix: Resolve custom role permissions not loading in frontend Users assigned to custom roles (non-USER/ADMIN) had all permission checks fail because AuthContext only fetched system role permissions. The roles map keyed by USER/ADMIN never contained the custom role name, so useHasAccess returned false for every feature gate. - Fetch the user's custom role in AuthContext and include it in the roles map so useHasAccess can resolve permissions correctly - Use encodeURIComponent instead of toLowerCase for role name URLs to preserve custom role casing through the API roundtrip - Only uppercase system role names on the backend GET route; pass custom role names through as-is for exact DB lookup - Allow users to fetch their own assigned role without READ_ROLES capability * refactor: Normalize all role names to uppercase Custom role names were stored in original casing, causing case-sensitivity bugs across the stack — URL lowercasing, route uppercasing, and case-sensitive DB lookups all conflicted for mixed-case custom roles. Enforce uppercase normalization at every boundary: - createRoleByName trims and uppercases the name before storage - createRoleHandler uppercases before passing to createRoleByName - All admin route handlers (get, update, delete, members, permissions) uppercase the :name URL param before DB lookups - addRoleMemberHandler uppercases before setting user.role - Startup migration (normalizeRoleNames) finds non-uppercase custom roles, renames them, and updates affected user.role values with collision detection Legacy GET /api/roles/:roleName retains always-uppercase behavior. Tests updated to expect uppercase role names throughout. * fix: Use case-preserved role names with strict equality Remove uppercase normalization — custom role names are stored and compared exactly as the user sets them, with only trimming applied. USER and ADMIN remain reserved case-insensitively via isSystemRoleName. - Remove toUpperCase from createRoleByName, createRoleHandler, and all admin route handlers (get, update, delete, members, permissions) - Remove toUpperCase from legacy GET and PUT routes in roles.js; the frontend now sends exact casing via encodeURIComponent - Remove normalizeRoleNames startup migration - Revert test expectations to original casing * fix: Format useMemo dependency array for Prettier * feat: Add custom role support to admin settings + review fixes - Add backend tests for isOwnRole authorization gate on GET /api/roles/:roleName - Add frontend tests for custom role detection and fetching in AuthContext - Fix transient null permission flash by only spreading custom role once loaded - Add isSystemRoleName helper to data-provider for case-insensitive system role detection - Use sentinel value in useGetRole to avoid ghost cache entry from empty string - Add useListRoles hook and listRoles data service for fetching all roles - Update AdminSettingsDialog and PeoplePickerAdminSettings to dynamically list custom roles in the role dropdown, with proper fallback defaults * fix: Address review findings for custom role permissions - Add assertions to AuthContext test verifying custom role in roles map - Fix empty array bypassing nullish coalescing fallback in role dropdowns - Add null/undefined guard to isSystemRoleName helper - Memoize role dropdown items to avoid unnecessary re-renders - Apply sentinel pattern to useGetRole in admin settings for consistency - Mark ListRolesResponse description as required to match schema * fix: Prevent prototype pollution in role authorization gate - Replace roleDefaults[roleName] with Object.hasOwn to prevent prototype chain bypass for names like constructor or __proto__ - Add dedicated rolesList query key to avoid cache collision when a custom role is named 'list' - Add regression test for prototype property name authorization * fix: Resolve Prettier formatting and unused variable lint errors * fix: Address review findings for custom role permissions - Add ADMIN self-read test documenting isOwnRole bypass behavior - Guard save button while custom role data loads to prevent data loss - Extract useRoleSelector hook eliminating ~55 lines of duplication - Unify defaultValues/useEffect permission resolution (fixes inconsistency) - Make ListRolesResponse.description and _id optional to match schema - Fix vacuous test assertions to verify sentinel calls exist - Only fetch userRole when user.role === USER (avoid unnecessary requests) - Remove redundant empty string guard in custom role detection * fix: Revert USER role fetch restriction to preserve admin settings Admins need the USER role loaded in AuthContext.roles so the admin settings dialog shows persisted USER permissions instead of defaults. * fix: Remove unused useEffect import from useRoleSelector * fix: Clean up useRoleSelector hook - Use existing isCustom variable instead of re-calling isSystemRoleName - Remove unused roles and availableRoleNames from return object * fix: Address review findings for custom role permissions - Use Set-based isSystemRoleName to auto-expand with future SystemRoles - Add isCustomRoleError handling: guard useEffect reset and disable Save - Remove resolvePermissions from hook return; use defaultValues in useEffect to eliminate redundant computation and stale-closure reset race - Rename customRoleName to userRoleName in AuthContext for clarity * fix: Request server-max roles for admin dropdown listRoles now passes limit=200 (the server's MAX_PAGE_LIMIT) so the admin role selector shows all roles instead of silently truncating at the default page size of 50. --------- Co-authored-by: Danny Avila <danny@librechat.ai>	2026-04-03 13:24:11 -04:00
Danny Avila	2e706ebcb3	⚖️ refactor: Split Config Route into Unauthenticated and Authenticated Paths (#12490 ) * refactor: split /api/config into unauthenticated and authenticated response paths - Replace preAuthTenantMiddleware with optionalJwtAuth on the /api/config route so the handler can detect whether the request is authenticated - When unauthenticated: call getAppConfig({ baseOnly: true }) for zero DB queries, return only login-relevant fields (social logins, turnstile, privacy policy / terms of service from interface config) - When authenticated: call getAppConfig({ role, userId, tenantId }) to resolve per-user DB overrides (USER + ROLE + GROUP + PUBLIC principals), return full payload including modelSpecs, balance, webSearch, etc. - Extract buildSharedPayload() and addWebSearchConfig() helpers to avoid duplication between the two code paths - Fixes per-user balance overrides not appearing in the frontend because userId was never passed to getAppConfig (follow-up to #12474) * test: rewrite config route tests for unauthenticated vs authenticated paths - Replace the previously-skipped supertest tests with proper mocked tests - Cover unauthenticated path: baseOnly config call, minimal payload, interface subset (privacyPolicy/termsOfService only), exclusion of authenticated-only fields - Cover authenticated path: getAppConfig called with userId, full payload including modelSpecs/balance/webSearch, per-user balance override merging * fix: address review findings — restore multi-tenant support, improve tests - Chain preAuthTenantMiddleware back before optionalJwtAuth on /api/config so unauthenticated requests in multi-tenant deployments still get tenant-scoped config via X-Tenant-Id header (Finding #1) - Use getAppConfig({ tenantId }) instead of getAppConfig({ baseOnly: true }) when a tenant context is present; fall back to baseOnly for single-tenant - Fix @type annotation: unauthenticated payload is Partial<TStartupConfig> - Refactor addWebSearchConfig into pure buildWebSearchConfig that returns a value instead of mutating the payload argument - Hoist isBirthday() to module level - Remove inline narration comments - Assert tenantId propagation in tests, including getTenantId fallback and user.tenantId preference - Add error-path tests for both unauthenticated and authenticated branches - Expand afterEach env var cleanup for proper test isolation * test: fix mock isolation and add tenant-scoped response test - Replace jest.clearAllMocks() with jest.resetAllMocks() so mockReturnValue implementations don't leak between tests - Add test verifying tenant-scoped socialLogins and turnstile are correctly mapped in the unauthenticated response * fix: add optionalJwtAuth to /api/config in experimental.js Without this middleware, req.user is never populated in the experimental cluster entrypoint, so authenticated users always receive the minimal unauthenticated config payload.	2026-03-31 19:22:51 -04:00
Dustin Healy	a4a17ac771	⛩️ feat: Admin Grants API Endpoints (#12438 ) * feat: add System Grants handler factory with tests Handler factory with 4 endpoints: getEffectiveCapabilities (expanded capability set for authenticated user), getPrincipalGrants (list grants for a specific principal), assignGrant, and revokeGrant. Write ops dynamically check MANAGE_ROLES/GROUPS/USERS based on target principal type. 31 unit tests covering happy paths, validation, 403, and errors. * feat: wire System Grants REST routes Mount /api/admin/grants with requireJwtAuth + ACCESS_ADMIN gate. Add barrel export for createAdminGrantsHandlers and AdminGrantsDeps. * fix: cascade grant cleanup on role deletion Add deleteGrantsForPrincipal to AdminRolesDeps and call it in deleteRoleHandler via Promise.allSettled after successful deletion, matching the groups cleanup pattern. 3 tests added for cleanup call, skip on 404, and resilience to cleanup failure. * fix: simplify cascade grant cleanup on role deletion Replace Promise.allSettled wrapper with a direct try/catch for the single deleteGrantsForPrincipal call. * fix: harden grant handlers with auth, validation, types, and RESTful revoke - Add per-handler auth checks (401) and granular capability gates (READ_* for getPrincipalGrants, possession check for assignGrant) - Extract validatePrincipal helper; rewrite validateGrantBody to use direct type checks instead of unsafe `as string` casts - Align DI types with data layer (ResolvedPrincipal.principalType widened to string, getUserPrincipals role made optional) - Switch revoke route from DELETE body to RESTful URL params - Return 201 for assignGrant to match roles/groups create convention - Handle null grantCapability return with 500 - Add comprehensive test coverage for new auth/validation paths * fix: deduplicate ResolvedPrincipal, typed body, defensive auth checks - Remove duplicate ResolvedPrincipal from capabilities.ts; import the canonical export from grants.ts - Replace Record<string, unknown> with explicit GrantRequestBody interface - Add defensive 403 when READ_CAPABILITY_BY_TYPE lookup misses - Document revoke asymmetry (no possession check) with JSDoc - Use _id only in resolveUser (avoid Mongoose virtual reliance) - Improve null-grant error message - Complete logger mock in tests * refactor: move ResolvedPrincipal to shared types to fix circular dep Extract ResolvedPrincipal from admin/grants.ts to types/principal.ts so middleware/capabilities.ts imports from shared types rather than depending upward on the admin handler layer. * chore: remove dead re-export, align logger mocks across admin tests - Remove unused ResolvedPrincipal re-export from grants.ts (canonical source is types/principal.ts) - Align logger mocks in roles.spec.ts and groups.spec.ts to include all log levels (error, warn, info, debug) matching grants.spec.ts * fix: cascade Config and AclEntry cleanup on role deletion Add deleteConfig and deleteAclEntries to role deletion cascade, matching the group deletion pattern. Previously only grants were cleaned up, leaving orphaned config overrides and ACL entries. * perf: single-query batch for getEffectiveCapabilities Add getCapabilitiesForPrincipals (plural) to the data layer — a single $or query across all principals instead of N+1 parallel queries. Wire it into the grants handler so getEffectiveCapabilities hits the DB once regardless of how many principals the user has. * fix: defer SystemCapabilities access to factory call time Move all SystemCapabilities usage (VALID_CAPABILITIES, MANAGE_CAPABILITY_BY_TYPE, READ_CAPABILITY_BY_TYPE) inside the createAdminGrantsHandlers factory. External test suites that mock @librechat/data-schemas without providing SystemCapabilities crashed at import time when grants.ts was loaded transitively. * test: add data-layer and handler test coverage for review findings - Add 6 mongodb-memory-server tests for getCapabilitiesForPrincipals: multi-principal batch, empty array, filtering, tenant scoping - Add handler test: all principals filtered (only PUBLIC) - Add handler test: granting an implied capability succeeds - Add handler test: all cascade cleanup operations fail simultaneously - Document platform-scope-only tenantId behavior in JSDoc * fix: resolveUser fallback to user.id, early-return empty principals - Match capabilities middleware pattern: _id?.toString() ?? user.id to handle JWT-deserialized users without Mongoose _id - Move empty-array guard before principals.map() to skip unnecessary normalizePrincipalId calls - Add comment explaining VALID_PRINCIPAL_TYPES module-scope asymmetry * refactor: derive VALID_PRINCIPAL_TYPES from capability maps Make MANAGE_CAPABILITY_BY_TYPE and READ_CAPABILITY_BY_TYPE non-Partial Records over a shared GrantPrincipalType union, then derive VALID_PRINCIPAL_TYPES from the map keys. This makes divergence between the three data structures structurally impossible. * feat: add GET /api/admin/grants list-all-grants endpoint Add listAllGrants data-layer method and handler so the admin panel can fetch all grants in a single request instead of fanning out N+M calls per role and group. Response is filtered to only include grants for principal types the caller has read access to. * fix: update principalType to use GrantPrincipalType for consistency in grants handling - Refactor principalType in createAdminGrantsHandlers to use GrantPrincipalType instead of PrincipalType for better type accuracy. - Ensure type consistency across the grants handling logic in the API. * fix: address admin grants review findings — tenantId propagation, capability validation, pagination, and test coverage Propagate tenantId through all grant operations for multi-tenancy support. Extract isValidCapability to accept full SystemCapability union (base, section, assign) and reuse it in both Mongoose schema validation and handler input checks. Replace listAllGrants with paginated listGrants + countGrants. Filter PUBLIC principals from getCapabilitiesForPrincipals queries. Export getCachedPrincipals from ALS store for fast-path principal resolution. Move DELETE capability param to query string to avoid colon-in-URL issues. Remove dead code and add comprehensive handler and data-layer test coverage. * refactor: harden admin grants — FilterQuery types, auth-first ordering, DELETE path param, isValidCapability tests Replace Record<string, unknown> with FilterQuery<ISystemGrant> across all data-layer query filters. Refactor buildTenantFilter to a pure tenantCondition function that returns a composable FilterQuery fragment, eliminating the $or collision between tenant and principal queries. Move auth check before input validation in getPrincipalGrantsHandler, assignGrantHandler, and revokeGrantHandler to avoid leaking valid type names to unauthenticated callers. Switch DELETE route from query param back to path param (/:capability) with encodeURIComponent per project conventions. Add compound index for listGrants sort. Type VALID_PRINCIPAL_TYPES as Set<GrantPrincipalType>. Remove unused GetCachedPrincipalsFn type export. Add dedicated isValidCapability unit tests and revokeGrant idempotency test. * refactor: batch capability checks in listGrantsHandler via getHeldCapabilities Replace 3 parallel hasCapabilityForPrincipals DB calls with a single getHeldCapabilities query that returns the subset of capabilities any principal holds. Also: defensive limit(0) clamp, parallelized assignGrant auth checks, principalId type-vs-required error split, tenantCondition hoisted to factory top, JSDoc on cascade deps, DELETE route encoding note. * fix: normalize principalId and filter undefined in getHeldCapabilities Add normalizePrincipalId + null guard to getHeldCapabilities, matching the contract of getCapabilitiesForPrincipals. Simplify allCaps build with flatMap, add no-tenantId cross-check and undefined-principalId test cases. * refactor: use concrete types in GrantRequestBody, rename encoding test Replace unknown fields with explicit string types in GrantRequestBody, matching the established pattern in roles/groups/config handlers. Rename misleading 'encoded' test to 'with colons' since Express auto-decodes req.params. * fix: support hierarchical parent capabilities in possession checks hasCapabilityForPrincipals and getHeldCapabilities now resolve parent base capabilities for section/assignment grants. An admin holding manage:configs can now grant manage:configs:<section> and transitively read:configs:<section>. Fixes anti-escalation 403 blocking config capability delegation. * perf: use getHeldCapabilities in assignGrant to halve DB round-trips assignGrantHandler was making two parallel hasCapabilityForPrincipals calls to check manage + capability possession. getHeldCapabilities was introduced in this PR specifically for this pattern. Replace with a single batched call. Update corresponding spec assertions. * fix: validate role existence before granting capabilities Grants for non-existent role names were silently persisted, creating orphaned grants that could surprise-activate if a role with that name was later created. Add optional checkRoleExists dep to assignGrant and wire it to getRoleByName in the route file. * refactor: tighten principalType typing and use grantCapability in tests Narrow getCapabilitiesForPrincipals parameter from string to PrincipalType, removing the redundant cast. Replace direct SystemGrant.create() calls in getCapabilitiesForPrincipals tests with methods.grantCapability() to honor the schema's normalization invariant. Add getHeldCapabilities extended capability tests. * test: rename misleading cascade cleanup test name The test only injects failure into deleteGrantsForPrincipal, not all cascade operations. Rename from 'cascade cleanup fails' to 'grant cleanup fails' to match the actual scope. * fix: reorder role check after permission guard, add tenantId to index Move checkRoleExists after the getHeldCapabilities permission check so that a sub-MANAGE_ROLES admin cannot probe role name existence via 400 vs 403 response codes. Add tenantId to the { principalType, capability } index so listGrants queries in multi-tenant deployments can use a covering index instead of post-scanning for tenant condition. Add missing test for checkRoleExists throwing. * fix: scope deleteGrantsForPrincipal to tenant on role deletion deleteGrantsForPrincipal previously filtered only on principalType + principalId, deleting grants across all tenants. Since the role schema supports multi-tenancy (compound unique index on name + tenantId), two tenants can share a role name like 'editor'. Deleting that role in one tenant would wipe grants for identically-named roles in other tenants. Add optional tenantId parameter to deleteGrantsForPrincipal. When provided, scopes the delete to that tenant plus platform-level grants. Propagate req.user.tenantId through the role deletion cascade. * fix: scope grant cleanup to tenant on group deletion Same cross-tenant gap as the role deletion path: deleteGroupHandler called deleteGrantsForPrincipal without tenantId, so deleting a group would wipe its grants across all tenants. Extract req.user.tenantId and pass it through. * test: add HTTP integration test for admin grants routes Supertest-based test with real MongoMemoryServer exercising the full Express wiring: route registration, injected auth middleware, handler DI deps, and real DB round-trips. Covers GET /, GET /effective, POST / + DELETE / lifecycle, role existence validation, and 401 for unauthenticated callers. Also documents the expandImplications scope: the /effective endpoint returns base-level capabilities only; section-level resolution is handled at authorization check time by getParentCapabilities. * fix: use exact tenant match in deleteGrantsForPrincipal, normalize principalId, harden API CRITICAL: deleteGrantsForPrincipal was using tenantCondition (a read-query helper) for deleteMany, which includes the { tenantId: { $exists: false } } arm. This silently destroyed platform-level grants when a tenant-scoped role/group deletion occurred. Replace with exact { tenantId } match for deletes so platform-level grants survive tenant-scoped cascade cleanup. Refactor deleteGrantsForPrincipal signature from fragile positional overload (sessionOrTenantId union + maybeSession) to a clean options object: { tenantId?, session? }. Update all callers and test assertions. Add normalizePrincipalId to hasCapabilityForPrincipals to match the pattern already used by getHeldCapabilities — prevents string/ObjectId type mismatch on USER/GROUP principal queries. Also: export GrantPrincipalType from barrel, add upper-bound cap to listGrants, document GROUP/USER existence check trade-off, add integration tests for tenant-isolation property of deleteGrantsForPrincipal. * fix: forward tenantId to getUserPrincipals in resolvePrincipals resolvePrincipals had tenantId available from the caller but only forwarded it to getCachedPrincipals (cache lookup). The DB fallback via getUserPrincipals omitted it. While the Group schema's applyTenantIsolation Mongoose plugin handles scoping via AsyncLocalStorage in HTTP request context, explicitly passing tenantId makes the contract visible and prevents silent cross-tenant group resolution if called outside request context. * fix: remove unused import and add assertion to 401 integration test Remove unused SystemCapabilities import flagged by ESLint. Add explicit body assertion to the 401 test so it has a jest expect() call. * chore: hoist grant limit constants to scope, remove dead isolateModules Move GRANTS_DEFAULT_LIMIT / GRANTS_MAX_LIMIT from inside listGrants function body to createSystemGrantMethods scope so they are evaluated once at module load. Remove dead jest.isolateModules + jest.doMock block in integration test — the ~/models mock was never exercised since handlers are built with explicit DI deps. --------- Co-authored-by: Danny Avila <danny@librechat.ai>	2026-03-30 16:49:23 -04:00
Danny Avila	935288f841	🏗️ feat: 3-Tier MCP Server Architecture with Config-Source Lazy Init (#12435 ) * feat: add MCPServerSource type, tenantMcpPolicy schema, and source-based dbSourced wiring - Add `tenantMcpPolicy` to `mcpSettings` in YAML config schema with `enabled`, `maxServersPerTenant`, `allowedTransports`, and `allowedDomains` - Add `MCPServerSource` type ('yaml' \| 'config' \| 'user') and `source` field to `ParsedServerConfig` - Change `dbSourced` determination from `!!config.dbId` to `config.source === 'user'` across MCPManager, ConnectionsRepository, UserConnectionManager, and MCPServerInspector - Set `source: 'user'` on all DB-sourced servers in ServerConfigsDB * feat: three-layer MCPServersRegistry with config cache and lazy init - Add `configCacheRepo` as third repository layer between YAML cache and DB for admin-defined config-source MCP servers - Implement `ensureConfigServers()` that identifies config-override servers from resolved `getAppConfig()` mcpConfig, lazily inspects them, and caches parsed configs with `source: 'config'` - Add `lazyInitConfigServer()` with timeout, stub-on-failure, and concurrent-init deduplication via `pendingConfigInits` map - Extend `getAllServerConfigs()` with optional `configServers` param for three-way merge: YAML → Config → User - Add `getServerConfig()` lookup through config cache layer - Add `invalidateConfigCache()` for clearing config-source inspection results on admin config mutations - Tag `source: 'yaml'` on CACHE-stored servers and `source: 'user'` on DB-stored servers in `addServer()` and `addServerStub()` * feat: wire tenant context into MCP controllers, services, and cache invalidation - Resolve config-source servers via `getAppConfig({ role, tenantId })` in `getMCPTools()` and `getMCPServersList()` controllers - Pass `ensureConfigServers()` results through `getAllServerConfigs()` for three-way merge of YAML + Config + User servers - Add tenant/role context to `getMCPSetupData()` and connection status routes via `getTenantId()` from ALS - Add `clearMcpConfigCache()` to `invalidateConfigCaches()` so admin config mutations trigger re-inspection of config-source MCP servers * feat: enforce tenantMcpPolicy on admin config mcpServers mutations - Add `validateMcpServerPolicy()` helper that checks mcpServers against operator-defined `tenantMcpPolicy` (enabled, maxServersPerTenant, allowedTransports, allowedDomains) - Wire validation into `upsertConfigOverrides` and `patchConfigField` handlers — rejects with 403 when policy is violated - Infer transport type from config shape (command → stdio, url protocol → websocket/sse, type field → streamable-http) - Validate server domains against policy allowlist when configured * revert: remove tenantMcpPolicy schema and enforcement The existing admin config CRUD routes already provide the mechanism for granular MCP server prepopulation (groups, roles, users). The tenantMcpPolicy gating adds unnecessary complexity that can be revisited if needed in the future. - Remove tenantMcpPolicy from mcpSettings Zod schema - Remove validateMcpServerPolicy helper and TenantMcpPolicy interface - Remove policy enforcement from upsertConfigOverrides and patchConfigField handlers * test: update test assertions for source field and config-server wiring - Use objectContaining in MCPServersRegistry reset test to account for new source: 'yaml' field on CACHE-stored configs - Add getTenantId and ensureConfigServers mocks to MCP route tests - Add getAppConfig mock to route test Config service mock - Update getMCPSetupData assertion to expect second options argument - Update getAllServerConfigs assertions for new configServers parameter * fix: disconnect active connections when config-source servers are evicted When admin config overrides change and config-source MCP servers are removed, the invalidation now proactively disconnects active connections for evicted servers instead of leaving them lingering until timeout. - Return evicted server names from invalidateConfigCache() - Disconnect app-level connections for evicted servers in clearMcpConfigCache() via MCPManager.appConnections.disconnect() * fix: address code review findings (CRITICAL, MAJOR, MINOR) CRITICAL fixes: - Scope configCacheRepo keys by config content hash to prevent cross-tenant cache poisoning when two tenants define the same server name with different configurations - Change dbSourced checks from `source === 'user'` to `source !== 'yaml' && source !== 'config'` so undefined source (pre-upgrade cached configs) fails closed to restricted mode MAJOR fixes: - Derive OAuth servers from already-computed mcpConfig instead of calling getOAuthServers() separately — config-source OAuth servers are now properly detected - Add parseInt radix (10) and NaN guard with fallback to 30_000 for CONFIG_SERVER_INIT_TIMEOUT_MS - Add CONFIG_CACHE_NAMESPACE to aggregate-key branch in ServerConfigsCacheFactory to avoid SCAN-based Redis stalls - Remove `if (role \|\| tenantId)` guard in getMCPSetupData — config servers now always resolve regardless of tenant context MINOR fixes: - Extract resolveAllMcpConfigs() helper in mcp controller to eliminate 3x copy-pasted config resolution boilerplate - Distinguish "not initialized" from real errors in clearMcpConfigCache — log actual failures instead of swallowing - Remove narrative inline comments per style guide - Remove dead try/catch inside Promise.allSettled in ensureConfigServers (inner method never throws) - Memoize YAML server names to avoid repeated cacheConfigsRepo.getAll() calls per request Test updates: - Add ensureConfigServers mock to registry test fixtures - Update getMCPSetupData assertions for inline OAuth derivation * fix: address code review findings (CRITICAL, MAJOR, MINOR) CRITICAL fixes: - Break circular dependency: move CONFIG_CACHE_NAMESPACE from MCPServersRegistry to ServerConfigsCacheFactory - Fix dbSourced fail-closed: use source field when present, fall back to legacy dbId check when absent (backward-compatible with pre-upgrade cached configs that lack source field) MAJOR fixes: - Add CONFIG_CACHE_NAMESPACE to aggregate-key set in ServerConfigsCacheFactory to avoid SCAN-based Redis stalls - Add comprehensive test suite (ensureConfigServers.test.ts, 18 tests) covering lazy init, stub-on-failure, cross-tenant isolation via config hash keys, concurrent deduplication, merge order, and cache invalidation MINOR fixes: - Update MCPServerInspector test assertion for dbSourced change * fix: restore getServerConfig lookup for config-source servers (NEW-1) Add configNameToKey map that indexes server name → hash-based cache key for O(1) lookup by name in getServerConfig. This restores the config cache layer that was dropped when hash-based keys were introduced. Without this fix, config-source servers appeared in tool listings (via getAllServerConfigs) but getServerConfig returned undefined, breaking all connection and tool call paths. - Populate configNameToKey in ensureSingleConfigServer - Clear configNameToKey in invalidateConfigCache and reset - Clear stale read-through cache entries after lazy init - Remove dead code in invalidateConfigCache (config.title, key parsing) - Add getServerConfig tests for config-source server lookup * fix: eliminate configNameToKey race via caller-provided configServers param Replace the process-global configNameToKey map (last-writer-wins under concurrent multi-tenant load) with a configServers parameter on getServerConfig. Callers pass the pre-resolved config servers map directly — no shared mutable state, no cross-tenant race. - Add optional configServers param to getServerConfig; when provided, returns matching config directly without any global lookup - Remove configNameToKey map entirely (was the source of the race) - Extract server names from cache keys via lastIndexOf in invalidateConfigCache (safe for names containing colons) - Use mcpConfig[serverName] directly in getMCPTools instead of a redundant getServerConfig call - Add cross-tenant isolation test for getServerConfig * fix: populate read-through cache after config server lazy init After lazyInitConfigServer succeeds, write the parsed config to readThroughCache keyed by serverName so that getServerConfig calls from ConnectionsRepository, UserConnectionManager, and MCPManager.callTool find the config without needing configServers. Without this, config-source servers appeared in tool listings but every connection attempt and tool call returned undefined. * fix: user-scoped getServerConfig fallback to server-only cache key When getServerConfig is called with a userId (e.g., from callTool or UserConnectionManager), the cache key is serverName::userId. Config-source servers are cached under the server-only key (no userId). Add a fallback so user-scoped lookups find config-source servers in the read-through cache. * fix: configCacheRepo fallback, isUserSourced DRY, cross-process race CRITICAL: Add findInConfigCache fallback in getServerConfig so config-source servers remain reachable after readThroughCache TTL expires (5s). Without this, every tool call after 5s returned undefined for config-source servers. MAJOR: Extract isUserSourced() helper to mcp/utils.ts and replace all 5 inline dbSourced ternary expressions (MCPManager x2, ConnectionsRepository, UserConnectionManager, MCPServerInspector). MAJOR: Fix cross-process Redis race in lazyInitConfigServer — when configCacheRepo.add throws (key exists from another process), fall back to reading the existing entry instead of returning undefined. MINOR: Parallelize invalidateConfigCache awaits with Promise.all. Remove redundant .catch(() => {}) inside Promise.allSettled. Tighten dedup test assertion to toBe(1). Add TTL-expiry tests for getServerConfig (with and without userId). * feat: thread configServers through getAppToolFunctions and formatInstructionsForContext Add optional configServers parameter to getAppToolFunctions, getInstructions, and formatInstructionsForContext so config-source server tools and instructions are visible to agent initialization and context injection paths. Existing callers (boot-time init, tests) pass no argument and continue to work unchanged. Agent runtime paths can now thread resolved config servers from request context. * fix: stale failure stubs retry after 5 min, upsert for cross-process races - Add CONFIG_STUB_RETRY_MS (5 min) — stale failure stubs are retried instead of permanently disabling config-source servers after transient errors (DNS outage, cold-start race) - Extract upsertConfigCache() helper that tries add then falls back to update, preventing cross-process Redis races where a second instance's successful inspection result was discarded - Add test for stale-stub retry after CONFIG_STUB_RETRY_MS * fix: stamp updatedAt on failure stubs, null-guard callTool config, test cleanup - Add updatedAt: Date.now() to failure stubs in lazyInitConfigServer so CONFIG_STUB_RETRY_MS (5 min) window works correctly — without it, stubs were always considered stale (updatedAt ?? 0 → epoch → always expired) - Add null guard for rawConfig in MCPManager.callTool before passing to preProcessGraphTokens — prevents unsafe `as` cast on undefined - Log double-failure in upsertConfigCache instead of silently swallowing - Replace module-scope Date.now monkey-patch with jest.useFakeTimers / jest.setSystemTime / jest.useRealTimers in ensureConfigServers tests * fix: server-only readThrough fallback only returns truthy values Prevents a cached undefined from a prior no-userId lookup from short-circuiting the DB query on a subsequent userId-scoped lookup. * fix: remove findInConfigCache to eliminate cross-tenant config leakage The findInConfigCache prefix scan (serverName:) could return any tenant's config after readThrough TTL expires, violating tenant isolation. Config-source servers are now ONLY resolvable through: 1. The configServers param (callers with tenant context from ALS) 2. The readThrough cache (populated by ensureSingleConfigServer, 5s TTL, repopulated on every HTTP request via resolveAllMcpConfigs) Connection/tool-call paths without tenant context rely exclusively on the readThrough cache. If it expires before the next HTTP request repopulates it, the server is not found — which is correct because there is no tenant context to determine which config to return. - Remove findInConfigCache method and its call in getServerConfig - Update server-only readThrough fallback to only return truthy values (prevents cached undefined from short-circuiting user-scoped DB lookup) - Update tests to document tenant isolation behavior after cache expiry style: fix import order per AGENTS.md conventions Sort package imports shortest-to-longest, local imports longest-to-shortest across MCPServersRegistry, ConnectionsRepository, MCPManager, UserConnectionManager, and MCPServerInspector. * fix: eliminate cross-tenant readThrough contamination and TTL-expiry tool failures Thread pre-resolved serverConfig from tool creation context into callTool, removing dependency on the readThrough cache for config-source servers. This fixes two issues: - Cross-tenant contamination: the readThrough cache key was unscoped (just serverName), so concurrent multi-tenant requests for same-named servers would overwrite each other's entries - TTL expiry: tool calls happening >5s after config resolution would fail with "Configuration not found" because the readThrough entry had expired Changes: - Add optional serverConfig param to MCPManager.callTool — uses provided config directly, falling back to getServerConfig lookup for YAML/user servers - Thread serverConfig from createMCPTool through createToolInstance closure to callTool - Remove readThrough write from ensureSingleConfigServer — config-source servers are only accessible via configServers param (tenant-scoped) - Remove server-only readThrough fallback from getServerConfig - Increase config cache hash from 8 to 16 hex chars (64-bit) - Add isUserSourced boundary tests for all source/dbId combinations - Fix double Object.keys call in getMCPTools controller - Update test assertions for new getServerConfig behavior * fix: cache base configs for config-server users; narrow upsertConfigCache error handling - Refactor getAllServerConfigs to separate base config fetch (YAML + DB) from config-server layering. Base configs are cached via readThroughCacheAll regardless of whether configServers is provided, eliminating uncached MongoDB queries per request for config-server users - Narrow upsertConfigCache catch to duplicate-key errors only; infrastructure errors (Redis timeouts, network failures) now propagate instead of being silently swallowed, preventing inspection storms during outages * fix: restore correct merge order and document upsert error matching - Restore YAML → Config → User DB precedence in getAllServerConfigs (user DB servers have highest precedence, matching the JSDoc contract) - Add source comment on upsertConfigCache duplicate-key detection linking to the two cache implementations that define the error message * feat: complete config-source server support across all execution paths Wire configServers through the entire agent execution pipeline so config-source MCP servers are fully functional — not just visible in listings but executable in agent sessions. - Thread configServers into handleTools.js agent tool pipeline: resolve config servers from tenant context before MCP tool iteration, pass to getServerConfig, createMCPTools, and createMCPTool - Thread configServers into agent instructions pipeline: applyContextToAgent → getMCPInstructionsForServers → formatInstructionsForContext, resolved in client.js before agent context application - Add configServers param to createMCPTool and createMCPTools for reconnect path fallback - Add source field to redactServerSecrets allowlist for client UI differentiation of server tiers - Narrow invalidateConfigCache to only clear readThroughCacheAll (merged results), preserving YAML individual-server readThrough entries - Update context.spec.ts assertions for new configServers parameter * fix: add missing mocks for config-source server dependencies in client.test.js Mock getMCPServersRegistry, getAppConfig, and getTenantId that were added to client.js but not reflected in the test file's jest.mock declarations. * fix: update formatInstructionsForContext assertions for configServers param The test assertions expected formatInstructionsForContext to be called with only the server names array, but it now receives configServers as a second argument after the config-source server feature wiring. * fix: move configServers resolution before MCP tool loop to avoid TDZ configServers was declared with `let` after the first tool loop but referenced inside it via getServerConfig(), causing a ReferenceError temporal dead zone. Move declaration and resolution before the loop, using tools.some(mcpToolPattern) to gate the async resolution. * fix: address review findings — cache bypass, discoverServerTools gap, DRY - #2: getAllServerConfigs now always uses getBaseServerConfigs (cached via readThroughCacheAll) instead of bypassing it when configServers is present. Extracts user-DB entries from cached base by diffing against YAML keys to maintain YAML → Config → User DB merge order without extra MongoDB calls. - #3: Add configServers param to ToolDiscoveryOptions and thread it through discoverServerTools → getServerConfig so config-source servers are discoverable during OAuth reconnection flows. - #6: Replace inline import() type annotations in context.ts with proper import type { ParsedServerConfig } per AGENTS.md conventions. - #7: Extract resolveConfigServers(req) helper in MCP.js and use it from handleTools.js and client.js, eliminating the duplicated 6-line config resolution pattern. - #10: Restore removed "why" comment explaining getLoaded() vs getAll() choice in getMCPSetupData — documents non-obvious correctness constraint. - #11: Fix incomplete JSDoc param type on resolveAllMcpConfigs. * fix: consolidate imports, reorder constants, fix YAML-DB merge edge case - Merge duplicate @librechat/data-schemas requires in MCP.js into one - Move resolveConfigServers after module-level constants - Fix getAllServerConfigs edge case where user-DB entry overriding a YAML entry with the same name was excluded from userDbConfigs; now uses reference equality check to detect DB-overwritten YAML keys * fix: replace fragile string-match error detection with proper upsert method Add upsert() to IServerConfigsRepositoryInterface and all implementations (InMemory, Redis, RedisAggregateKey, DB). This eliminates the brittle error message string match ('already exists in cache') in upsertConfigCache that was the only thing preventing cross-process init races from silently discarding inspection results. Each implementation handles add-or-update atomically: - InMemory: direct Map.set() - Redis: direct cache.set() - RedisAggregateKey: read-modify-write under write lock - DB: delegates to update() (DB servers use explicit add() with ACL setup) * fix: wire configServers through remaining HTTP endpoints - getMCPServerById: use resolveAllMcpConfigs instead of bare getServerConfig - reinitialize route: resolve configServers before getServerConfig - auth-values route: resolve configServers before getServerConfig - getOAuthHeaders: accept configServers param, thread from callers - Update mcp.spec.js tests to mock getAllServerConfigs for GET by name * fix: thread serverConfig through getConnection for config-source servers Config-source servers exist only in configCacheRepo, not in YAML cache or DB. When callTool → getConnection → getUserConnection → getServerConfig runs without configServers, it returns undefined and throws. Fix by threading the pre-resolved serverConfig (providedConfig) from callTool through getConnection → getUserConnection → createUserConnectionInternal, using it as a fallback before the registry lookup. * fix: thread configServers through reinit, reconnect, and tool definition paths Wire configServers through every remaining call chain that creates or reconnects MCP server connections: - reinitMCPServer: accepts serverConfig and configServers, uses them for getServerConfig fallback, getConnection, and discoverServerTools - reconnectServer: accepts and passes configServers to reinitMCPServer - createMCPTools/createMCPTool: pass configServers to reconnectServer - ToolService.loadToolDefinitionsWrapper: resolves configServers from req, passes to both reinitMCPServer call sites - reinitialize route: passes serverConfig and configServers to reinitMCPServer * fix: address review findings — simplify merge, harden error paths, fix log labels - Simplify getAllServerConfigs merge: replace fragile reference-equality loop with direct spread { ...yamlConfigs, ...configServers, ...base } - Guard upsertConfigCache in lazyInitConfigServer catch block so cache failures don't mask the original inspection error - Deduplicate getYamlServerNames cold-start with promise dedup pattern - Remove dead `if (!mcpConfig)` guard in getMCPSetupData - Fix hardcoded "App server" in ServerConfigsCacheRedisAggregateKey error messages — now uses this.namespace for correct Config/App labeling - Remove misleading OAuth callback comment about readThrough cache - Move resolveConfigServers after module-level constants in MCP.js * fix: clear rejected yamlServerNames promise, fix config-source reinspect, fix reset log label - Clear yamlServerNamesPromise on rejection so transient cache errors don't permanently prevent ensureConfigServers from working - Skip reinspectServer for config-source servers (source: 'config') in reinitMCPServer — they lack a CACHE/DB storage location; retry is handled by CONFIG_STUB_RETRY_MS in ensureConfigServers - Use source field instead of dbId for storageLocation derivation - Fix remaining hardcoded "App" in reset() leaderCheck message * fix: persist oauthHeaders in flow state for config-source OAuth servers The OAuth callback route has no JWT auth context and cannot resolve config-source server configs. Previously, getOAuthHeaders would silently return {} for config-source servers, dropping custom token exchange headers. Now oauthHeaders are persisted in MCPOAuthFlowMetadata during flow initiation (which has auth context), and the callback reads them from the stored flow state with a fallback to the registry lookup for YAML/user-DB servers. * fix: update tests for getMCPSetupData null guard removal and ToolService mock - MCP.spec.js: update test to expect graceful handling of null mcpConfig instead of a throw (getAllServerConfigs always returns an object) - MCP.js: add defensive \|\| {} for Object.entries(mcpConfig) in case of null from test mocks - ToolService.spec.js: add missing mock for ~/server/services/MCP (resolveConfigServers) * fix: address review findings — DRY, naming, logging, dead code, defensive guards - #1: Simplify getAllServerConfigs to single getBaseServerConfigs call, eliminating redundant double-fetch of cacheConfigsRepo.getAll() - #2: Add warning log when oauthHeaders absent from OAuth callback flow state - #3: Extract resolveAllMcpConfigs to MCP.js service layer; controller imports shared helper instead of reimplementing - #4: Rename _serverConfig/_provider to capturedServerConfig/capturedProvider in createToolInstance — these are actively used, not unused - #5: Log rejected results from ensureConfigServers Promise.allSettled so cache errors are visible instead of silently dropped - #6: Remove dead 'MCP config not found' error handlers from routes - #7: Document circular-dependency reason for dynamic require in clearMcpConfigCache - #8: Remove logger.error from withTimeout to prevent double-logging timeouts - #10: Add explicit userId guard in ServerConfigsDB.upsert with clear error message - #12: Use spread instead of mutation in addServer for immutability consistency - Add upsert mock to ensureConfigServers.test.ts DB mock - Update route tests for resolveAllMcpConfigs import change * fix: restore correct merge priority, use immutable spread, fix test mock - getAllServerConfigs: { ...configServers, ...base } so userDB wins over configServers, matching documented "User DB (highest)" priority - lazyInitConfigServer: use immutable spread instead of direct mutation for parsedConfig.source, consistent with addServer fix - Fix test to mock getAllServerConfigs as {} instead of null, remove unnecessary \|\| {} defensive guard in getMCPSetupData * fix: error handling, stable hashing, flatten nesting, remove dead param - Wrap resolveConfigServers/resolveAllMcpConfigs in try/catch with graceful {} fallback so transient DB/cache errors don't crash tool pipeline - Sort keys in configCacheKey JSON.stringify for deterministic hashing regardless of object property insertion order - Flatten clearMcpConfigCache from 3 nested try-catch to early returns; document that user connections are cleaned up lazily (accepted tradeoff) - Remove dead configServers param from getAppToolFunctions (never passed) - Add security rationale comment for source field in redactServerSecrets * fix: use recursive key-sorting replacer in configCacheKey to prevent cross-tenant cache collision The array replacer in JSON.stringify acts as a property allowlist at every nesting depth, silently dropping nested keys like headers['X-API-Key'], oauth.client_secret, etc. Two configs with different nested values but identical top-level structure produced the same hash, causing cross-tenant cache hits and potential credential contamination. Switch to a function replacer that recursively sorts keys at all depths without dropping any properties. Also document the known gap in getOAuthServers: config-source OAuth servers are not covered by auto-reconnection or uninstall cleanup because callers lack request context. * fix: move clearMcpConfigCache to packages/api to eliminate circular dependency The function only depends on MCPServersRegistry and MCPManager, both of which live in packages/api. Import it directly from @librechat/api in the CJS layer instead of using dynamic require('~/config'). * chore: imports/fields ordering * fix: address review findings — error handling, targeted lookup, test gaps - Narrow resolveAllMcpConfigs catch to only wrap ensureConfigServers so getAppConfig/getAllServerConfigs failures propagate instead of masking infrastructure errors as empty server lists. - Use targeted getServerConfig in getMCPServerById instead of fetching all server configs for a single-server lookup. - Forward configServers to inner createMCPTool calls so reconnect path works for config-source servers. - Update getAllServerConfigs JSDoc to document disjoint-key design. - Add OAuth callback oauthHeaders fallback tests (flow state present vs registry fallback). - Add resolveConfigServers/resolveAllMcpConfigs unit tests covering happy path and error propagation. * fix: add getOAuthReconnectionManager mock to OAuth callback tests * chore: imports ordering	2026-03-28 10:36:43 -04:00
Danny Avila	67db0c1cb3	🗑️ chore: Remove Action Test Suite and Update Mock Implementations (#12268 ) - Deleted the Action test suite located in `api/models/Action.spec.js` to streamline the codebase. - Updated various test files to reflect changes in model mocks, consolidating mock implementations for user-related actions and enhancing clarity. - Improved consistency in test setups by aligning with the latest model updates and removing redundant mock definitions.	2026-03-21 14:28:55 -04:00
Danny Avila	8ba2bde5c1	📦 refactor: Consolidate DB models, encapsulating Mongoose usage in `data-schemas` (#11830 ) * chore: move database model methods to /packages/data-schemas * chore: add TypeScript ESLint rule to warn on unused variables * refactor: model imports to streamline access - Consolidated model imports across various files to improve code organization and reduce redundancy. - Updated imports for models such as Assistant, Message, Conversation, and others to a unified import path. - Adjusted middleware and service files to reflect the new import structure, ensuring functionality remains intact. - Enhanced test files to align with the new import paths, maintaining test coverage and integrity. * chore: migrate database models to packages/data-schemas and refactor all direct Mongoose Model usage outside of data-schemas * test: update agent model mocks in unit tests - Added `getAgent` mock to `client.test.js` to enhance test coverage for agent-related functionality. - Removed redundant `getAgent` and `getAgents` mocks from `openai.spec.js` and `responses.unit.spec.js` to streamline test setup and reduce duplication. - Ensured consistency in agent mock implementations across test files. * fix: update types in data-schemas * refactor: enhance type definitions in transaction and spending methods - Updated type definitions in `checkBalance.ts` to use specific request and response types. - Refined `spendTokens.ts` to utilize a new `SpendTxData` interface for better clarity and type safety. - Improved transaction handling in `transaction.ts` by introducing `TransactionResult` and `TxData` interfaces, ensuring consistent data structures across methods. - Adjusted unit tests in `transaction.spec.ts` to accommodate new type definitions and enhance robustness. * refactor: streamline model imports and enhance code organization - Consolidated model imports across various controllers and services to a unified import path, improving code clarity and reducing redundancy. - Updated multiple files to reflect the new import structure, ensuring all functionalities remain intact. - Enhanced overall code organization by removing duplicate import statements and optimizing the usage of model methods. * feat: implement loadAddedAgent and refactor agent loading logic - Introduced `loadAddedAgent` function to handle loading agents from added conversations, supporting multi-convo parallel execution. - Created a new `load.ts` file to encapsulate agent loading functionalities, including `loadEphemeralAgent` and `loadAgent`. - Updated the `index.ts` file to export the new `load` module instead of the deprecated `loadAgent`. - Enhanced type definitions and improved error handling in the agent loading process. - Adjusted unit tests to reflect changes in the agent loading structure and ensure comprehensive coverage. * refactor: enhance balance handling with new update interface - Introduced `IBalanceUpdate` interface to streamline balance update operations across the codebase. - Updated `upsertBalanceFields` method signatures in `balance.ts`, `transaction.ts`, and related tests to utilize the new interface for improved type safety. - Adjusted type imports in `balance.spec.ts` to include `IBalanceUpdate`, ensuring consistency in balance management functionalities. - Enhanced overall code clarity and maintainability by refining type definitions related to balance operations. * feat: add unit tests for loadAgent functionality and enhance agent loading logic - Introduced comprehensive unit tests for the `loadAgent` function, covering various scenarios including null and empty agent IDs, loading of ephemeral agents, and permission checks. - Enhanced the `initializeClient` function by moving `getConvoFiles` to the correct position in the database method exports, ensuring proper functionality. - Improved test coverage for agent loading, including handling of non-existent agents and user permissions. * chore: reorder memory method exports for consistency - Moved `deleteAllUserMemories` to the correct position in the exported memory methods, ensuring a consistent and logical order of method exports in `memory.ts`.	2026-03-21 14:28:53 -04:00
Danny Avila	35a35dc2e9	📏 refactor: Add File Size Limits to Conversation Imports (#12221 ) * fix: add file size limits to conversation import multer instance * fix: address review findings for conversation import file size limits * fix: use local jest.mock for data-schemas instead of global moduleNameMapper The global @librechat/data-schemas mock in jest.config.js only provided logger, breaking all tests that depend on createModels from the same package. Replace with a virtual jest.mock scoped to the import spec file. * fix: move import to top of file, pre-compute upload middleware, assert logger.warn in tests * refactor: move resolveImportMaxFileSize to packages/api New backend logic belongs in packages/api as TypeScript. Delete the api/server/utils/import/limits.js wrapper and import directly from @librechat/api in convos.js and importConversations.js. Resolver unit tests move to packages/api; the api/ spec retains only multer behavior tests. * chore: rename importLimits to import * fix: stale type reference and mock isolation in import tests Update typeof import path from '../importLimits' to '../import' after the rename. Clear mockLogger.warn in beforeEach to prevent cross-test accumulation. * fix: add resolveImportMaxFileSize to @librechat/api mock in convos.spec.js * fix: resolve jest.mock hoisting issue in import tests jest.mock factories are hoisted above const declarations, so the mockLogger reference was undefined at factory evaluation time. Use a direct import of the mocked logger module instead. * fix: remove virtual flag from data-schemas mock for CI compatibility virtual: true prevents the mock from intercepting the real module in CI where @librechat/data-schemas is built, causing import.ts to use the real logger while the test asserts against the mock.	2026-03-14 03:06:29 -04:00
Danny Avila	189cdf581d	🔐 fix: Add User Filter to Message Deletion (#12220 ) * fix: add user filter to message deletion to prevent IDOR * refactor: streamline DELETE request syntax in messages-delete test - Simplified the DELETE request syntax in the messages-delete.spec.js test file by combining multiple lines into a single line for improved readability. This change enhances the clarity of the test code without altering its functionality. * fix: address review findings for message deletion IDOR fix * fix: add user filter to message deletion in conversation tests - Included a user filter in the message deletion test to ensure proper handling of user-specific deletions, enhancing the accuracy of the test case and preventing potential IDOR vulnerabilities. * chore: lint	2026-03-13 23:42:37 -04:00
Danny Avila	ca79a03135	🚦 fix: Add Rate Limiting to Conversation Duplicate Endpoint (#12218 ) * fix: add rate limiting to conversation duplicate endpoint * chore: linter * fix: address review findings for conversation duplicate rate limiting * refactor: streamline test mocks for conversation routes - Consolidated mock implementations into a dedicated `convos-route-mocks.js` file to enhance maintainability and readability of test files. - Updated tests in `convos-duplicate-ratelimit.spec.js` and `convos.spec.js` to utilize the new mock structure, improving clarity and reducing redundancy. - Enhanced the `duplicateConversation` function to accept an optional title parameter for better flexibility in conversation duplication. * chore: rename files	2026-03-13 23:40:44 -04:00
Danny Avila	fa9e1b228a	🪪 fix: MCP API Responses and OAuth Validation (#12217 ) * 🔒 fix: Validate MCP Configs in Server Responses * 🔒 fix: Enhance OAuth URL Validation in MCPOAuthHandler - Introduced validation for OAuth URLs to ensure they do not target private or internal addresses, enhancing security against SSRF attacks. - Updated the OAuth flow to validate both authorization and token URLs before use, ensuring compliance with security standards. - Refactored redirect URI handling to streamline the OAuth client registration process. - Added comprehensive error handling for invalid URLs, improving robustness in OAuth interactions. * 🔒 feat: Implement Permission Checks for MCP Server Management - Added permission checkers for MCP server usage and creation, enhancing access control. - Updated routes for reinitializing MCP servers and retrieving authentication values to include these permission checks, ensuring only authorized users can access these functionalities. - Refactored existing permission logic to improve clarity and maintainability. * 🔒 fix: Enhance MCP Server Response Validation and Redaction - Updated MCP route tests to use `toMatchObject` for better validation of server response structures, ensuring consistency in expected properties. - Refactored the `redactServerSecrets` function to streamline the removal of sensitive information, ensuring that user-sourced API keys are properly redacted while retaining their source. - Improved OAuth security tests to validate rejection of private URLs across multiple endpoints, enhancing protection against SSRF vulnerabilities. - Added comprehensive tests for the `redactServerSecrets` function to ensure proper handling of various server configurations, reinforcing security measures. * chore: eslint * 🔒 fix: Enhance OAuth Server URL Validation in MCPOAuthHandler - Added validation for discovered authorization server URLs to ensure they meet security standards. - Improved logging to provide clearer insights when an authorization server is found from resource metadata. - Refactored the handling of authorization server URLs to enhance robustness against potential security vulnerabilities. * 🔒 test: Bypass SSRF validation for MCP OAuth Flow tests - Mocked SSRF validation functions to allow tests to use real local HTTP servers, facilitating more accurate testing of the MCP OAuth flow. - Updated test setup to ensure compatibility with the new mocking strategy, enhancing the reliability of the tests. * 🔒 fix: Add Validation for OAuth Metadata Endpoints in MCPOAuthHandler - Implemented checks for the presence and validity of registration and token endpoints in the OAuth metadata, enhancing security by ensuring that these URLs are properly validated before use. - Improved error handling and logging to provide better insights during the OAuth metadata processing, reinforcing the robustness of the OAuth flow. * 🔒 refactor: Simplify MCP Auth Values Endpoint Logic - Removed redundant permission checks for accessing the MCP server resource in the auth-values endpoint, streamlining the request handling process. - Consolidated error handling and response structure for improved clarity and maintainability. - Enhanced logging for better insights during the authentication value checks, reinforcing the robustness of the endpoint. * 🔒 test: Refactor LeaderElection Integration Tests for Improved Cleanup - Moved Redis key cleanup to the beforeEach hook to ensure a clean state before each test. - Enhanced afterEach logic to handle instance resignations and Redis key deletion more robustly, improving test reliability and maintainability.	2026-03-13 23:18:56 -04:00
Danny Avila	f32907cd36	🔏 fix: MCP Server URL Schema Validation (#12204 ) * fix: MCP server configuration validation and schema - Added tests to reject URLs containing environment variable references for SSE, streamable-http, and websocket types in the MCP routes. - Introduced a new schema in the data provider to ensure user input URLs do not resolve environment variables, enhancing security against potential leaks. - Updated existing MCP server user input schema to utilize the new validation logic, ensuring consistent handling of user-supplied URLs across the application. * fix: MCP URL validation to reject env variable references - Updated tests to ensure that URLs for SSE, streamable-http, and websocket types containing environment variable patterns are rejected, improving security against potential leaks. - Refactored the MCP server user input schema to enforce stricter validation rules, preventing the resolution of environment variables in user-supplied URLs. - Introduced new test cases for various URL types to validate the rejection logic, ensuring consistent handling across the application. * test: Enhance MCPServerUserInputSchema tests for environment variable handling - Introduced new test cases to validate the prevention of environment variable exfiltration through user input URLs in the MCPServerUserInputSchema. - Updated existing tests to confirm that URLs containing environment variable patterns are correctly resolved or rejected, improving security against potential leaks. - Refactored test structure to better organize environment variable handling scenarios, ensuring comprehensive coverage of edge cases.	2026-03-12 23:19:31 -04:00
Danny Avila	fcb344da47	🛂 fix: MCP OAuth Race Conditions, CSRF Fallback, and Token Expiry Handling (#12171 ) * fix: Implement race conditions in MCP OAuth flow - Added connection mutex to coalesce concurrent `getUserConnection` calls, preventing multiple simultaneous attempts. - Enhanced flow state management to retry once when a flow state is missing, improving resilience against race conditions. - Introduced `ReauthenticationRequiredError` for better error handling when access tokens are expired or missing. - Updated tests to cover new race condition scenarios and ensure proper handling of OAuth flows. * fix: Stale PENDING flow detection and OAuth URL re-issuance PENDING flows in handleOAuthRequired now check createdAt age — flows older than 2 minutes are treated as stale and replaced instead of joined. Fixes the case where a leftover PENDING flow from a previous session blocks new OAuth initiation. authorizationUrl is now stored in MCPOAuthFlowMetadata so that when a second caller joins an active PENDING flow (e.g., the SSE-emitting path in ToolService), it can re-issue the URL to the user via oauthStart. * fix: CSRF fallback via active PENDING flow in OAuth callback When the OAuth callback arrives without CSRF or session cookies (common in the chat/SSE flow where cookies can't be set on streaming responses), fall back to validating that a PENDING flow exists for the flowId. This is safe because the flow was created server-side after JWT authentication and the authorization code is PKCE-protected. * test: Extract shared OAuth test server helpers Move MockKeyv, getFreePort, trackSockets, and createOAuthMCPServer into a shared helpers/oauthTestServer module. Enhance the test server with refresh token support, token rotation, metadata discovery, and dynamic client registration endpoints. Add InMemoryTokenStore for token storage tests. Refactor MCPOAuthRaceCondition.test.ts to import from shared helpers. * test: Add comprehensive MCP OAuth test modules MCPOAuthTokenStorage — 21 tests for storeTokens/getTokens with InMemoryTokenStore: encrypt/decrypt round-trips, expiry calculation, refresh callback wiring, ReauthenticationRequiredError paths. MCPOAuthFlow — 10 tests against real HTTP server: token refresh with stored client info, refresh token rotation, metadata discovery, dynamic client registration, full store/retrieve/expire/refresh lifecycle. MCPOAuthConnectionEvents — 5 tests for MCPConnection OAuth event cycle with real OAuth-gated MCP server: oauthRequired emission on 401, oauthHandled reconnection, oauthFailed rejection, token expiry detection. MCPOAuthTokenExpiry — 12 tests for the token expiry edge case: refresh success/failure paths, ReauthenticationRequiredError, PENDING flow CSRF fallback, authorizationUrl metadata storage, full re-auth cycle after refresh failure, concurrent expired token coalescing, stale PENDING flow detection. * test: Enhance MCP OAuth connection tests with cooldown reset Added a `beforeEach` hook to clear the cooldown for `MCPConnection` before each test, ensuring a clean state. Updated the race condition handling in the tests to properly clear the timeout, improving reliability in the event data retrieval process. * refactor: PENDING flow management and state recovery in MCP OAuth - Introduced a constant `PENDING_STALE_MS` to define the age threshold for PENDING flows, improving the handling of stale flows. - Updated the logic in `MCPConnectionFactory` and `FlowStateManager` to check the age of PENDING flows before joining or reusing them. - Modified the `completeFlow` method to return false when the flow state is deleted, ensuring graceful handling of race conditions. - Enhanced tests to validate the new behavior and ensure robustness against state recovery issues. * refactor: MCP OAuth flow management and testing - Updated the `completeFlow` method to log warnings when a tool flow state is not found during completion, improving error handling. - Introduced a new `normalizeExpiresAt` function to standardize expiration timestamp handling across the application. - Refactored token expiration checks in `MCPConnectionFactory` to utilize the new normalization function, ensuring consistent behavior. - Added a comprehensive test suite for OAuth callback CSRF fallback logic, validating the handling of PENDING flows and their staleness. - Enhanced existing tests to cover new expiration normalization logic and ensure robust flow state management. * test: Add CSRF fallback tests for active PENDING flows in MCP OAuth - Introduced new tests to validate CSRF fallback behavior when a fresh PENDING flow exists without cookies, ensuring successful OAuth callback handling. - Added scenarios to reject requests when no PENDING flow exists, when only a COMPLETED flow is present, and when a PENDING flow is stale, enhancing the robustness of flow state management. - Improved overall test coverage for OAuth callback logic, reinforcing the handling of CSRF validation failures. * chore: imports order * refactor: Update UserConnectionManager to conditionally manage pending connections - Modified the logic in `UserConnectionManager` to only set pending connections if `forceNew` is false, preventing unnecessary overwrites. - Adjusted the cleanup process to ensure pending connections are only deleted when not forced, enhancing connection management efficiency. * refactor: MCP OAuth flow state management - Introduced a new method `storeStateMapping` in `MCPOAuthHandler` to securely map the OAuth state parameter to the flow ID, improving callback resolution and security against forgery. - Updated the OAuth initiation and callback handling in `mcp.js` to utilize the new state mapping functionality, ensuring robust flow management. - Refactored `MCPConnectionFactory` to store state mappings during flow initialization, enhancing the integrity of the OAuth process. - Adjusted comments to clarify the purpose of state parameters in authorization URLs, reinforcing code readability. * refactor: MCPConnection with OAuth recovery handling - Added `oauthRecovery` flag to manage OAuth recovery state during connection attempts. - Introduced `decrementCycleCount` method to reduce the circuit breaker's cycle count upon successful reconnection after OAuth recovery. - Updated connection logic to reset the `oauthRecovery` flag after handling OAuth, improving state management and connection reliability. * chore: Add debug logging for OAuth recovery cycle count decrement - Introduced a debug log statement in the `MCPConnection` class to track the decrement of the cycle count after a successful reconnection during OAuth recovery. - This enhancement improves observability and aids in troubleshooting connection issues related to OAuth recovery. * test: Add OAuth recovery cycle management tests - Introduced new tests for the OAuth recovery cycle in `MCPConnection`, validating the decrement of cycle counts after successful reconnections. - Added scenarios to ensure that the cycle count is not decremented on OAuth failures, enhancing the robustness of connection management. - Improved test coverage for OAuth reconnect scenarios, ensuring reliable behavior under various conditions. * feat: Implement circuit breaker configuration in MCP - Added circuit breaker settings to `.env.example` for max cycles, cycle window, and cooldown duration. - Refactored `MCPConnection` to utilize the new configuration values from `mcpConfig`, enhancing circuit breaker management. - Improved code maintainability by centralizing circuit breaker parameters in the configuration file. * refactor: Update decrementCycleCount method for circuit breaker management - Changed the visibility of the `decrementCycleCount` method in `MCPConnection` from private to public static, allowing it to be called with a server name parameter. - Updated calls to `decrementCycleCount` in `MCPConnectionFactory` to use the new static method, improving clarity and consistency in circuit breaker management during connection failures and OAuth recovery. - Enhanced the handling of circuit breaker state by ensuring the method checks for the existence of the circuit breaker before decrementing the cycle count. * refactor: cycle count decrement on tool listing failure - Added a call to `MCPConnection.decrementCycleCount` in the `MCPConnectionFactory` to handle cases where unauthenticated tool listing fails, improving circuit breaker management. - This change ensures that the cycle count is decremented appropriately, maintaining the integrity of the connection recovery process. * refactor: Update circuit breaker configuration and logic - Enhanced circuit breaker settings in `.env.example` to include new parameters for failed rounds and backoff strategies. - Refactored `MCPConnection` to utilize the updated configuration values from `mcpConfig`, improving circuit breaker management. - Updated tests to reflect changes in circuit breaker logic, ensuring accurate validation of connection behavior under rapid reconnect scenarios. * feat: Implement state mapping deletion in MCP flow management - Added a new method `deleteStateMapping` in `MCPOAuthHandler` to remove orphaned state mappings when a flow is replaced, preventing old authorization URLs from resolving after a flow restart. - Updated `MCPConnectionFactory` to call `deleteStateMapping` during flow cleanup, ensuring proper management of OAuth states. - Enhanced test coverage for state mapping functionality to validate the new deletion logic.	2026-03-10 21:15:01 -04:00
Danny Avila	b8c31e7314	🔱 chore: Harden API Routes Against IDOR and DoS Attacks (#11760 ) * 🔧 feat: Update user key handling in keys route and add comprehensive tests - Enhanced the PUT /api/keys route to destructure request body for better clarity and maintainability. - Introduced a new test suite for keys route, covering key update, deletion, and retrieval functionalities, ensuring robust validation and IDOR prevention. - Added tests to verify handling of extraneous fields and missing optional parameters in requests. * 🔧 fix: Enhance conversation deletion route with parameter validation - Updated the DELETE /api/convos route to handle cases where the request body is empty or the 'arg' parameter is null/undefined, returning a 400 status with an appropriate error message for DoS prevention. - Added corresponding tests to ensure proper validation and error handling for these scenarios, enhancing the robustness of the API. * 🔧 fix: Improve request body validation in keys and convos routes - Updated the DELETE /api/convos and PUT /api/keys routes to validate the request body, returning a 400 status for null or invalid bodies to enhance security and prevent potential DoS attacks. - Added corresponding tests to ensure proper error handling for these scenarios, improving the robustness of the API.	2026-02-12 18:08:24 -05:00
Danny Avila	599f4a11f1	🛡️ fix: Secure MCP/Actions OAuth Flows, Resolve Race Condition & Tool Cache Cleanup (#11756 ) * 🔧 fix: Update OAuth error message for clarity - Changed the default error message in the OAuth error route from 'Unknown error' to 'Unknown OAuth error' to provide clearer context during authentication failures. * 🔒 feat: Enhance OAuth flow with CSRF protection and session management - Implemented CSRF protection for OAuth flows by introducing `generateOAuthCsrfToken`, `setOAuthCsrfCookie`, and `validateOAuthCsrf` functions. - Added session management for OAuth with `setOAuthSession` and `validateOAuthSession` middleware. - Updated routes to bind CSRF tokens for MCP and action OAuth flows, ensuring secure authentication. - Enhanced tests to validate CSRF handling and session management in OAuth processes. * 🔧 refactor: Invalidate cached tools after user plugin disconnection - Added a call to `invalidateCachedTools` in the `updateUserPluginsController` to ensure that cached tools are refreshed when a user disconnects from an MCP server after a plugin authentication update. This change improves the accuracy of tool data for users. * chore: imports order * fix: domain separator regex usage in ToolService - Moved the declaration of `domainSeparatorRegex` to avoid redundancy in the `loadActionToolsForExecution` function, improving code clarity and performance. * chore: OAuth flow error handling and CSRF token generation - Enhanced the OAuth callback route to validate the flow ID format, ensuring proper error handling for invalid states. - Updated the CSRF token generation function to require a JWT secret, throwing an error if not provided, which improves security and clarity in token generation. - Adjusted tests to reflect changes in flow ID handling and ensure robust validation across various scenarios.	2026-02-12 14:22:05 -05:00
Danny Avila	211b39f311	🔒 fix: Restrict MCP Stdio Transport via API (#11184 ) - Updated MCP server configuration tests to reject stdio transport configurations, ensuring that only remote transports (SSE, HTTP, WebSocket) are allowed via the API. - Enhanced documentation to clarify that stdio transport is excluded from user input for security, as it allows arbitrary command execution and should only be configured by administrators through YAML files.	2026-01-03 12:47:11 -05:00
Danny Avila	b94388ce9d	🏺 fix: Restore Archive Functionality with Dedicated Endpoint (#11183 ) The archive conversation feature was broken after the `/api/convos/update` route was modified to only handle title updates. The frontend was sending `{ conversationId, isArchived }` to the update endpoint, but the backend was only extracting `title` and ignoring the `isArchived` field entirely. This fix implements a dedicated `/api/convos/archive` endpoint to restore the archive/unarchive functionality. Changes: packages/data-provider/src/api-endpoints.ts: - Add `archiveConversation()` endpoint returning `/api/convos/archive` packages/data-provider/src/data-service.ts: - Update `archiveConversation()` to use dedicated archive endpoint api/server/routes/convos.js: - Add `POST /archive` route with validation for `conversationId` (required) and `isArchived` (must be boolean) api/server/routes/__tests__/convos.spec.js: - Add test coverage for archive endpoint (success, validation, error cases)	2026-01-02 19:41:53 -05:00
Danny Avila	bfc981d736	✍️ fix: Validation for Conversation Title Updates (#11099 ) * ✍️ fix: Validation for Conversation Title Updates * fix: Add validateConvoAccess middleware mock in tests	2025-12-25 12:59:48 -05:00
Artyom Bogachenko	7844a93f8b	♻️ fix: use DOMAIN_CLIENT for MCP OAuth Redirects (#11057 ) Co-authored-by: Artyom Bogachenco <a.bogachenko@easyreport.ai>	2025-12-25 12:24:01 -05:00
Atef Bellaaj	95a69df70e	🔒 feat: Add MCP server domain restrictions for remote transports (#11013 ) * 🔒 feat: Add MCP server domain restrictions for remote transports * 🔒 feat: Implement comprehensive MCP error handling and domain validation - Added `handleMCPError` function to centralize error responses for domain restrictions and inspection failures. - Introduced custom error classes: `MCPDomainNotAllowedError` and `MCPInspectionFailedError` for better error management. - Updated MCP server controllers to utilize the new error handling mechanism. - Enhanced domain validation logic in `createMCPTools` and `createMCPTool` functions to prevent operations on disallowed domains. - Added tests for runtime domain validation scenarios to ensure correct behavior. * chore: import order * 🔒 feat: Enhance domain validation in MCP tools with user role-based restrictions - Integrated `getAppConfig` to fetch allowed domains based on user roles in `createMCPTools` and `createMCPTool` functions. - Removed the deprecated `getAllowedDomains` method from `MCPServersRegistry`. - Updated tests to verify domain restrictions are applied correctly based on user roles. - Ensured that domain validation logic is consistent and efficient across tool creation processes. * 🔒 test: Refactor MCP tests to utilize configurable app settings - Introduced a mock for `getAppConfig` to enhance test flexibility. - Removed redundant mock definition to streamline test setup. - Ensured tests are aligned with the latest domain validation logic. --------- Co-authored-by: Atef Bellaaj <slalom.bellaaj@external.daimlertruck.com> Co-authored-by: Danny Avila <danny@librechat.ai>	2025-12-18 13:57:49 -05:00

1 2

85 commits