Commit graph

10 commits

Author SHA1 Message Date
Danny Avila
5bab22d236
🛡️ fix: Gate Bash PTC Capabilities (#13053) 2026-05-10 21:23:02 -04:00
Danny Avila
715a4a5fc1
🧰 refactor: Use Bash PTC for Agent Tools (#13042)
* fix: Use Bash PTC for programmatic agent tools

* fix: Preserve legacy PTC event calls
2026-05-09 16:31:09 -04:00
Danny Avila
f839a447e1
🧬 fix: Subagent MCP requestBody Propagation (bump @librechat/agents to 3.1.78 + cleanup) (#12959)
* 📦 chore: bump `@librechat/agents` to v3.1.78

v3.1.78 ships [danny-avila/agents#147](https://github.com/danny-avila/agents/pull/147),
which makes `SubagentExecutor` inherit the parent invocation's
`configurable` (with `thread_id`/`run_id`/`parent_run_id` scrubbed)
into the child workflow. Subagent tool dispatches through the parent's
`ON_TOOL_EXECUTE` handler now arrive with parent's `requestBody`,
`user`, `userMCPAuthMap`, etc. — so `{{LIBRECHAT_BODY_*}}` placeholder
substitution and per-user MCP connection lookup work for subagent
tool calls the same way they do for the parent agent.

Note: `package-lock.json` will need an `npm install` refresh once
v3.1.78 lands on the registry. The user/user_id injection added in
PR #12950 stays as defense-in-depth.

* 🗑️ refactor: drop redundant user/user_id injection from `loadToolsForExecution`

`@librechat/agents@3.1.78` (via danny-avila/agents#147) makes
`SubagentExecutor` forward the parent's `configurable` verbatim into
the child workflow. Subagent `ON_TOOL_EXECUTE` dispatches now arrive
with parent's `user` / `user_id` already in `data.configurable` —
making the host-side injection added in #12950 a no-op.

Removes:
- The conditional `user: createSafeUser(req.user); user_id: req.user.id`
  block in `loadToolsForExecution` (req.user.id-guarded so the
  `'api-user'` fallback in Responses/OpenAI controllers is preserved).
- The unused `createSafeUser` import.
- The 4 unit tests covering the now-deleted behavior.

The merge in `handlers.ts` (`{ ...configurable, ...toolConfigurable }`)
still produces a `mergedConfigurable` with the right user identity for
both parent and subagent paths — the values just come from
`configurable` (forwarded by the SDK) rather than `toolConfigurable`.

Other fixes from #12950 stay (IUser.id narrowing, the env.ts /
google/initialize.ts / remoteAgentAuth.ts TS-warning fixes) — they
were independent of the subagent identity propagation issue.

* 📦 chore: update `@librechat/agents` to v3.1.78

This update reflects the transition from the development version `3.1.78-dev.0` to the stable release `3.1.78`. The package-lock.json has been refreshed to ensure consistency with the new version, including updated integrity checks and resolved URLs for the package. This change is part of ongoing improvements to enhance the functionality and stability of the agents module.
2026-05-05 22:07:26 -04:00
Danny Avila
bff9bfea87
🐛 fix: Propagate User Identity to Subagent MCP Tool Calls (#12950)
* 🐛 fix: Propagate User Identity to Subagent MCP Tool Calls

The `@librechat/agents` SDK's `SubagentExecutor` invokes the child
workflow with a fresh configurable of `{ thread_id }` only — the
parent's `user` / `user_id` are dropped on the way into the child
graph. The child's `ToolNode` then dispatches `ON_TOOL_EXECUTE` to the
parent's handler, which merges `{ ...configurable, ...toolConfigurable }`,
but neither side carries user identity for subagents.

Downstream MCP tools read `config.configurable.user?.id || user_id` and
got `undefined`, so `MCPManager.getConnection` fell through to the
"No connection found for server X" error path — it can't reach the
user-connection lookup without a userId.

Re-inject `user` (via `createSafeUser`) and `user_id` from `req.user`
into the configurable returned by `loadToolsForExecution`. This is the
single point all controllers (chat, Responses API, OpenAI-compat) flow
through. For the parent agent it's a no-op (outer config already
carries the same values); for subagents it fills the gap so MCP
connection lookup, user-placeholder substitution, and tools that read
configurable.user all work correctly.

* 🐛 fix: Preserve `api-user` Fallback When Injecting Subagent Identity

Codex review pointed out that the prior commit unconditionally wrote
`user_id: req.user?.id` (and `user`) into `toolConfigurable`. The handler
merges via `{ ...configurable, ...toolConfigurable }` — `toolConfigurable`
wins — so when `req.user` is absent, this overwrote the outer config's
`'api-user'` fallback (set by `responses.js` / `openai.js` for the
unauthenticated API-key path) with `undefined`, breaking MCP connection
lookup for that path.

Only inject the keys when `req.user.id` is truthy. Omitting them lets
the merge preserve whatever the outer configurable already had. Tests
updated to assert key omission for `req.user` undefined / null / present
without `id`.

* 🩹 fix: Narrow `IUser.id` to required string

`IUser` extends mongoose `Document`, which types `id?: any` (the optional
virtual). At runtime `id` is always `_id.toString()` for a hydrated doc,
so narrow the type to a required string.

Closes two `@rollup/plugin-typescript` TS2322 warnings introduced by
PR #12450 (OIDC Bearer Token Authentication for Remote Agent API)
where `req.user = userResolution.user` and the
`(req: Request, res: Response, next: NextFunction)` signature both
failed against the project's local `Express.User` augmentation
(`{ [key: string]: any; id: string; }`) because `IUser.id` was
`any`/optional. Narrowing here fixes both at the source rather than
casting at every assignment site.

* 🩹 fix: Resolve TS Build Warnings Surfaced by `IUser.id` Narrowing

Three rollup TS plugin warnings surfaced after narrowing
`IUser.id` from `any` to `string`:

- `utils/env.ts:95` — `safeUser[field] = user[field]` failed strict
  checking because indexed write through a union-typed key collapses
  the LHS to the intersection of all field write types (i.e.,
  `undefined` when fields have mixed types). The previous `id?: any`
  on IUser had been masking this. Switch to `Object.assign(safeUser,
  { [field]: user[field] })` which widens the assignment.

- `endpoints/google/initialize.ts:35` — `getUserKey({ userId:
  req.user?.id, ... })` failed because `req.user?.id` is now
  `string | undefined` (no longer `any`). Match the pattern already
  used in `endpoints/openAI/initialize.ts:49`: `req.user?.id ?? ''`.

- `middleware/remoteAgentAuth.ts:465` — pre-existing, unrelated to
  the IUser change. The local (gitignored) `express.d.ts` augments
  `express.Request` but not `express-serve-static-core.Request`,
  so the explicit `(req: Request, ...)` annotation imported from
  `'express'` resolves to a Request whose `req.user` differs from
  the one `RequestHandler` expects internally. Type the closure as
  `RequestHandler` directly so TS infers params from the augmented
  type.

* 🩹 fix: Cast `RemoteAgentAuth` Closure to `RequestHandler`

My previous attempt removed the explicit `req: Request` annotation on
the closure to side-step the outer `RequestHandler` mismatch. That
shifted the error to every helper call site inside the closure
(`getConfigOptions(req)`, `runApiKeyAuth(req, ...)` at 467/474/493/
512/531), because the helpers annotate their params with
`express.Request` (which has the local `Request.user` augmentation),
while the unannotated closure inferred `req` as
`express-serve-static-core.Request` (no augmentation). Reproduced
locally by stubbing the gitignored `src/types/express.d.ts`.

Right approach: keep the explicit `req: Request` annotation so the
closure body matches the helpers' types, then cast at the return —
`RequestHandler`'s internal `Request` resolves through
`express-serve-static-core` and lacks the augmentation, so the cast
is the boundary that bridges the two views of `req.user`.

Verified against a build with the local express.d.ts stub: zero
warnings on `remoteAgentAuth.ts`, `env.ts`, and `google/initialize.ts`.
2026-05-05 12:19:50 +09:00
Caesar Isidro Va-ay
9b9a86d17d
🔀 fix: Resolve Action Tools by Exact Name to Prevent Multi-Action Domain Collision (#12594)
* 🐛 fix: resolve Action tools by exact tool name to prevent multi-action collision

When two OpenAPI Actions on the same Agent share a hostname, the second
action's entry overwrote the first in the encoded-domain Map and one
action's tools silently disappeared from the LLM payload. The buggy
resolution loop also used substring matching, which caused similar
shadowing for any encoded-domain prefix overlap.

This change builds a Map keyed on the full tool name
(`<operationId>_action_<encoded-domain>`) directly, mirroring the exact
lookup pattern that getActionToolDefinitions already uses. Each function
in an action's spec gets its own slot, so two actions sharing a hostname
no longer collide. Both the new and the legacy domain encodings are
registered for each function so agents whose stored tool names predate
the current encoding still resolve.

Applied at all three call sites that had the buggy pattern:

  - processRequiredActions (assistants/threads path)
  - loadAgentTools (agent build path)
  - loadActionToolsForExecution (agent execution path)

Adds three regression tests covering both ordering directions and the
execution path. Tests fail without the fix and pass with it.

* 🐛 fix: Normalize action tool name at lookup + cover assistants path

Follow-up to the multi-action domain collision fix. Addresses PR #12594
review feedback:

**Must-fix #1 — short-hostname lookup mismatch.** The toolToAction map
is keyed on the `_`-collapsed domain, but `agent.tools` and
`currentAction.tool` persist the raw `domainParser(..., true)` output,
which for hostnames ≤ ENCODED_DOMAIN_LENGTH is a `---`-separated string
(e.g. `medium---com`). Exact-match `Map.get()` missed those keys and
silently dropped the tool. Fix: normalize every incoming tool name
through a new `normalizeActionToolName` helper before the lookup in
`loadAgentTools`, `processRequiredActions`, and
`loadActionToolsForExecution`.

**Must-fix #2 — assistants path coverage.** `processRequiredActions`
received the same structural rewrite but had zero tests. Added a
regression test under `multi-action domain collision regression` that
drives two shared-hostname actions through the assistants path and
asserts each tool reaches its own request builder.

**Must-fix #3 — legacy encoding branch coverage.** The
`if (legacyNormalized !== normalizedDomain)` registration was never
exercised by any test. Added a test where `agent.tools` stores the
legacy-format name and asserts it still resolves.

**Should-fix #4 — DRY the registration loop.** Extracted
`registerActionTools({ toolToAction, functionSignatures,
normalizedDomain, legacyNormalized, makeEntry })`. All three call sites
now share the same key-building logic; the key template lives in one
place.

**Should-fix #5 — remove stale optional chaining.** In
`loadActionToolsForExecution`, `functionSignature?.description ?? ''`
became `functionSignature.description` — `sig` is always defined by the
iterator, matching the style of `loadAgentTools`.

**Should-fix #6 — drop unreachable `!requestBuilder` guard.** Entries
in `processRequiredActions` are now pre-built with
`requestBuilder: requestBuilders[sig.name]`, which `openapiToFunction`
always produces alongside the signature, so the guard is dead.

**Should-fix #7 — unwrap `actionSetsData`.** It now holds a bare
`Map` instead of `{ toolToAction }`; the sentinel `!actionSetsData`
check still works because `new Map()` is truthy.

Also added a short-hostname regression test
(`loadAgentTools resolves raw ---separated tool names`) that reproduces
Must-fix #1: it fails against the previous commit (0 create calls) and
passes with the normalization in place.

41 tests, all passing. The 3 new regression tests are under
`multi-action domain collision regression` and cover the assistants
path, the legacy encoding branch, and the short-hostname lookup path.

* 🐛 fix: Tighten registerActionTools key handling and assistants test

Follow-up to d643444 addressing the second review pass on PR #12594.

**ESLint** — Two prettier errors in the spec file (multi-line arrow
function bodies that should fit on one line). Auto-fixed.

**[MINOR] operationId containing `---` → key mismatch.** The lookup
path collapsed every `actionDomainSeparator` sequence in the full tool
name, but the registration path passed `sig.name` through unchanged.
A `---` that survived into an operationId would shift the underscore
boundary at lookup and miss its own key. Fix in `registerActionTools`:
normalize `sig.name` with the same helper so registration and lookup
always agree on the canonical form. `sanitizeOperationId` strips the
characters that produce `---` in practice, so this is theoretical
hardening, not a fix for a known reproducer.

**[MINOR] Same-operationId + same-hostname silent overwrite.** Two
actions sharing both an operationId and a hostname still produced a
silent `Map.set()` overwrite (the new key is identical, so neither the
operationId nor the domain disambiguates). Added a `setKey` helper
inside `registerActionTools` that logs a `[Actions] operationId
collision: ...` warning whenever a key is already present, naming the
overwriting action_id. The silent-overwrite mode from the original bug
cannot reappear under a different disguise without surfacing in the
logs.

**[NIT] processRequiredActions test simulated a runtime crash.**
`mockCreateActionTool` returned a tool with `_call: jest.fn()`, which
resolves to `undefined`. `processRequiredActions` chains
`.then(handleToolOutput).catch(handleToolError)` directly onto that
return, so `undefined.then(...)` threw synchronously and the outer
try/catch funneled the error into `handleToolError`. Creation count
assertions still passed because `createActionTool` runs before the
crash, but the test was silently exercising the failure path. Updated
the global mock to `_call: jest.fn().mockResolvedValue('{"status":"ok"}')`
so the success path runs end-to-end. The assistants regression test
now executes in ~5ms instead of ~90ms, which corroborates that it's
no longer hitting the synchronous throw.

**[NIT] Duplicated rationale comments.** All three call sites carried
multi-line comment blocks restating why we key on the full tool name.
That rationale now lives canonically in `registerActionTools`'s JSDoc;
the inline blocks collapsed to `// See registerActionTools for the
key-shape rationale.` Net -22 lines of comments.

41/41 tests still pass; lint is clean.

* 🐛 fix: Scope tool-name normalization to the encoded-domain suffix

Follow-up to f22228e addressing the Codex P1 on PR #12594.

**Regression.** The previous commit normalized the entire tool name
(`normalizeActionToolName(sig.name)` at registration, full-name
`.replace()` at lookup) to handle operationIds that theoretically
contained `---`. But `openapiToFunction` uses user-supplied
operationIds verbatim and the fallback `sanitizeOperationId` only
strips characters outside `[a-zA-Z0-9_-]`, so specs can legitimately
produce operationIds like `get_foo---bar` and `get_foo_bar` side by
side. Collapsing `---` to `_` across the entire key merged those two
into a single map slot — one silently overwrote the other, and both
tool requests routed to the surviving entry's request builder.

**Fix.** Limit normalization to the encoded-domain portion of the
full tool name, i.e. the substring after the last `actionDelimiter`.
The operationId half is left verbatim, so hyphens-vs-underscores
remain disambiguating. The short-hostname bug (Must-fix #1 from the
original review) is still covered because the `---` → `_` collapse
still happens on the domain suffix where it matters:

```js
const normalizeActionToolName = (toolName) => {
  const delimiterIndex = toolName.lastIndexOf(actionDelimiter);
  if (delimiterIndex === -1) return toolName;
  const prefixEnd = delimiterIndex + actionDelimiter.length;
  const encodedDomain = toolName.slice(prefixEnd);
  return toolName.slice(0, prefixEnd) + encodedDomain.replace(domainSeparatorRegex, '_');
};
```

`registerActionTools` reverts to `sig.name` verbatim — no more
`normalizeActionToolName(sig.name)` ahead of the key build.

**Regression test.** Added
`loadAgentTools distinguishes operationIds that differ only by
---` vs `_`` under `multi-action domain collision regression`. It
loads two actions sharing a hostname whose operationIds are
`get_foo---bar` and `get_foo_bar` respectively, each pointing at a
different path (`/foo-bar`, `/foo_bar`), and asserts that each tool
resolves to its own request builder. Verified the test fails against
f22228e (`hyphenTool` resolves to `/foo_bar` — the sibling's builder)
and passes with this commit.

42/42 tests pass; lint clean.
2026-04-13 09:08:06 -04:00
Danny Avila
275af48592
🎯 fix: MCP Tool Misclassification from Action Delimiter Collision (#12512)
* fix: prevent MCP tools with `_action` in name from being misclassified as OpenAPI action tools

Add `isActionTool()` helper that checks for the `_action_` delimiter
while guarding against cross-delimiter collision with `_mcp_`. Replace
all `includes(actionDelimiter)` classification checks with the new
helper across backend and frontend.

* test: add coverage for MCP/action cross-delimiter collision

Verify that `isActionTool` correctly rejects MCP tool names containing
`_action` and that `loadAgentTools` does not filter them based on
`actionsEnabled`. Add ToolIcon and definitions test cases.

* fix: simplify isActionTool to handle all MCP name patterns

- Use `!toolName.includes('_mcp_')` instead of checking only after the
  first `_action_` occurrence, which missed MCP tools with `_action_` in
  the middle of their name (e.g. `get_action_data_mcp_myserver`).
- Reference `Constants.mcp_delimiter` value via a local const to avoid
  circular import from config.ts, with a comment explaining why.
- Remove dead `actionDelimiter` import from definitions.ts.
- Replace double-filter with single-pass partition in loadToolsForExecution.
- Add test for mid-name `_action_` collision case.

* fix: narrow MCP exclusion to delimiter position in isActionTool

Only reject when `_mcp_` appears after `_action_` (the MCP suffix
position). `_mcp_` before `_action_` is part of the operationId and
is valid — e.g. `sync_mcp_state_action_api---example---com` is a
legitimate action tool whose operationId happens to contain `_mcp_`.

* fix: document positional _mcp_ guard and known RFC-invalid domain limitation

Expand JSDoc on isActionTool to explain the action/MCP format
disambiguation and the theoretical false negative for non-RFC-compliant
domains containing `_mcp_`. Add test documenting this known edge case.
2026-04-01 22:36:21 -04:00
Danny Avila
935288f841
🏗️ feat: 3-Tier MCP Server Architecture with Config-Source Lazy Init (#12435)
* feat: add MCPServerSource type, tenantMcpPolicy schema, and source-based dbSourced wiring

- Add `tenantMcpPolicy` to `mcpSettings` in YAML config schema with
  `enabled`, `maxServersPerTenant`, `allowedTransports`, and `allowedDomains`
- Add `MCPServerSource` type ('yaml' | 'config' | 'user') and `source`
  field to `ParsedServerConfig`
- Change `dbSourced` determination from `!!config.dbId` to
  `config.source === 'user'` across MCPManager, ConnectionsRepository,
  UserConnectionManager, and MCPServerInspector
- Set `source: 'user'` on all DB-sourced servers in ServerConfigsDB

* feat: three-layer MCPServersRegistry with config cache and lazy init

- Add `configCacheRepo` as third repository layer between YAML cache and
  DB for admin-defined config-source MCP servers
- Implement `ensureConfigServers()` that identifies config-override servers
  from resolved `getAppConfig()` mcpConfig, lazily inspects them, and
  caches parsed configs with `source: 'config'`
- Add `lazyInitConfigServer()` with timeout, stub-on-failure, and
  concurrent-init deduplication via `pendingConfigInits` map
- Extend `getAllServerConfigs()` with optional `configServers` param for
  three-way merge: YAML → Config → User
- Add `getServerConfig()` lookup through config cache layer
- Add `invalidateConfigCache()` for clearing config-source inspection
  results on admin config mutations
- Tag `source: 'yaml'` on CACHE-stored servers and `source: 'user'` on
  DB-stored servers in `addServer()` and `addServerStub()`

* feat: wire tenant context into MCP controllers, services, and cache invalidation

- Resolve config-source servers via `getAppConfig({ role, tenantId })`
  in `getMCPTools()` and `getMCPServersList()` controllers
- Pass `ensureConfigServers()` results through `getAllServerConfigs()`
  for three-way merge of YAML + Config + User servers
- Add tenant/role context to `getMCPSetupData()` and connection status
  routes via `getTenantId()` from ALS
- Add `clearMcpConfigCache()` to `invalidateConfigCaches()` so admin
  config mutations trigger re-inspection of config-source MCP servers

* feat: enforce tenantMcpPolicy on admin config mcpServers mutations

- Add `validateMcpServerPolicy()` helper that checks mcpServers against
  operator-defined `tenantMcpPolicy` (enabled, maxServersPerTenant,
  allowedTransports, allowedDomains)
- Wire validation into `upsertConfigOverrides` and `patchConfigField`
  handlers — rejects with 403 when policy is violated
- Infer transport type from config shape (command → stdio, url protocol
  → websocket/sse, type field → streamable-http)
- Validate server domains against policy allowlist when configured

* revert: remove tenantMcpPolicy schema and enforcement

The existing admin config CRUD routes already provide the mechanism
for granular MCP server prepopulation (groups, roles, users). The
tenantMcpPolicy gating adds unnecessary complexity that can be
revisited if needed in the future.

- Remove tenantMcpPolicy from mcpSettings Zod schema
- Remove validateMcpServerPolicy helper and TenantMcpPolicy interface
- Remove policy enforcement from upsertConfigOverrides and
  patchConfigField handlers

* test: update test assertions for source field and config-server wiring

- Use objectContaining in MCPServersRegistry reset test to account for
  new source: 'yaml' field on CACHE-stored configs
- Add getTenantId and ensureConfigServers mocks to MCP route tests
- Add getAppConfig mock to route test Config service mock
- Update getMCPSetupData assertion to expect second options argument
- Update getAllServerConfigs assertions for new configServers parameter

* fix: disconnect active connections when config-source servers are evicted

When admin config overrides change and config-source MCP servers are
removed, the invalidation now proactively disconnects active connections
for evicted servers instead of leaving them lingering until timeout.

- Return evicted server names from invalidateConfigCache()
- Disconnect app-level connections for evicted servers in
  clearMcpConfigCache() via MCPManager.appConnections.disconnect()

* fix: address code review findings (CRITICAL, MAJOR, MINOR)

CRITICAL fixes:
- Scope configCacheRepo keys by config content hash to prevent
  cross-tenant cache poisoning when two tenants define the same
  server name with different configurations
- Change dbSourced checks from `source === 'user'` to
  `source !== 'yaml' && source !== 'config'` so undefined source
  (pre-upgrade cached configs) fails closed to restricted mode

MAJOR fixes:
- Derive OAuth servers from already-computed mcpConfig instead of
  calling getOAuthServers() separately — config-source OAuth servers
  are now properly detected
- Add parseInt radix (10) and NaN guard with fallback to 30_000
  for CONFIG_SERVER_INIT_TIMEOUT_MS
- Add CONFIG_CACHE_NAMESPACE to aggregate-key branch in
  ServerConfigsCacheFactory to avoid SCAN-based Redis stalls
- Remove `if (role || tenantId)` guard in getMCPSetupData — config
  servers now always resolve regardless of tenant context

MINOR fixes:
- Extract resolveAllMcpConfigs() helper in mcp controller to
  eliminate 3x copy-pasted config resolution boilerplate
- Distinguish "not initialized" from real errors in
  clearMcpConfigCache — log actual failures instead of swallowing
- Remove narrative inline comments per style guide
- Remove dead try/catch inside Promise.allSettled in
  ensureConfigServers (inner method never throws)
- Memoize YAML server names to avoid repeated cacheConfigsRepo.getAll()
  calls per request

Test updates:
- Add ensureConfigServers mock to registry test fixtures
- Update getMCPSetupData assertions for inline OAuth derivation

* fix: address code review findings (CRITICAL, MAJOR, MINOR)

CRITICAL fixes:
- Break circular dependency: move CONFIG_CACHE_NAMESPACE from
  MCPServersRegistry to ServerConfigsCacheFactory
- Fix dbSourced fail-closed: use source field when present, fall back to
  legacy dbId check when absent (backward-compatible with pre-upgrade
  cached configs that lack source field)

MAJOR fixes:
- Add CONFIG_CACHE_NAMESPACE to aggregate-key set in
  ServerConfigsCacheFactory to avoid SCAN-based Redis stalls
- Add comprehensive test suite (ensureConfigServers.test.ts, 18 tests)
  covering lazy init, stub-on-failure, cross-tenant isolation via config
  hash keys, concurrent deduplication, merge order, and cache invalidation

MINOR fixes:
- Update MCPServerInspector test assertion for dbSourced change

* fix: restore getServerConfig lookup for config-source servers (NEW-1)

Add configNameToKey map that indexes server name → hash-based cache key
for O(1) lookup by name in getServerConfig. This restores the config
cache layer that was dropped when hash-based keys were introduced.

Without this fix, config-source servers appeared in tool listings
(via getAllServerConfigs) but getServerConfig returned undefined,
breaking all connection and tool call paths.

- Populate configNameToKey in ensureSingleConfigServer
- Clear configNameToKey in invalidateConfigCache and reset
- Clear stale read-through cache entries after lazy init
- Remove dead code in invalidateConfigCache (config.title, key parsing)
- Add getServerConfig tests for config-source server lookup

* fix: eliminate configNameToKey race via caller-provided configServers param

Replace the process-global configNameToKey map (last-writer-wins under
concurrent multi-tenant load) with a configServers parameter on
getServerConfig. Callers pass the pre-resolved config servers map
directly — no shared mutable state, no cross-tenant race.

- Add optional configServers param to getServerConfig; when provided,
  returns matching config directly without any global lookup
- Remove configNameToKey map entirely (was the source of the race)
- Extract server names from cache keys via lastIndexOf in
  invalidateConfigCache (safe for names containing colons)
- Use mcpConfig[serverName] directly in getMCPTools instead of a
  redundant getServerConfig call
- Add cross-tenant isolation test for getServerConfig

* fix: populate read-through cache after config server lazy init

After lazyInitConfigServer succeeds, write the parsed config to
readThroughCache keyed by serverName so that getServerConfig calls
from ConnectionsRepository, UserConnectionManager, and
MCPManager.callTool find the config without needing configServers.

Without this, config-source servers appeared in tool listings but
every connection attempt and tool call returned undefined.

* fix: user-scoped getServerConfig fallback to server-only cache key

When getServerConfig is called with a userId (e.g., from callTool or
UserConnectionManager), the cache key is serverName::userId. Config-source
servers are cached under the server-only key (no userId). Add a fallback
so user-scoped lookups find config-source servers in the read-through cache.

* fix: configCacheRepo fallback, isUserSourced DRY, cross-process race

CRITICAL: Add findInConfigCache fallback in getServerConfig so
config-source servers remain reachable after readThroughCache TTL
expires (5s). Without this, every tool call after 5s returned
undefined for config-source servers.

MAJOR: Extract isUserSourced() helper to mcp/utils.ts and replace
all 5 inline dbSourced ternary expressions (MCPManager x2,
ConnectionsRepository, UserConnectionManager, MCPServerInspector).

MAJOR: Fix cross-process Redis race in lazyInitConfigServer — when
configCacheRepo.add throws (key exists from another process), fall
back to reading the existing entry instead of returning undefined.

MINOR: Parallelize invalidateConfigCache awaits with Promise.all.
Remove redundant .catch(() => {}) inside Promise.allSettled.
Tighten dedup test assertion to toBe(1).
Add TTL-expiry tests for getServerConfig (with and without userId).

* feat: thread configServers through getAppToolFunctions and formatInstructionsForContext

Add optional configServers parameter to getAppToolFunctions,
getInstructions, and formatInstructionsForContext so config-source
server tools and instructions are visible to agent initialization
and context injection paths.

Existing callers (boot-time init, tests) pass no argument and
continue to work unchanged. Agent runtime paths can now thread
resolved config servers from request context.

* fix: stale failure stubs retry after 5 min, upsert for cross-process races

- Add CONFIG_STUB_RETRY_MS (5 min) — stale failure stubs are retried
  instead of permanently disabling config-source servers after transient
  errors (DNS outage, cold-start race)
- Extract upsertConfigCache() helper that tries add then falls back to
  update, preventing cross-process Redis races where a second instance's
  successful inspection result was discarded
- Add test for stale-stub retry after CONFIG_STUB_RETRY_MS

* fix: stamp updatedAt on failure stubs, null-guard callTool config, test cleanup

- Add updatedAt: Date.now() to failure stubs in lazyInitConfigServer so
  CONFIG_STUB_RETRY_MS (5 min) window works correctly — without it, stubs
  were always considered stale (updatedAt ?? 0 → epoch → always expired)
- Add null guard for rawConfig in MCPManager.callTool before passing to
  preProcessGraphTokens — prevents unsafe `as` cast on undefined
- Log double-failure in upsertConfigCache instead of silently swallowing
- Replace module-scope Date.now monkey-patch with jest.useFakeTimers /
  jest.setSystemTime / jest.useRealTimers in ensureConfigServers tests

* fix: server-only readThrough fallback only returns truthy values

Prevents a cached undefined from a prior no-userId lookup from
short-circuiting the DB query on a subsequent userId-scoped lookup.

* fix: remove findInConfigCache to eliminate cross-tenant config leakage

The findInConfigCache prefix scan (serverName:*) could return any
tenant's config after readThrough TTL expires, violating tenant
isolation. Config-source servers are now ONLY resolvable through:

1. The configServers param (callers with tenant context from ALS)
2. The readThrough cache (populated by ensureSingleConfigServer,
   5s TTL, repopulated on every HTTP request via resolveAllMcpConfigs)

Connection/tool-call paths without tenant context rely exclusively on
the readThrough cache. If it expires before the next HTTP request
repopulates it, the server is not found — which is correct because
there is no tenant context to determine which config to return.

- Remove findInConfigCache method and its call in getServerConfig
- Update server-only readThrough fallback to only return truthy values
  (prevents cached undefined from short-circuiting user-scoped DB lookup)
- Update tests to document tenant isolation behavior after cache expiry

* style: fix import order per AGENTS.md conventions

Sort package imports shortest-to-longest, local imports longest-to-shortest
across MCPServersRegistry, ConnectionsRepository, MCPManager,
UserConnectionManager, and MCPServerInspector.

* fix: eliminate cross-tenant readThrough contamination and TTL-expiry tool failures

Thread pre-resolved serverConfig from tool creation context into
callTool, removing dependency on the readThrough cache for config-source
servers. This fixes two issues:

- Cross-tenant contamination: the readThrough cache key was unscoped
  (just serverName), so concurrent multi-tenant requests for same-named
  servers would overwrite each other's entries
- TTL expiry: tool calls happening >5s after config resolution would
  fail with "Configuration not found" because the readThrough entry
  had expired

Changes:
- Add optional serverConfig param to MCPManager.callTool — uses
  provided config directly, falling back to getServerConfig lookup
  for YAML/user servers
- Thread serverConfig from createMCPTool through createToolInstance
  closure to callTool
- Remove readThrough write from ensureSingleConfigServer — config-source
  servers are only accessible via configServers param (tenant-scoped)
- Remove server-only readThrough fallback from getServerConfig
- Increase config cache hash from 8 to 16 hex chars (64-bit)
- Add isUserSourced boundary tests for all source/dbId combinations
- Fix double Object.keys call in getMCPTools controller
- Update test assertions for new getServerConfig behavior

* fix: cache base configs for config-server users; narrow upsertConfigCache error handling

- Refactor getAllServerConfigs to separate base config fetch (YAML + DB)
  from config-server layering. Base configs are cached via readThroughCacheAll
  regardless of whether configServers is provided, eliminating uncached
  MongoDB queries per request for config-server users
- Narrow upsertConfigCache catch to duplicate-key errors only;
  infrastructure errors (Redis timeouts, network failures) now propagate
  instead of being silently swallowed, preventing inspection storms
  during outages

* fix: restore correct merge order and document upsert error matching

- Restore YAML → Config → User DB precedence in getAllServerConfigs
  (user DB servers have highest precedence, matching the JSDoc contract)
- Add source comment on upsertConfigCache duplicate-key detection
  linking to the two cache implementations that define the error message

* feat: complete config-source server support across all execution paths

Wire configServers through the entire agent execution pipeline so
config-source MCP servers are fully functional — not just visible in
listings but executable in agent sessions.

- Thread configServers into handleTools.js agent tool pipeline: resolve
  config servers from tenant context before MCP tool iteration, pass to
  getServerConfig, createMCPTools, and createMCPTool
- Thread configServers into agent instructions pipeline:
  applyContextToAgent → getMCPInstructionsForServers →
  formatInstructionsForContext, resolved in client.js before agent
  context application
- Add configServers param to createMCPTool and createMCPTools for
  reconnect path fallback
- Add source field to redactServerSecrets allowlist for client UI
  differentiation of server tiers
- Narrow invalidateConfigCache to only clear readThroughCacheAll (merged
  results), preserving YAML individual-server readThrough entries
- Update context.spec.ts assertions for new configServers parameter

* fix: add missing mocks for config-source server dependencies in client.test.js

Mock getMCPServersRegistry, getAppConfig, and getTenantId that were added
to client.js but not reflected in the test file's jest.mock declarations.

* fix: update formatInstructionsForContext assertions for configServers param

The test assertions expected formatInstructionsForContext to be called with
only the server names array, but it now receives configServers as a second
argument after the config-source server feature wiring.

* fix: move configServers resolution before MCP tool loop to avoid TDZ

configServers was declared with `let` after the first tool loop but
referenced inside it via getServerConfig(), causing a ReferenceError
temporal dead zone. Move declaration and resolution before the loop,
using tools.some(mcpToolPattern) to gate the async resolution.

* fix: address review findings — cache bypass, discoverServerTools gap, DRY

- #2: getAllServerConfigs now always uses getBaseServerConfigs (cached via
  readThroughCacheAll) instead of bypassing it when configServers is present.
  Extracts user-DB entries from cached base by diffing against YAML keys
  to maintain YAML → Config → User DB merge order without extra MongoDB calls.

- #3: Add configServers param to ToolDiscoveryOptions and thread it through
  discoverServerTools → getServerConfig so config-source servers are
  discoverable during OAuth reconnection flows.

- #6: Replace inline import() type annotations in context.ts with proper
  import type { ParsedServerConfig } per AGENTS.md conventions.

- #7: Extract resolveConfigServers(req) helper in MCP.js and use it from
  handleTools.js and client.js, eliminating the duplicated 6-line config
  resolution pattern.

- #10: Restore removed "why" comment explaining getLoaded() vs getAll()
  choice in getMCPSetupData — documents non-obvious correctness constraint.

- #11: Fix incomplete JSDoc param type on resolveAllMcpConfigs.

* fix: consolidate imports, reorder constants, fix YAML-DB merge edge case

- Merge duplicate @librechat/data-schemas requires in MCP.js into one
- Move resolveConfigServers after module-level constants
- Fix getAllServerConfigs edge case where user-DB entry overriding a
  YAML entry with the same name was excluded from userDbConfigs; now
  uses reference equality check to detect DB-overwritten YAML keys

* fix: replace fragile string-match error detection with proper upsert method

Add upsert() to IServerConfigsRepositoryInterface and all implementations
(InMemory, Redis, RedisAggregateKey, DB). This eliminates the brittle
error message string match ('already exists in cache') in upsertConfigCache
that was the only thing preventing cross-process init races from silently
discarding inspection results.

Each implementation handles add-or-update atomically:
- InMemory: direct Map.set()
- Redis: direct cache.set()
- RedisAggregateKey: read-modify-write under write lock
- DB: delegates to update() (DB servers use explicit add() with ACL setup)

* fix: wire configServers through remaining HTTP endpoints

- getMCPServerById: use resolveAllMcpConfigs instead of bare getServerConfig
- reinitialize route: resolve configServers before getServerConfig
- auth-values route: resolve configServers before getServerConfig
- getOAuthHeaders: accept configServers param, thread from callers
- Update mcp.spec.js tests to mock getAllServerConfigs for GET by name

* fix: thread serverConfig through getConnection for config-source servers

Config-source servers exist only in configCacheRepo, not in YAML cache or
DB. When callTool → getConnection → getUserConnection → getServerConfig
runs without configServers, it returns undefined and throws. Fix by
threading the pre-resolved serverConfig (providedConfig) from callTool
through getConnection → getUserConnection → createUserConnectionInternal,
using it as a fallback before the registry lookup.

* fix: thread configServers through reinit, reconnect, and tool definition paths

Wire configServers through every remaining call chain that creates or
reconnects MCP server connections:

- reinitMCPServer: accepts serverConfig and configServers, uses them for
  getServerConfig fallback, getConnection, and discoverServerTools
- reconnectServer: accepts and passes configServers to reinitMCPServer
- createMCPTools/createMCPTool: pass configServers to reconnectServer
- ToolService.loadToolDefinitionsWrapper: resolves configServers from req,
  passes to both reinitMCPServer call sites
- reinitialize route: passes serverConfig and configServers to reinitMCPServer

* fix: address review findings — simplify merge, harden error paths, fix log labels

- Simplify getAllServerConfigs merge: replace fragile reference-equality
  loop with direct spread { ...yamlConfigs, ...configServers, ...base }
- Guard upsertConfigCache in lazyInitConfigServer catch block so cache
  failures don't mask the original inspection error
- Deduplicate getYamlServerNames cold-start with promise dedup pattern
- Remove dead `if (!mcpConfig)` guard in getMCPSetupData
- Fix hardcoded "App server" in ServerConfigsCacheRedisAggregateKey error
  messages — now uses this.namespace for correct Config/App labeling
- Remove misleading OAuth callback comment about readThrough cache
- Move resolveConfigServers after module-level constants in MCP.js

* fix: clear rejected yamlServerNames promise, fix config-source reinspect, fix reset log label

- Clear yamlServerNamesPromise on rejection so transient cache errors
  don't permanently prevent ensureConfigServers from working
- Skip reinspectServer for config-source servers (source: 'config') in
  reinitMCPServer — they lack a CACHE/DB storage location; retry is
  handled by CONFIG_STUB_RETRY_MS in ensureConfigServers
- Use source field instead of dbId for storageLocation derivation
- Fix remaining hardcoded "App" in reset() leaderCheck message

* fix: persist oauthHeaders in flow state for config-source OAuth servers

The OAuth callback route has no JWT auth context and cannot resolve
config-source server configs. Previously, getOAuthHeaders would silently
return {} for config-source servers, dropping custom token exchange headers.

Now oauthHeaders are persisted in MCPOAuthFlowMetadata during flow
initiation (which has auth context), and the callback reads them from
the stored flow state with a fallback to the registry lookup for
YAML/user-DB servers.

* fix: update tests for getMCPSetupData null guard removal and ToolService mock

- MCP.spec.js: update test to expect graceful handling of null mcpConfig
  instead of a throw (getAllServerConfigs always returns an object)
- MCP.js: add defensive || {} for Object.entries(mcpConfig) in case of
  null from test mocks
- ToolService.spec.js: add missing mock for ~/server/services/MCP
  (resolveConfigServers)

* fix: address review findings — DRY, naming, logging, dead code, defensive guards

- #1: Simplify getAllServerConfigs to single getBaseServerConfigs call,
  eliminating redundant double-fetch of cacheConfigsRepo.getAll()
- #2: Add warning log when oauthHeaders absent from OAuth callback flow state
- #3: Extract resolveAllMcpConfigs to MCP.js service layer; controller
  imports shared helper instead of reimplementing
- #4: Rename _serverConfig/_provider to capturedServerConfig/capturedProvider
  in createToolInstance — these are actively used, not unused
- #5: Log rejected results from ensureConfigServers Promise.allSettled
  so cache errors are visible instead of silently dropped
- #6: Remove dead 'MCP config not found' error handlers from routes
- #7: Document circular-dependency reason for dynamic require in clearMcpConfigCache
- #8: Remove logger.error from withTimeout to prevent double-logging timeouts
- #10: Add explicit userId guard in ServerConfigsDB.upsert with clear error message
- #12: Use spread instead of mutation in addServer for immutability consistency
- Add upsert mock to ensureConfigServers.test.ts DB mock
- Update route tests for resolveAllMcpConfigs import change

* fix: restore correct merge priority, use immutable spread, fix test mock

- getAllServerConfigs: { ...configServers, ...base } so userDB wins over
  configServers, matching documented "User DB (highest)" priority
- lazyInitConfigServer: use immutable spread instead of direct mutation
  for parsedConfig.source, consistent with addServer fix
- Fix test to mock getAllServerConfigs as {} instead of null, remove
  unnecessary || {} defensive guard in getMCPSetupData

* fix: error handling, stable hashing, flatten nesting, remove dead param

- Wrap resolveConfigServers/resolveAllMcpConfigs in try/catch with
  graceful {} fallback so transient DB/cache errors don't crash tool pipeline
- Sort keys in configCacheKey JSON.stringify for deterministic hashing
  regardless of object property insertion order
- Flatten clearMcpConfigCache from 3 nested try-catch to early returns;
  document that user connections are cleaned up lazily (accepted tradeoff)
- Remove dead configServers param from getAppToolFunctions (never passed)
- Add security rationale comment for source field in redactServerSecrets

* fix: use recursive key-sorting replacer in configCacheKey to prevent cross-tenant cache collision

The array replacer in JSON.stringify acts as a property allowlist at
every nesting depth, silently dropping nested keys like headers['X-API-Key'],
oauth.client_secret, etc. Two configs with different nested values but
identical top-level structure produced the same hash, causing cross-tenant
cache hits and potential credential contamination.

Switch to a function replacer that recursively sorts keys at all depths
without dropping any properties.

Also document the known gap in getOAuthServers: config-source OAuth
servers are not covered by auto-reconnection or uninstall cleanup
because callers lack request context.

* fix: move clearMcpConfigCache to packages/api to eliminate circular dependency

The function only depends on MCPServersRegistry and MCPManager, both of
which live in packages/api. Import it directly from @librechat/api in
the CJS layer instead of using dynamic require('~/config').

* chore: imports/fields ordering

* fix: address review findings — error handling, targeted lookup, test gaps

- Narrow resolveAllMcpConfigs catch to only wrap ensureConfigServers so
  getAppConfig/getAllServerConfigs failures propagate instead of masking
  infrastructure errors as empty server lists.
- Use targeted getServerConfig in getMCPServerById instead of fetching
  all server configs for a single-server lookup.
- Forward configServers to inner createMCPTool calls so reconnect path
  works for config-source servers.
- Update getAllServerConfigs JSDoc to document disjoint-key design.
- Add OAuth callback oauthHeaders fallback tests (flow state present
  vs registry fallback).
- Add resolveConfigServers/resolveAllMcpConfigs unit tests covering
  happy path and error propagation.

* fix: add getOAuthReconnectionManager mock to OAuth callback tests

* chore: imports ordering
2026-03-28 10:36:43 -04:00
Danny Avila
6f87b49df8
🛂 fix: Enforce Actions Capability Gate Across All Event-Driven Tool Loading Paths (#12252)
* fix: gate action tools by actions capability in all code paths

Extract resolveAgentCapabilities helper to eliminate 3x-duplicated
capability resolution. Apply early action-tool filtering in both
loadToolDefinitionsWrapper and loadAgentTools non-definitions path.
Gate loadActionToolsForExecution in loadToolsForExecution behind an
actionsEnabled parameter with a cache-based fallback. Replace the
late capability guard in loadAgentTools with a hasActionTools check
to avoid unnecessary loadActionSets DB calls and duplicate warnings.

* fix: thread actionsEnabled through InitializedAgent type

Add actionsEnabled to the loadTools callback return type,
InitializedAgent, and the initializeAgent destructuring/return
so callers can forward the resolved value to loadToolsForExecution
without redundant getEndpointsConfig cache lookups.

* fix: pass actionsEnabled from callers to loadToolsForExecution

Thread actionsEnabled through the agentToolContexts map in
initialize.js (primary and handoff agents) and through
primaryConfig in the openai.js and responses.js controllers,
avoiding per-tool-call capability re-resolution on the hot path.

* test: add regression tests for action capability gating

Test the real exported functions (resolveAgentCapabilities,
loadAgentTools, loadToolsForExecution) with mocked dependencies
instead of shadow re-implementations. Covers definition filtering,
execution gating, actionsEnabled param forwarding, and fallback
capability resolution.

* test: use Constants.EPHEMERAL_AGENT_ID in ephemeral fallback test

Replaces a string guess with the canonical constant to avoid
fragility if the ephemeral detection heuristic changes.

* fix: populate agentToolContexts for addedConvo parallel agents

After processAddedConvo returns, backfill agentToolContexts for
any agents in agentConfigs not already present, so ON_TOOL_EXECUTE
for added-convo agents receives actionsEnabled instead of falling
back to a per-call cache lookup.
2026-03-15 23:01:36 -04:00
Danny Avila
d3c06052d7
🗝️ feat: Credential Variables for DB-Sourced MCP Servers (#12044)
* feat: Allow Credential Variables in Headers for DB-sourced MCP Servers

- Removed the hasCustomUserVars check from ToolService.js, directly retrieving userMCPAuthMap.
- Updated MCPConnectionFactory and related classes to include a dbSourced flag for better handling of database-sourced configurations.
- Added integration tests to ensure proper behavior of dbSourced servers, verifying that sensitive placeholders are not resolved while allowing customUserVars.
- Adjusted various MCP-related files to accommodate the new dbSourced logic, ensuring consistent handling across the codebase.

* chore: MCPConnectionFactory Tests with Additional Flow Metadata for typing

- Updated MCPConnectionFactory tests to include new fields in flowMetadata: serverUrl and state.
- Enhanced mockFlowData in multiple test cases to reflect the updated structure, ensuring comprehensive coverage of the OAuth flow scenarios.
- Added authorization_endpoint to metadata in the test setup for improved validation of the OAuth process.

* refactor: Simplify MCPManager Configuration Handling

- Removed unnecessary type assertions and streamlined the retrieval of server configuration in MCPManager.
- Enhanced the handling of OAuth and database-sourced flags for improved clarity and efficiency.
- Updated tests to reflect changes in user object structure and ensure proper processing of MCP environment variables.

* refactor: Optimize User MCP Auth Map Retrieval in ToolService

- Introduced conditional loading of userMCPAuthMap based on the presence of MCP-delimited tools, improving efficiency by avoiding unnecessary calls.
- Updated the loadToolDefinitionsWrapper and loadAgentTools functions to reflect this change, enhancing overall performance and clarity.

* test: Add userMCPAuthMap gating tests in ToolService

- Introduced new tests to validate the logic for determining if MCP tools are present in the agent's tool list.
- Implemented various scenarios to ensure accurate detection of MCP tools, including edge cases for empty, undefined, and null tool lists.
- Enhanced clarity and coverage of the ToolService capability checking logic.

* refactor: Enhance MCP Environment Variable Processing

- Simplified the handling of the dbSourced parameter in the processMCPEnv function.
- Introduced a failsafe mechanism to derive dbSourced from options if not explicitly provided, improving robustness and clarity in MCP environment variable processing.

* refactor: Update Regex Patterns for Credential Placeholders in ServerConfigsDB

- Modified regex patterns to include additional credential/env placeholders that should not be allowed in user-provided configurations.
- Clarified comments to emphasize the security risks associated with credential exfiltration when MCP servers are shared between users.

* chore: field order

* refactor: Clean Up dbSourced Parameter Handling in processMCPEnv

- Reintroduced the failsafe mechanism for deriving the dbSourced parameter from options, ensuring clarity and robustness in MCP environment variable processing.
- Enhanced code readability by maintaining consistent comment structure.

* refactor: Update MCPOptions Type to Include Optional dbId

- Modified the processMCPEnv function to extend the MCPOptions type, allowing for an optional dbId property.
- Simplified the logic for deriving the dbSourced parameter by directly checking the dbId property, enhancing code clarity and maintainability.
2026-03-03 18:02:37 -05:00
Danny Avila
3d98194a99
🦥 feat: Add Deferred Tools as Agents Capability (#11295) 2026-01-28 17:44:30 -05:00