Commit graph

32 commits

Author SHA1 Message Date
Ravi Kumar L
27b0782201
📛 feat: Tag Langfuse Traces With Tenant ID (#13808)
* feat: tag Langfuse traces with tenant id

* fix: propagate tenant id to agent Langfuse config
2026-06-17 20:27:55 -04:00
Danny Avila
3c3837bb7d
🧾 fix: Bill Subagent Child-Run Model Usage in Parent Transactions (#13683)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
GitNexus Index / index (push) Waiting to run
GitNexus Index / post-index (push) Blocked by required conditions
* 🧾 fix: Bill Subagent Child-Run Model Usage in Parent Transactions

* 🩹 fix: Type Subagent Usage Sink Structurally Until SDK Release

* 🔧 chore: Update @librechat/agents dependency to version 3.2.35 in package-lock.json and related package.json files
2026-06-13 14:55:48 -04:00
Danny Avila
139d61c437
🚐 fix: Reuse Request-Scoped MCP Connections per Run (#13673)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
GitNexus Index / index (push) Waiting to run
GitNexus Index / post-index (push) Blocked by required conditions
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions
Sync Helm Chart Tags / Ignore non-main push (push) Waiting to run
Sync Helm Chart Tags / Sync chart tags (push) Waiting to run
* fix(mcp): reuse request-scoped connections per run

* test(mcp): update connection factory defaults
2026-06-11 01:17:14 -04:00
Danny Avila
65bca95023
🎒 fix: Carry Request-Scoped MCP Tools into PTC Execution (#13669)
* fix(mcp): preserve request-scoped tools for PTC execution

* fix(mcp): preserve run-scoped tools on initialized agents
2026-06-10 23:48:04 -04:00
Dustin Healy
5867f1a065
🛡️ feat: Configurable Message PII Filter (#13602)
* 🛡️ feat: Reject chat messages matching configured credential patterns

Adds an opt-in `messagePiiFilter` middleware mounted on the agent
chat route ahead of `moderateText`. When the configured patterns
match the user's input the request is refused with 400, so the
credential never reaches OpenAI moderation, the model, or MongoDB.
Three starter patterns ship by default and operators can subset
them or add their own regex via `customPatterns` in librechat.yaml.

* 🧪 test: Memoize compiled patterns + add middleware spec

Memoize the compiled pattern array via a WeakMap keyed by the
messagePiiFilter config object so repeat requests against the same
config skip the per-request RegExp construction. Cache entries are
released automatically when the config object itself rotates.

Adds packages/api/src/middleware/messagePiiFilter.spec.ts covering
the default-starter rejections, the starterPatterns subset and
empty-array semantics, customPatterns matching layered on top of and
in place of the starters, the no-config and empty-text pass-through
paths, and a memoization regression check.

* 🛡️ fix: Skip invalid customPattern regexes instead of crashing the request

Admin DB overrides for `messagePiiFilter.customPatterns` reach
`req.config` via `mergeConfigOverrides`, which deep-merges raw
override values without re-running `configSchema`. A typo'd regex
like `(` would slip past the YAML-load validation and throw inside
`new RegExp(...)` during `compile()`, returning 500 for every chat
request until the operator rolled the override back.

Wrapped the per-pattern compile in a try/catch that logs the
invalid pattern id + reason and skips it, so other valid patterns
(starters and other custom entries) keep filtering. Added a
regression test alongside the existing spec.

* 🛡️ feat: Extend PII filter to OpenAI-compatible and Responses agent APIs

The chat-route middleware operates on `req.body.text`, but the remote
agent API endpoints (`/api/agents/v1/chat/completions`,
`/api/agents/v1/responses`) accept the same prompt content as a
`messages` array or an `input` field. A caller using their API key
could send a credential-shaped value through either route and bypass
the configured PII filter even though they share the same agent and
model backbone the middleware is meant to guard.

Factored out `findPiiMatchInMessages`, a tolerant walker that handles
both `content: string` and `content: ContentPart[]` user-message
shapes against the same compiled, cached pattern list. Wired it into
the OpenAI-compat controller after agent lookup and into the
Responses controller right after `convertToInternalMessages`. Each
returns the endpoint's native 400 error shape
(`sendErrorResponse` / `sendResponsesErrorResponse`) with the
`message_pii_filter_block` code when a user message matches.

* 🩹 test: Add findPiiMatchInMessages to OpenAI + Responses controller mocks

The OpenAI-compat and Responses controller specs mock `@librechat/api`
with a hand-listed object. The new `findPiiMatchInMessages` export
wired into both controllers in 3ea35af9a was missing from those
mocks, so the production lookup returned undefined and the controllers
threw at request time under jest. Added the missing entries (default
mock: returns null so the handlers fall through to the existing happy
paths). All 278 agents-controller tests pass locally.

* 🧹 refactor: Namespace messagePiiFilter under messageFilter.pii + fix import order

Renames the yaml field `messagePiiFilter` to `messageFilter.pii`, the
module to `messageFilterPii`, the factory to `createMessageFilterPii`,
the type to `MessageFilterPiiConfig`, and the error code to
`message_filter_pii_block`. The wrapper `messageFilter` namespace
gives future safety filters (e.g. `messageFilter.toxicity`) a place
to plug in without restructuring the config later. The
`findPiiMatchInMessages` helper kept its name because it already
describes what it does at the value level.

Also fixes import order Danny flagged on the OpenAI-compatible and
Responses controllers: `findPiiMatchInMessages` was appended at the
bottom of two `require('@librechat/api')` destructures rather than
placed in the length-sorted slot the house style expects.

* 🧹 chore: Length-sort the general require destructure in responses.js

Reorders the general sub-group inside the `require('@librechat/api')`
destructure shortest to longest so the whole block conforms to the
length-sort rule the file's `// Responses API` sub-group already
follows. Pure reorder, no other changes.

* 🧹 chore: Length-sort the defaultConfig block in AppService

Reorders the `defaultConfig` keys in `packages/data-schemas/src/app/service.ts`
shortest-line to longest-line, with the explicit-value entries
(`mcpConfig`, `fileStrategies`, `cloudfront`) trailing the shorthand
ones. Pure reorder, no behavior change.
2026-06-10 09:03:05 -04:00
Danny Avila
2c8d54e18c
🗂️ feat: Add Deployment Skill Directory (#13523)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
GitNexus Index / index (push) Waiting to run
GitNexus Index / post-index (push) Blocked by required conditions
* feat: Add deployment skill directory

* chore: Address deployment skill review feedback

* fix: Include deployment skill file metadata

* test: Add deployment skills e2e smoke test
2026-06-05 10:24:28 -04:00
Danny Avila
1da789bac0
🗂️ feat: Add Agent File Authoring Tools (#13435)
* feat: add agent file authoring tools

* style: format file authoring changes

* style: satisfy file authoring prettier

* test: fix file authoring initialization expectations

* fix: complete skill file authoring flow

* fix: pass skill authoring state on edit

* test: mock missing bundled skill file

* fix: harden agent file authoring gates

* fix: preserve file authoring runtime context

* test: fix authoring context mock typing

* fix: preserve subagent skill primes

* test: avoid array at in handler spec

* refactor: deepen skill authoring runtime wiring

* fix: address codex authoring review findings

* test: fix authoring collision fixture type

* test: add skill file authoring mock e2e

* fix: Improve skill file authoring recovery

* fix: Show file authoring args while running

* fix: Clarify skill rename authoring errors

* fix: Keep code-only file authoring schemas sandbox scoped

* fix: Address skill authoring review findings

* fix: Gate skill authoring on write access
2026-06-03 23:58:12 -04:00
Danny Avila
596f806f60 🛡️ fix: Strict Opt-In Skills Activation per Agent (#12823)
* 🛡️ fix: Strict opt-in skills activation per agent

Skills were activating on every agent run that had the capability +
RBAC enabled, regardless of whether the user (ephemeral) or author
(persisted) had opted in. `scopeSkillIds(undefined)` fell through to
"full accessible catalog" whenever `agent.skills` was unset, which is
the default state for any agent created before skills existed and for
every ephemeral agent.

Activation now requires an explicit signal:
- Ephemeral agent → per-conversation skills badge toggle.
- Persisted agent → new `skills_enabled` master switch on the agent
  doc, surfaced as a toggle in the Agent Builder skills section.
  Enabled + empty/undefined allowlist = full accessible catalog;
  enabled + non-empty allowlist = narrow to those ids; disabled (or
  undefined) = no skills available, even if an allowlist is set.

Centralised the predicate in `resolveAgentScopedSkillIds` so the
primary-agent path, handoff/discovery, the subagent loop, and both
OpenAI controllers all share one source of truth. Frontend `$`
popover scope mirrors the same logic so the UI never offers skills
the backend would refuse to activate.

* test: mock resolveAgentScopedSkillIds in agent controller specs

* refactor: address review findings on skills opt-in PR

- AgentConfig: associate skills label with toggle via htmlFor for
  click/keyboard affordance; simplify Switch handler to Boolean(value).
- skills: mark scopeSkillIds as @internal so runtime callers continue
  to route through resolveAgentScopedSkillIds and inherit the activation
  predicate (ephemeral toggle, persisted skills_enabled).

* fix(agents): include skills_enabled in agent list projection

Without this field, agents loaded via the list endpoint hydrate into the
client agentsMap with skills_enabled === undefined, causing the `$`
skill popover to hide every skill on a fresh page load even when the
agent was saved with skills_enabled: true.

* fix(skills): fail closed for persisted agents during agentsMap hydration

Returning undefined while the agents map loads let the popover render the
full catalog for a persisted agent before we could read its
skills_enabled flag, so the user could pick a skill the backend would
then refuse for the turn. Match the strict opt-in contract by returning
[] until the map is authoritative.

* refactor(skills): extract skillsHintKey for readability

Replaces the nested ternary in the skills section JSX with a
pre-computed constant so the activation -> hint key mapping reads
top-down.

* refactor(skills): unflatten skillsHintKey to remove nested ternary
2026-04-25 04:02:01 -04:00
Danny Avila
35bf04b26c 🧰 refactor: Unify code-execution tools (#12767)
* 🛠️ feat: Add registerCodeExecutionTools helper

Idempotently registers `bash_tool` + `read_file` in the run's tool
registry and tool-definition list via a registry `.has()` dedupe. Sets
up the single code-execution tool path shared by:

- `initializeAgent` (when an agent has `execute_code` in its tools and
  the capability is enabled for the run)
- `injectSkillCatalog` (when skills are active; unconditional read_file,
  bash_tool follows `codeEnvAvailable`)

Both callers reach the helper in the same initialization sequence, so
the second call becomes a no-op and exactly one copy of each tool
reaches the LLM — no more double registration for agents that combine
`execute_code` capability with active skills.

Unit-tested on a fresh run, idempotence (second call, overlap with
prior tooldefs, partial overlap), and the no-registry variant.

* 🔀 refactor: Route injectSkillCatalog bash_tool + read_file through registerCodeExecutionTools

The `skill` tool is still registered inline (it's skill-path-specific),
but `bash_tool` + `read_file` now flow through the shared idempotent
helper so a prior registration from the execute_code path doesn't
produce a duplicate copy later in the same run. Behavior preserved:

- `read_file` always registers when any active skill is in scope —
  manually-primed `disable-model-invocation: true` skills still need it
  to load `references/*` from storage.
- `bash_tool` follows `codeEnvAvailable` exactly as before.

Adds a test pinning the cross-call dedupe: when `injectSkillCatalog`
runs AFTER `registerCodeExecutionTools` has already seeded the registry
+ tool definitions with bash_tool/read_file, the resulting
`toolDefinitions` still contains exactly one copy of each.

* 🪄 feat: Expand `execute_code` tool name into bash_tool + read_file at initialize-time

When an agent's `tools` include `execute_code` and the `execute_code`
capability is enabled for the run, `initializeAgent` now registers
`bash_tool` + `read_file` via `registerCodeExecutionTools` before
`injectSkillCatalog`. The legacy `execute_code` tool definition is no
longer handed to the LLM — `execute_code` remains on the agent
document as a capability-trigger marker, but the runtime expands it
into the skill-flavored tool pair.

Call ordering matters: the `execute_code` registration runs BEFORE
`injectSkillCatalog`, so the skill path's own `registerCodeExecutionTools`
call inside `injectSkillCatalog` becomes a no-op via the registry's
`.has()` check. Exactly one copy of each tool reaches the LLM whether
the agent has:

- only `execute_code` (legacy path)
- only skills
- both

No data migration needed — `agent.tools: ['execute_code']` stays in
the DB unchanged; the expansion is a runtime operation.

Three tests cover the matrix: execute_code + capability on →
bash_tool + read_file registered; execute_code + capability off →
neither registered; no execute_code + capability on → neither
registered.

* 🗑️ refactor: Drop CodeExecutionToolDefinition from the builtin registry

Removes the legacy `execute_code` entry from `agentToolDefinitions` and
the corresponding import. With the initialize-time expansion in place,
nothing consults `getToolDefinition('execute_code')` for a tool schema
any more — the capability gate still filters on the string
`execute_code`, but the actual tool definitions the LLM sees come from
`registerCodeExecutionTools` (i.e. `bash_tool` + `read_file`).

`loadToolDefinitions` in `packages/api/src/tools/definitions.ts`
silently drops `execute_code` when it no longer resolves in the
registry — that's the expected path and is now covered by an updated
test. No caller of `getToolDefinition('execute_code')` expects a
non-undefined result after this change.

* 🔌 refactor: Read CODE_API_KEY from env for primeCodeFiles + PTC

Finishes the Phase 4 server-env-keyed rollout on the two remaining
`loadAuthValues({ authFields: [EnvVar.CODE_API_KEY] })` sites in
`ToolService.js`:

- `primeCodeFiles` (user-attached file priming on execute_code agents)
- Programmatic Tool Calling (`createProgrammaticToolCallingTool`)

Both now read `process.env[EnvVar.CODE_API_KEY]` directly, matching
`bash_tool`'s pattern. The per-user plugin-auth path is no longer
consulted for code-env credentials anywhere in the hot path — the
agents library owns the actual tool-call execution and also reads the
env var internally.

Priming still fires for existing user-file workflows so the legacy
`toolContextMap[execute_code]` hint ("files available at /mnt/data/...")
stays in the prompt; only the key lookup changed.

* 🔧 fix: Type the pre-seeded dedupe-test tools as LCTool

CI TypeScript type checks caught `{ parameters: {} }` in the new
cross-call dedupe test: `LCTool.parameters` is a `JsonSchemaType`,
not `{}`. Use `{ type: 'object', properties: {} }` and type the
local registry Map through the parameter-derived shape so the
pre-seeded values match what `toolRegistry.set` expects.

* 🛡️ fix: Run execute_code expansion before GOOGLE_TOOL_CONFLICT gate

Codex review caught a latent regression: the original Phase 8 placement
ran `registerCodeExecutionTools` after `hasAgentTools` was computed,
so an execute-code-only agent on Google/Vertex with provider-specific
`options.tools` populated would no longer trip `GOOGLE_TOOL_CONFLICT`
— the legacy `CodeExecutionToolDefinition` used to populate
`toolDefinitions` before the guard, but after dropping it from the
registry, `toolDefinitions` stayed empty until my expansion ran
downstream of the guard. Mixed provider + agent tools would silently
flow through to the LLM.

Fix moves the `execute_code` expansion to BEFORE `hasAgentTools`
computation. `bash_tool` + `read_file` now contribute to the check
the same way the legacy `execute_code` def did. Covered by a new
test that pins the Google+execute_code+provider-tools scenario —
the `rejects.toThrow(/google_tool_conflict/)` path would have
silently passed on the prior placement.

* 🔗 fix: Thread codeEnvAvailable through handoff sub-agents

Round-2 codex review caught the other half of the execute_code
expansion gap: `discoverConnectedAgents` omitted `codeEnvAvailable`
from its forwarded `initializeAgent` params, so handoff sub-agents
with `agent.tools: ['execute_code']` lost the `bash_tool` + `read_file`
registration (pre-Phase 8 the legacy `CodeExecutionToolDefinition`
would have landed in their `toolDefinitions` via the registry).

- Add `codeEnvAvailable?` to `DiscoverConnectedAgentsParams` and
  forward it verbatim on every sub-agent `initializeAgent` call.
- Update the three JS call sites that construct the primary's
  `codeEnvAvailable` (`services/Endpoints/agents/initialize.js`,
  `controllers/agents/openai.js`, `controllers/agents/responses.js`)
  to pass the same flag into `discoverConnectedAgents` — one
  authoritative source per request.
- Two regression tests in `discovery.spec.ts` pin the true/false
  passthrough so a future refactor that drops the param-forwarding
  surfaces immediately.

Left intentionally unchanged: `packages/api/src/agents/openai/service.ts`
(public API helper with no in-repo caller). External consumers of
`createAgentChatCompletion` who want code execution should pass a
`codeEnvAvailable`-aware `initializeAgent` via `deps` — documenting
the full public-API surface is out of scope for this Phase 8 PR.

* 🔗 fix: Thread codeEnvAvailable through addedConvo + memory-agent paths

Round-3 codex review caught the last two production `initializeAgent`
callers missing the Phase-8 capability flag:

- `api/server/services/Endpoints/agents/addedConvo.js` (multi-convo
  parallel agent execution). Added `codeEnvAvailable` to
  `processAddedConvo`'s destructured params and forwarded it into
  the per-added-agent `initializeAgent` call. Caller in
  `api/server/services/Endpoints/agents/initialize.js` passes the
  same `codeEnvAvailable` it computed for the primary.
- `api/server/controllers/agents/client.js` (`useMemory` — memory
  extraction agent). Computes its own `codeEnvAvailable` from
  `appConfig?.endpoints?.[EModelEndpoint.agents]?.capabilities` and
  forwards into `initializeAgent`. Memory agents rarely list
  `execute_code`, but if one does, pre-Phase 8 they got the legacy
  `execute_code` tool registered unconditionally — the passthrough
  restores parity.

With this, every production caller of `initializeAgent` explicitly
resolves the capability: main chat flow (primary + handoff), OpenAI
chat completions (primary + handoff), Responses API (primary + handoff),
added convo parallel agents, and memory agents. The one remaining
caller, `packages/api/src/agents/openai/service.ts::createAgentChatCompletion`,
is a public API helper with no in-repo consumer (external callers
must pass a capability-aware `initializeAgent` via `deps`).

* 🪤 fix: Remove duplicate appConfig declaration causing TDZ ReferenceError

The Responses API controller had TWO `const appConfig = req.config;`
bindings inside `createResponse`: one at the top of the function
(added by the Phase 4 `bash_tool` decouple) and one inside the try
block (added by the polish PR #12760). Because `const` is block-scoped
with a temporal dead zone, the inner redeclaration put `appConfig` in
TDZ for the entire try block, so any earlier reference inside the
try — notably `appConfig?.endpoints?.[EModelEndpoint.agents]?.allowedProviders`
at line 348 — threw `ReferenceError: Cannot access 'appConfig'
before initialization`. The error was silently swallowed by the
outer try/catch, leaving `recordCollectedUsage` unreached and the
six `responses.unit.spec.js` token-usage tests failing.

Removing the inner redeclaration fixes the six failing tests
(verified: 11/11 pass locally post-fix, 0 regressions elsewhere).
The outer function-scoped binding already provides `appConfig` to
every downstream reference.

* 🔗 fix: Thread codeEnvAvailable through the OpenAI chat-completion public API

Round-4 codex review (legitimate on the type-safety angle, even though the
runtime concern was already covered): the `createAgentChatCompletion`
helper defines its own narrower `InitializeAgentParams` interface locally,
and the type was missing `codeEnvAvailable`. External consumers who
supply a capability-aware `deps.initializeAgent` couldn't route
`codeEnvAvailable` through without a type-cast workaround.

- Widen the local `InitializeAgentParams` interface to include
  `codeEnvAvailable?: boolean` (matches the real
  `packages/api/src/agents/initialize.ts` type).
- Derive `codeEnvAvailable` inside `createAgentChatCompletion` from
  `deps.appConfig?.endpoints?.agents?.capabilities` (the same source
  the in-repo controllers use) and forward to `deps.initializeAgent`.
  Uses a string literal `'execute_code'` lookup so this file stays free
  of a `librechat-data-provider` import — keeping the dependency surface
  of the public helper minimal.

With this, external consumers of `createAgentChatCompletion` who pass
`appConfig` with the agents capabilities get `bash_tool` + `read_file`
registration automatically; consumers who don't pass `appConfig` retain
the existing "explicit opt-in" semantics (the flag stays `undefined`,
expansion is skipped).

* 🧹 chore: Review-driven polish — observability log, JSDoc DRY, test gaps, no-op allocation

Addresses the comprehensive review of PR #12767:

- **Finding #1** (MINOR, observability): `initializeAgent` now emits a
  debug log when an agent lists `execute_code` in its tools but the
  runtime gate is off (`params.codeEnvAvailable` !== true). The
  event-driven `loadToolDefinitionsWrapper` path doesn't log
  capability-disabled warnings, so without this the tool silently
  vanishes from the LLM's definitions with zero trace. Operators
  debugging "why isn't code interpreter working?" now get a signal at
  the initialize layer.

- **Finding #5** (NIT, allocation): `registerCodeExecutionTools` now
  returns the input `toolDefinitions` array by reference on the no-op
  path (both tools already registered by a prior caller in the same
  run) instead of allocating a fresh spread array every time. The
  common dual-call scenario — `initializeAgent` then
  `injectSkillCatalog` — saves one O(n) copy per request.

- **Finding #4** (NIT, DRY): Collapsed the duplicated 6-line JSDoc
  comment in `openai.js`, `responses.js`, and `addedConvo.js` into
  either a one-line `@see DiscoverConnectedAgentsParams.codeEnvAvailable`
  pointer (the two JS call sites) or a compact 3-line block referring
  back to the canonical source (addedConvo's @param).

- **Finding #2** (MINOR, test gap): Added
  `api/server/services/Endpoints/agents/addedConvo.spec.js` with three
  cases covering `codeEnvAvailable=true`, `codeEnvAvailable=false`,
  and omitted (undefined) passthrough. A future refactor that drops
  the param from destructuring now surfaces here instead of silently
  regressing multi-convo parallel agents with `execute_code`.

- **Finding #3** (MINOR, test gap): Added
  `api/server/controllers/agents/__tests__/client.memory.spec.js`
  pinning the capability-flag derivation that `AgentClient::useMemory`
  uses — six cases covering present/absent/null/undefined config shapes
  plus an enum-literal pin (`'execute_code'` / `'agents'`). Catches
  enum renames or config-path shifts that would otherwise silently
  strip `bash_tool` + `read_file` from memory agents.

Finding #7 (jest.mock scoping, confidence 40) left as-is: the
reviewer's own risk assessment noted `buildToolSet` doesn't touch
the mocked exports, and restructuring a file-level `jest.mock` to
`jest.doMock` + dynamic `import()` introduces more complexity than
the speculative risk justifies. The existing mock is scoped to the
test file and contains the same stubs the adjacent
`skills.test.ts` already uses.

Finding #6 (PR description commit count) addressed out-of-band via
PR description update.

All existing tests pass, typecheck clean, lint clean across touched
files. New tests: 9 cases across 2 new spec files.

* 🧽 refactor: Replace hardcoded 'execute_code' string with AgentCapabilities enum in service.ts

Follow-up review (conf 55) caught that `openai/service.ts`'s Phase 8
`codeEnvAvailable` derivation used the literal `'execute_code'` while
every in-repo controller uses `AgentCapabilities.execute_code` from
`librechat-data-provider`. The file deliberately uses local type
interfaces to keep the public API helper's type surface small, but
that pattern was never a ban on single-value imports from the data
provider — `packages/api` already depends on it. Importing the enum
value means a future rename of `AgentCapabilities.execute_code`
propagates to this file automatically, matching the in-repo
controllers' behavior.

Other follow-up findings left as-is per the reviewer's own verdict:

- #2 (memory spec mirrors the production expression rather than
  calling `AgentClient::useMemory` directly): reviewer flagged as
  "not blocking" / "design-philosophy observation." The test file's
  JSDoc already explicitly documents the tradeoff and pins the enum
  literals to catch the most likely drift vector. Standing up
  `AgentClient` + all its mocks for a one-line regression guard is
  disproportionate.
- #3 (`addedConvo.spec.js` mock signature vs. underlying
  `loadAddedAgent` arity): reviewer's own confidence 25 noted the
  mock matches the wrapper's actual call pattern in the production
  file. Not a real gap.
- #4 was self-retracted as a false alarm.

* 🗑️ refactor: Fully deprecate CODE_API_KEY — remove all LibreChat-side references

The code-execution sandbox no longer authenticates via a per-run
`CODE_API_KEY` (frontend or backend). Auth moved server-side into the
agents library / sandbox service, so LibreChat drops every reference:

**Backend plumbing:**
- `api/server/services/Files/Code/crud.js`: `getCodeOutputDownloadStream`,
  `uploadCodeEnvFile`, `batchUploadCodeEnvFiles` no longer accept
  `apiKey` or send the `X-API-Key` header.
- `api/server/services/Files/Code/process.js`: `processCodeOutput`,
  `getSessionInfo`, `primeFiles` drop the `apiKey` param throughout.
- `api/server/services/ToolService.js`: stop reading
  `process.env[EnvVar.CODE_API_KEY]` for `primeCodeFiles` and PTC; the
  agents library handles auth internally. Remove the now-dead
  `loadAuthValues` + `EnvVar` imports. Drop the misleading
  "LIBRECHAT_CODE_API_KEY" hint from the bash_tool error log.
- `api/server/services/Files/process.js`: remove the `loadAuthValues`
  call around `uploadCodeEnvFile`.
- `api/server/routes/files/files.js`: code-env file download no longer
  fetches a per-user key.
- `api/server/controllers/tools.js`: `execute_code` is no longer a
  tool that needs verifyToolAuth with `[EnvVar.CODE_API_KEY]` — the
  endpoint always reports system-authenticated so the client skips
  the key-entry dialog. `processCodeOutput` called without `apiKey`.
- `api/server/controllers/agents/callbacks.js`: `processCodeOutput`
  invoked without the loadAuthValues round trip, for both LegacyHandler
  and Responses-API handlers.
- `api/app/clients/tools/util/handleTools.js`: `createCodeExecutionTool`
  called with just `user_id` + files.

**packages/api:**
- `packages/api/src/agents/skillFiles.ts`: `PrimeSkillFilesParams`,
  `PrimeInvokedSkillsDeps`, `primeSkillFiles`, `primeInvokedSkills` all
  drop the `apiKey` param; the gate is purely `codeEnvAvailable`.
- `packages/api/src/agents/handlers.ts`: `handleSkillToolCall` drops
  the `process.env[EnvVar.CODE_API_KEY]` read; skill-file priming is
  now gated solely on `codeEnvAvailable`. `ToolExecuteOptions`
  signatures drop apiKey from `batchUploadCodeEnvFiles` and
  `getSessionInfo`.
- `packages/api/src/agents/skillConfigurable.ts`: JSDoc no longer
  references the env var.
- `packages/api/src/tools/classification.ts`: PTC creation no longer
  gated on `loadAuthValues`; `buildToolClassification` drops the
  `loadAuthValues` dep entirely (no LibreChat-side callers need it for
  this path anymore).
- `packages/api/src/tools/definitions.ts`: `LoadToolDefinitionsDeps`
  drops the `loadAuthValues` field.

**Frontend:**
- Delete `client/src/hooks/Plugins/useAuthCodeTool.ts`,
  `useCodeApiKeyForm.ts`, and
  `client/src/components/SidePanel/Agents/Code/ApiKeyDialog.tsx` —
  the install/revoke dialogs for CODE_API_KEY are fully dead.
- `BadgeRowContext.tsx`: drop `codeApiKeyForm` from the context type and
  provider. `codeInterpreter` toggle treated as always authenticated
  (sandbox auth is server-side).
- `ToolsDropdown.tsx`, `ToolDialogs.tsx`, `CodeInterpreter.tsx`,
  `RunCode.tsx`, `SidePanel/Agents/Code/Action.tsx` +`Form.tsx`: all
  API-key dialog trigger refs, "Configure code interpreter" gear
  buttons, and auth-verification plumbing removed. The
  "Code Interpreter" toggle is now a plain `AgentCapabilities.execute_code`
  checkbox — no key-entry gate.
- `client/src/locales/en/translation.json`: drop the three
  `com_ui_librechat_code_api*` keys and `com_ui_add_code_interpreter_api_key`.
  Other locales are externally automated per CLAUDE.md.

**Config:**
- `.env.example`: remove the `# LIBRECHAT_CODE_API_KEY=your-key` section
  and its header.

**Tests:**
- `crud.spec.js`: assertions flipped to pin "no X-API-Key header" and
  "no apiKey param".
- `skillFiles.spec.ts`: removed env-var save/restore; tests now pin
  that the batch-upload path is gated solely on `codeEnvAvailable` and
  that no apiKey is threaded through.
- `handlers.spec.ts`: same — just the `codeEnvAvailable` gate pins
  remain.
- `classification.spec.ts`: remove the two tests that asserted
  `loadAuthValues` was (not) called for PTC.
- `definitions.spec.ts`: drop every `loadAuthValues: mockLoadAuthValues`
  entry from the deps shape.
- `process.spec.js`: strip the mock of `EnvVar.CODE_API_KEY`.

**Comment hygiene:**
- `tools.ts`, `initialize.ts`, `registry/definitions.ts`: shortened
  stale comment references to "legacy `execute_code` tool" without
  naming the retired env var.

Tests verified: 678 packages/api tests pass, 836 backend api tests
pass. Typecheck clean, lint clean. Only remaining CODE_API_KEY
mentions in the code are two regression-guard assertions:
- `crud.spec.js`: pins "no X-API-Key header" stays absent.
- `skillConfigurable.spec.ts`: pins `configurable` never grows a
  `codeApiKey` field.

* 🧹 chore: Remove the last two CODE_API_KEY name mentions in LibreChat

Follow-up to the prior full deprecation commit: two tests still named
the retired identifier in their regression-guard assertions.

- `packages/api/src/agents/skillConfigurable.spec.ts`: drop the
  "does not inject a codeApiKey key" test. The `codeApiKey` field is
  gone from the production configurable shape, so an absence-assertion
  naming it re-introduces the retired identifier in code.
- `api/server/services/Files/Code/crud.spec.js`: rename the
  "without an X-API-Key header" case back to "should request stream
  response from the correct URL" and drop the
  `expect(headers).not.toHaveProperty('X-API-Key')` assertion. The
  surrounding request-shape checks (URL, timeout, responseType) still
  pin the behavior; the explicit header-absence line was named-after
  the deprecated contract.

Result: `grep -rn "CODE_API_KEY\|codeApiKey\|LIBRECHAT_CODE_API_KEY"`
against the LibreChat source tree returns zero hits. The only
remaining `X-API-Key` strings in this repo are on unrelated OpenAPI
Action + MCP server auth configurations, where the string is
user-facing config, not a LibreChat-owned identifier.

Tests: 677 packages/api pass (2 pre-existing summarization e2e failures
unrelated); 126 api-workspace controller/service tests pass.
Typecheck and lint clean.

* 🎯 fix: Narrow codeEnvAvailable to per-agent (admin cap AND agent.tools)

Before this commit, `codeEnvAvailable` was computed in the three JS
controllers as the admin-level capability flag only
(`enabledCapabilities.has(AgentCapabilities.execute_code)`) and passed
through `initializeAgent` → `injectSkillCatalog` / `primeInvokedSkills` /
`enrichWithSkillConfigurable` unchanged. A skills-only agent whose
`tools` array didn't include `execute_code` still got `bash_tool`
registered (via `injectSkillCatalog`) and skill files re-primed to the
sandbox on every turn — wrong, because the agent never opted in to
code execution.

**Fix:** `initializeAgent` now computes the per-agent effective value
once as `params.codeEnvAvailable === true && agent.tools.includes(Tools.execute_code)`,
reuses the same boolean for:

1. The `execute_code` → `bash_tool + read_file` expansion gate
   (previously already consulted `agent.tools`; now shares the single
   `effectiveCodeEnvAvailable` binding).
2. The `injectSkillCatalog` call (previously got the raw admin flag).
3. The returned `InitializedAgent.codeEnvAvailable` field (new, typed as
   required boolean).

**Controllers (initialize.js, openai.js, responses.js):** store
`primaryConfig.codeEnvAvailable` in `agentToolContexts.set(primaryId, ...)`,
capture `config.codeEnvAvailable` in every handoff `onAgentInitialized`
callback, and read it from the per-agent ctx inside the
`toolExecuteOptions.loadTools` runtime closure. The hoisted
`const codeEnvAvailable = enabledCapabilities.has(...)` locals in the
two OpenAI-compat controllers are gone — they were shadowing the
narrowed per-agent value.

**primeInvokedSkills:** `handlePrimeInvokedSkills` in
`services/Endpoints/agents/initialize.js` now uses
`primaryConfig.codeEnvAvailable` (per-agent, narrowed) instead of the
raw admin flag. A skills-only primary agent won't re-prime historical
skill files to the sandbox even when the admin enabled the capability
globally.

**Efficiency:** one extra `&&` in `initializeAgent`. No runtime hot-path
cost — the `includes()` scan on `agent.tools` was already happening for
the `execute_code` expansion gate; it's now just bound to a local. Tool
execution closures read `ctx.codeEnvAvailable === true` (property
access + strict equality, O(1)).

**Ephemeral-agent note:** per-agent narrowing is authoritative for both
persisted and ephemeral flows. The ephemeral toggle
(`ephemeralAgent.execute_code`) is reconciled into `agent.tools`
upstream in `packages/api/src/agents/added.ts`, so
`agent.tools.includes('execute_code')` is the single source of truth
by the time `initializeAgent` runs.

**Tests:** two new regression tests pin the narrowing contract:

- `initialize.test.ts` — four-quadrant matrix on
  `InitializedAgent.codeEnvAvailable` (cap on × agent asks, cap on ×
  doesn't ask, cap off × asks, neither). Catches future refactors that
  drop either half of the AND.
- `skills.test.ts` — `injectSkillCatalog` with `codeEnvAvailable: false`
  against an active skill catalog must NOT register `bash_tool` even
  though it still registers `read_file` + `skill`. This is the state
  a skills-only agent gets post-narrowing.

All 191 affected packages/api tests pass + 836 backend api tests pass.
Typecheck clean, lint clean.

* 🧽 refactor: Comprehensive-review polish — hoist tool defs, pin verifyToolAuth contract, doc appConfig

Addresses the comprehensive review of Phase 8. Findings mapped:

**#1 (MINOR): `verifyToolAuth` unconditional auth for execute_code**
- Added doc comment explicitly stating the deployment contract
  (admin capability → reachable sandbox; no per-check health probe
  to keep UI-gate queries O(1)).
- New `api/server/controllers/__tests__/tools.verifyToolAuth.spec.js`
  with 4 regression tests pinning the contract:
  1. `authenticated: true` + `SYSTEM_DEFINED` for execute_code.
  2. 404 for unknown tool IDs.
  3. `loadAuthValues` is never consulted (catches a future revert
     that would resurface the per-user key-entry dialog).
  4. Response `message` is never `USER_PROVIDED`.

**#2 (MINOR): `openai/service.ts` undocumented `appConfig` dependency**
- Expanded the `ChatCompletionDependencies.appConfig` JSDoc to spell
  out that omitting it silently disables code execution for agents
  with `execute_code` in their tools. External consumers of
  `createAgentChatCompletion` now have the contract documented at
  the type boundary.

**#5 (NIT): `registerCodeExecutionTools` re-allocates tool defs**
- Hoisted `READ_FILE_DEF` and `BASH_TOOL_DEF` to module-level
  `Object.freeze`d constants. The shapes derive entirely from
  static `@librechat/agents` exports, so a single frozen object per
  tool is safe to share across every agent init. Eliminates the
  ~4-property allocations on every call (including the common
  second-call no-op path).

**#6 (NIT): Verbose history-priming comment in initialize.js**
- Trimmed the 16-line `handlePrimeInvokedSkills` block to a 5-line
  summary with `@see InitializedAgent.codeEnvAvailable` pointer.
  The canonical narrowing explanation lives on the type; the
  controller comment is just the ACL-vs-capability rationale.

**Skipped:**

- #3 (memory spec tests a mirror function): reviewer self-dismissed
  as a design tradeoff; the enum-literal pin already catches the
  highest-risk drift vector.
- #4 (cross-repo contract for `createCodeExecutionTool`): user will
  explicitly install the latest `@librechat/agents` dev version
  once the companion PR publishes, so the version pin will be
  authoritative.
- #7 (migration/deprecation note for self-hosters): out of scope
  per user direction — release notes handle this.

Tests verified: 679 packages/api + 840 backend api tests pass.
Typecheck + lint clean.

* 🔧 chore: Update @librechat/agents version to 3.1.68-dev.1 across package-lock and package.json files

This commit updates the version of the `@librechat/agents` package from `3.1.68-dev.0` to `3.1.68-dev.1` in the `package-lock.json` and relevant `package.json` files. This change ensures consistency across the project and incorporates any updates or fixes from the new version.
2026-04-25 04:02:01 -04:00
Danny Avila
ac913aa886 🔐 chore: Skills Permissions Housekeeping, Reachable Admin Dialog + Defaults Tests (#12766)
* 🔐 chore: Skills permissions housekeeping — reachable admin dialog + defaults tests

Phase 9 housekeeping pass. Skills was already gated on `PermissionTypes.SKILLS`
(seeded from `interface.skills`) and `AgentCapabilities.skills` everywhere it
matters, but two smaller parity gaps with Prompts/Memory/MCP remained:

- The skills admin settings dialog had no UI entry point. The only mount was
  inside an unused `FilterSkills` component, so admins had no way to reach the
  role-permissions editor for skills. Mounted it in `SkillsAccordion` gated on
  `SystemRoles.ADMIN`, matching the `PromptsAccordion` pattern.
- No regression lock on skill permission defaults. `roles.spec.ts` asserted
  structural completeness but not the specific shape — a future refactor
  could silently flip ADMIN's `USE/CREATE/SHARE/SHARE_PUBLIC` to false or
  drop SKILLS from USER defaults without failing. Added explicit Skills
  assertions for both roles.
- No lock on `AgentCapabilities.skills` being in `defaultAgentCapabilities`.
  Added an assertion in `endpoints.spec.ts`.

* 🩹 fix: Remove duplicate `const appConfig` in Responses createResponse

The Skills polish commit (#12760) added `const appConfig = req.config;` at
line 381 inside the try block of `createResponse`, without noticing that
the earlier drive-by fix (2463b6acd) already declared it at the function
top (line 283). The second `const` creates a new block-scoped binding
inside the try, so earlier references within the same block (e.g.
line 348's `appConfig?.endpoints?.[EModelEndpoint.agents]?.allowedProviders`)
now hit the TDZ instead of the outer binding and throw
`ReferenceError: Cannot access 'appConfig' before initialization` —
which the outer try/catch then swallows into a generic 500.

This surfaced as all six token-usage tests in
`api/server/controllers/agents/__tests__/responses.unit.spec.js` failing
with `mockRecordCollectedUsage` never being called (because the throw
skips past the `recordCollectedUsage(...)` call).

Dropping the inner re-declaration restores the full control flow. All 11
tests in the file pass again.

* 🧹 refactor: Address review nits on Phase 9 housekeeping

- Move the `defaultAgentCapabilities` regression test out of the
  `createEndpointsConfigService` describe block and into its own
  top-level describe. It tests a module constant and has no relationship
  to the service factory; nesting was misleading and made it easier to
  accidentally drop if the service tests are later restructured.
- Re-order local imports in `SkillsAccordion.tsx` longest-to-shortest
  per AGENTS.md convention (`SkillsSidePanel` 48 chars before
  `useAuthContext` 41 chars).
2026-04-25 04:02:01 -04:00
Danny Avila
91cd3f7b7c 🧽 refactor: Skills polish: precedence-aware body validation, controller drop logs, SkillPills rename (#12760)
Post-merge sanity-review cleanup on top of #12746:

- `createSkill` / `updateSkill` now parse SKILL.md body's always-apply
  status once and reuse it for both validation and derivation (was
  parsing the same YAML block twice per call).
- Body-inline `always-apply:` validation becomes precedence-aware: a
  caller sending an explicit top-level `alwaysApply` or a structured
  `frontmatter['always-apply']` no longer gets rejected for a typo in
  the body — the body value is never consulted at derivation time when
  a higher-precedence source wins. New tests cover the three relevant
  interactions (explicit+body-typo, frontmatter+body-typo, body-only
  typo still rejects).
- OpenAI and Responses controllers now emit a `logger.warn` when
  `injectSkillPrimes` drops always-apply primes to stay under
  `MAX_PRIMED_SKILLS_PER_TURN`. `injectSkillPrimes` already logs
  internally; the controller-level warn adds endpoint context so
  operators can identify which path hit the cap at a glance. Mirrors
  AgentClient's existing log.
- Rename `ManualSkillPills` → `SkillPills` (component + type + file +
  test + all JSDoc references). The component handles both manual and
  always-apply pills now; the original name was carried over from the
  manual-only Phase 3 and misleads new readers.
- Drive-by fix: declare `appConfig = req.config` at the top of
  `createResponse` in `responses.js` — it was used unqualified on
  lines 381/396, which silently evaluated to `undefined` (via optional
  chaining) and disabled the skills-capability check on the Responses
  endpoint. Pre-existing, surfaced by lint on the touched file.
2026-04-25 04:02:01 -04:00
Danny Avila
7581540ab6 🔌 refactor: Decouple bash_tool from Per-User CODE_API_KEY (#12712)
* 🔌 refactor: Decouple bash_tool from Per-User CODE_API_KEY

Phase 4 of Agent Skills umbrella (#12625): gate bash_tool and skill
file priming on the `execute_code` capability only. Thread a boolean
`codeEnvAvailable` through `enrichWithSkillConfigurable` and
`primeInvokedSkills` in place of the old per-user `codeApiKey` +
`loadAuthValues` plumbing. The sandbox API key is the LibreChat-
hosted service key — system-level, not a user secret — so the
per-user lookup was legacy; when needed, it's read directly from
`process.env[EnvVar.CODE_API_KEY]` inside the capability gate.

`handleSkillToolCall` and `primeInvokedSkills` gate sandbox uploads
on `codeEnvAvailable` first, preventing skill-file uploads to the
sandbox when an agent has `execute_code` disabled even if the env
var happens to be set. The agents library resolves the env key
itself for `bash_tool`, so `ToolService.js` drops the
`loadAuthValues` lookup and the "Code execution is not available"
placeholder tool in favor of a plain `createBashExecutionTool({})`
with a loud error log if the env var is missing.

Also fixes a pre-existing `appConfig`-undefined lint error in
`responses.js`/`createResponse` that surfaced when this file was
touched (declares `const appConfig = req.config` at function top,
matching the existing pattern in other controllers).

Preserves the `skillPrimedIdsByName` threading added by Phase 3/5/6
and all Phase 3/5/6 call-site signatures. Adds
`skillConfigurable.spec.ts` (5 cases pinning the new surface) and
`skillFiles.spec.ts` (4-way matrix of capability × env key for
`primeInvokedSkills`).

* 🧪 refactor: Address Codex Review Feedback

Resolves findings from the second codex review on #12712:

- MAJOR: `handlers.spec.ts` now covers the `codeEnvAvailable` gate in
  `handleSkillToolCall` across three cases (gate off, gate on + env
  set, gate on + env unset). The gate is the critical regression
  prevention — a future edit that drops it would silently re-enable
  sandbox uploads for agents with `execute_code` disabled.

- MINOR: Hoist `codeEnvAvailable` and `skillPrimedIdsByName` out of
  `loadTools` closures in `openai.js` and `responses.js`. Both values
  are fixed once `initializeAgent` resolves, so recomputing them on
  every tool execution was wasted work. `responses.js` shares a single
  pair between its streaming and non-streaming branches.

- MINOR: `skillFiles.spec.ts` now has a test that exercises the full
  upload path end-to-end with real file records, asserting
  `batchUploadCodeEnvFiles` is called with the env-sourced apiKey and
  the correct file set (including the synthetic `SKILL.md`).

- NIT: Finish the `appConfig` extraction in `responses.js/createResponse`
  — replaces the remaining `req.config` references with `appConfig` for
  consistency with the pattern in other controllers.

No behavioral changes beyond what was already in place; this is
coverage and readability polish.

* 🧷 test: Tighten Spec Hygiene Per Codex Nit Feedback

Round-3 codex review flagged two NITs on the test code added in the
previous commit:

- Replace `_id: 'skill-id' as unknown as never` in the new
  `makeSkillHandlerWithFiles` helper with a real `Types.ObjectId`,
  matching the pattern used by the primed-skill tests further up in
  the same file (and by `skillFiles.spec.ts`). The `never` cast
  hides the fact that `_id` really is a string / ObjectId at runtime.

- Replace the ad-hoc `{ on, pipe, read }` stub with a real
  `Readable.from(Buffer.from(''))` in the upload-path test. The stub
  worked only because `batchUploadCodeEnvFiles` is mocked and never
  iterates the stream; `Readable.from` satisfies the same contract
  and is robust to any future partial-real replacement of the upload
  function.

Pure test-hygiene improvements; no runtime code touched.

* 🧹 chore: Remove Duplicate appConfig Declaration After Rebase

The upstream `2463b6acd` fix declared `const appConfig = req.config`
inside the try block (line 381) to patch the same `no-undef` error
I fixed in this PR at the top of `createResponse` (line 283). The
rebase kept both declarations side-by-side. Drops the inner one —
the outer binding covers every downstream reference already.
2026-04-25 04:02:01 -04:00
Danny Avila
5c69d1f7fa 🩹 fix: define appConfig in Responses API createResponse
responses.js referenced `appConfig` on lines 381 / 396 without ever
declaring it, so `createResponse` threw `ReferenceError: appConfig is
not defined` the moment it entered the skills-capability block. The
existing `recordCollectedUsage` unit tests silently stopped running
(try/catch swallowed the error into `logger.error`), so CI showed
6 assertions failing with "Expected calls: 1, Received: 0" — the
function never reached the recorder.

Mirror initialize.js: seed `appConfig = req.config` at the top of the
try block, before the `enabledCapabilities` Set it feeds into. The two
later `appConfig: req.config` call-sites keep the direct reference —
only the lexical reads needed a binding.

This failure already exists on origin/feat/agent-skills (the same 6
tests fail there with the same stack) but blocks our branch too since
we're rebased on top, so fix it here and cherry-pick back if needed.
2026-04-25 04:02:01 -04:00
Danny Avila
dfc3dfa57f 📍 feat: always-apply frontmatter: auto-prime skills every turn (#12746)
* 🔁 refactor: Rebase always-apply work onto merged structured-frontmatter columns

Phase 6 (disable-model-invocation / user-invocable / allowed-tools)
landed first on feat/agent-skills. Reconcile this branch with the new
mainline:

- Thread alwaysApplySkillPrimes through unionPrimeAllowedTools alongside
  manualSkillPrimes, applying the combined MAX_PRIMED_SKILLS_PER_TURN
  ceiling before loading tools.
- Add `_id` to ResolvedAlwaysApplySkill to match Phase 6's
  ResolvedManualSkill shape (read_file name-collision protection).
- Register 'always-apply' in ALLOWED_FRONTMATTER_KEYS / FRONTMATTER_KIND
  so Phase 6's validator recognizes it.
- Drop frontmatter from the listSkillsByAccess projection; the backfill
  helper remains as defensive code but its read path is no longer
  exercised on summary rows (no legacy rows exist — the branch never
  shipped), saving ~200KB per page.
- Retire the corresponding "backfills legacy on summaries" test.
- Plumb listAlwaysApplySkills through the JS controllers + endpoint
  initializer so the always-apply resolver sees a real DB method.

* 🧹 fix: Dedupe manual/always-apply overlap, share YAML util, tidy comments

Addresses review findings:

- Cross-list dedup: when a user $-invokes a skill that is also marked
  always-apply, the always-apply copy is now dropped so the same
  SKILL.md body never primes twice in one turn. Manual wins (explicit
  intent, closer to the user message). Dedup runs in both
  initializeAgent (so persisted user-bubble pills stay in sync) and
  injectSkillPrimes (defense-in-depth at splice time). New test cases
  cover single-overlap, partial-overlap, and dedup-before-cap.
- DRY: extract stripYamlTrailingComment to
  packages/data-schemas/src/utils/yaml.ts; packages/api/src/skills/import.ts
  now imports the shared helper. Also drop the redundant inner
  stripYamlTrailingComment call inside parseBooleanScalar — the call
  site already strips.
- Mark injectManualSkillPrimes as @deprecated in favor of
  injectSkillPrimes (kept for external consumers of @librechat/api).
- Document SKILL_TRIGGER_MODEL as forward-looking plumbing for the
  model-invoked path rather than leaving it as a bare unused export.
- Replace the stale "frontmatter is included" comment on
  listSkillsByAccess with an accurate explanation of why it was
  intentionally excluded.

* 🔒 fix: Include always-apply primes in skillPrimedIdsByName + clear alwaysApply on body opt-out

Two bugs flagged by Codex review:

P1 (read_file): `manualSkillPrimedIdsByName` only carried manual-invocation
primes, so an always-apply skill with `disable-model-invocation: true`
was blocked from reading its own bundled files, and same-name collisions
could resolve to a different doc than the one whose body got primed.
- Rename `buildManualSkillPrimedIdsByName` → `buildSkillPrimedIdsByName`
  (accepts both manual + always-apply prime arrays).
- Rename the configurable field `manualSkillPrimedIdsByName` →
  `skillPrimedIdsByName` throughout the plumbing (skillConfigurable.ts,
  handlers.ts, CJS callers, tests).
- Overlap resolution: manual wins on the rare edge case where the same
  name appears in both arrays (upstream dedup should prevent this, but
  defensive merging treats manual as authoritative).
- New tests: (1) gate-relaxation fires for always-apply primes, (2) `_id`
  pinning works for always-apply same-name collisions.

P2 (updateSkill): when a body update had no `always-apply:` key,
`extractAlwaysApplyFromBody` returned `absent` and the column was left
untouched. A skill that was once `alwaysApply: true` would keep
auto-priming even after its SKILL.md no longer declared the flag.
- Treat `absent` as a positive "not always-apply" declaration when the
  body is explicitly submitted; flip the column to `false`.
- Explicit top-level `alwaysApply` still wins (three-source precedence
  unchanged).
- New tests: body removes key → false, body has no frontmatter at all →
  false, explicit + body-without-key → explicit wins.

* 🧵 refactor: Collapse duplicate prime types + tighten parse + test hygiene

Sanity-check review follow-ups:

- Collapse `ResolvedManualSkill` / `ResolvedAlwaysApplySkill` into a
  single `ResolvedSkillPrime` canonical interface with two backward-
  compatible type aliases. Both resolvers feed the same pipeline stages
  (injectSkillPrimes, unionPrimeAllowedTools, buildSkillPrimedIdsByName);
  the per-source distinction lives on `additional_kwargs.trigger`, not
  on the resolver output.
- Move the `always-apply` branch in `parseFrontmatter` to operate on the
  raw post-colon text. The outer `unquoteYaml` was fine today because
  it's idempotent on non-quoted strings, but running it twice (once per
  line, once after stripping the inline comment) would be fragile if the
  unquoter ever grows richer YAML-escape handling.
- Add the missing `alwaysApplyDedupedFromManual: 0` field to the
  `injectSkillPrimes` mocks in `openai.spec.js` and `responses.unit.spec.js`
  so they match the full `InjectSkillPrimesResult` contract.
- Insert the blank line between the `unionPrimeAllowedTools` and
  `resolveAlwaysApplySkills` describe blocks.

* 🔧 fix(tsc): Cast mock.calls via `unknown` for strict tuple destructure

`getSkillByName.mock.calls[0]` is typed as `[]` by jest's generic default;
a direct cast to `[string, ..., ...]` fails TS2352 under `--noEmit` even
though the runtime shape matches. Go through `as unknown as [...]` like
the earlier test in the same file so CI's type-check step stays green.

* 🪢 fix: Propagate skillPrimedIdsByName into handoff agent tool context

Handoff agents go through the same `initializeAgent` flow as the primary
(with `listAlwaysApplySkills` now plumbed), so they resolve their own
`manualSkillPrimes` and `alwaysApplySkillPrimes` — but the
`agentToolContexts.set(...)` for handoff agents didn't carry
`skillPrimedIdsByName` into the per-agent context.

That meant `handleReadFileCall` fell back to the full ACL set + a
`prefer*` flag for handoff agents: same-name collisions could resolve to
a different doc than the one whose body got primed, and a
`disable-model-invocation: true` skill primed via manual `$` or
always-apply inside the handoff flow would be blocked from reading its
own bundled files.

Build the map via `buildSkillPrimedIdsByName(config.manualSkillPrimes,
config.alwaysApplySkillPrimes)` for every handoff tool context so
`read_file` behaves identically across primary and handoff agents.
2026-04-25 04:02:00 -04:00
Danny Avila
82173f7b91 🛡️ feat: Persist & enforce disable-model-invocation / user-invocable / allowed-tools (#12745)
* 🧬 feat: Persist `disable-model-invocation` / `user-invocable` / `allowed-tools`

Adds first-class columns mirroring the three runtime-enforced frontmatter
fields, with a `deriveStructuredFrontmatterFields` helper that maps from
frontmatter at create/update time and re-syncs (via `$unset`) when fields
are removed. `listSkillsByAccess` projection includes them so the Phase 6
catalog filter and popover filter can both read off the summary row.

Marks `invocationMode` as @deprecated on `TSkill` and the
`InvocationMode` enum — the runtime now reads the persisted pair instead.

* 🛡️ feat: Enforce frontmatter at runtime (catalog, skill tool, manual resolver, tool union)

Wires the persisted columns into actual runtime behavior across all four
invocation paths:

- `injectSkillCatalog` excludes `disableModelInvocation: true` skills
  before catalog formatting — they cost zero context tokens and stay
  invisible to the model.
- `handleSkillToolCall` rejects with a clear error when the model names
  a skill marked `disable-model-invocation: true` (defends against a
  stale-cache or hallucinated invocation getting past the catalog
  filter).
- `resolveManualSkills` skips `userInvocable: false` skills with a warn
  log so an API-direct caller can't bypass the popover-side filter.
- `unionPrimeAllowedTools` collects skill-declared `allowed-tools` minus
  what's already on the agent; `initialize.ts` re-runs `loadTools` for
  the extras and merges resulting `toolDefinitions` into the agent's
  effective set for the turn. Tool-name resolution is tolerant —
  unknown names silently drop with a debug log so cross-ecosystem
  skills referencing yet-to-be-implemented tools (Claude Code's
  `edit_file`, etc.) import without breaking. The agent document is
  never modified; the union is turn-scoped.

Helper exports (`unionPrimeAllowedTools`) are structured so Phase 5's
always-apply primes flow through the same union (combined
`[...manualPrimes, ...alwaysApplyPrimes]`) once the resolver lands.

Skill handler wire format gains the three fields so clients can render
them on detail / list views.

* 🎛️ feat: `$` popover reads `userInvocable` instead of UI-only `invocationMode`

Replaces the phase-1 UI-only `invocationMode` check with the persisted
`userInvocable` field (mirrors the `user-invocable` frontmatter). Skills
authored with `user-invocable: false` no longer surface in the popover;
the backend resolver enforces the same rule for defense-in-depth.

Default-visible behavior is preserved: skills without an explicit
`userInvocable` value (older rows, freshly imported skills that don't
declare the field) stay visible — only an explicit `false` hides them.

Test fixture updated to reflect the new field.

* 🔧 fix: Address Phase 6 review findings

Codex P2 + reviewer #1: Single `loadTools` call with the union of
`agent.tools + allowed-tools`. The earlier two-call approach dropped
`userMCPAuthMap` / `toolContextMap` / `actionsEnabled` from the
skill-added pass — an MCP tool gained via `allowed-tools` would be
visible to the model but fail at execution without per-user auth
context. Resolution of `manualSkillPrimes` is hoisted before
`loadTools` so the union can be computed up-front; the dropped-tools
debug log now compares loaded vs. requested across the single call.

Codex P3 + reviewer #2: `injectSkillCatalog.activeSkillIds` now
includes `disable-model-invocation: true` skills. The runtime ACL
check in `handleSkillToolCall` previously couldn't reach the explicit
"cannot be invoked by the model" rejection because the broader access
set excluded those skills. Catalog text and tool registration still
gate on the visible subset (zero-context-token guarantee preserved);
only the per-user `isActive` filter is a hard exclusion now.

Reviewer #1 (try/catch around loadTools, MAJOR): A single bad
`allowed-tools` entry from a shared skill could crash the entire turn.
Now wrapped — on failure with extras, retry with just `agent.tools`
and continue (the dropped-tools debug log surfaces what vanished). If
the retry-without-extras still throws, propagate; the agent's own
tools are the load-bearing surface.

Reviewer #3 (integration tests, MAJOR): Added six tests in
`initialize.test.ts` covering the full `allowed-tools` loading path:
union pass-through, no-extras short-circuit, agent-baseline dedup,
loadTools throw + retry, propagated throw without extras, and the
empty-tools edge case.

Smaller cleanups bundled in:
- Reviewer #4: Moved `logger` import to the package-imports section
  (was wedged among local imports).
- Reviewer #5: Removed unused index on `disableModelInvocation`
  (filtering happens application-side in `injectSkillCatalog`; index
  cost write overhead for zero query benefit).
- Reviewer #6: Swapped order of `userInvocable` and body checks in
  `resolveManualSkills` so the more authoritative author-decision
  reason surfaces first when both apply.
- Reviewer #8: Documented the `allowedTools` enforcement gap on the
  schema + type — model-invoked skills (mid-turn `skill` tool calls)
  do NOT trigger tool union, since adding tools after the graph
  starts would require a rebuild. Manual / always-apply (Phase 5)
  primes are the supported paths.
- Reviewer #9: Renamed `dmi` / `ui` / `at` locals to
  `disableModelInvocationRaw` / `userInvocableRaw` / `allowedToolsRaw`
  in `deriveStructuredFrontmatterFields`.

Reviewer #7 (DRY shared `getSkillByName` return type) deferred —
field sets diverge meaningfully across the three call sites (handler
needs `body + fileCount`; resolver needs `author + allowedTools +
userInvocable`; the InitializeAgentDbMethods contract needs the
superset). A `Pick<>`-based consolidation is a follow-up cleanup.

* 🔧 fix: Address codex iter 2 — catalog quota + duplicate-name dedup

P1: `injectSkillCatalog` cap now counts only model-visible skills, not
the merged active set. The previous behavior let a tenant with many
`disable-model-invocation: true` rows near the top of the cursor
exhaust the 100-slot quota before any invocable skill got scanned —
the catalog could end up empty even though invocable skills existed
further down the paginated results. `MAX_CATALOG_PAGES` stays the
ceiling on scan budget; only `visibleCount` drives the early-exit on
quota fill.

P2: When an invocable and a `disable-model-invocation: true` skill
share a name, drop the disabled doc(s) from `activeSkillIds`. Without
this dedup, `getSkillByName` (which sorts by `updatedAt` desc) could
pick the disabled doc and every model call to the cataloged name
would fail with "cannot be invoked by the model" instead of executing
the visible skill. When ONLY a disabled doc exists for a name, it
stays in `activeSkillIds` so the explicit-rejection error path still
fires for hallucinated invocations.

Tests: 3 new cases in `injectSkillCatalog` covering (a) cap counted
on visible skills only, (b) same-name collision drops disabled doc,
(c) sole-disabled-name case keeps the disabled doc.

* 🔒 fix: Apply `disable-model-invocation` gate to read_file too (codex iter 3 P1)

`activeSkillIds` is shared between the `skill` and `read_file` handlers.
The skill-tool gate was applied last iteration, but `handleReadFileCall`
authorized purely on `getSkillByName(..., accessibleIds)` — so a model
that learned a hidden skill's name (stale catalog or hallucination)
could still read its `SKILL.md` body or bundled files via `read_file`,
defeating the contract. Same explicit rejection now fires from both
handlers; no change needed to the ACL set itself (disabled docs stay
in `activeSkillIds` so the explicit error path keeps firing).

Two new tests in `handlers.spec.ts` cover the read_file gate and
regression-protect the happy path.

* 🔧 fix: Address codex iter 4 — manual-prime exception + legacy frontmatter backfill

P1: Scope the `read_file` `disableModelInvocation` gate to AUTONOMOUS
model probes only. A user-invoked `$` skill that is also marked
`disable-model-invocation: true` had its bundled `references/*` /
`scripts/*` files unreadable, leaving the manually-primed skill body
referencing files the model couldn't load. Now the handler bypasses
the gate when the skill name appears in `manualSkillNames` (the
per-turn allowlist threaded from `manualSkillPrimes` →
`agentToolContexts` → `enrichWithSkillConfigurable` →
`mergedConfigurable`). Defense-in-depth: the bypass is scoped to the
specific names in the allowlist; a different disabled skill name is
still rejected.

P2: Read-time fallback for legacy skills authored before Phase 6
landed the structured columns. `user-invocable: false` /
`disable-model-invocation: true` set in `frontmatter` (the validator
already accepted those keys) but with no derived column would
incorrectly evaluate as "user-invocable / model-allowed" until a save
backfilled the columns. New `backfillDerivedFromFrontmatter` helper
fills undefined columns from frontmatter at read time in both
`getSkillByName` and `listSkillsByAccess` — column wins when both are
set, frontmatter fills the gap when only it's set. No DB writes; the
next `updateSkill` naturally persists. `listSkillsByAccess` projection
expanded to include `frontmatter` (bounded by validator, payload
impact small) so summaries can also be backfilled.

Sticky-primed disabled skills (ones invoked in prior turns of the
same conversation) are not yet in the manual-prime allowlist — same-
turn manual invocation is the load-bearing path codex flagged; the
sticky-turn case is a known limitation tracked for a follow-up.

Tests: 2 new in handlers.spec.ts (manual-prime allows + name-scoped
block holds), 3 new in skill.spec.ts (legacy backfill via
getSkillByName + listSkillsByAccess + column-wins precedence).

* 🔧 fix: Address codex iter 5 — propagate manualSkillNames + keep read_file

P1: `enrichWithSkillConfigurable` is also called from `openai.js` and
`responses.js` (the OpenAI Responses + completions endpoints). Both
were ignoring the new `manualSkillNames` parameter, which meant the
manual-prime exception in the `read_file` gate (iter 4) only worked
on the agents endpoint. Now all three call sites pass
`primaryConfig.manualSkillPrimes?.map(p => p.name)` so manual `$`
invocations of disabled skills work consistently across endpoints.

P2: When every accessible skill is `disable-model-invocation: true`,
the catalog text and `skill` tool are correctly omitted (no model-
reachable targets) — but `read_file` and `bash_tool` MUST still be
registered. A user manually invoking such a skill gets its SKILL.md
body primed into context; if the body references `references/foo.md`
or `scripts/run.sh`, those reads need a registered tool. Restructured
`injectSkillCatalog` so `skill` registration is gated on
`catalogVisibleSkills.length > 0` while `read_file` (always) and
`bash_tool` (when codeEnvAvailable) register whenever any active
skill is in scope.

Tests: existing all-disabled test rewritten to assert read_file IS
registered + skill is NOT; new test confirms bash_tool joins it
when codeEnvAvailable.

* 🔧 fix: Address codex iter 6 — name-collision consistency via preferInvocable

P2a (resolveManualSkills): a name collision between an older
user-invocable doc and a newer non-user-invocable doc made manual `$`
invocation silently no-op. The popover surfaced the older invocable
doc; resolver looked it up by name; `getSkillByName` returned the
newer non-invocable doc; resolver skipped on `userInvocable: false`.

P2b (handler / runtime ACL): with same-name duplicates (e.g. older
invocable + newer disabled), the manual prime resolved to one doc
while later `read_file` / `skill` execution resolved a different doc
through `activeSkillIds`. Model could follow one SKILL.md body while
reading files from a different skill.

Both root-cause: `getSkillByName` always returned the newest match
and let the caller filter, but with collisions the newest can be
something the caller didn't want.

Fix: extend `getSkillByName` with `options.preferInvocable`. When
true, prefer the newest doc satisfying BOTH `userInvocable !== false`
AND `disableModelInvocation !== true` (with frontmatter backfill);
fall back to the newest match otherwise. Fast path preserved when
caller doesn't opt in.

Callers passing `preferInvocable: true`:
- `resolveManualSkills` — picks the popover-visible invocable doc
  even when a newer disabled / non-user-invocable duplicate exists.
- `handleSkillToolCall` — keeps execution aligned with the catalog;
  falls back to the disabled doc only when no invocable variant
  exists (so the explicit "cannot be invoked by the model" gate
  still fires for the hallucinated-disabled-name case).
- `handleReadFileCall` — same alignment, plus the manual-prime
  exception added in iter 4 still applies.

Tests:
- 2 new in skill.spec.ts (preferInvocable picks invocable when
  collision exists; falls back to newest when no clean-invocable
  exists).
- 1 new in skills.test.ts (resolver passes preferInvocable through).
- 2 new in handlers.spec.ts (skill tool + read_file pass it).
- Existing initialize.test.ts assertion updated for the new option.

* 🔧 fix: Address codex iter 7 — split preferInvocable into per-axis flags

The previous unified `preferInvocable` filter required BOTH
`userInvocable !== false` AND `disableModelInvocation !== true`. That
was wrong for the model paths: `userInvocable: false` skills are
model-only and remain valid `skill` / `read_file` invocation targets.
A duplicate-name scenario where the newer cataloged doc was model-
only would let the older user-invocable variant shadow it on every
model call.

Split the option into two independent axes:
- `preferUserInvocable` — for manual paths (`$` popover). Skips docs
  with `userInvocable: false`. Disable-model-invocation status is
  irrelevant; iter 4 explicitly supports manual prime of disabled
  skills.
- `preferModelInvocable` — for model paths (`skill` / `read_file`
  handlers). Skips docs with `disableModelInvocation: true`. User-
  invocable status is irrelevant; model-only skills are valid here.

Both flags fall back to the newest match when no preferred doc
exists, so the explicit-rejection error paths still fire correctly
in the sole-disabled-name case.

Callers updated:
- `resolveManualSkills` → `preferUserInvocable: true`
- `handleSkillToolCall` / `handleReadFileCall` → `preferModelInvocable: true`

Tests:
- New spec test for preferModelInvocable not filtering on userInvocable.
- Existing preferInvocable test renamed/split to cover the new axes.
- New test asserts preferUserInvocable still returns disabled docs
  (preserves iter 4 manual-disabled support).
- Caller tests assert each path passes the right single flag and
  does NOT pass the wrong one.

* 🔧 fix: TypeScript type-check failure in handlers.spec.ts (CI green)

`jest.fn(async () => ...)` without explicit args infers an empty tuple
for the call signature, so `mock.calls[0][2]` flagged as "Tuple type
'[]' has no element at index '2'." Cast to `unknown[]` then narrow to
the expected option shape. Behavior unchanged.

Caught by the `Type check @librechat/api` CI step
(.github/workflows/backend-review.yml).

* 🔧 fix: Address codex iter 8 — undefined-result fallback + read_file alignment

P1 (loadTools returning undefined): Production loaders
(`createToolLoader` in `initialize.js` / `openai.js` /
`responses.js`) wrap `loadAgentTools` in try/catch and return
`undefined` on failure rather than throwing. Without explicit
handling, my iter-1 try/catch only fired for thrown errors — a
silent-failure on a skill-added tool would fall through to the
empty fallback and silently DROP the agent's baseline tools for
the turn (much worse than just losing the extras). Added an
`undefined`-result branch that retries with just `agent.tools`,
mirroring the throw branch. Test pins both behaviors.

P2 (read_file alignment with manual prime): When a skill is in
this turn's `manualSkillNames`, the `read_file` handler now uses
`preferUserInvocable` instead of `preferModelInvocable`. Same
name-collision rule as `resolveManualSkills`, so the doc whose
files get read is the same doc whose body got primed. For
autonomous probes (skill not in `manualSkillNames`), the handler
keeps `preferModelInvocable` to align with the catalog the model
saw. Two new tests cover both branches and regression-protect that
the wrong flag isn't passed.

* 🔧 fix: Address codex iter 9 — pin read_file lookup to primed skill _id

P1 (manually-primed disabled IDs were dropped from activeSkillIds):
The `executableSkills` dedup in `injectSkillCatalog` correctly drops
`disable-model-invocation: true` duplicates when an invocable doc
shares the name — but `resolveManualSkills` legitimately primes
disabled docs (iter 4 supports manual `$` invocation of disabled
skills). When the resolver primed a disabled doc, the read_file
handler couldn't find it in the (deduped) `activeSkillIds` and
either resolved a different same-name skill or returned not-found.

Fix: `ResolvedManualSkill` now carries `_id`; the legacy `initialize.js`
/ `openai.js` / `responses.js` controllers build a
`manualSkillPrimedIdsByName` map and `enrichWithSkillConfigurable`
passes it into `mergedConfigurable`. `handleReadFileCall` now pins
its lookup's `accessibleIds` to `[primedId]` whenever the requested
skill is in the map. The constrained set guarantees the lookup
returns the EXACT doc the resolver primed — body/files come from the
same source even when same-name duplicates exist or the dedup
removed the prime's id from `activeSkillIds`.

Autonomous read_file probes (skill not in the manual-primed map)
keep the full ACL set + `preferModelInvocable` so they align with
the catalog the model saw and the disabled-only case still fires
the explicit-rejection gate.

Test fixture changes flow from `_id` becoming required on
`ResolvedManualSkill`. `buildSkillPrimeContentParts` /
`injectManualSkillPrimes` widen their param types to `Pick<...>`
because they only read `name` / `body` and shouldn't force test
literals to invent placeholder ids.

* 🧹 fix: Address independent reviewer findings (DRY + types + tests + docs)

Sanity-pass review surfaced 7 findings; addressed 6 (the 7th — DRY
on inline `getSkillByName` return types — is acknowledged tech debt
deferred to a follow-up).

#1 [MAJOR, DRY]: The 4-line `manualSkillPrimedIdsByName` map
construction was duplicated across 4 CJS call sites (openai.js,
responses.js x2, initialize.js). Extracted `buildManualSkillPrimedIdsByName`
helper in `skillDeps.js`; all four sites now call the helper. If
`ResolvedManualSkill` ever renames `_id` or gains identifying fields,
only the helper changes.

#2 [MINOR, type safety]: `handleReadFileCall` was casting a hex string
to `Types.ObjectId[]` via `as unknown as`, relying on mongoose's
auto-cast in `$in` queries. Replaced with `new Types.ObjectId(...)`
so any future consumer comparing with `.equals()` / `===` gets the
correct value type. Imported `Types` as a value (was type-only).

#5 [MINOR, test gap]: Added a test for the worst-case silent-failure
path — both the union and base-only `loadTools` calls return undefined.
The agent gets no tools but the turn doesn't crash hard; pinning
that contract.

#4 [MINOR, performance]: Added a TODO on the `listSkillsByAccess`
projection noting the `frontmatter` field can be dropped once a
write migration backfills all pre-Phase-6 skills' columns. ~2KB/skill
× 100/page is wasted bandwidth post-backfill.

#6 [NIT, docs]: `backfillDerivedFromFrontmatter` JSDoc said "Pure"
right before "mutates its undefined fields in place". Replaced with
"Side-effect-free w.r.t. the DB (no writes), but mutates its argument
in place" which describes both halves accurately.

#7 [NIT, test determinism]: Replaced `await new Promise(r => setTimeout(r, 5))`
in two same-name collision tests with explicit `updateOne` setting
`updatedAt: new Date(Date.now() - 1000)` on the older doc. Removes
the wall-clock race on fast CI runners. The pagination test (line
480) still uses setTimeout — that test is pre-existing and order
is incidental, not load-bearing.

Existing test fixtures updated to use valid 24-char hex ObjectIds
(required by the iter-9 test that constructs a real `ObjectId`).

#3 [MINOR, deferred]: Inline `getSkillByName` return type duplicated
across `handlers.ts`, `initialize.ts`, `skills.ts`. Reviewer
acknowledged this as deferred; field sets diverge across call sites
(handler needs `fileCount`, resolver needs `author`/`allowedTools`).
A `Pick<>`-based consolidation is a clean follow-up.
2026-04-25 04:02:00 -04:00
Danny Avila
539c4c7e4d 🎬 feat: Prime Manually-Invoked Skills via $ Popover (#12709)
* 🎬 feat: Prime Manually-Invoked Skills via $ Popover

Lands the backend for manual skill invocation, making the $ popover
deterministically prime SKILL.md before the LLM turn instead of leaving
the model to discover the skill via the catalog.

Flow: popover drains pendingManualSkillsByConvoId on submit, attaches
names to the ask payload, controllers forward to initializeAgent, and
initialize resolves each name to its body (ACL + active-state filtered,
reusing the same rules as catalog injection). AgentClient splices the
primes as meta HumanMessages before the user's current message.

- Extract primeManualSkill / resolveManualSkills in packages/api/src/agents/skills.ts
  and reuse primeManualSkill inside handleSkillToolCall for a single shape source.
- Thread manualSkills + getSkillByName through InitializeAgentParams / DbMethods
  and all three initializeAgent call sites (initialize.js, responses.js, openai.js).
- Splice HumanMessage primes in client.js chatCompletion after formatAgentMessages,
  shifting indexTokenCountMap so hydrate still fills fresh positions correctly.
- Carry isMeta / source / skillName in additional_kwargs for downstream filtering.

* 🛡️ fix: Scope manual skill primes to single-agent + cap resolver input

Two follow-ups to the Phase 3 priming path flagged in Codex review.

Multi-agent runs: skipping the splice when agentConfigs is non-empty.
`initialMessages` is shared across every agent in `createRun`, so splicing
a skill body there would bypass Phase 1's per-agent `scopeSkillIds`
contract — a handoff / added-convo agent with a different skill scope
would see content its configuration excludes. Warn + skip is the minimal
correct behavior; lifting this to per-agent initial state is a follow-up.

Input bounding: `resolveManualSkills` now truncates to `MAX_MANUAL_SKILLS`
(10) after dedup, with a warn listing the dropped tail. Controllers only
validate `Array.isArray(req.body.manualSkills)`, so a crafted payload
could otherwise fan out into an unbounded `Promise.all` of concurrent
`getSkillByName` DB lookups. Cap lives in the resolver so every caller
(including future `always-apply` in Phase 5) inherits it.

* 🧪 refactor: Testable Helpers + Payload Validation for Manual Skill Primes

Follow-ups from the comprehensive review. No behavior change for the
happy path — these are architectural and defensive improvements that
shrink the JS surface in /api, tighten the request-body contract, and
cover the delicate splice logic with proper unit tests.

- Extract `injectManualSkillPrimes` into packages/api/src/agents/skills.ts
  so the message-array splice and `indexTokenCountMap` shift are unit-
  testable in TS. client.js now calls the helper. Tests pin the `>=`
  vs `>` boundary condition — a regression here would silently corrupt
  token accounting for every message after the insertion point.
- Extract `extractManualSkills(body)` and use in all three controllers
  (initialize.js, responses.js, openai.js). Replaces copy-pasted
  `Array.isArray(...) ? ... : undefined` with a helper that also filters
  non-string / empty elements — closes a type-safety gap where a crafted
  payload like `{"manualSkills": [123, {"$gt":""}]}` would otherwise reach
  `getSkillByName` and waste DB round-trips.
- Rename `primeManualSkill` → `buildSkillPrimeMessage`. The helper serves
  three invocation modes (`$` popover, `always-apply`, model-invoked);
  the old name misled readers coming from `handleSkillToolCall`.
- Add `loadable.state === 'hasValue'` guard in `drainPendingManualSkills`
  — defensive, since the atom has a synchronous `[]` default, but the
  previous `.contents` cast would have been unsound under loading/error.
- Document why `resolveManualSkills` honors the active-state filter even
  for explicit `$` selections (Phase 2 popover filter + API-direct
  hardening).
- Remove stray `void Types;` in initialize.test.ts — `Types` is already
  consumed elsewhere in that test.

* 🔖 refactor: Single source for the skill-message source marker

Export `SKILL_MESSAGE_SOURCE = 'skill'` and use it in both construction
paths that stamp skill-primed messages — `buildSkillPrimeMessage` (for
the model-invoked tool path) and `injectManualSkillPrimes` (for the
user-invoked splice path). Downstream filtering and telemetry read this
marker, so the two paths must agree; keeping the literal in one place
removes the risk of them drifting when Phase 5's `always-apply` adds a
third caller.

* ♻️ refactor: Drop Multi-Agent Guard + Review Polish

- Remove the multi-agent skip in `AgentClient.chatCompletion`. Leaking
  primes to handoff / added-convo agents via shared `initialMessages` is
  the agents SDK's concern to scope; this layer should just inject and
  let the graph handle agent-scoped state. The guard was well-intended
  but produced a silent-drop UX where `$skill` in a multi-agent run did
  nothing.
- Bound the `[resolveManualSkills] Truncating ...` warn output to the
  first 5 dropped names plus a count suffix. A malicious payload of
  1000 names was previously spilling all ~990 names into the log line.
- Remove dead `?? []` from the `hasValue`-guarded loadable read in
  `drainPendingManualSkills` — the atom always yields a string[] when
  resolved, so the nullish fallback was unreachable.
- Reorder skills.ts imports to follow the style guide: value imports
  shortest-to-longest (`data-schemas` → `langchain/core/messages` →
  multi-line `@librechat/agents`), type imports longest-to-shortest.

* 🧠 fix: Strip Skill Primes from Memory Window + Unbreak CI Mocks

Two fixes after the last push.

CI unbreak: `responses.unit.spec.js` and `openai.spec.js` mock
`@librechat/api` and the mock didn't expose the new `extractManualSkills`
symbol, so every test in those files crashed before reaching the
`recordCollectedUsage` assertion. Added `extractManualSkills: jest.fn()`
returning `undefined` to both mocks; the controllers now no-op on
manualSkills as the tests expect.

Codex P2: `runMemory` passes `messages` straight through to the memory
processor, so after the splice in `injectManualSkillPrimes`, SKILL.md
bodies ride along as if they were real user chat. That pollutes memory
extraction with synthetic instruction content and crowds out real turns
from the window.

- Export `isSkillPrimeMessage(msg)` from `packages/api/src/agents/skills.ts`
  — a predicate keyed on the shared `SKILL_MESSAGE_SOURCE` marker.
- Filter `chatMessages = messages.filter(m => !isSkillPrimeMessage(m))`
  at the top of `runMemory` before the window-sizing logic. Keeps the
  primes visible to the LLM (they still ride in `initialMessages`) but
  invisible to the memory layer.
- 5 new tests for the predicate covering marker-present, plain messages,
  different source, non-object inputs, and array filter integration.

* 📜 feat: Show Skill-Loaded Cards for Manually-Invoked Skills

The $ popover was priming SKILL.md bodies into the turn but leaving no
visible trace on the assistant response — from the user's view it looked
like the `$name ` cosmetic text did nothing. Now each manually-invoked
skill renders the same "Skill X loaded" tool-call card that model-invoked
skills already produce via PR #12684's SkillCall renderer.

Approach: post-run prepend to `this.contentParts`. The aggregator owns
per-step indices during the run, so pre-seeding collides; waiting until
`await runAgents(...)` returns lets the graph settle before synthetic
parts slot in at the front.

- Export `buildSkillPrimeContentParts(primes, { runId })` from
  `packages/api/src/agents/skills.ts`. Returns completed tool_call parts
  (`progress: 1`, args JSON-encoded with `{skillName}`, output matching
  the model-invoked path's wording) that the existing `SkillCall.tsx`
  renderer draws identically.
- In `AgentClient.chatCompletion`, prepend the built parts to
  `this.contentParts` immediately after `await runAgents`. Persistence
  and the final-event reconcile come for free — `sendCompletion` already
  reads `this.contentParts` verbatim.
- Card ordering: skills appear first in the assistant message, reflecting
  that priming ran before the LLM's turn.

Live-during-streaming cards are a separate follow-up — the graph's
index-based aggregator makes that a bigger lift and this change delivers
the core UX win without fighting the stream ordering.

6 new unit tests covering part shape, args JSON contract, output text,
unique IDs, empty input, and startOffset ID differentiation.

*  feat: Emit Optimistic Skill Cards + Wire Primes in OpenAI/Responses

Two follow-ups from testing.

Optimistic card emit: the main chat path was only showing "Skill X
loaded" cards at final-reconcile time, so the user saw nothing happen
until the stream finished. Now emit synthetic ON_RUN_STEP +
ON_RUN_STEP_COMPLETED events right before `runAgents` starts — same
pattern the MCP OAuth flow uses in `ToolService` — so the cards appear
immediately. The graph's content at index 0 may overwrite them during
streaming, but the post-run `contentParts` prepend (unchanged) restores
them on final reconcile.

OpenAI + Responses parity: both controllers were resolving
`manualSkillPrimes` via `initializeAgent` but never injecting them into
`formattedMessages` before the run. Manual invocation silently did
nothing on `/v1/chat/completions` and the Responses API path. Now both
call `injectManualSkillPrimes` on the formatted messages so the model
sees SKILL.md bodies on every path. LibreChat-style card SSE events
don't apply to these OpenAI-shaped responses, so the live-emit is
chat-path-only.

- Export `buildSkillPrimeStepEvents(primes, { runId })` from
  `packages/api/src/agents/skills.ts`. Uses `Constants.USE_PRELIM_RESPONSE_MESSAGE_ID`
  by default so the frontend maps events to the in-flight preliminary
  response message, matching the OAuth emitter.
- In `AgentClient.chatCompletion`, emit via `sendEvent` (or
  `GenerationJobManager.emitChunk` in resumable mode) after
  `injectManualSkillPrimes` runs, before the LLM turn begins.
- Wire `injectManualSkillPrimes` into `openai.js` + `responses.js` after
  `formatAgentMessages`. Refactored the destructure to `let` on
  `indexTokenCountMap` so the injector's returned map is usable.
- 8 new unit tests covering the step-event builder: pair cardinality,
  default/custom runId, TOOL_CALLS shape + JSON args, progress:1 on
  completion, index ordering, stepId/toolCallId pairing, empty input.

* 🎯 fix: Route Skill Prime Events to the Real Response + Sparse-Array Offset

Two bugs in the optimistic-card emit from the last pass.

1. Wrong runId. The events used `USE_PRELIM_RESPONSE_MESSAGE_ID` (the
   MCP OAuth pattern), but OAuth emits DURING tool loading — before the
   real response messageId exists. By the time skill priming fires, the
   graph is about to emit with `this.responseMessageId`, so the PRELIM
   runId orphaned every card onto the client's placeholder response
   entry in `messageMap`, separate from the one the LLM's events were
   building. Net effect: cards never rendered mid-stream.

   Now passing `this.responseMessageId` — the same ID `createRun`
   receives — so synthetic and real steps land on the same `messageMap`
   entry.

2. Index 0 collision. With the runId fixed, card-at-0 would have hit
   `updateContent`'s type-mismatch guard when the LLM's text delta
   arrived at the same index, suppressing the whole text stream.

   New `SKILL_PRIME_INDEX_OFFSET` = 100 placed on both the live SSE
   emit and the server-side `contentParts` assignment. Sparse array
   during streaming renders as `[llm_text, ..., card]` (skip-holes via
   `Array#filter` / `Array#map`). `filterMalformedContentParts` from
   `sendCompletion` compacts to dense `[text, card]` before persistence,
   so streaming UI and saved message agree on order — no finalize
   reorder jank. Post-run switches from `contentParts.unshift` to
   `contentParts[OFFSET + i] = part` to mirror the live placement.

- Add `startIndex` option to `buildSkillPrimeStepEvents` with
  `SKILL_PRIME_INDEX_OFFSET` default. Export the constant from
  `@librechat/api` so `client.js` can reuse it for the post-run splice.
- Update the existing index-ordering test to the new default and add a
  new test for the explicit `startIndex` override.

* 🎗️ feat: Replace \$skill-name Text with Pills on the User Message

The `$skill-name ` cosmetic text the popover was inserting into the
textarea had two problems: it lingered in the user message forever (the
card is a more meaningful marker), and it implied that free-form text
invocation like \"\$foo help me\" should work — which it doesn't, and
supporting it would mean another parsing layer nobody asked for.

Dropped the textarea insertion. Visual confirmation after submit now
comes from a compact `ManualSkillPills` row on the user bubble that
self-extinguishes once the backend's live skill-card stream
(`buildSkillPrimeStepEvents` from the last commit) populates the sibling
assistant response. Multiple skills render as multiple pills — the atom
was already a string array, so multi-select works for free.

- `SkillsCommand.tsx`: select handler no longer writes to the textarea.
  Still drops the trigger `$` via `removeCharIfLast`, still pushes to
  `pendingManualSkillsByConvoId`, still flips `ephemeralAgent.skills`.
- `families.ts`: new `attachedSkillsByMessageId` atomFamily keyed by
  user messageId. `useChatFunctions.ask` writes the drained skill list
  here on every fresh submit (regenerate/continue/edit still skip).
- `ManualSkillPills.tsx` renders pills conditionally: hidden when the
  message isn't a user message, when no skills are attached, or when
  the sibling assistant response already carries a `skill` tool_call
  content part (the live card took over). Reads messages via React Query
  so we don't re-render on every message-state keystroke.
- `Container.tsx` mounts the pills above the user message text, parallel
  to the existing `Files` slot.
- Updated the SkillsCommand select-flow spec to assert the textarea is
  cleared of `$` instead of populated with `\$name `. 5 new tests for
  `ManualSkillPills` covering empty state, non-user message guard,
  multi-skill rendering, the skill-card hide condition, and the
  text-only-content-doesn't-hide case.

* 🎛️ feat: Manual Skills as Persisted Message Field + Compose-Time Chips

Three problems with the previous pass:
1. Cards rendered BELOW the LLM text on the assistant message (and
   stayed there on reload) because the sparse index-100 offset put them
   after the model's content. Now back to `unshift` — cards at the top,
   same as before the live-emit detour.
2. Pills on the user message disappeared the moment the live card
   arrived, so users barely saw them. The live-emit channel also added
   meaningful complexity and relied on a per-message Recoil atom that
   had no clean cleanup story.
3. No visual cue at all during new-chat compose — the `$name ` text was
   removed, the submitted-message pills weren't there yet, and the
   popover closes after selection. User had no way to see what they'd
   queued up before sending.

New architecture: `manualSkills` is a first-class field on `TMessage`,
persisted by the backend on the user message. `ManualSkillPills` reads
straight from `message.manualSkills` — no atom, no sibling-lookup — so
pills survive reload, show in history, and stay for the lifetime of the
message. Compose-time chips above the textarea read the existing
`pendingManualSkillsByConvoId` atom and let users × skills out before
submitting.

Backend reverts:
- `client.js`: dropped the `ON_RUN_STEP` live-emit loop, restored
  `this.contentParts.unshift(...primeParts)` so cards sit at the top of
  the persisted assistant response.
- `skills.ts`: removed `buildSkillPrimeStepEvents` and
  `SKILL_PRIME_INDEX_OFFSET` (both unused now). `GraphEvents`,
  `StepTypes`, and `Constants` imports went with them. Removed 8 tests.

Field persistence:
- `tMessageSchema` gains `manualSkills: z.array(z.string()).optional()`.
- Mongoose message schema gains `manualSkills: { type: [String] }` with
  matching `IMessage` TS field.
- `BaseClient.js` reads `req.body.manualSkills` on user-message save,
  filters to non-empty strings, pins onto `userMessage` before
  `saveMessageToDatabase`. Mirrors the existing `files` pattern right
  above it. Runtime resolution still reads top-level `req.body.manualSkills`
  — persistence and resolution are separate concerns.

Frontend:
- `useChatFunctions.ask` sets `currentMsg.manualSkills` directly; the
  drained atom value goes onto the message, not a separate atom.
  Removed the `attachSkillsToMessage` Recoil callback.
- `ManualSkillPills`: pure render of `message.manualSkills`. No more
  `useQueryClient`, no sibling scan, no atom read. Loses the
  auto-hide-when-card-arrives behavior — pills stay on the user
  bubble, cards live on the assistant bubble, both are informative.
- Dropped the `attachedSkillsByMessageId` atomFamily and its export.
- New `PendingManualSkillsChips` above the textarea reads the
  compose-time atom and renders chips with × to remove. Mounted in
  `ChatForm` right after `TextareaHeader`. Naturally hides on submit
  when the atom drains.

Tests: updated `ManualSkillPills` suite to the new field-based reads
(5 passing). New `PendingManualSkillsChips` suite covering empty state,
multi-chip render, single × removal, and full-clear (4 passing).
Backend suite trimmed to 89 (was 97) from the step-events test
removal — no regressions on the remaining helpers.

* 🧪 feat: Assistant-Side Skill-Loading Chips + Pill Padding

Two small UX fixes on top of the field-on-message architecture.

Pill padding: bumped the user-side `ManualSkillPills` from `py-0.5` to
`py-1` on each chip and added `py-0.5` to the wrapper so the row
breathes a little without feeling tall.

Mid-stream indicator: new `InvokingSkillsIndicator` mirrors the parent
user message's `manualSkills` onto the assistant bubble as transient
"Running X" chips while the real card is in flight. Renders above
`ContentParts` in `MessageParts`. Hides itself when the assistant's
own `content` grows a `skill` tool_call — the authoritative card from
`buildSkillPrimeContentParts.unshift` is showing, so the placeholder
steps aside. No SSE emit, no aggregator injection, no index
collision with the LLM's streaming content: just a render slot keyed
off the parent's field.

Why not stream the cards live: whichever content index we'd choose
either blocks the LLM's text stream (`updateContent` type-mismatch at
index 0) or lands below the response after sparse compaction (index
100+). Mirroring the parent field sidesteps the aggregator entirely
and gives the user an immediate "skill is loading" signal that
naturally gives way to the real card at finalize.

Covers the gap the user flagged: pills on the user message said "I
asked for these" but nothing on the assistant side said "we're
working on it" until the stream finished. 5 new tests for the
indicator: user-msg guard, missing parent-field guard, multi-chip
render, hides-on-card-landing, orphan-parent guard.

* 🔁 fix: Indicator Visibility + Carry Manual Skills Through Regenerate/Edit

Two bugs.

Indicator never rendered: `InvokingSkillsIndicator` looked up the parent
user message via `queryClient.getQueryData([QueryKeys.messages, convoId])`,
but on a new chat the React Query cache is keyed by `"new"` (the URL
`paramId`) until the server assigns a real conversation ID — while
`message.conversationId` on the assistant message is already the server
ID. Lookup missed, `skills.length === 0`, nothing rendered. Switched
to `useChatContext().getMessages()`, which reads from the same
`paramId` the rest of the UI uses, so new-chat and existing-chat cases
both resolve to the correct message list.

Regenerate / save-and-submit dropped manual skills: the compose-time
`pendingManualSkillsByConvoId` atom is drained on the first submit,
so replaying that turn later found an empty atom and sent `manualSkills: []`.
The pills were still on the user bubble, so from the user's point of
view the model was running primed — but the backend saw nothing and
produced an unprimed response.

- Added `overrideManualSkills?: string[]` to `TOptions`. Callers with a
  reference message pass its persisted `manualSkills`; `useChatFunctions.ask`
  uses the override verbatim when present, otherwise falls back to the
  existing drain-or-empty logic.
- `regenerate` in `useChatFunctions` passes `parentMessage.manualSkills`
  — the user message being regenerated has the field persisted by the
  backend, so the second turn primes the same skills as the first.
- `EditMessage.resubmitMessage` covers both edit branches:
  - User-message save-and-submit: forwards the edited message's own
    `manualSkills` so the new sibling turn primes identically.
  - Assistant-response edit: forwards the parent user message's
    `manualSkills` for the same reason.

Indicator test suite converted from `@tanstack/react-query` harness to
a jest-mocked `useChatContext().getMessages()`. 6 tests (was 5), added
a cache-miss case.

* 🧭 fix: Drive Mid-Stream Skill Chips from Submission Atom, Not Message Lookup

Message-ID-keyed lookups kept racing the stream: the user message flips
from its client-side intermediate UUID to the server-assigned ID mid-run,
conversation IDs flip from the URL `paramId="new"` to the real convo
ID on brand-new chats, and the React Query cache splits briefly between
the two. Previous attempts — direct `queryClient.getQueryData` and then
`useChatContext().getMessages()` — each missed a different window.

`TSubmission.manualSkills` is already populated at `ask()` time and the
submission atom (`store.submissionByIndex(index)`) is the single stable
anchor across the whole lifecycle: set once at submit, lives through
every SSE event, cleared when the stream ends. No ID lookups, no cache
timing.

- `InvokingSkillsIndicator` now reads `submissionByIndex(index)` via
  Recoil. Shows chips when:
    • the message is assistant-side,
    • a submission is in flight with non-empty `manualSkills`,
    • the assistant's `parentMessageId` matches
      `submission.userMessage.messageId` (so chips appear only on the
      bubble for the current turn, never on siblings),
    • the assistant's own content doesn't yet carry a `skill`
      tool_call (real card takes over from the server's post-run
      `contentParts.unshift`).
- Drops the `useChatContext().getMessages()` dependency and the
  `useQueryClient` dependency before that. No more lookups by
  conversationId or messageId.

Test suite now mocks `useChatContext` to supply `index: 0` and seeds
the `submissionByIndex(0)` atom via Recoil initializer. 6 cases cover
user-side, no-submission history, empty `manualSkills`, multi-chip
render, hides-on-card-landing, and wrong-turn guard.

* 🌱 fix: Seed Response manualSkills in createdHandler, Indicator Becomes Pure

The mid-stream indicator kept getting wired off state I don't own: first
`queryClient.getQueryData` (raced the new-chat paramId flip), then
`useChatContext().getMessages()` (same cache, same race), then
`useRecoilValue(submissionByIndex)` (pulled every message into the
submission subscription — re-renders all indicators on any submission
change, exactly the "limit hooks in rendering" concern).

Cleanest path is the one the user pointed at: the submission owns the
data, `useSSE` / `useEventHandlers` owns the save points, so seed the
field ONTO the response message at the save site and let the indicator
be a pure prop-read.

- `createdHandler` now writes `manualSkills` onto the initial response
  from `submission.manualSkills` at the moment the placeholder enters
  the messages array. The field rides through the normal mutation
  pipeline via spreads (`useStepHandler` response creation,
  `updateContent` result returns) — no special handling needed.
- `InvokingSkillsIndicator` drops the Recoil / context / queryClient
  reads. Pure function of `message`: if assistant, has `manualSkills`,
  and `content` hasn't grown a `skill` tool_call yet, render chips.
  Only `useLocalize` left, which was already unavoidable for the i18n
  string.
- Renders decouple: no single state change (`submissionByIndex` flip,
  React Query cache update) forces every indicator in the message list
  to re-render anymore. Only the message whose prop changed re-runs.

Finalize story unchanged: server's `responseMessage` doesn't carry the
frontend-only `manualSkills` field, so `finalHandler`'s replacement
drops it — but by then the real `skill` tool_call is in `content`
and the indicator's content-scan hides itself anyway.

Test suite back to pure prop mocks: 7 cases covering user-guard,
no-seed, multi-chip render, skill-card-hide, non-skill-tool-call-keeps,
text-only-keeps, and missing message.

* 🪞 fix: Render Skill Indicator Inside ContentParts, Adjacent to Parts

The indicator still wasn't showing because even though MessageParts
mounted it as a sibling of ContentParts, ContentParts is a `memo`'d
component that owns the only rendering path that refreshes in lockstep
with content deltas. Mounting above it put the indicator one layer
further out — reachable, but not exercised on the same render cycle
that processes the streaming `message` prop.

Moved the indicator into ContentParts itself, rendered at the top of
both the sequential and parallel branches. Reads the `message` prop
(newly threaded through as an optional prop alongside `content`), so:

- Same render cycle as Parts — updates from the SSE pipeline flow
  through the same pathway.
- Lives outside the `content.map`, so delta-driven content reshuffles
  never wipe it.
- Still a pure prop-read inside the indicator itself (no Recoil,
  queryClient, context hooks). The only dep is `useLocalize`.

Thread:
- `ContentPartsProps` gains `message?: TMessage`.
- `MessageParts` passes `message={message}` through, drops its own
  indicator mount + import.
- `ContentParts` renders `<InvokingSkillsIndicator message={message} />`
  in both the parallel-content and sequential-content branches, right
  under `MemoryArtifacts` and before the empty-cursor / parts map.

Companion data flow (unchanged): `createdHandler` seeds
`initialResponse.manualSkills` from `submission.manualSkills`; the
field rides through `useStepHandler` via spreads; indicator hides on
`skill` tool_call landing in `content`.

* 🔎 refactor: Narrow Skill Components to Scalar skills Prop, Kill Memo Churn

Passing the full `message` object into presentational components busts
`React.memo` shallow comparisons every time the message reference changes
for unrelated reasons. Swap to scalar `skills?: string[]` throughout:

- `InvokingSkillsIndicator`: props-only (`skills?: string[]`); visibility
  logic (user-vs-assistant, skill tool_call arrival) now lives in the
  caller so this stays pure presentational.
- `ManualSkillPills`: props-only (`skills?: string[]`).
- `ContentParts`: takes `manualSkills?: string[]` scalar, computes
  `showInvokingSkills` once per render from `manualSkills` + content scan
  for the `skill` tool_call, then mounts the indicator with `skills=`
  prop in both parallel and sequential branches.
- `MessageParts`: passes `manualSkills={message.manualSkills}` through
  to `ContentParts`.
- `Container`: passes `skills={message.manualSkills}` to `ManualSkillPills`.
- Tests updated to exercise the narrowed prop surface.

* 📜 feat: Mid-Stream Skill Cards via SkillCall, Drop Custom Indicator

Instead of a separate `InvokingSkillsIndicator` chip component, render
pending skill placeholders through the existing `SkillCall` renderer —
same component the backend's finalized prime part uses. The loading
visual (`progress < 1` + empty output → pulsing "Running X") and the
completed visual ("Ran X") now come from one source of truth.

`ContentParts` computes `pendingSkillNames` from `manualSkills` minus
any `skill` tool_call already in `content` (dedupe by `args.skillName`
since the synthetic's id differs from the real one). Those names
render through a separate slot ABOVE the Parts iteration — not
prepended to the content array, which would shift React keys on
every downstream streaming text / tool part and force unmount/remount
mid-stream.

When the real prime `tool_call` lands at finalize (backend unshifts to
content[0..]), `collectExistingSkillNames` picks it up, the pending
set empties, and the real part takes over rendering in the Parts
iteration. Layout is identical either way because primes are always
at the top of content.

- `InvokingSkillsIndicator.tsx` + test deleted (no longer referenced)
- `ContentParts.tsx` renders `<SkillCall .../>` directly for pending
  names, mirrors `Part.tsx`'s usage of the same component
- `createdHandler` doc comment updated to reflect the new flow

* ✂️ fix: Render Interim Skill Cards From manualSkills Only, Leave Content Untouched

Previous revision read `content` to de-dupe pending cards against real
`skill` tool_calls, so any optimistic skill part streamed from the
backend would race our placeholder off the screen mid-turn — exactly
the "getting overridden" symptom.

Now: interim `SkillCall` cards are driven purely by the response
message's `manualSkills` field. `content` is never inspected here,
so no backend delta can pull the cards down. The field is now seeded
directly onto the assistant placeholder in `useChatFunctions` (not
only in `createdHandler`) so the cards appear from the first render,
before the `created` SSE event round-trip.

Lifecycle:
- `useChatFunctions` puts `manualSkills` on the freshly-minted
  `initialResponse` — cards render the instant the placeholder lands.
- `createdHandler` keeps its own re-seed (idempotent; safe) so a
  regenerate / save-and-submit flow that hits that path still works.
- `useStepHandler` spread operations preserve the field through every
  content update.
- `finalHandler` replaces the message with the server-backed
  `responseMessage` (no `manualSkills`) — cards disappear, and the
  real `skill` tool_call part in `content` takes over.

ContentParts changes:
- Drop `collectExistingSkillNames` / `parseJsonField` dedupe path.
- `renderPendingSkills` reads only `manualSkills` + `isCreatedByUser`.
- Simpler control flow — one boolean (`hasPendingSkills`) gates the
  early return, one function renders.

* 🩹 fix: Codex Review Resolutions — Localization, Guards, Tests, Docs

Addresses seven findings from comprehensive code review:

Finding 1 (MAJOR) — Document sticky re-priming as intentional
- `buildSkillPrimeContentParts`: expanded doc comment explaining
  synthetic `skill` tool_calls persist and get re-primed on every
  subsequent turn via `extractInvokedSkillsFromPayload` (shape parity
  with model-invoked skills). This matches the UX: the assistant
  skill card is a visible, persistent signal that the skill is active
  for the conversation. Not a bug — called out explicitly so future
  maintainers don't mistake it for one.

Finding 2 (MAJOR) — Add ContentParts render tests
- New `ContentParts.test.tsx` with 7 cases covering the interim skill
  card logic: assistant-only rendering, user-message suppression,
  undefined-content safety, parallel+sequential branch integration,
  progress<1 (pending) state. Child components mocked so the test
  exercises only the branching and prop wiring ContentParts owns.

Finding 3 (MINOR) — Localize hardcoded aria-labels
- Added `com_ui_skills_manual_invoked` + `com_ui_skills_queued` keys.
- Reused existing `com_ui_remove_skill_var` for the remove-button
  aria-label.
- `PendingManualSkillsChips` and `ManualSkillPills` now call
  `useLocalize()`. Test mocks updated to the label-echo pattern.

Finding 4 (MINOR) — Max-length guard in `extractManualSkills`
- New `MAX_SKILL_NAME_LENGTH = 200` constant and filter. Blocks a
  crafted payload like `{ manualSkills: ['a'.repeat(100000)] }` from
  reaching `getSkillByName` / Mongo's query planner.

Finding 5 (NIT) — `BaseClient.js` comment contradicted itself
- Rewrote to call the filter what it is: defense-in-depth on top of
  Mongoose schema validation, not a redundant second layer.

Finding 6 (NIT) — `ManualSkillPills` now wrapped in `React.memo`
- Consistent with peer components (`PendingManualSkillsChips`,
  `ContentParts`). Rendered inside `Container`, which re-renders on
  every content update, so the memo is a real cycle savings.

Finding 7 (NIT) — Redundant guard in `ContentParts.renderPendingSkills`
- Collapsed the duplicate null-check by computing `pendingSkills` as
  a `useMemo`'d array (`[]` when not applicable), and mapping
  directly. `hasPendingSkills` now derives from the array length —
  one source of truth, no redundant gate inside the render function.

* 🔧 fix: Update ParallelContent to Handle Optional Content Prop

Modified the `ParallelContentRendererProps` to make the `content` prop optional, ensuring safer access within the component. Adjusted the calculation of `lastContentIdx` to handle cases where `content` may be undefined, preventing potential runtime errors. This change enhances the robustness of the component when dealing with varying message structures.

* 🎯 fix: Thread manualSkills Through ContentRender — The Real Renderer

This is why the interim skill cards never appeared across many rounds of
iteration: `ContentRender.tsx` (the memo'd renderer used by most paths,
including the agents endpoint) was calling `ContentParts` without the
`manualSkills` prop. Only `MessageParts.tsx` had it wired up — and
that's not the component that actually renders the assistant response
in production.

Two fixes:
1. Pass `manualSkills={msg.manualSkills}` to the `ContentParts` call.
2. Extend the `areContentRenderPropsEqual` memo comparator to include
   `manualSkills.length`, otherwise a message update that adds the
   field (seeded by `useChatFunctions` on the initialResponse) would
   be bailed out by the memo and never re-render.

Verified the two ContentParts call sites are now consistent; Container
usages for `ManualSkillPills` on the user side were already correct.

* 🧹 polish: Address Audit Follow-Up (F1/F3/F6)

F1 — Clarify sticky re-priming opt-out path.
  The previous comment said "regenerate without the pick" as one
  opt-out, but `useChatFunctions.regenerate` forwards the original
  picks via `overrideManualSkills`, so regeneration alone keeps the
  skill sticky. Updated to: edit the originating message to remove
  the pills and resubmit, or start a new conversation.

F3 — Add DOM-order assertions to the parallel + sequential tests.
  The two "alongside" tests verified both elements existed but
  didn't pin the ordering contract. Both now use
  `compareDocumentPosition` to assert the pending SkillCall
  precedes the real content, matching the backend semantic
  (`contentParts.unshift(...primeParts)` puts primes at the top).

F6 — Fix package import order in PendingManualSkillsChips.
  `recoil` (58 chars) was listed before `lucide-react` (45 chars)
  which violates the "shortest to longest after react" rule in
  AGENTS.md. Swapped order; no behavior change.

F2 / F4 / F5 from the audit were confirmed as non-issues
(React-safe empty map, cosmetic test-mock artifact, accepted
memo tradeoff) and require no change.

*  feat: Dedicated PendingSkillCall + Running→Ran Transition on Real Content

UX polish on the interim skill card now that it's actually rendering:

1. New `PendingSkillCall` component (mirrors `SkillCall` visually but
   drops the expand affordance). `SkillCall`'s underlying `ProgressText`
   always renders a chevron + clickable button when any input is
   present, which on a card with empty output points at nothing —
   misleading cursor:pointer and a no-op toggle. The pending variant
   has only the icon + label, no button wrapper, no chevron.

2. "Running X" → "Ran X" transition when real content lands.
   `ContentParts` computes `hasRealContent` (any non-text part, or a
   text part with non-empty content — placeholder empty-text parts
   don't count) and passes `loaded={hasRealContent}` to
   `PendingSkillCall`. Matches what users see for model-invoked skills
   as they finish priming: pulsing shimmer → static icon.

3. Cleanup:
   - Dropped direct `SkillCall` import from `ContentParts` (replaced
     by `PendingSkillCall`). `SkillCall` is still used by `Part` for
     real `skill` tool_call content parts — no behavior change there.
   - Removed the now-redundant explicit `manualSkills` assignment
     in `createdHandler`. `useChatFunctions` seeds the field on
     `initialResponse` at construction, so the `...submission.initialResponse`
     spread already carries it through — the re-assignment was
     defensive belt-and-suspenders doing the same work twice. Comment
     rewritten to describe the actual lifecycle.

Tests updated to the new component (12/12 pass): two new cases pin
the loaded-state transition (unloaded when content has no real parts,
flips to loaded once a non-empty text part lands).
2026-04-25 04:02:00 -04:00
Danny Avila
9225a279eb 🎚️ feat: Per-User Skill Active/Inactive Toggle with Ownership-Aware Defaults (#12692)
* feat: per-user skill active/inactive toggle with ownership-aware defaults

- Add `skillStates` map (Record<string, boolean>) to user schema for
  per-user active/inactive overrides on skills
- Add `defaultActiveOnShare` to interface.skills config (default: false)
  so admins can control whether shared skills auto-activate
- Add GET/POST /api/user/settings/skills/active endpoints with validation
- Add React Query hooks with optimistic mutations for skill states
- Add useSkillActiveState hook with ownership-aware resolution:
  owned skills default active, shared skills default inactive
- Add toggle switch UI to SkillListItem and SkillDetail components
- Filter inactive skills in injectSkillCatalog before agent injection
- Add localization keys for active/inactive labels

* fix: use Record instead of Map for IUser.skillStates

Mongoose .lean() flattens Map to a plain object, causing type
incompatibility with IUser in methods that return lean documents.

* fix: address review findings for skill active states

- Fail-closed when userId is absent: filter rejects all shared skills
  instead of passing them through unfiltered (Codex P1)
- Validate Mongoose Map key characters (reject . and $) in controller
  to return 400 instead of a 500 from schema validation (Codex P2)
- Block toggle while initial skill states query is loading to prevent
  overwriting server-side overrides with an empty snapshot (Codex P2)
- Extract shared SkillToggle component, eliminating duplicate toggle
  markup in SkillListItem and SkillDetail (Finding #3)
- Move skill state query/mutation hooks from Favorites.ts to
  Skills/queries.ts per feature-directory convention (Finding #4)
- Fix hardcoded English aria-label in SkillListItem by passing the
  localized string from the parent SkillList (Finding #5)
- Fix inline arrow in SkillList render loop: pass stable callback
  reference so SkillListItem memo() is not invalidated (Finding #1)
- Extract toRecord() helper in controller to DRY the Map-to-Object
  conversion (Finding #6)
- Remove Promise.resolve wrapping synchronous config read (Finding #8)
- Remove unused TUpdateSkillStatesRequest type (Finding #12)

* fix: forward tabIndex on SkillToggle to preserve list keyboard nav

The original inline toggle had tabIndex={-1} so the row itself
remained the sole tab target. The extraction into SkillToggle
dropped this prop, making every list toggle a tab stop. Add an
optional tabIndex prop and pass -1 from SkillListItem.

* fix: plumb skillStates to all agent entry points, isolate toggle keydown

- Add skillStates/defaultActiveOnShare loading to openai.js and
  responses.js controllers so shared-skill activation is respected
  across all agent entry points, not just initialize.js (Codex P1)
- Stop keydown propagation on SkillToggle so Enter/Space does not
  bubble to the parent row's navigation handler (Codex P2)

* fix: paginate catalog fetch and serialize toggle writes

- Paginate listSkillsByAccess (up to 10 pages of 100) until the active
  catalog quota is filled, so inactive shared skills in recent positions
  do not starve active owned skills past the first page (Codex P1)
- Extend listSkillsByAccess interface with cursor/has_more/after for
  catalog pagination
- Serialize skill-state writes via a ref queue: one in-flight request
  at a time, with the latest desired state sent when the previous one
  settles. Prevents last-response-wins races where an older request
  overwrites newer toggles (Codex P2)

* fix: share write queue across hook instances, block toggle on fetch error

- Move the write queue from a per-instance useRef to a module-scoped
  object so every mount of useSkillActiveState (SkillList, SkillDetail,
  etc.) serializes against the same in-flight slot. Prior per-instance
  queues allowed two components to race full-map POSTs (Codex P1)
- Extend the toggle guard beyond isLoading: also block when isError is
  true or data is undefined. Prevents a failed GET from seeding a
  toggle with an empty baseline that would wipe server-side overrides
  on the next successful POST (Codex P1)

* fix: stale closure, orphan cleanup, and cap-error UX

- Read toggle baseline from React Query cache via queryClient.getQueryData
  instead of the captured skillStates closure. The closure can be stale
  between onMutate's setQueryData and the next render, so rapid successive
  toggles would build on old state and drop earlier changes (Codex P1)
- Surface the MAX_SKILL_STATES_EXCEEDED error code with a specific toast
  key (com_ui_skill_states_limit) so users understand the 200-cap rather
  than seeing a generic error
- Prune orphaned entries (skillIds whose Skill doc no longer exists) on
  both GET and POST in SkillStatesController. Self-heals over time
  without needing cascade-delete hooks or a migration job. Uses one
  indexed Skill._id query per request

* test: pin skill active-state precedence with unit tests

Extract the active-state resolution logic from a closure inside
injectSkillCatalog into an exported resolveSkillActive helper, then
cover every branch of the precedence matrix:

- Fails closed when userId is absent (even with defaultActiveOnShare=true)
- Explicit override wins over ownership and config (both true and false)
- Owned skills default to active when no override is set
- Shared skills default to defaultActiveOnShare value
- Undefined skillStates behaves identically to an empty object
- defaultActiveOnShare defaults to false when omitted
- Owned skills ignore defaultActiveOnShare entirely

Closes Finding #2 from the pre-rebase comprehensive review. Mirrors
the existing scopeSkillIds test style; injectSkillCatalog now calls
resolveSkillActive instead of inlining the closure.

* refactor: limit skill active toggle to detail header, drop label

- Remove the per-row toggle from SkillListItem and the active-state
  plumbing (hook call, isSkillEnabled/onToggleEnabled/toggleAriaLabel
  props) from SkillList. The detail view is now the single place to
  change a skill's active state
- Drop dim/muted styling for inactive skills in the sidebar: without
  a control there, the visual indication has nowhere to land
- Resize SkillToggle to match neighbor buttons: outer h-9 container,
  h-6 w-11 track with size-5 knob, no label span. The 'Active' /
  'Inactive' text that accompanied the detail-view toggle is removed
- Remove the now-unused label prop and tabIndex prop (the tabIndex
  existed only for the list-row context) from SkillToggle. Drop the
  onKeyDown stopPropagation for the same reason
- Remove now-orphaned com_ui_skill_active / com_ui_skill_inactive
  translation keys

* style: shrink SkillToggle track to h-5 w-9 with size-4 knob

Container stays at h-9 to match neighbor button heights. The toggle
track itself drops from h-6 w-11 to h-5 w-9, with a size-4 knob
travelling 1.125rem on activation. Visually lighter inside the row.

* fix: remove redundant skillStates entries that match the resolved default

When a toggle lands on the ownership/config default, delete the key
from the map instead of persisting `{id: defaultValue}`. Without this,
a user toggling a skill off and back on would leave `{id: true}` for
an owned skill (whose default is already true), silently consuming a
slot against the 200-entry cap. Repeated round-trip toggles could
exhaust the quota with zero meaningful overrides (Codex P2).

Preserves the exceptions-list invariant that the runtime-resolution
design depends on.

* fix: prune before enforcing skill-state cap; reject non-ObjectId keys

Reorder the update controller so pruneOrphans runs before the 200-cap
check. Without this, a user near the cap with some orphaned entries
(skills deleted since their last GET) could send a payload that would
pass after pruning but gets rejected by the raw-size check first.

Add a sanity cap on raw payload size (2 * MAX_SKILL_STATES) so abusive
inputs do not reach the DB query, and enforce the real cap on the
pruned result instead.

Harden pruneOrphans: the earlier early-return path could pass
non-ObjectId keys through unchanged. Now only valid ObjectIds are
returned, and the Skill-model-unavailable fallback filters by format.

Also add isValidObjectIdString validation at the input boundary so
malformed (but otherwise non-Mongo-unsafe) keys never reach persistence
(Codex P2 x2).

* fix: enforce active filter at execute time, prune revoked shares, scope queue per user

P1: injectSkillCatalog now returns activeSkillIds (the filtered set
that appears in the catalog). initializeAgent uses that set as the
stored accessibleSkillIds on the initialized agent, so getSkillByName
at runtime cannot resolve a deactivated skill — even if the LLM
hallucinates a name or the user invokes by direct-invocation shorthand.
Previously the executor authorized against the full ACL set, bypassing
the active-state guarantee (Codex P1).

P2: pruneOrphans now checks user access via findAccessibleResources
in addition to skill existence. When a share is revoked, the user's
skillStates entry for that skill had no cleanup path and silently
consumed the 200-cap. Self-heals on both GET and POST. One extra ACL
query per settings read/write; scoped to a single user so no N-user
amplification (Codex P2).

P2: the write queue moves from a single module-scoped object to a Map
keyed by userId. Logout/login in the same tab can no longer flush the
previous user's pending snapshot under the new session's auth. Each
userId gets its own pending/inFlight slot; the in-flight request
retains its original auth via the cookie already attached when sent,
so the race window closes (Codex P2).

* refactor: extract skillStates helpers to packages/api; add tests; polish

Address the remaining valid findings from the comprehensive review:

- Extract toRecord, loadSkillStates, validateSkillStatesPayload, and
  pruneOrphanSkillStates into packages/api/src/skills/skillStates.ts
  as TypeScript. The controller in /api shrinks to a ~90-line thin
  wrapper that builds live dependency adapters for Mongoose + the
  permission service (Review #2 DRY, #3 workspace boundary)

- Replace the triplicated 12-line skillStates loading block in
  initialize.js, openai.js, and responses.js with a single call to
  loadSkillStates from @librechat/api. One helper, three sites

- Swap console.error for the project logger in the controller
  (Review #7)

- Remove the redundant INVALID_KEY_PATTERN regex: a valid ObjectId
  cannot contain . or $, so isValidObjectIdString already covers it
  (Review #11)

- Parameterize the 200-cap error toast with {{0}} interpolation
  driven by the error response's `limit` field, so future changes to
  MAX_SKILL_STATES update the UI message automatically (Review #12)

- Add 24 unit tests for the new skillStates helpers (toRecord,
  resolveDefaultActiveOnShare, loadSkillStates, validateSkillStates-
  Payload, pruneOrphanSkillStates) covering success paths, malformed
  input, cap boundaries, and parallel-query behavior (Review #4)

- Add 10 tests for injectSkillCatalog pagination covering empty
  accessible set, missing listSkillsByAccess, single-page filter,
  owned-vs-shared defaults, explicit-override precedence, multi-page
  collection, MAX_CATALOG_PAGES safety cap, early termination on
  has_more=false, additional_instructions injection, and fail-closed
  without userId (Review #5)

Total test count: 60 (was 26 on this surface).

* fix: rename skillStates ValidationError to avoid barrel-export collision

packages/api/src/types/error.ts already exports a ValidationError
(MongooseError extension). Re-exporting a different shape from
skills/skillStates.ts through the skills barrel caused TS2308 in CI
because the root index re-exports both. Rename to
SkillStatesValidationError to keep the exports disjoint.

* refactor: tighten tests and absorb caller guard into loadSkillStates

Address the followup review findings:

- Add optional `accessibleSkillIds` param to loadSkillStates so the
  helper short-circuits to defaults when no skills are accessible.
  All three controllers drop the residual 7-line conditional wrapper
  in favor of a single destructured call (Review #2)

- Remove the unreachable `typeof key !== 'string'` check from
  validateSkillStatesPayload: Object.entries always yields string
  keys per the JS spec (Review #3)

- Replace the two `as unknown as` agent casts in the injectSkillCatalog
  tests with a `makeAgent()` factory typed directly as the function's
  parameter shape (Review #4)

- Tighten the MAX_CATALOG_PAGES assertion from `toBeLessThanOrEqual(11)`
  to `toHaveBeenCalledTimes(10)` — the loop deterministically makes
  exactly 10 page fetches before hitting the cap (Review #1)

- Rewrite the parallel-execution test for pruneOrphanSkillStates using
  deferred promises instead of microtask-order assertions. The test
  now inspects `toHaveBeenCalledTimes(1)` on both mocks after a single
  Promise.resolve() yield, pinning Promise.all usage without relying
  on push-order into a shared array (Review #5)

- Evict stale writeQueue entries on user change via a module-scoped
  `lastSeenUserId` sentinel. When a different user's toggle is the
  first one after a logout/login, the previous user's queue entry is
  deleted. Keeps the Map bounded without adding hook-instance effect
  cleanup (Review #6)

* fix(test): mock loadSkillStates in openai and responses controller specs

The prior refactor replaced the inline 12-line skillStates loading
block with a call to loadSkillStates from @librechat/api. Both
controller spec files mock @librechat/api as a flat object, so any
new named import from that package is undefined in the test env.
Calling `await loadSkillStates(...)` threw before recordCollectedUsage
ran, surfacing as "undefined is not iterable" on the test's array
destructure of `mockRecordCollectedUsage.mock.calls[0]`.

Add the missing mock to both spec files alongside the existing
scopeSkillIds stub.

* fix: abandon stale skillStates write queues on user switch

Close the cross-session leak window where an in-flight flush loop
still holds a reference to a previous user's queue: it could fire its
next mutateAsync under the new session's auth cookies and persist
the stale snapshot to the new user's document (Codex P1).

Add an `abandoned` flag on `WriteQueue`. Three mechanisms cooperate:

- `getWriteQueue` marks every non-active queue abandoned when the
  user differs from the last-seen identity (pre-existing eviction
  site, now more aggressive).
- A `useEffect` on `userId` calls the same abandonment pass on every
  render with a new active identity, covering the window between
  logout/login and the new user's first toggle (when `getWriteQueue`
  would otherwise not fire).
- The flush loop checks `!queue.abandoned` in its while condition so
  the second and later iterations exit without firing another
  `mutateAsync` after the session changes.

The first iteration's in-flight request (already dispatched under the
original user's cookies) still runs to completion or failure on its
own — only the subsequent iterations, which are the dangerous ones,
are blocked.
2026-04-25 04:02:00 -04:00
Danny Avila
3e064c2f2b 🎯 feat: Per-Agent Skill Selection in Builder and Runtime Scoping (#12689)
* feat: per-agent skill selection in builder and runtime scoping

Wire skills persistence on the Agent model and enable the skills
section in the agents builder panel. At runtime, scope the skill
catalog to only the skills configured on each agent (intersected
with user ACL). When no skills are configured, the full user catalog
is used as the default. The ephemeral chat toggle overrides per-agent
scoping to provide the full catalog.

* fix: add scopeSkillIds to @librechat/api mock in responses unit test

The test mocks @librechat/api but was missing the newly imported
scopeSkillIds, causing createResponse to throw before reaching the
assertions. Added a passthrough mock that returns the input array.

* fix: scope primeInvokedSkills by agent's configured skills

primeInvokedSkills was receiving the full unscoped accessibleSkillIds,
bypassing the per-agent skill scoping applied to initializeAgent. This
allowed previously invoked skills from message history to be resolved
and primed even when excluded from the agent's configured skill set.

Apply the same scopeSkillIds filtering to match the initializeAgent
calls, so skill resolution is consistent across catalog injection
and history priming.

* fix: preserve agent skills through form reset and union prime scope

Two related bugs in the per-agent skill selection flow:

1. resetAgentForm dropped the persisted skills array because the generic
   fall-through at the end of the loop excludes object/array values.
   Combined with composeAgentUpdatePayload always emitting skills, this
   caused any save of a previously-configured agent to silently overwrite
   skills with an empty array. Add an explicit case for skills mirroring
   the agent_ids handling.

2. primeInvokedSkills processes the full conversation payload, including
   prior handoff-agent invocations. Scoping it to only primaryAgent.skills
   meant a skill invoked by a handoff agent in a prior turn could not be
   resolved when the current primary agent had a different scope, leaving
   message history reconstruction incomplete. Union the per-agent scoped
   accessibleSkillIds across primary plus all loaded handoff agents so
   any skill any active agent could invoke is resolvable from history.

* fix: mark inline skill removals as dirty

The inline X button on the skills list called setValue without
shouldDirty: true, so removing a skill via this control did not
mark the skills field as dirty in react-hook-form state. When a
user removed a skill with the X button and also staged an avatar
upload in the same save, isAvatarUploadOnlyDirty returned true and
onSubmit short-circuited to avatar-only upload, silently dropping
the PATCH that would persist the skill removal.

The dialog path (SkillSelectDialog) already passes shouldDirty: true
on add/remove; this aligns the inline control with that behavior.

* fix: restore full ACL scope for primeInvokedSkills history reconstruction

Reverting the earlier scoping of primeInvokedSkills to the active-agent
union. That change conflated runtime invocation scoping (which correctly
gates what the model can call now) with history reconstruction (which
restores bodies the model already saw in prior turns).

Per-agent scoping still applies at:
- Catalog injection (injectSkillCatalog via initializeAgent)
- Runtime invocation (handleSkillToolCall via enrichWithSkillConfigurable,
  using each agent's scoped accessibleSkillIds in agentToolContexts)

History priming is a read of past context, not a grant of new capability.
Scoping it causes historical skill bodies to vanish from formatAgentMessages
when an agent's skills list is edited mid-conversation or when the ephemeral
toggle flips, which breaks message reconstruction and drops code-env file
continuity for /mnt/data/{skillName}/ references. The user's ACL-accessible
set is the correct and sufficient gate for history reconstruction.

* fix: close openai.js skill gap and pin undefined vs [] semantics

Three related gaps surfaced in review:

1. api/server/controllers/agents/openai.js was a third skill resolution
   site alongside responses.js and initialize.js, but still used the old
   activation gate (required ephemeralAgent.skills === true) and never
   passed accessibleSkillIds through scopeSkillIds. Per-agent scoping
   silently did not apply on this route. Mirror the same pattern used
   in responses.js so all three routes behave identically.

2. scopeSkillIds previously collapsed undefined and [] into the same
   "full catalog" fallback, making it impossible for a user to express
   "this agent has no skills." Tighten the semantics before any data
   is written under the old behavior:
     - undefined / null = not configured, full catalog
     - []              = explicitly none, returns []
     - non-empty       = intersection with ACL-accessible set
   Update defaultAgentFormValues.skills from [] to undefined so a brand
   new agent whose skills UI was never touched does not accidentally
   persist "explicit none" on first save (removeNullishValues strips
   undefined from the payload server side).

3. Add direct unit tests for scopeSkillIds covering all five cases
   (undefined, null, empty, disjoint, overlap, exact match, empty
   accessible set). 16 tests total in skills.test.ts pass.

* fix: add scopeSkillIds to @librechat/api mock in openai unit test

Same pattern as the earlier responses.unit.spec.js fix: the test mocks
@librechat/api with an explicit object, so each newly imported symbol
must be added to the mock. Without scopeSkillIds, OpenAIChatCompletion
controller throws on destructuring before reaching recordCollectedUsage,
causing the token usage assertions to fail.
2026-04-25 04:02:00 -04:00
Danny Avila
64ec5f18b8 ⚙️ feat: Skill runtime integration: catalog, tools, execution, file priming (#12649)
* feat: Skill runtime integration — catalog injection, tool registration, execute handler

Wires the @librechat/agents SkillTool primitive into LibreChat's agent runtime:

**Enums:**
- Add `skills` to AgentCapabilities + defaultAgentCapabilities

**Data layer:**
- Add `getSkillByName(name, accessibleIds)` — compound query that
  combines name lookup + ACL check in one findOne

**Agent initialization (packages/api/src/agents/initialize.ts):**
- Accept `accessibleSkillIds` param and `listSkillsByAccess` db method
- Query accessible skills, format catalog via `formatSkillCatalog()`,
  append to `additional_instructions` (appears in agent system prompt)
- Register `SkillToolDefinition` + `createSkillTool()` when catalog
  is non-empty (tool appears in model's tool list)
- Store `accessibleSkillIds` and `skillCount` on InitializedAgent

**Execute handler (packages/api/src/agents/handlers.ts):**
- Add `getSkillByName` to `ToolExecuteOptions`
- `handleSkillToolCall()` intercepts `Constants.SKILL_TOOL`:
  extracts skillName, loads body from DB with ACL check,
  substitutes $ARGUMENTS, returns ToolExecuteResult with
  injectedMessages (skill body as isMeta user message)

**Caller wiring:**
- initialize.js: query skill IDs via findAccessibleResources,
  pass to initializeAgent + store on agentToolContexts,
  add getSkillByName to toolExecuteOptions,
  pass accessibleSkillIds through loadTools configurable
- openai.js + responses.js: same pattern for their flows

Requires @librechat/agents >= 3.1.65 (PR #91 exports).

* feat: Skills toggle in tools menu + backend capability gating

Frontend:
- Add skills?: boolean to TEphemeralAgent type
- Add LAST_SKILLS_TOGGLE_ to LocalStorageKeys for persistence
- Add skillsEnabled to useAgentCapabilities hook
- Add skills useToolToggle to BadgeRowContext with localStorage init
- New Skills.tsx badge component (Scroll icon, cyan theme,
  permission-gated via PermissionTypes.SKILLS)
- Add skills entry to ToolsDropdown with toggle + pin
- Render Skills badge in BadgeRow ephemeral section

Backend:
- Extract injectSkillCatalog() into packages/api/src/agents/skills.ts
  (reduces initializeAgent module size, reusable helper)
- initializeAgent delegates to helper instead of inline block
- Capability-gate the findAccessibleResources query:
  - Agents endpoint: checks AgentCapabilities.skills in admin config
  - OpenAI/Responses controllers: checks ephemeralAgent.skills toggle
- ACL query runs once per run, result shared across all agents

* refactor: remove createSkillTool() instance from injectSkillCatalog

SkillTool is event-driven only. The tool definition in toolDefinitions
is sufficient for the LLM to see the tool schema. No tool instance is
needed since the host handler intercepts via ON_TOOL_EXECUTE before
tool.invoke() is ever called.

Removes tools from InjectSkillCatalogParams/Result, drops the
createSkillTool import.

* feat: skill file priming, bash tool, and invoked skills state

Multi-file skill support:
- New primeSkillFiles() helper (packages/api/src/agents/skillFiles.ts)
  uploads skill files + SKILL.md body to code execution environment
- handleSkillToolCall primes files on invocation when skill.fileCount > 0,
  returns session info as artifact so ToolNode stores the session
- Skill-primed files available to subsequent bash/code tool calls

Bash tool auto-registration:
- BashExecutionToolDefinition added alongside SkillToolDefinition when
  skills are enabled, giving the model a bash tool for running scripts

Conversation state:
- Add invokedSkillIds field to conversation schema (Mongoose + Zod)
- handleSkillToolCall updates conversation with $addToSet on success
- Enables re-priming skill files on subsequent runs (future)

Dependency wiring:
- Pass listSkillFiles, getStrategyFunctions, uploadCodeEnvFile,
  updateConversation through ToolExecuteOptions
- Pass req and codeApiKey through mergedConfigurable
- All three controller entry points wired (initialize.js, openai.js,
  responses.js)

* fix: load bash_tool instance in loadToolsForExecution, remove file listing

- Add createBashExecutionTool to loadToolsForExecution alongside PTC/ToolSearch
  pattern: loads CODE_API_KEY, creates bash tool instance on demand
- Add BASH_TOOL and SKILL_TOOL to specialToolNames set so they don't go
  through the generic loadTools path (bash is created here, skill is
  intercepted in handler before tool.invoke)
- Remove file name listing from skill content text — it's the skill
  author's responsibility to disclose files in SKILL.md, not the framework

* feat: batch upload for skill files, replace sequential uploads

- Add batchUploadCodeEnvFiles() to crud.js: single POST to /upload/batch
  with all files in one multipart request, returns shared session_id
- Rewrite primeSkillFiles to collect all streams (SKILL.md + bundled files)
  then do one batch upload instead of N sequential uploads
- Replace uploadCodeEnvFile with batchUploadCodeEnvFiles across all callers
  (handlers.ts, initialize.js, openai.js, responses.js)

* refactor: remove invokedSkillIds from conversation schema

Skills aren't re-loaded between runs, so conversation-level state for
invoked skills doesn't help. Skill state will live on messages instead
(like tool_search discoveredTools and summaries), enabling in-place
re-injection on follow-up runs.

Removes invokedSkillIds from: convo Mongoose schema, IConversation
interface, Zod schema, ToolExecuteOptions.updateConversation, and
all three caller wiring points.

* feat: smart skill file re-priming with session freshness checking

Schema:
- Add codeEnvIdentifier field to ISkillFile (type + Mongoose schema)
- Add updateSkillFileCodeEnvIds batch method (uses tenantSafeBulkWrite)
- Export checkIfActive from Code/process.js

Extraction:
- Add extractInvokedSkillsFromHistory() to run.ts — scans message
  history for AIMessage tool_calls where name === 'skill', extracts
  skillName args. Follows same pattern as extractDiscoveredToolsFromHistory.

Smart re-priming in primeSkillFiles:
- Before batch uploading, checks if existing codeEnvIdentifiers are
  still active via getSessionInfo + checkIfActive (23h threshold)
- If session is still active, returns cached references (zero uploads)
- If stale or missing, batch-uploads everything and persists new
  identifiers on SkillFile documents (fire-and-forget)
- Single session check covers all files (batch shares one session_id)

Wiring:
- Pass getSessionInfo, checkIfActive, updateSkillFileCodeEnvIds
  through ToolExecuteOptions and all three controller entry points

* feat: wire skill file re-priming at run start via initialSessions

Flow:
1. initialize.js creates primeInvokedSkills callback with all deps
2. client.js calls it with message history before createRun
3. extractInvokedSkillsFromHistory scans for skill tool calls
4. For each invoked skill with files, primeSkillFiles uploads/checks
5. Returns initialSessions map passed to createRun
6. createRun passes initialSessions to Run.create (via RunConfig)
7. Run constructor seeds Graph.sessions, making skill files available
   to subsequent bash/code tool calls via ToolNode session injection

Requires @librechat/agents with initialSessions on RunConfig (PR #94).

* refactor: use CODE_EXECUTION_TOOLS set for code tool checks

Import CODE_EXECUTION_TOOLS from @librechat/agents and replace inline
constant checks in handlers.ts and callbacks.js. Fixes missing bash
tool coverage in the session context injection (handlers.ts) and code
output processing (callbacks.js).

* refactor: move primeInvokedSkills to packages/api, add skill body re-injection

Moves primeInvokedSkills from an inline closure in initialize.js (with
dynamic requires) to a proper exported function in packages/api
skillFiles.ts with explicit typed dependencies.

Key changes:
- primeInvokedSkills now returns both initialSessions (for file priming)
  AND injectedMessages (skill bodies for context continuity)
- createRun accepts invokedSkillMessages and appends skill bodies to
  systemContent so the model retains skill instructions across runs
- initialize.js calls the packaged function with all deps passed explicitly
- client.js passes both initialSessions and injectedMessages to createRun

* fix: move dynamic requires to top-level module imports

Move primeInvokedSkills, getStrategyFunctions, batchUploadCodeEnvFiles,
getSessionInfo, and checkIfActive from inline requires to top-level
module requires where they belong.

* refactor: skill body reconstruction via formatAgentMessages, not systemContent

Replaces the lazy systemContent approach with proper message-level
reconstruction:

SDK (formatAgentMessages):
- New invokedSkillBodies param (Map<string, string>)
- Reconstructs HumanMessages after skill ToolMessages at the correct
  position in the message sequence, matching where ToolNode originally
  injected them

LibreChat:
- extractInvokedSkillsFromPayload replaces extractInvokedSkillsFromHistory
  (works with raw TPayload before formatAgentMessages, not BaseMessage[])
- primeInvokedSkills now takes payload instead of messages, returns
  skillBodies Map instead of injectedMessages
- client.js calls primeInvokedSkills BEFORE formatAgentMessages, passes
  skillBodies through as the 4th param
- Removed invokedSkillMessages from createRun (no more systemContent hack)
- Single-pass: skill detection happens inside formatAgentMessages' existing
  tool_call processing loop, zero extra message iterations

* refactor: rename skillBodies to skills for consistency with SDK param

* refactor: move auth loading into primeInvokedSkills, pass loadAuthValues as dep

The payload/accessibleSkillIds guard and CODE_API_KEY loading now live
inside primeInvokedSkills (packages/api) rather than in the CJS caller.
initialize.js passes loadAuthValues as a dependency and the callback
is only created when skillsCapabilityEnabled.

* feat: ReadFile tool + conditional bash registration + skill path namespacing

ReadFile tool (read_file):
- General-purpose file reader, event-driven (ON_TOOL_EXECUTE)
- Schema: { file_path: string } — "{skillName}/{path}" convention
- handleReadFileCall: resolves skill name from path, ACL check, reads
  from DB cache or storage, binary detection, size limits (256KB),
  lazy caching (512KB), line numbers in output
- SKILL.md special case: reads skill.body directly
- Dispatched alongside SKILL_TOOL in createToolExecuteHandler
- Added to specialToolNames in ToolService

Conditional tool registration:
- ReadFile + SkillTool: always registered when skills enabled
- BashTool: only registered when codeEnvAvailable === true
- codeEnvAvailable passed through InitializeAgentParams from caller

Skill file path namespacing:
- primeSkillFiles now uploads as "{skillName}/SKILL.md" and
  "{skillName}/{relativePath}" instead of flat names
- Prevents file collisions when multiple skills are invoked

Wiring:
- getSkillFileByPath + updateSkillFileContent passed through
  ToolExecuteOptions in all three callers

* feat: return images/PDFs as artifacts from read_file, tighten caching

Binary artifact support:
- Images (png, jpeg, gif, webp) returned as base64 in artifact.content
  with type: 'image_url', processed by existing callback attachment flow
- PDFs returned as base64 artifact similarly
- Binary size limit: 10MB (MAX_BINARY_BYTES)
- Other binary files still return metadata + bash fallback

Caching:
- Text cached only on first read (file.content == null check)
- Binary flag cached only on first detection (file.isBinary == null)
- Skill files are immutable; no redundant cache writes

Registration:
- ReadFileToolDefinition now includes responseFormat: 'content_and_artifact'

* chore: update @librechat/agents to version 3.1.66-dev.0 and add peer dependencies in package-lock.json and package.json files

* fix: resolve review findings #1,#2,#4,#5,#6,#10,#13

Critical:
- #1: primeInvokedSkills now accumulates files across all skills into
  one session entry instead of overwriting. Parallel processing via
  Promise.allSettled.
- #2: codeEnvAvailable now computed and passed in openai.js and
  responses.js (was missing, bash tool never registered in those flows)

Major:
- #4: relativePath in updateSkillFileCodeEnvIds now strips the
  {skillName}/ prefix to match SkillFile documents. SKILL.md filter
  uses endsWith instead of exact match.
- #5: File priming guarded on apiKey being non-empty (skip when not
  configured instead of failing with auth error)
- #6: Skills processed in parallel via Promise.allSettled instead of
  sequential for-of loop

Minor:
- #10: Use top-level imports in initialize.js instead of inline requires
- #13: Log warning when skill catalog reaches the 100-skill limit

* fix: resolve followup review findings N1,N2,N4

N1 (CRITICAL): Wire skill deps into responses.js non-streaming path.
Was completely missing getSkillByName, file strategy functions, etc.

N2 (MAJOR): Single batch upload for ALL skills' files. Resolves skills
in parallel (Phase 1), then collects all file streams across skills
and does ONE batchUploadCodeEnvFiles call (Phase 2). All files share
one session_id, eliminating cross-session isolation issues.

N4 (MINOR): Move inline require() to top-level in openai.js and
responses.js, consistent with initialize.js.

* fix: add mocks for new file strategy imports in controller tests

* fix: restore session freshness check, parallelize file lookups, add warnings

R1: Re-add session freshness check before batch upload. Checks any
existing codeEnvIdentifier via getSessionInfo + checkIfActive. If the
session is still active (23h window), returns cached file references
with zero re-uploads.

R2: listSkillFiles calls parallelized via Promise.all (were sequential
in the for-of loop).

R3: Log warning when skill record lookup fails during identifier
persistence (was a silent empty-string fallback).

* fix: guard freshness cache on single-session consistency

* fix: multi-session freshness check (code env handles mixed sessions natively)

The code execution environment fetches each file by its own
{session_id, fileId} pair independently — no single-session
requirement. Removed the sessionIds.size === 1 guard.

Now checks ALL distinct sessions for freshness. If every session
is still active (23h window), returns cached references with per-file
session_ids preserved. If any session expired, falls through to
re-upload everything in a single batch.

* perf: parallelize session freshness checks via Promise.all

* fix: add optional chaining for session info retrieval in primeInvokedSkills

Updated the primeInvokedSkills function to use optional chaining for getSessionInfo and checkIfActive methods, ensuring safer access and preventing potential runtime errors when these methods are undefined.

* fix: address review findings #1-#9 + Codex P1/P2 + session probe

Critical:
- #1/Codex P1: Add codeApiKey loading to openai.js and responses.js
  loadTools configurable (was missing, file priming broken in 2/3 paths)
- Codex P1: Fix cached file name prefix in primeSkillFiles cache path
  (was sf.relativePath, now ${skill.name}/${sf.relativePath})

Major:
- Codex P2: Honor ephemeral skills toggle in agents endpoint
  (check ephemeralAgent?.skills !== false alongside admin capability)
- #4: Early size check using file.bytes from DB before streaming
  (prevents full-file buffer for oversized files)

Minor:
- #5: Replace Record<string, any> with Record<string, boolean | string>
- #6: Localize Pin/Unpin aria-labels with com_ui_pin/com_ui_unpin
- #8: Parallelize stream acquisition in primeSkillFiles via
  Promise.allSettled
- #9: Log warning for partial batch upload failures with filenames

Performance:
- Session probe optimization: getSessionInfo now hits per-object
  endpoint (GET /sessions/{sid}/objects/{fid}) instead of listing
  entire session (GET /files/{sid}?detail=summary). O(1) stat vs
  O(N) list + linear scan.

* refactor: extract shared skill wiring helper + add unit tests

DRY (#3):
- New skillDeps.js exports getSkillToolDeps() with all 9 skill-related
  deps (getSkillByName, listSkillFiles, getStrategyFunctions, etc.)
- Replaces 5 identical copy-paste blocks across initialize.js, openai.js,
  responses.js (streaming + non-streaming paths)
- One place to maintain when skill deps change

Tests (#2):
- 8 unit tests for extractInvokedSkillsFromPayload covering:
  string args, object args, missing skill tool_calls, non-assistant
  messages, malformed JSON, empty skillName, empty payload, dedup

* fix: remove @jest/globals import, use global jest env

* fix: resolve round 2 review findings R2-1 through R2-7

R2-1 (toggle semantics): openai.js + responses.js now check admin
  capability (AgentCapabilities.skills) alongside ephemeral toggle.
  Aligns with initialize.js.

R2-2 (swallowed error): primeInvokedSkills now logs
  updateSkillFileCodeEnvIds failures (was .catch(() => {}))

R2-4 (test cast): Record<string, string> → Record<string, unknown>

R2-5 (DRY regression): Extract enrichWithSkillConfigurable() into
  skillDeps.js. Replaces 4 identical loadAuthValues blocks.
  Each loadTools callback is now a one-liner. JSDoc added (R2-6).

R2-7 (sequential streams): primeInvokedSkills now uses
  Promise.allSettled for parallel stream acquisition.

* fix: require explicit skills toggle + treat partial cache as miss

- initialize.js: change ephemeralSkillsToggle !== false to === true
  (unset toggle no longer enables skills)
- primeSkillFiles cache: require ALL files to have codeEnvIdentifier
  before using cache (partial persistence = cache miss = re-upload)
- primeInvokedSkills cache: same check (allFilesWithIds.length must
  equal total file count)

* fix: pass entity_id=skillId on batch upload, eliminates per-user cache thrashing

primeSkillFiles now passes entity_id: skill._id.toString() to
batchUploadCodeEnvFiles. This scopes the code env session to the
skill, not the user. All users sharing a skill share the same
uploaded files — no more cache thrashing from overwriting each
other's codeEnvIdentifier.

The stored codeEnvIdentifier now includes ?entity_id= suffix so
freshness checks pass the entity_id through to the per-object
stat endpoint. Both primeSkillFiles and primeInvokedSkills
store consistent identifier formats.

* fix: pass entity_id on multi-skill batch upload, consistent identifier format

* Revert "fix: pass entity_id on multi-skill batch upload, consistent identifier format"

This reverts commit c85ce2161e.

* refactor: per-skill upload in primeInvokedSkills, eliminate multi-skill batch

Replace the monolithic multi-skill batch upload with per-skill
primeSkillFiles calls. Each skill gets its own session with
entity_id=skillId, ensuring:

- Correct session auth (entity_id matches on freshness checks)
- Per-skill freshness caching (only expired skills re-upload)
- Shared skill sessions work across users (same entity_id=skillId)
- Code env handles mixed session_ids natively

The big batch block (stream collection, single upload, identifier
mapping) is replaced by a simple loop over primeSkillFiles, which
already handles freshness caching, batch upload, and identifier
persistence per-skill.

* fix: resolve review findings #1,#3-5,#7,#9-11

Critical:
- #1: Strip ?entity_id= query string before splitting codeEnvIdentifier
  into session_id/fileId (was corrupting cached file IDs in 4 locations)

Major:
- #4: Parallelize per-skill primeSkillFiles via Promise.allSettled
- #5: Add logger.warn to all empty .catch(() => {}) on cache writes

Minor:
- #7: Add logger.debug to enrichWithSkillConfigurable catch block
- #9: Use error instanceof Error guard in batchUploadCodeEnvFiles
- #10: Move enrichWithSkillConfigurable to TypeScript in packages/api
  (skillConfigurable.ts), skillDeps.js wraps with loadAuthValues dep
- #11: Reduce MAX_BINARY_BYTES from 10MB to 5MB (~11.5MB peak with b64)

* fix: forward entity_id in session probe + always register bash tool

Codex P2 (entity_id in probe): getSessionInfo now preserves and
forwards query params (including entity_id) to the per-object stat
endpoint. Without this, identifiers stored as ...?entity_id=... would
fail auth checks because the entity_id scope was dropped.

Codex P2 (bash tool availability): Remove codeEnvAvailable gate from
injectSkillCatalog. Bash tool definition is now always registered when
skills are enabled. Actual tool instance creation still happens at
execution time in loadToolsForExecution (which loads per-user
credentials). This ensures users with per-user CODE_API_KEY get
bash without requiring a global env var at init time.

Removes codeEnvAvailable from InjectSkillCatalogParams,
InitializeAgentParams, and all three controller entry points.

* fix: add debug logging to primeInvokedSkills catch, rename export alias

* fix: stub bash tool when no key + remove PDF artifact path

Codex P1 (bash tool): When CODE_API_KEY is unavailable, create a stub
tool that returns "Code execution is not available. Use read_file
instead." This prevents "tool not found" errors from the model
repeatedly calling bash_tool in no-code-env deployments while still
registering the definition for per-user credential users.

Codex P2 (PDF artifacts): Remove PDF image_url artifact path. The
host artifact pipeline processes image_url via saveBase64Image which
fails for PDFs. PDFs now fall through to the generic binary handler
("Use bash to process"). TODO comment for future document artifact
support.

Also: isImageOrPdf → isImage in early size checks (PDFs are no
longer treated as artifact candidates).

* fix: remove dead PDF_MIME constant, hoist skillToolDeps, document session_id

- #7: Remove unused PDF_MIME constant (dead code after PDF artifact removal)
- #11: Hoist skillToolDeps to module-level constant (avoid per-call allocation)
- #6: Document that CodeSessionContext.session_id is a representative value;
  ToolNode uses per-file session_id from the files array

* fix: call toolEndCallback for skill/read_file artifacts + clear codeEnvIdentifier on re-upload

Codex P1 (toolEndCallback bypass): skill and read_file handler branches
returned early, bypassing the toolEndCallback that processes artifacts
(image attachments). Now calls toolEndCallback when the result has an
artifact, using the same metadata pattern as the normal tool.invoke path.

Codex P1 (stale identifiers): upsertSkillFile now $unset's
codeEnvIdentifier alongside content and isBinary when a file is
re-uploaded. Prevents the freshness cache from returning references
to old file content after a skill file is replaced.

* fix: add session_id comment at cached path, rename skillResult to handlerResult

* fix: return content_and_artifact from bash stub so result.content is populated

* fix: deterministic skill lookup, dedup warning, and multi-session freshness check

- getSkillByName: add sort({updatedAt:-1}) so name collisions resolve
  deterministically to the most recently updated skill
- injectSkillCatalog: warn when multiple accessible skills share a name
- primeSkillFiles: check ALL distinct sessions for freshness, not just
  the first file's session, preventing stale refs after partial bulkWrite

* refactor: update icon import in Skills component

- Replaced the Scroll icon with ScrollText in the Skills component for improved clarity and consistency in the UI.

* fix: SKILL.md cache parity, gate bash_tool on code env, fix read_file too-large message

- primeSkillFiles: filter SKILL.md from returned files array on fresh
  upload so cached and non-cached paths return identical file sets
  (SKILL.md is still on disk in the session for bash access)
- injectSkillCatalog: only register bash_tool when codeEnvAvailable is
  true; thread the flag from all three CJS callers via execute_code
  capability check
- handleReadFileCall: tell the model to invoke the skill first before
  suggesting /mnt/data paths for oversized files

* fix: use EnvVar constant, deduplicate auth lookup, validate batch upload, stream byte limit

- Replace hardcoded 'LIBRECHAT_CODE_API_KEY' with EnvVar.CODE_API_KEY
  in skillConfigurable.ts and skillFiles.ts
- Resolve code API key once at run start in initialize.js and pass to
  both primeInvokedSkills and enrichWithSkillConfigurable via optional
  preResolvedCodeApiKey param, eliminating redundant loadAuthValues calls
- Add response structure validation in batchUploadCodeEnvFiles before
  accessing session_id/files to surface unexpected responses early
- Add streaming byte counter in handleReadFileCall that aborts and
  destroys the stream when accumulated bytes exceed MAX_BINARY_BYTES,
  preventing full file buffering when DB metadata is inaccurate

* refactor: update icon import in ToolsDropdown component

- Replaced the Scroll icon with ScrollText in the ToolsDropdown component for improved clarity and consistency in the UI.

* fix: partial upload failure detection, EnvVar in initialize.js, declaration ordering

- primeSkillFiles: return null (failure) when batch upload partially
  succeeds — missing bundled files would cause runtime bash/read
  failures with missing paths in code env
- initialize.js: replace hardcoded 'LIBRECHAT_CODE_API_KEY' with
  EnvVar.CODE_API_KEY imported from @librechat/agents
- initialize.js: move enabledCapabilities, accessibleSkillIds, and
  codeApiKey declarations before the toolExecuteOptions closure that
  references them (eliminates reliance on temporal dead zone hoisting)
2026-04-25 04:02:00 -04:00
Danny Avila
8e866e6010
🗺️ fix: Resolve Custom-Endpoint Providers for Summarization (#12739)
* 🔧 fix: resolve custom-endpoint providers for summarization

When `summarization.provider` in `librechat.yaml` is set to a custom-endpoint
name (e.g. `Ollama`), the string was passed verbatim to the agents SDK, which
only knows a fixed set of provider names and threw
`Unsupported LLM provider: Ollama`.

Before shaping the summarization config for the SDK, resolve the provider
through `getProviderConfig`: custom-endpoint labels are remapped to the
underlying SDK provider (e.g. `openAI`) and the endpoint's baseURL/apiKey are
injected into `parameters` so the summarization model reaches the right
backend, even when summarization targets a different custom endpoint than the
main agent.

Unknown names and names that appear with no matching endpoint flow through
unchanged so the SDK can surface a clear error. User-provided credentials and
unresolved env-var references are skipped rather than forwarded, letting the
SDK's self-summarize path reuse the agent's own clientOptions.

Ref: LibreChat Discussion #12614

* address: widen unresolved-env-var guard, fix test naming

- Reject summarization overrides when the extracted baseURL/apiKey still
  contains any `${...}` placeholder, including prefix/suffix patterns like
  `https://${UNSET}.example.com` that `envVarRegex` (exact-match) missed.
- Rename the "case-insensitive" test to reflect that only `Ollama` is
  normalized via `normalizeEndpointName`; add coverage proving other
  custom-endpoint names match case-sensitively.

* address: use req.config in responses.js; forward full endpoint options

- `responses.js` relied on a module-level `appConfig` set via `setAppConfig`,
  which is never called anywhere. Use `req.config` directly so the
  summarization provider resolver actually runs on the responses route.
- Route the custom endpoint config through `getOpenAIConfig` so summarization
  inherits the same `headers`, `defaultQuery`, `addParams`/`dropParams`, and
  `customParams` transforms (Anthropic/Google/etc.) that `initializeCustom`
  applies for the main agent flow. Strip the stale `model`/`modelName`
  defaults so `summarization.model` still wins.

* address: skip overrides when summarization matches agent endpoint

When `summarization.provider` resolves to the same custom endpoint as the
main agent, rely on the SDK's self-summarize path (which reuses
`agentContext.clientOptions` unchanged) rather than injecting overrides.
Otherwise the shallow spread of `clientOverrides.configuration` would
replace the agent's request-resolved state (dynamic headers, proxy/fetch
options) with yaml-only config.

Only applies when summarization targets a *different* endpoint from the
agent; the yaml config is all we have in that case, so overrides still
flow through.

* address: preserve raw provider when overrides cannot be built

When summarization points at a different custom endpoint than the agent
and we can't resolve the endpoint's credentials (user_provided, or a
still-unresolved `${VAR}` after env extraction), remapping to `openAI`
without overrides would silently route summaries to the default OpenAI
client. Preserve the raw provider name so the SDK raises a clear
"Unsupported LLM provider" error (now also logged, via the agents SDK
defense-in-depth fix) instead of sending traffic to the wrong backend.

* address: resolve endpoint headers and forward PROXY to summarization

- Custom-endpoint `headers` now flow through `resolveHeaders` before
  reaching `getOpenAIConfig`, matching the main agent path. This ensures
  templated values like `\${PORTKEY_API_KEY}` or `{{LIBRECHAT_BODY_...}}`
  are substituted for summarization requests instead of being forwarded
  literally.
- `PROXY` env var is now passed into `getOpenAIConfig` so cross-endpoint
  summarization honors outbound proxy dispatchers configured for the rest
  of the deployment.

* address: user summarization parameters win over endpoint defaults

Flip the merge order so `summarization.parameters` from yaml override
`clientOverrides` defaults (which come from `getOpenAIConfig` and always
include `streaming: true` etc.). A user who sets `parameters.streaming:
false` in their config should still see non-streaming summarization for
providers that require it.

* address: review feedback (logging, dead code, DRY, types, deep-merge)

- Log error in the resolveSummarizationProvider catch-all so programming
  bugs in getProviderConfig/getOpenAIConfig/resolveHeaders surface in
  operator logs instead of falling through silently.
- Drop dead `setAppConfig`/`appConfig` infrastructure in responses.js and
  fix adjacent `allowedProviders` reference that also relied on the
  never-initialized module-level appConfig. Uses `req.config` directly.
- Import canonical `normalizeEndpointName` from librechat-data-provider
  instead of duplicating it locally.
- Replace `SummarizationClientOverrides = Record<string, unknown>` with
  an explicit interface covering the known fields.
- Deep-merge `configuration` when user-supplied `summarization.parameters.
  configuration` overlaps the resolved endpoint configuration, so user
  additions (e.g. `defaultQuery`) don't wipe out `baseURL`/`defaultHeaders`.
- Wrap `process.env` mutations in test in `try/finally` so a failed
  assertion doesn't leak env state into subsequent tests.
- Drop `as unknown as AppConfig` in test helper; fixture now matches the
  `AppConfig` shape directly using a `Partial<TEndpoint>` union.
- Trim JSDoc that restated the name it was attached to.

* address: review nits — import order, local test type, conflict test

- Move `import { logger }` up into the package value-imports section so
  it no longer sits between `import type` blocks.
- Replace `as unknown as SummarizationConfig['parameters']` in the
  deep-merge test with a named `TestSummarizationParameters` type and a
  single narrowing cast at the call site, making intent explicit.
- Add a test proving that user-supplied `configuration.baseURL` wins
  over the resolved endpoint baseURL, locking in the deep-merge's
  user-wins-on-conflict semantics that the previous suite only exercised
  additively.
2026-04-20 12:00:46 -04:00
Danny Avila
d2cbd551b7
🤝 fix: Load Handoff Agents for Agents API (#12740)
* 🤝 fix: load handoff sub-agents on OpenAI-compat endpoints (#12726)

Extracts the BFS discovery + ACL-gated initialization of handoff sub-agents
into a shared `discoverConnectedAgents` helper in `@librechat/api` and
wires it into the OpenAI-compatible `/v1/chat/completions` and Open
Responses `/v1/responses` controllers. These endpoints previously only
passed the primary agent config to `createRun` while keeping
`primaryConfig.edges` intact, which forced `MultiAgentGraph` into
multi-agent mode without loading the referenced sub-agents and caused
StateGraph to throw "Found edge ending at unknown node <id>".

The discovery helper also filters orphaned edges (deleted sub-agents or
those the caller lacks VIEW permission on), so API users see the same
graceful fallback the chat UI already had.

* 🧪 fix: use ServerRequest in discovery spec helpers

CI `tsc --noEmit -p packages/api/tsconfig.json` caught that the test
helpers typed `req` as `express.Request`, which is not assignable to
`DiscoverConnectedAgentsParams.req` (typed as `ServerRequest` whose
`user` is `IUser`). Local jest passed because ts-jest is transpile-only,
but the CI typecheck uses the full compiler.

* 🪲 fix: drop orphan edges on both endpoints, not just `to`

Addresses the P1 codex finding on #12740: `filterOrphanedEdges`
previously only removed edges whose `to` referenced a skipped agent.
Edges whose `from` was a skipped agent — the symmetric case in a
bidirectional graph like `A <-> B` where `B` is deleted or the user
lacks VIEW on it — leaked through to `createRun` and re-triggered
`Found edge ending at unknown node <id>` at StateGraph compile time.

The filter now drops an edge if either endpoint references a skipped
id, and the existing `to`-only test cases were updated to reflect the
stricter behavior. Adds a bidirectional-graph regression test in
`discovery.spec.ts`.

* 🔒 fix: enforce REMOTE_AGENT ACL on handoff sub-agents for API routes

Addresses the second P1 codex finding on #12740: the OpenAI-compat
`/v1/chat/completions` and Open Responses `/v1/responses` routes gate
the primary agent on `REMOTE_AGENT` (via `createCheckRemoteAgentAccess`),
but `discoverConnectedAgents` was checking handoff sub-agents against
the looser in-app `AGENT` resource type. That allowed a remote caller
who could reach the orchestrator but had only in-app visibility on a
sub-agent to invoke it via the API — bypassing the remote-sharing
boundary.

Adds an optional `resourceType` param to `discoverConnectedAgents`
(defaulting to `AGENT` for the chat UI path) and passes
`ResourceType.REMOTE_AGENT` from both API controllers so every
discovered sub-agent clears the same sharing boundary enforced at
route entry.

* 🧯 fix: enforce allowedProviders for discovered sub-agents

Addresses the third P1 codex finding on #12740: `discoverConnectedAgents`
forwarded the caller's `endpointOption` verbatim into `initializeAgent`,
but on the OpenAI-compat routes that option's `endpoint` is the primary
agent's provider (e.g. `openai`), not `agents`. `initializeAgent` only
enforces `allowedProviders` when `isAgentsEndpoint(endpointOption.endpoint)`
is true, so handoff sub-agents silently bypassed the provider allowlist
configured under `endpoints.agents.allowedProviders`.

Override `endpointOption.endpoint` to `EModelEndpoint.agents` for every
per-sub-agent init call. The primary agent still uses the caller's
endpointOption as before — this only affects the BFS-loaded handoff
targets. Regression test asserts the override.

* ✂️ fix: prune unreachable sub-agents after orphan-edge filtering

Addresses the fourth P1 codex finding on #12740: BFS eagerly initializes
every sub-agent referenced in the primary's edge scan, but once
`filterOrphanedEdges` drops edges whose endpoints were skipped, some of
those sub-agents end up disconnected from the primary. In an `A -> B ->
C` graph (edges stored directly on A) where B is skipped (missing or
no VIEW), both edges are filtered, but C was already loaded and would
still be passed to `createRun` — which flips into multi-agent mode on
`agents.length > 1` and turns C into an unintended parallel start node.

After filtering edges, compute the set of agent ids reachable from the
primary through the surviving edge set and prune `agentConfigs` to that
set. Two regression tests added: one for the pruning case, one that
confirms agents connected via surviving edges are still kept.

* 🔁 fix: don't seed initialize.js agentConfigs from the pre-pruning callback

Addresses the fifth P1 codex finding on #12740: `onAgentInitialized`
fires during BFS, BEFORE the helper prunes agents that become
disconnected once `filterOrphanedEdges` runs. Writing the sub-agent
straight into the outer `agentConfigs` there and then only additively
merging the pruned `discoveredConfigs` left stranded entries in the
outer map, and `AgentClient` would still hand them to `createRun` as
extra parallel start nodes (the exact failure mode the pass-4 prune
was meant to eliminate for the API controllers).

Drop the `agentConfigs.set` from the callback and replace the additive
merge with a direct copy from `discoveredConfigs`, which is now the
single authoritative source of what the run should see. The
per-agent tool context map is still populated during BFS — stale
entries there are harmless because they're only read by closure inside
`ON_TOOL_EXECUTE` and are unreachable once the agent is not in
`agentConfigs`.

* 🔬 fix: address audit findings on discovery helper

Resolves findings from a comprehensive external audit of #12740.

**Finding 1 (CRITICAL) — stale edges survive the reachability prune.**
The pass-4 prune removed unreachable agents from `agentConfigs` but left
matching edges in the return value. In an `A -> B -> C -> D` graph (all
edges stored on A) where B is skipped, `filterOrphanedEdges` drops A->B
and B->C but keeps C->D (neither endpoint is skipped). The caller then
sees `agentConfigs` without C/D but `edges` still references them,
flipping `createRun` into multi-agent mode with mismatched agents/edges
— the exact crash this PR is supposed to fix. Now filter the edge list
to the reachable set in the same pass, so the returned shape is
self-consistent: every edge endpoint is either the primary id or a key
of `agentConfigs`. New regression test covers A->B->C->D with B skipped.

**Finding 2 (MAJOR) — unconditional `getModelsConfig` on every API
request.** The OpenAI-compat and Responses controllers called
`getModelsConfig(req)` and `discoverConnectedAgents` even when the
primary agent had no edges (the common single-agent API case). Gate
both behind `primaryConfig.edges?.length > 0` so single-agent runs
don't pay that cost.

**Finding 5 (MINOR) — silent mutation of caller's
`primaryConfig.userMCPAuthMap`.** The helper aliased that object and
then `Object.assign`'d sub-agent entries into it, changing the caller's
config in-place. Shallow-clone up front so the returned merged map is
the only destination.

**Finding 7 (NIT) — dead `?? []` coalescing.**
`filterOrphanedEdges` always returns a concrete array, so the
`discoveredEdges ?? []` fallback was never reached. Simplified the
`primaryConfig.edges = …` assignment.

Also adds a test that verifies `primaryConfig.userMCPAuthMap` is not
mutated in-place.

* 🧹 chore: address audit NITs on discovery helper

Addresses two NIT findings from the post-fix audit:

**F1** — the shallow clone on `primaryConfig.userMCPAuthMap` was only
applied on the primary side; the `else` branch (hit when the primary
had no MCP auth and the first sub-agent seeds the map) assigned the
sub-agent's `config.userMCPAuthMap` directly, so a later sub-agent's
`Object.assign` mutated the first one's map in place. Harmless in
practice (per-request ephemeral objects) but asymmetric. Clone in the
else branch too. Test added.

**F2** — `initialize.js` had a defensive `if (agentConfigs.size > 0 &&
!edges) edges = []` normalizer. Pre-existing dead code: the helper now
always returns a concrete array from `filteredEdges.filter(...)`.
Removed for clarity.

* 🕸 fix: require all sources reachable when traversing fan-in edges

Addresses the seventh P1 codex finding on #12740: the reachability BFS
advanced through an edge as soon as any of its `from` endpoints matched
the current frontier node (`sources.includes(current)`), but the
subsequent edge filter required ALL sources to be reachable (`every`).
The two-semantics mismatch let a fan-in edge like `{from: ['A','B'],
to: 'C'}` mark C reachable purely via A even when B had no path from
the primary, then drop the edge itself at filter time. Result: C
survived in `agentConfigs` with no surviving edge connecting it to A,
so `createRun` flipped into multi-agent mode on `agents.length > 1`
and C ran as an unintended parallel root.

Replace the BFS with a fixed-point iteration keyed on the same
all-sources-reachable predicate used by the filter, so traversal and
filtering stay aligned and multi-source edges only fire once every
source is in the reachable set.

Two regression tests added:
- `{from: ['A','B'], to: 'C'}` with B having no incoming path — asserts
  neither B nor C leak into the result.
- `A -> B`, `A -> C`, `['B','C'] -> D` — asserts the fan-in edge fires
  and D becomes reachable once both B and C are.

* 🔀 fix: match SDK OR semantics for multi-source edge reachability

Reverts the all-sources-required reachability gate from 4982f1c3b and
replaces it with an any-source-reachable model, which matches how
`@librechat/agents`'s `MultiAgentGraph.createWorkflow` actually wires
multi-source edges at runtime (per-source `builder.addEdge(source,
destination)`). With the previous `every` gate, a legitimate handoff
edge `{ from: ['A', 'B'], to: 'C' }` where B had no incoming path was
pruned along with C, regressing OR-semantics routing that the SDK
would otherwise handle correctly.

New behavior:

1. Reachability: an edge advances when ANY of its `from` endpoints is
   already reachable. Fixed-point iteration over `filteredEdges`.
2. Edge filter: keep an edge when it has at least one reachable source
   AND all destinations are reachable (a missing destination would
   still crash `StateGraph.compile` with `Found edge ending at unknown
   node`).
3. Agent prune: keep agents that are reachable OR referenced on any
   endpoint of a surviving edge. The second clause preserves co-sources
   in multi-source edges (B in `{ from: ['A','B'], to: 'C' }` when
   nothing else reaches B) so the SDK's per-source `addEdge` — and the
   `validateEdgeAgents` safety-net I added to the SDK in #111 — still
   finds B as a node.

The pass-audit A->B->C->D regression test continues to pass: with B
skipped, `filterOrphanedEdges` drops both B-adjacent edges, reachability
never expands past A, C->D has no reachable source so it gets filtered,
and C/D are pruned because they're neither reachable nor referenced.

* ✂️ fix: strip skipped co-members from multi-source/multi-dest edges

Addresses codex pass-9 P2 on #12740. `filterOrphanedEdges` previously
dropped an edge whenever any `from` id was skipped, which was correct
for scalar edges but over-aggressive for multi-source ones: the agents
SDK adds one `builder.addEdge(source, destination)` per source, so
`{ from: ['A','B'], to: 'C' }` with B skipped still has a valid
`A -> C` route that was being thrown away.

Now sanitize each endpoint:
- Scalar skipped → drop the whole edge (no route survives).
- Array with some skipped → strip the skipped ids, keep the edge with
  the surviving members. If the array empties out, drop the edge.

Symmetric handling for `to` covers multi-destination fan-out when one
co-destination is skipped. Tests updated/added:
- `strips skipped co-sources from multi-source edges…`
- `strips skipped co-destinations from multi-destination edges`
- `drops multi-member edges only when every member on a side is skipped`
- Discovery-side: `preserves valid routes when one co-source of a
  multi-source edge is skipped` asserts the end-to-end behavior —
  skipped co-source B gets stripped from the edge, A->C routing
  survives, and C remains in `agentConfigs`.

* 🔓 fix: respect SHARE-on-AGENT fallback for handoff ACL on API routes

Addresses codex pass-10 P1 on #12740. The API controllers were handing
`discoverConnectedAgents` a raw `PermissionService.checkPermission` call
against `ResourceType.REMOTE_AGENT`, but the route-level middleware
(`createCheckRemoteAgentAccess`) authorizes the primary agent via
`getRemoteAgentPermissions`, which first consults the AGENT ACL and
treats owners with the SHARE bit as remotely authorized even without
an explicit REMOTE_AGENT grant. The mismatch meant a user could open
the primary via `/v1/chat/completions` or `/v1/responses`, but their
own owned handoff sub-agents were silently skipped — breaking
multi-agent handoffs for the common "owner runs their own multi-agent
orchestrator" case.

Both controllers now pass `discoverConnectedAgents` a `checkPermission`
wrapper that delegates to `getRemoteAgentPermissions` (with
`getEffectivePermissions` injected from `PermissionService`) and
compares the returned bitmask against the required permission via
`hasPermissions`. Sub-agents are now authorized by the exact same
rules the route middleware applies to the primary.

* 🌱 fix: preserve user-defined parallel-start branches

Addresses codex pass-11 P2 on #12740. The post-filter reachability
prune seeded only from `primaryConfig.id`, which killed
`MultiAgentGraph`'s legitimate multi-start pattern — a user-defined
edge like `X -> Y` where X has no incoming path (X is an intentional
parallel starting node, run alongside the primary) was being dropped
because neither X nor Y was reachable from the primary.

Reconcile the tension with pass-4 ("prune accidental orphans when an
intermediate is skipped") by using pre-filter reachability as the
signal:

- An agent that WAS reachable from the primary via the original
  (pre-filter) edges but loses that path when `filterOrphanedEdges`
  runs is an accidental orphan (a skipped hop broke the chain) — prune.
- An agent that was NEVER reachable from the primary, even pre-filter,
  is an intentional parallel start — seed it into post-filter
  reachability so its component survives.

Surviving-edge endpoint references still keep an agent (co-sources in
multi-source edges). New test `preserves user-defined parallel-start
branches disconnected from the primary` covers the pass-11 scenario;
the existing `A->B->C->D, B skipped` regression test continues to
pass because C/D were pre-filter reachable through B and lose that
reachability after filtering.

* 🎯 fix: tighten parallel-start seed criterion to 'no pre-filter incoming edge'

Addresses codex pass-12 P1 on #12740. The pass-11 seed heuristic — 'agent
is in `agentConfigs` but was not pre-filter reachable from the primary' —
was too permissive. A downstream agent like Y in `X -> Y` where X gets
skipped (missing / no VIEW) was never pre-filter reachable from the
primary either, so the old rule promoted Y to a parallel start node and
discovery returned `agents: [primary, Y]` with no connecting edge. The
SDK then ran Y as an unintended parallel root — exactly the orphan
behavior pass-4 wanted to prevent.

Tighter criterion: seed a post-filter reachability root only when the
agent had NO incoming edge in the pre-filter graph. That matches
`MultiAgentGraph.analyzeGraph`'s "no-incoming-edge" definition of a
start node applied to the user's original declared topology, so:

- `A -> B` plus a user-defined `X -> Y` parallel branch: X has no
  incoming pre-filter → seeded → X and Y both survive.
- `A -> B` plus `X -> Y` with X skipped: Y had an incoming pre-filter
  (`X -> Y`) → NOT seeded → Y is pruned as the orphan it is.
- `A -> B -> C` with B skipped: C had an incoming pre-filter (`B -> C`)
  → NOT seeded → C is pruned.

New test `does not promote a downstream orphan to a parallel start when
its only upstream is skipped` locks in the pass-12 scenario. The pass-11
`preserves user-defined parallel-start branches` test continues to hold.

* 📁 fix: don't enforce AGENT-only file ACL on REMOTE_AGENT API callers

Addresses codex pass-13 P1 on #12740. When I refactored the API
controllers' DB-method bundle, I inadvertently started forwarding
`filterFilesByAgentAccess` into `initializeAgent`. That helper calls
`checkPermission` with `resourceType: ResourceType.AGENT`, but these
routes authorize callers through `REMOTE_AGENT` (via
`getRemoteAgentPermissions`). A user granted `REMOTE_AGENT_VIEWER` on
a shared agent but lacking direct `AGENT_VIEW` could invoke the agent
yet all its owner-attached context files would get silently filtered
out — breaking `file_search`/context retrieval for remote consumers.

Drop `filterFilesByAgentAccess` from the OpenAI-compat and Responses
controllers' `dbMethods` (and remove the now-unused import). The chat
UI's `initialize.js` keeps it since that path legitimately authorizes
at the AGENT level. No functional change inside the helper — passing
`undefined` simply tells `primeResources` to skip the per-file ACL
filter, restoring the pre-refactor API behavior.

* 🪓 fix: strip unreachable co-sources from surviving multi-source edges

Addresses codex pass-14 P1 on #12740. The earlier pass-8 fix kept any
agent referenced as an endpoint of a surviving edge (via a
`referencedByEdge` fallback) to avoid the SDK's `validateEdgeAgents`
failing on missing nodes. But that fallback propped up unreachable
co-sources too: with `[A -> C, X -> B, [B,C] -> D]` and X skipped,
`X -> B` gets filtered, the `[B,C] -> D` fan-in survives because C is
reachable, and B stays in `agentConfigs` solely because the fan-in
still lists it. `MultiAgentGraph.analyzeGraph` then sees B with no
incoming edge and runs it as an unintended parallel root.

Sanitize surviving edges instead: for a kept edge whose `from` is an
array, filter out any co-source that isn't reachable. The SDK's
per-source `addEdge` fires independently, so dropping an unreachable
co-source doesn't invalidate the remaining routes — in the scenario
above `[B,C] -> D` becomes `[C] -> D`, every endpoint of every
surviving edge is now reachable, and the agent prune collapses to a
strict `reachable.has(agentId)` check. No more referenced-by-edge
fallback.

Regression test added: `strips unreachable co-sources from surviving
multi-source edges (no stray parallel root)` — asserts B is absent
from every surviving edge endpoint and the fan-in's `from` is just
`['C']`. All 22 prior discovery tests still pass unchanged.
2026-04-20 02:20:43 -04:00
Danny Avila
b5c097e5c7
⚗️ feat: Agent Context Compaction/Summarization (#12287)
* chore: imports/types

Add summarization config and package-level summarize handler contracts

Register summarize handlers across server controller paths

Port cursor dual-read/dual-write summary support and UI status handling

Selectively merge cursor branch files for BaseClient summary content
block detection (last-summary-wins), dual-write persistence, summary
block unit tests, and on_summarize_status SSE event handling with
started/completed/failed branches.

Co-authored-by: Cursor <cursoragent@cursor.com>

refactor: type safety

feat: add localization for summarization status messages

refactor: optimize summary block detection in BaseClient

Updated the logic for identifying existing summary content blocks to use a reverse loop for improved efficiency. Added a new test case to ensure the last summary content block is updated correctly when multiple summary blocks exist.

chore: add runName to chainOptions in AgentClient

refactor: streamline summarization configuration and handler integration

Removed the deprecated summarizeNotConfigured function and replaced it with a more flexible createSummarizeFn. Updated the summarization handler setup across various controllers to utilize the new function, enhancing error handling and configuration resolution. Improved overall code clarity and maintainability by consolidating summarization logic.

feat(summarization): add staged chunk-and-merge fallback

feat(usage): track summarization usage separately from messages

feat(summarization): resolve prompt from config in runtime

fix(endpoints): use @librechat/api provider config loader

refactor(agents): import getProviderConfig from @librechat/api

chore: code order

feat(app-config): auto-enable summarization when configured

feat: summarization config

refactor(summarization): streamline persist summary handling and enhance configuration validation

Removed the deprecated createDeferredPersistSummary function and integrated a new createPersistSummary function for MongoDB persistence. Updated summarization handlers across various controllers to utilize the new persistence method. Enhanced validation for summarization configuration to ensure provider, model, and prompt are properly set, improving error handling and overall robustness.

refactor(summarization): update event handling and remove legacy summarize handlers

Replaced the deprecated summarization handlers with new event-driven handlers for summarization start and completion across multiple controllers. This change enhances the clarity of the summarization process and improves the integration of summarization events in the application. Additionally, removed unused summarization functions and streamlined the configuration loading process.

refactor(summarization): standardize event names in handlers

Updated event names in the summarization handlers to use constants from GraphEvents for consistency and clarity. This change improves maintainability and reduces the risk of errors related to string literals in event handling.

feat(summarization): enhance usage tracking for summarization events

Added logic to track summarization usage in multiple controllers by checking the current node type. If the node indicates a summarization task, the usage type is set accordingly. This change improves the granularity of usage data collected during summarization processes.

feat(summarization): integrate SummarizationConfig into AppSummarizationConfig type

Enhanced the AppSummarizationConfig type by extending it with the SummarizationConfig type from librechat-data-provider. This change improves type safety and consistency in the summarization configuration structure.

test: add end-to-end tests for summarization functionality

Introduced a comprehensive suite of end-to-end tests for the summarization feature, covering the full LibreChat pipeline from message creation to summarization. This includes a new setup file for environment configuration and a Jest configuration specifically for E2E tests. The tests utilize real API keys and ensure proper integration with the summarization process, enhancing overall test coverage and reliability.

refactor(summarization): include initial summary in formatAgentMessages output

Updated the formatAgentMessages function to return an initial summary alongside messages and index token count map. This change is reflected in multiple controllers and the corresponding tests, enhancing the summarization process by providing additional context for each agent's response.

refactor: move hydrateMissingIndexTokenCounts to tokenMap utility

Extracted the hydrateMissingIndexTokenCounts function from the AgentClient and related tests into a new tokenMap utility file. This change improves code organization and reusability, allowing for better management of token counting logic across the application.

refactor(summarization): standardize step event handling and improve summary rendering

Refactored the step event handling in the useStepHandler and related components to utilize constants for event names, enhancing consistency and maintainability. Additionally, improved the rendering logic in the Summary component to conditionally display the summary text based on its availability, providing a better user experience during the summarization process.

feat(summarization): introduce baseContextTokens and reserveTokensRatio for improved context management

Added baseContextTokens to the InitializedAgent type to calculate the context budget based on agentMaxContextNum and maxOutputTokensNum. Implemented reserveTokensRatio in the createRun function to allow configurable context token management. Updated related tests to validate these changes and ensure proper functionality.

feat(summarization): add minReserveTokens, context pruning, and overflow recovery configurations

Introduced new configuration options for summarization, including minReserveTokens, context pruning settings, and overflow recovery parameters. Updated the createRun function to accommodate these new options and added a comprehensive test suite to validate their functionality and integration within the summarization process.

feat(summarization): add updatePrompt and reserveTokensRatio to summarization configuration

Introduced an updatePrompt field for updating existing summaries with new messages, enhancing the flexibility of the summarization process. Additionally, added reserveTokensRatio to the configuration schema, allowing for improved management of token allocation during summarization. Updated related tests to validate these new features.

feat(logging): add on_agent_log event handler for structured logging

Implemented an on_agent_log event handler in both the agents' callbacks and responses to facilitate structured logging of agent activities. This enhancement allows for better tracking and debugging of agent interactions by logging messages with associated metadata. Updated the summarization process to ensure proper handling of log events.

fix: remove duplicate IBalanceUpdate interface declaration

perf(usage): single-pass partition of collectedUsage

Replace two Array.filter() passes with a single for-of loop that
partitions message vs. summarization usages in one iteration.

fix(BaseClient): shallow-copy message content before mutating and preserve string content

Avoid mutating the original message.content array in-place when
appending a summary block. Also convert string content to a text
content part instead of silently discarding it.

fix(ui): fix Part.tsx indentation and useStepHandler summarize-complete handling

- Fix SUMMARY else-if branch indentation in Part.tsx to match chain level
- Guard ON_SUMMARIZE_COMPLETE with didFinalize flag to avoid unnecessary
  re-renders when no summarizing parts exist
- Protect against undefined completeData.summary instead of unsafe spread

fix(agents): use strict enabled check for summarization handlers

Change summarizationConfig?.enabled !== false to === true so handlers
are not registered when summarizationConfig is undefined.

chore: fix initializeClient JSDoc and move DEFAULT_RESERVE_RATIO to module scope

refactor(Summary): align collapse/expand behavior with Reasoning component

- Single render path instead of separate streaming vs completed branches
- Use useMessageContext for isSubmitting/isLatestMessage awareness so
  the "Summarizing..." label only shows during active streaming
- Default to collapsed (matching Reasoning), user toggles to expand
- Add proper aria attributes (aria-hidden, role, aria-controls, contentId)
- Hide copy button while actively streaming

feat(summarization): default to self-summarize using agent's own provider/model

When no summarization config is provided (neither in librechat.yaml nor
on the agent), automatically enable summarization using the agent's own
provider and model. The agents package already provides default prompts,
so no prompt configuration is needed.

Also removes the dead resolveSummarizationLLMConfig in summarize.ts
(and its spec) — run.ts buildAgentContext is the single source of truth
for summarization config resolution. Removes the duplicate
RuntimeSummarizationConfig local type in favor of the canonical
SummarizationConfig from data-provider.

chore: schema and type cleanup for summarization

- Add trigger field to summarizationAgentOverrideSchema so per-agent
  trigger overrides in librechat.yaml are not silently stripped by Zod
- Remove unused SummarizationStatus type from runs.ts
- Make AppSummarizationConfig.enabled non-optional to reflect the
  invariant that loadSummarizationConfig always sets it

refactor(responses): extract duplicated on_agent_log handler

refactor(run): use agents package types for summarization config

Import SummarizationConfig, ContextPruningConfig, and
OverflowRecoveryConfig from @librechat/agents and use them to
type-check the translation layer in buildAgentContext. This ensures
the config object passed to the agent graph matches what it expects.

- Use `satisfies AgentSummarizationConfig` on the config object
- Cast contextPruningConfig and overflowRecoveryConfig to agents types
- Properly narrow trigger fields from DeepPartial to required shape

feat(config): add maxToolResultChars to base endpoint schema

Add maxToolResultChars to baseEndpointSchema so it can be configured
on any endpoint in librechat.yaml. Resolved during agent initialization
using getProviderConfig's endpoint resolution: custom endpoint config
takes precedence, then the provider-specific endpoint config, then the
shared `all` config.

Passed through to the agents package ToolNode, which uses it to cap
tool result length before it enters the context window. When not
configured, the agents package computes a sensible default from
maxContextTokens.

fix(summarization): forward agent model_parameters in self-summarize default

When no explicit summarization config exists, the self-summarize
default now forwards the agent's model_parameters as the
summarization parameters. This ensures provider-specific settings
(e.g. Bedrock region, credentials, endpoint host) are available
when the agents package constructs the summarization LLM.

fix(agents): register summarization handlers by default

Change the enabled gate from === true to !== false so handlers
register when no explicit summarization config exists. This aligns
with the self-summarize default where summarization is always on
unless explicitly disabled via enabled: false.

refactor(summarization): let agents package inherit clientOptions for self-summarize

Remove model_parameters forwarding from the self-summarize default.
The agents package now reuses the agent's own clientOptions when the
summarization provider matches the agent's provider, inheriting all
provider-specific settings (region, credentials, proxy, etc.)
automatically.

refactor(summarization): use MessageContentComplex[] for summary content

Unify summary content to always use MessageContentComplex[] arrays,
matching the pattern used by on_message_delta. No more string | array
unions — content is always an array of typed blocks ({ type: 'text',
text: '...' } for text, { type: 'reasoning_content', ... } for
reasoning).

Agents package:
- SummaryContentBlock.content: MessageContentComplex[] (was string)
- tokenCount now optional (not sent on deltas)
- Removed reasoning field — reasoning is now a content block type
- streamAndCollect normalizes all chunks to content block arrays
- Delta events pass content blocks directly

LibreChat:
- SummaryContentPart.content: Agents.MessageContentComplex[]
- Updated Part.tsx, Summary.tsx, useStepHandler.ts, BaseClient.js
- Summary.tsx derives display text from content blocks via useMemo
- Aggregator uses simple array spread

refactor(summarization): enhance summary handling and text extraction

- Updated BaseClient.js to improve summary text extraction, accommodating both legacy and new content formats.
- Modified summarization logic to ensure consistent handling of summary content across different message formats.
- Adjusted test cases in summarization.e2e.spec.js to utilize the new summary text extraction method.
- Refined SSE useStepHandler to initialize summary content as an array.
- Updated configuration schema by removing unused minReserveTokens field.
- Cleaned up SummaryContentPart type by removing rangeHash property.

These changes streamline the summarization process and ensure compatibility with various content structures.

refactor(summarization): streamline usage tracking and logging

- Removed direct checks for summarization nodes in ModelEndHandler and replaced them with a dedicated markSummarizationUsage function for better readability and maintainability.
- Updated OpenAIChatCompletionController and responses handlers to utilize the new markSummarizationUsage function for setting usage types.
- Enhanced logging functionality by ensuring the logger correctly handles different log levels.
- Introduced a new useCopyToClipboard hook in the Summary component to encapsulate clipboard copy logic, improving code reusability and clarity.

These changes improve the overall structure and efficiency of the summarization handling and logging processes.

refactor(summarization): update summary content block documentation

- Removed outdated comment regarding the last summary content block in BaseClient.js.
- Added a new comment to clarify the purpose of the findSummaryContentBlock method, ensuring consistency in documentation.

These changes enhance code clarity and maintainability by providing accurate descriptions of the summarization logic.

refactor(summarization): update summary content structure in tests

- Modified the summarization content structure in e2e tests to use an array format for text, aligning with recent changes in summary handling.
- Updated test descriptions to clarify the behavior of context token calculations, ensuring consistency and clarity in the tests.

These changes enhance the accuracy and maintainability of the summarization tests by reflecting the updated content structure.

refactor(summarization): remove legacy E2E test setup and configuration

- Deleted the e2e-setup.js and jest.e2e.config.js files, which contained legacy configurations for E2E tests using real API keys.
- Introduced a new summarization.e2e.ts file that implements comprehensive E2E backend integration tests for the summarization process, utilizing real AI providers and tracking summaries throughout the run.

These changes streamline the testing framework by consolidating E2E tests into a single, more robust file while removing outdated configurations.

refactor(summarization): enhance E2E tests and error handling

- Added a cleanup step to force exit after all tests to manage Redis connections.
- Updated the summarization model to 'claude-haiku-4-5-20251001' for consistency across tests.
- Improved error handling in the processStream function to capture and return processing errors.
- Enhanced logging for cross-run tests and tight context scenarios to provide better insights into test execution.

These changes improve the reliability and clarity of the E2E tests for the summarization process.

refactor(summarization): enhance test coverage for maxContextTokens behavior

- Updated run-summarization.test.ts to include a new test case ensuring that maxContextTokens does not exceed user-defined limits, even when calculated ratios suggest otherwise.
- Modified summarization.e2e.ts to replace legacy UsageMetadata type with a more appropriate type for collectedUsage, improving type safety and clarity in the test setup.

These changes improve the robustness of the summarization tests by validating context token constraints and refining type definitions.

feat(summarization): add comprehensive E2E tests for summarization process

- Introduced a new summarization.e2e.test.ts file that implements extensive end-to-end integration tests for the summarization pipeline, covering the full flow from LibreChat to agents.
- The tests utilize real AI providers and include functionality to track summaries during and between runs.
- Added necessary cleanup steps to manage Redis connections post-tests and ensure proper exit.

These changes enhance the testing framework by providing robust coverage for the summarization process, ensuring reliability and performance under real-world conditions.

fix(service): import logger from winston configuration

- Removed the import statement for logger from '@librechat/data-schemas' and replaced it with an import from '~/config/winston'.
- This change ensures that the logger is correctly sourced from the updated configuration, improving consistency in logging practices across the application.

refactor(summary): simplify Summary component and enhance token display

- Removed the unused `meta` prop from the `SummaryButton` component to streamline its interface.
- Updated the token display logic to use a localized string for better internationalization support.
- Adjusted the rendering of the `meta` information to improve its visibility within the `Summary` component.

These changes enhance the clarity and usability of the Summary component while ensuring better localization practices.

feat(summarization): add maxInputTokens configuration for summarization

- Introduced a new `maxInputTokens` property in the summarization configuration schema to control the amount of conversation context sent to the summarizer, with a default value of 10000.
- Updated the `createRun` function to utilize the new `maxInputTokens` setting, allowing for more flexible summarization based on agent context.

These changes enhance the summarization capabilities by providing better control over input token limits, improving the overall summarization process.

refactor(summarization): simplify maxInputTokens logic in createRun function

- Updated the logic for the `maxInputTokens` property in the `createRun` function to directly use the agent's base context tokens when the resolved summarization configuration does not specify a value.
- This change streamlines the configuration process and enhances clarity in how input token limits are determined for summarization.

These modifications improve the maintainability of the summarization configuration by reducing complexity in the token calculation logic.

feat(summary): enhance Summary component to display meta information

- Updated the SummaryContent component to accept an optional `meta` prop, allowing for additional contextual information to be displayed above the main content.
- Adjusted the rendering logic in the Summary component to utilize the new `meta` prop, improving the visibility of supplementary details.

These changes enhance the user experience by providing more context within the Summary component, making it clearer and more informative.

refactor(summarization): standardize reserveRatio configuration in summarization logic

- Replaced instances of `reserveTokensRatio` with `reserveRatio` in the `createRun` function and related tests to unify the terminology across the codebase.
- Updated the summarization configuration schema to reflect this change, ensuring consistency in how the reserve ratio is defined and utilized.
- Removed the per-agent override logic for summarization configuration, simplifying the overall structure and enhancing clarity.

These modifications improve the maintainability and readability of the summarization logic by standardizing the configuration parameters.

* fix: circular dependency of `~/models`

* chore: update logging scope in agent log handlers

Changed log scope from `[agentus:${data.scope}]` to `[agents:${data.scope}]` in both the callbacks and responses controllers to ensure consistent logging format across the application.

* feat: calibration ratio

* refactor(tests): update summarizationConfig tests to reflect changes in enabled property

Modified tests to check for the new `summarizationEnabled` property instead of the deprecated `enabled` field in the summarization configuration. This change ensures that the tests accurately validate the current configuration structure and behavior of the agents.

* feat(tests): add markSummarizationUsage mock for improved test coverage

Introduced a mock for the markSummarizationUsage function in the responses unit tests to enhance the testing of summarization usage tracking. This addition supports better validation of summarization-related functionalities and ensures comprehensive test coverage for the agents' response handling.

* refactor(tests): simplify event handler setup in createResponse tests

Removed redundant mock implementations for event handlers in the createResponse unit tests, streamlining the setup process. This change enhances test clarity and maintainability while ensuring that the tests continue to validate the correct behavior of usage tracking during on_chat_model_end events.

* refactor(agents): move calibration ratio capture to finally block

Reorganized the logic for capturing the calibration ratio in the AgentClient class to ensure it is executed in the finally block. This change guarantees that the ratio is captured even if the run is aborted, enhancing the reliability of the response message persistence. Removed redundant code and improved clarity in the handling of context metadata.

* refactor(agents): streamline bulk write logic in recordCollectedUsage function

Removed redundant bulk write operations and consolidated document handling in the recordCollectedUsage function. The logic now combines all documents into a single bulk write operation, improving efficiency and reducing error handling complexity. Updated logging to provide consistent error messages for bulk write failures.

* refactor(agents): enhance summarization configuration resolution in createRun function

Streamlined the summarization configuration logic by introducing a base configuration and allowing for overrides from agent-specific settings. This change improves clarity and maintainability, ensuring that the summarization configuration is consistently applied while retaining flexibility for customization. Updated the handling of summarization parameters to ensure proper integration with the agent's model and provider settings.

* refactor(agents): remove unused tokenCountMap and streamline calibration ratio handling

Eliminated the unused tokenCountMap variable from the AgentClient class to enhance code clarity. Additionally, streamlined the logic for capturing the calibration ratio by using optional chaining and a fallback value, ensuring that context metadata is consistently defined. This change improves maintainability and reduces potential confusion in the codebase.

* refactor(agents): extract agent log handler for improved clarity and reusability

Refactored the agent log handling logic by extracting it into a dedicated function, `agentLogHandler`, enhancing code clarity and reusability across different modules. Updated the event handlers in both the OpenAI and responses controllers to utilize the new handler, ensuring consistent logging behavior throughout the application.

* test: add summarization event tests for useStepHandler

Implemented a series of tests for the summarization events in the useStepHandler hook. The tests cover scenarios for ON_SUMMARIZE_START, ON_SUMMARIZE_DELTA, and ON_SUMMARIZE_COMPLETE events, ensuring proper handling of summarization logic, including message accumulation and finalization. This addition enhances test coverage and validates the correct behavior of the summarization process within the application.

* refactor(config): update summarizationTriggerSchema to use enum for type validation

Changed the type of the `type` field in the summarizationTriggerSchema from a string to an enum with a single value 'token_count'. This modification enhances type safety and ensures that only valid types are accepted in the configuration, improving overall clarity and maintainability of the schema.

* test(usage): add bulk write tests for message and summarization usage

Implemented tests for the bulk write functionality in the recordCollectedUsage function, covering scenarios for combined message and summarization usage, summarization-only usage, and message-only usage. These tests ensure correct document handling and token rollup calculations, enhancing test coverage and validating the behavior of the usage tracking logic.

* refactor(Chat): enhance clipboard copy functionality and type definitions in Summary component

Updated the Summary component to improve the clipboard copy functionality by handling clipboard permission errors. Refactored type definitions for SummaryProps to use a more specific type, enhancing type safety. Adjusted the SummaryButton and FloatingSummaryBar components to accept isCopied and onCopy props, promoting better separation of concerns and reusability.

* chore(translations): remove unused "Expand Summary" key from English translations

Deleted the "Expand Summary" key from the English translation file to streamline the localization resources and improve clarity in the user interface. This change helps maintain an organized and efficient translation structure.

* refactor: adjust token counting for Claude model to account for API discrepancies

Implemented a correction factor for token counting when using the Claude model, addressing discrepancies between Anthropic's API and local tokenizer results. This change ensures accurate token counts by applying a scaling factor, improving the reliability of token-related functionalities.

* refactor(agents): implement token count adjustment for Claude model messages

Added a method to adjust token counts for messages processed by the Claude model, applying a correction factor to align with API expectations. This enhancement improves the accuracy of token counting, ensuring reliable functionality when interacting with the Claude model.

* refactor(agents): token counting for media content in messages

Introduced a new method to estimate token costs for image and document blocks in messages, improving the accuracy of token counting. This enhancement ensures that media content is properly accounted for, particularly for the Claude model, by integrating additional token estimation logic for various content types. Updated the token counting function to utilize this new method, enhancing overall reliability and functionality.

* chore: fix missing import

* fix(agents): clamp baseContextTokens and document reserve ratio change

Prevent negative baseContextTokens when maxOutputTokens exceeds the
context window (misconfigured models). Document the 10%→5% default
reserve ratio reduction introduced alongside summarization.

* fix(agents): include media tokens in hydrated token counts

Add estimateMediaTokensForMessage to createTokenCounter so the hydration
path (used by hydrateMissingIndexTokenCounts) matches the precomputed
path in AgentClient.getTokenCountForMessage. Without this, messages
containing images or documents were systematically undercounted during
hydration, risking context window overflow.

Add 34 unit tests covering all block-type branches of
estimateMediaTokensForMessage.

* fix(agents): include summarization output tokens in usage return value

The returned output_tokens from recordCollectedUsage now reflects all
billed LLM calls (message + summarization). Previously, summarization
completions were billed but excluded from the returned metadata, causing
a discrepancy between what users were charged and what the response
message reported.

* fix(tests): replace process.exit with proper Redis cleanup in e2e test

The summarization E2E test used process.exit(0) to work around a Redis
connection opened at import time, which killed the Jest runner and
bypassed teardown. Use ioredisClient.quit() and keyvRedisClient.disconnect()
for graceful cleanup instead.

* fix(tests): update getConvo imports in OpenAI and response tests

Refactor test files to import getConvo from the main models module instead of the Conversation submodule. This change ensures consistency across tests and simplifies the import structure, enhancing maintainability.

* fix(clients): improve summary text validation in BaseClient

Refactor the summary extraction logic to ensure that only non-empty summary texts are considered valid. This change enhances the robustness of the message processing by utilizing a dedicated method for summary text retrieval, improving overall reliability.

* fix(config): replace z.any() with explicit union in summarization schema

Model parameters (temperature, top_p, etc.) are constrained to
primitive types rather than the policy-violating z.any().

* refactor(agents): deduplicate CLAUDE_TOKEN_CORRECTION constant

Export from the TS source in packages/api and import in the JS client,
eliminating the static class property that could drift out of sync.

* refactor(agents): eliminate duplicate selfProvider in buildAgentContext

selfProvider and provider were derived from the same expression with
different type casts. Consolidated to a single provider variable.

* refactor(agents): extract shared SSE handlers and restrict log levels

- buildSummarizationHandlers() factory replaces triplicated handler
  blocks across responses.js and openai.js
- agentLogHandlerObj exported from callbacks.js for consistent reuse
- agentLogHandler restricted to an allowlist of safe log levels
  (debug, info, warn, error) instead of accepting arbitrary strings

* fix(SSE): batch summarize deltas, add exhaustiveness check, conditional error announcement

- ON_SUMMARIZE_DELTA coalesces rapid-fire renders via requestAnimationFrame
  instead of calling setMessages per chunk
- Exhaustive never-check on TStepEvent catches unhandled variants at
  compile time when new StepEvents are added
- ON_SUMMARIZE_COMPLETE error announcement only fires when a summary
  part was actually present and removed

* feat(agents): persist instruction overhead in contextMeta and seed across runs

Extend contextMeta with instructionOverhead and toolCount so the
provider-observed instruction overhead is persisted on the response message
and seeded into the pruner on subsequent runs. This enables the pruner to
use a calibrated budget from the first call instead of waiting for a
provider observation, preventing the ratio collapse caused by local
tokenizer overestimating tool schema tokens.

The seeded overhead is only used when encoding and tool count match
between runs, ensuring stale values from different configurations
are discarded.

* test(agents): enhance OpenAI test mocks for summarization handlers

Updated the OpenAI test suite to include additional mock implementations for summarization handlers, including buildSummarizationHandlers, markSummarizationUsage, and agentLogHandlerObj. This improves test coverage and ensures consistent behavior during testing.

* fix(agents): address review findings for summarization v2

Cancel rAF on unmount to prevent stale Recoil writes from dead
component context. Clear orphaned summarizing:true parts when
ON_SUMMARIZE_COMPLETE arrives without a summary payload. Add null
guard and safe spread to agentLogHandler. Handle Anthropic-format
base64 image/* documents in estimateMediaTokensForMessage. Use
role="region" for expandable summary content. Add .describe() to
contextMeta Zod fields. Extract duplicate usage loop into helper.

* refactor: simplify contextMeta to calibrationRatio + encoding only

Remove instructionOverhead and toolCount from cross-run persistence —
instruction tokens change too frequently between runs (prompt edits,
tool changes) for a persisted seed to be reliable. The intra-run
calibration in the pruner still self-corrects via provider observations.
contextMeta now stores only the tokenizer-bias ratio and encoding,
which are stable across instruction changes.

* test(SSE): enhance useStepHandler tests for ON_SUMMARIZE_COMPLETE behavior

Updated the test for ON_SUMMARIZE_COMPLETE to clarify that it finalizes the existing part with summarizing set to false when the summary is undefined. Added assertions to verify the correct behavior of message updates and the state of summary parts.

* refactor(BaseClient): remove handleContextStrategy and truncateToolCallOutputs functions

Eliminated the handleContextStrategy method from BaseClient to streamline message handling. Also removed the truncateToolCallOutputs function from the prompts module, simplifying the codebase and improving maintainability.

* refactor: add AGENT_DEBUG_LOGGING option and refactor token count handling in BaseClient

Introduced AGENT_DEBUG_LOGGING to .env.example for enhanced debugging capabilities. Refactored token count handling in BaseClient by removing the handleTokenCountMap method and simplifying token count updates. Updated AgentClient to log detailed token count recalculations and adjustments, improving traceability during message processing.

* chore: update dependencies in package-lock.json and package.json files

Bumped versions of several dependencies, including @librechat/agents to ^3.1.62 and various AWS SDK packages to their latest versions. This ensures compatibility and incorporates the latest features and fixes.

* chore: imports order

* refactor: extract summarization config resolution from buildAgentContext

* refactor: rename and simplify summarization configuration shaping function

* refactor: replace AgentClient token counting methods with single-pass pure utility

Extract getTokenCount() and getTokenCountForMessage() from AgentClient
into countFormattedMessageTokens(), a pure function in packages/api that
handles text, tool_call, image, and document content types in one loop.

- Decompose estimateMediaTokensForMessage into block-level helpers
  (estimateImageDataTokens, estimateImageBlockTokens, estimateDocumentBlockTokens)
  shared by both estimateMediaTokensForMessage and the new single-pass function
- Remove redundant per-call getEncoding() resolution (closure captures once)
- Remove deprecated gpt-3.5-turbo-0301 model branching
- Drop this.getTokenCount guard from BaseClient.sendMessage

* refactor: streamline token counting in createTokenCounter function

Simplified the createTokenCounter function by removing the media token estimation and directly calculating the token count. This change enhances clarity and performance by consolidating the token counting logic into a single pass, while maintaining compatibility with Claude's token correction.

* refactor: simplify summarization configuration types

Removed the AppSummarizationConfig type and directly used SummarizationConfig in the AppConfig interface. This change streamlines the type definitions and enhances consistency across the codebase.

* chore: import order

* fix: summarization event handling in useStepHandler

- Cancel pending summarizeDeltaRaf in clearStepMaps to prevent stale
  frames firing after map reset or component unmount
- Move announcePolite('summarize_completed') inside the didFinalize
  guard so screen readers only announce when finalization actually occurs
- Remove dead cleanup closure returned from stepHandler useCallback body
  that was never invoked by any caller

* fix: estimate tokens for non-PDF/non-image base64 document blocks

Previously estimateDocumentBlockTokens returned 0 for unrecognized MIME
types (e.g. text/plain, application/json), silently underestimating
context budget. Fall back to character-based heuristic or countTokens.

* refactor: return cloned usage from markSummarizationUsage

Avoid mutating LangChain's internal usage_metadata object by returning
a shallow clone with the usage_type tag. Update all call sites in
callbacks, openai, and responses controllers to use the returned value.

* refactor: consolidate debug logging loops in buildMessages

Merge the two sequential O(n) debug-logging passes over orderedMessages
into a single pass inside the map callback where all data is available.

* refactor: narrow SummaryContentPart.content type

Replace broad Agents.MessageContentComplex[] with the specific
Array<{ type: ContentTypes.TEXT; text: string }> that all producers
and consumers already use, improving compile-time safety.

* refactor: use single output array in recordCollectedUsage

Have processUsageGroup append to a shared array instead of returning
separate arrays that are spread into a third, reducing allocations.

* refactor: use for...in in hydrateMissingIndexTokenCounts

Replace Object.entries with for...in to avoid allocating an
intermediate tuple array during token map hydration.
2026-03-21 14:28:56 -04:00
Danny Avila
0412f05daf
🪢 chore: Consolidate Pricing and Tx Imports After tx.js Module Removal (#12086)
* 🧹 chore: resolve imports due to rebase

* chore: Update model mocks in unit tests for consistency

- Consolidated model mock implementations across various test files to streamline setup and reduce redundancy.
- Removed duplicate mock definitions for `getMultiplier` and `getCacheMultiplier`, ensuring a unified approach in `recordCollectedUsage.spec.js`, `openai.spec.js`, `responses.unit.spec.js`, and `abortMiddleware.spec.js`.
- Enhanced clarity and maintainability of test files by aligning mock structures with the latest model updates.

* fix: Safeguard token credit checks in transaction tests

- Updated assertions in `transaction.spec.ts` to handle potential null values for `updatedBalance` by using optional chaining.
- Enhanced robustness of tests related to token credit calculations, ensuring they correctly account for scenarios where the balance may not be found.

* chore: transaction methods with bulk insert functionality

- Introduced `bulkInsertTransactions` method in `transaction.ts` to facilitate batch insertion of transaction documents.
- Updated test file `transactions.bulk-parity.spec.ts` to utilize new pricing function assignments and handle potential null values in calculations, improving test robustness.
- Refactored pricing function initialization for clarity and consistency.

* refactor: Enhance type definitions and introduce new utility functions for model matching

- Added `findMatchingPattern` and `matchModelName` utility functions to improve model name matching logic in transaction methods.
- Updated type definitions for `findMatchingPattern` to accept a more specific tokensMap structure, enhancing type safety.
- Refactored `dbMethods` initialization in `transactions.bulk-parity.spec.ts` to include the new utility functions, improving test clarity and functionality.

* refactor: Update database method imports and enhance transaction handling

- Refactored `abortMiddleware.js` to utilize centralized database methods for message handling and conversation retrieval, improving code consistency.
- Enhanced `bulkInsertTransactions` in `transaction.ts` to handle empty document arrays gracefully and added error logging for better debugging.
- Updated type definitions in `transactions.ts` to enforce stricter typing for token types, enhancing type safety across transaction methods.
- Improved test setup in `transactions.bulk-parity.spec.ts` by refining pricing function assignments and ensuring robust handling of potential null values.

* refactor: Update database method references and improve transaction multiplier handling

- Refactored `client.js` to update database method references for `bulkInsertTransactions` and `updateBalance`, ensuring consistency in method usage.
- Enhanced transaction multiplier calculations in `transaction.spec.ts` to provide fallback values for write and read multipliers, improving robustness in cost calculations across structured token spending tests.
2026-03-21 14:28:53 -04:00
Danny Avila
8ba2bde5c1
📦 refactor: Consolidate DB models, encapsulating Mongoose usage in data-schemas (#11830)
* chore: move database model methods to /packages/data-schemas

* chore: add TypeScript ESLint rule to warn on unused variables

* refactor: model imports to streamline access

- Consolidated model imports across various files to improve code organization and reduce redundancy.
- Updated imports for models such as Assistant, Message, Conversation, and others to a unified import path.
- Adjusted middleware and service files to reflect the new import structure, ensuring functionality remains intact.
- Enhanced test files to align with the new import paths, maintaining test coverage and integrity.

* chore: migrate database models to packages/data-schemas and refactor all direct Mongoose Model usage outside of data-schemas

* test: update agent model mocks in unit tests

- Added `getAgent` mock to `client.test.js` to enhance test coverage for agent-related functionality.
- Removed redundant `getAgent` and `getAgents` mocks from `openai.spec.js` and `responses.unit.spec.js` to streamline test setup and reduce duplication.
- Ensured consistency in agent mock implementations across test files.

* fix: update types in data-schemas

* refactor: enhance type definitions in transaction and spending methods

- Updated type definitions in `checkBalance.ts` to use specific request and response types.
- Refined `spendTokens.ts` to utilize a new `SpendTxData` interface for better clarity and type safety.
- Improved transaction handling in `transaction.ts` by introducing `TransactionResult` and `TxData` interfaces, ensuring consistent data structures across methods.
- Adjusted unit tests in `transaction.spec.ts` to accommodate new type definitions and enhance robustness.

* refactor: streamline model imports and enhance code organization

- Consolidated model imports across various controllers and services to a unified import path, improving code clarity and reducing redundancy.
- Updated multiple files to reflect the new import structure, ensuring all functionalities remain intact.
- Enhanced overall code organization by removing duplicate import statements and optimizing the usage of model methods.

* feat: implement loadAddedAgent and refactor agent loading logic

- Introduced `loadAddedAgent` function to handle loading agents from added conversations, supporting multi-convo parallel execution.
- Created a new `load.ts` file to encapsulate agent loading functionalities, including `loadEphemeralAgent` and `loadAgent`.
- Updated the `index.ts` file to export the new `load` module instead of the deprecated `loadAgent`.
- Enhanced type definitions and improved error handling in the agent loading process.
- Adjusted unit tests to reflect changes in the agent loading structure and ensure comprehensive coverage.

* refactor: enhance balance handling with new update interface

- Introduced `IBalanceUpdate` interface to streamline balance update operations across the codebase.
- Updated `upsertBalanceFields` method signatures in `balance.ts`, `transaction.ts`, and related tests to utilize the new interface for improved type safety.
- Adjusted type imports in `balance.spec.ts` to include `IBalanceUpdate`, ensuring consistency in balance management functionalities.
- Enhanced overall code clarity and maintainability by refining type definitions related to balance operations.

* feat: add unit tests for loadAgent functionality and enhance agent loading logic

- Introduced comprehensive unit tests for the `loadAgent` function, covering various scenarios including null and empty agent IDs, loading of ephemeral agents, and permission checks.
- Enhanced the `initializeClient` function by moving `getConvoFiles` to the correct position in the database method exports, ensuring proper functionality.
- Improved test coverage for agent loading, including handling of non-existent agents and user permissions.

* chore: reorder memory method exports for consistency

- Moved `deleteAllUserMemories` to the correct position in the exported memory methods, ensuring a consistent and logical order of method exports in `memory.ts`.
2026-03-21 14:28:53 -04:00
Danny Avila
381ed8539b
🪪 fix: Enforce Conversation Ownership Checks in Remote Agent Controllers (#12263)
* 🔒 fix: Validate conversation ownership in remote agent API endpoints

Add user-scoped ownership checks for client-supplied conversation IDs
in OpenAI-compatible and Open Responses controllers to prevent
cross-tenant file/message loading via IDOR.

* 🔒 fix: Harden ownership checks against type confusion and unhandled errors

- Add typeof string validation before getConvo to block NoSQL operator
  injection (e.g. { "$gt": "" }) bypassing the ownership check
- Move ownership checks inside try/catch so DB errors produce structured
  JSON error responses instead of unhandled promise rejections
- Add string type validation for conversation_id and previous_response_id
  in the upstream TS request validators (defense-in-depth)

* 🧪 test: Add coverage for conversation ownership validation in remote agent APIs

- Fix broken getConvo mock in openai.spec.js (was missing entirely)
- Add tests for: owned conversation, unowned (404), non-string type (400),
  absent conversation_id (skipped), and DB error (500) — both controllers
2026-03-16 09:19:48 -04:00
Danny Avila
6f87b49df8
🛂 fix: Enforce Actions Capability Gate Across All Event-Driven Tool Loading Paths (#12252)
* fix: gate action tools by actions capability in all code paths

Extract resolveAgentCapabilities helper to eliminate 3x-duplicated
capability resolution. Apply early action-tool filtering in both
loadToolDefinitionsWrapper and loadAgentTools non-definitions path.
Gate loadActionToolsForExecution in loadToolsForExecution behind an
actionsEnabled parameter with a cache-based fallback. Replace the
late capability guard in loadAgentTools with a hasActionTools check
to avoid unnecessary loadActionSets DB calls and duplicate warnings.

* fix: thread actionsEnabled through InitializedAgent type

Add actionsEnabled to the loadTools callback return type,
InitializedAgent, and the initializeAgent destructuring/return
so callers can forward the resolved value to loadToolsForExecution
without redundant getEndpointsConfig cache lookups.

* fix: pass actionsEnabled from callers to loadToolsForExecution

Thread actionsEnabled through the agentToolContexts map in
initialize.js (primary and handoff agents) and through
primaryConfig in the openai.js and responses.js controllers,
avoiding per-tool-call capability re-resolution on the hot path.

* test: add regression tests for action capability gating

Test the real exported functions (resolveAgentCapabilities,
loadAgentTools, loadToolsForExecution) with mocked dependencies
instead of shadow re-implementations. Covers definition filtering,
execution gating, actionsEnabled param forwarding, and fallback
capability resolution.

* test: use Constants.EPHEMERAL_AGENT_ID in ephemeral fallback test

Replaces a string guess with the canonical constant to avoid
fragility if the ephemeral detection heuristic changes.

* fix: populate agentToolContexts for addedConvo parallel agents

After processAddedConvo returns, backfill agentToolContexts for
any agents in agentConfigs not already present, so ON_TOOL_EXECUTE
for added-convo agents receives actionsEnabled instead of falling
back to a per-call cache lookup.
2026-03-15 23:01:36 -04:00
Danny Avila
e1e204d6cf
🧮 refactor: Bulk Transactions & Balance Updates for Token Spending (#11996)
* refactor: transaction handling by integrating pricing and bulk write operations

- Updated `recordCollectedUsage` to accept pricing functions and bulk write operations, improving transaction management.
- Refactored `AgentClient` and related controllers to utilize the new transaction handling capabilities, ensuring better performance and accuracy in token spending.
- Added tests to validate the new functionality, ensuring correct behavior for both standard and bulk transaction paths.
- Introduced a new `transactions.ts` file to encapsulate transaction-related logic and types, enhancing code organization and maintainability.

* chore: reorganize imports in agents client controller

- Moved `getMultiplier` and `getCacheMultiplier` imports to maintain consistency and clarity in the import structure.
- Removed duplicate import of `updateBalance` and `bulkInsertTransactions`, streamlining the code for better readability.

* refactor: add TransactionData type and CANCEL_RATE constant to data-schemas

Establishes a single source of truth for the transaction document shape
and the incomplete-context billing rate constant, both consumed by
packages/api and api/.

* refactor: use proper types in data-schemas transaction methods

- Replace `as unknown as { tokenCredits }` with `lean<IBalance>()`
- Use `TransactionData[]` instead of `Record<string, unknown>[]`
  for bulkInsertTransactions parameter
- Add JSDoc noting insertMany bypasses document middleware
- Remove orphan section comment in methods/index.ts

* refactor: use shared types in transactions.ts, fix bulk write logic

- Import CANCEL_RATE from data-schemas instead of local duplicate
- Import TransactionData from data-schemas for PreparedEntry/BulkWriteDeps
- Use tilde alias for EndpointTokenConfig import
- Pass valueKey through to getMultiplier
- Only sum tokenValue for balance-enabled docs in bulkWriteTransactions
- Consolidate two loops into single-pass map

* refactor: remove duplicate updateBalance from Transaction.js

Import updateBalance from ~/models (sourced from data-schemas) instead
of maintaining a second copy. Also import CANCEL_RATE from data-schemas
and remove the Balance model import (no longer needed directly).

* fix: test real spendCollectedUsage instead of IIFE replica

Export spendCollectedUsage from abortMiddleware.js and rewrite the test
file to import and test the actual function. Previously the tests ran
against a hand-written replica that could silently diverge from the real
implementation.

* test: add transactions.spec.ts and restore regression comments

Add 22 direct unit tests for transactions.ts financial logic covering
prepareTokenSpend, prepareStructuredTokenSpend, bulkWriteTransactions,
CANCEL_RATE paths, NaN guards, disabled transactions, zero tokens,
cache multipliers, and balance-enabled filtering.

Restore critical regression documentation comments in
recordCollectedUsage.spec.js explaining which production bugs the
tests guard against.

* fix: widen setValues type to include lastRefill

The UpdateBalanceParams.setValues type was Partial<Pick<IBalance,
'tokenCredits'>> which excluded lastRefill — used by
createAutoRefillTransaction. Widen to also pick 'lastRefill'.

* test: use real MongoDB for bulkWriteTransactions tests

Replace mock-based bulkWriteTransactions tests with real DB tests using
MongoMemoryServer. Pure function tests (prepareTokenSpend,
prepareStructuredTokenSpend) remain mock-based since they don't touch
DB. Add end-to-end integration tests that verify the full prepare →
bulk write → DB state pipeline with real Transaction and Balance models.

* chore: update @librechat/agents dependency to version 3.1.54 in package-lock.json and related package.json files

* test: add bulk path parity tests proving identical DB outcomes

Three test suites proving the bulk path (prepareTokenSpend/
prepareStructuredTokenSpend + bulkWriteTransactions) produces
numerically identical results to the legacy path for all scenarios:

- usage.bulk-parity.spec.ts: mirrors all legacy recordCollectedUsage
  tests; asserts same return values and verifies metadata fields on
  the insertMany docs match what spendTokens args would carry

- transactions.bulk-parity.spec.ts: real-DB tests using actual
  getMultiplier/getCacheMultiplier pricing functions; asserts exact
  tokenValue, rate, rawAmount and balance deductions for standard
  tokens, structured/cache tokens, CANCEL_RATE, premium pricing,
  multi-entry batches, and edge cases (NaN, zero, disabled)

- Transaction.spec.js: adds describe('Bulk path parity') that mirrors
  7 key legacy tests via recordCollectedUsage + bulk deps against
  real MongoDB, asserting same balance deductions and doc counts

* refactor: update llmConfig structure to use modelKwargs for reasoning effort

Refactor the llmConfig in getOpenAILLMConfig to store reasoning effort within modelKwargs instead of directly on llmConfig. This change ensures consistency in the configuration structure and improves clarity in the handling of reasoning properties in the tests.

* test: update performance checks in processAssistantMessage tests

Revise the performance assertions in the processAssistantMessage tests to ensure that each message processing time remains under 100ms, addressing potential ReDoS vulnerabilities. This change enhances the reliability of the tests by focusing on maximum processing time rather than relative ratios.

* test: fill parity test gaps — model fallback, abort context, structured edge cases

- usage.bulk-parity: add undefined model fallback test
- transactions.bulk-parity: add abort context test (txns inserted,
  balance unchanged when balance not passed), fix readTokens type cast
- Transaction.spec: add 3 missing mirrors — balance disabled with
  transactions enabled, structured transactions disabled, structured
  balance disabled

* fix: deduct balance before inserting transactions to prevent orphaned docs

Swap the order in bulkWriteTransactions: updateBalance runs before
insertMany. If updateBalance fails (after exhausting retries), no
transaction documents are written — avoiding the inconsistent state
where transactions exist in MongoDB with no corresponding balance
deduction.

* chore: import order

* test: update config.spec.ts for OpenRouter reasoning in modelKwargs

Same fix as llm.spec.ts — OpenRouter reasoning is now passed via
modelKwargs instead of llmConfig.reasoning directly.
2026-03-01 12:26:36 -05:00
Danny Avila
8b159079f5
🪙 feat: Add messageId to Transactions (#11987)
* feat: Add messageId to transactions

* chore: field order

* feat: Enhance token usage tracking by adding messageId parameter

- Updated `recordTokenUsage` method in BaseClient to accept a new `messageId` parameter for improved tracking.
- Propagated `messageId` in the AgentClient when recording usage.
- Added tests to ensure `messageId` is correctly passed and handled in various scenarios, including propagation across multiple usage entries.

* chore: Correct field order in createGeminiImageTool function

- Moved the conversationId field to the correct position in the object being passed to the recordTokenUsage method, ensuring proper parameter alignment for improved functionality.

* refactor: Update OpenAIChatCompletionController and createResponse to use responseId instead of requestId

- Replaced instances of requestId with responseId in the OpenAIChatCompletionController for improved clarity in logging and tracking.
- Updated createResponse to include responseId in the requestBody, ensuring consistency across the handling of message identifiers.

* test: Add messageId to agent client tests

- Included messageId in the agent client tests to ensure proper handling and propagation of message identifiers during transaction recording.
- This update enhances the test coverage for scenarios involving messageId, aligning with recent changes in the tracking of message identifiers.

* fix: Update OpenAIChatCompletionController to use requestId for context

- Changed the context object in OpenAIChatCompletionController to use `requestId` instead of `responseId` for improved clarity and consistency in handling request identifiers.

* chore: field order
2026-02-27 23:50:13 -05:00
Danny Avila
5ea59ecb2b
🐛 fix: Normalize output_text blocks in Responses API input conversion (#11835)
* 🐛 fix: Normalize `output_text` blocks in Responses API input conversion

Treat `output_text` content blocks the same as `input_text` when
converting Responses API input to internal message format. Previously,
assistant messages containing `output_text` blocks fell through to the
default handler, producing `{ type: 'output_text' }` without a `text`
field, which caused downstream provider adapters (e.g. Bedrock) to fail
with "Unsupported content block type: output_text".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: Remove ChatModelStreamHandler from OpenAI and Responses controllers

Eliminated the ChatModelStreamHandler from both OpenAIChatCompletionController and createResponse functions to streamline event handling. This change simplifies the code by relying on existing handlers for message deltas and reasoning deltas, enhancing maintainability and reducing complexity in the agent's event processing logic.

* feat: Enhance input conversion in Responses API

Updated the `convertInputToMessages` function to handle additional content types, including `input_file` and `refusal` blocks, ensuring they are converted to appropriate message formats. Implemented null filtering for content arrays and default values for missing fields, improving robustness. Added comprehensive unit tests to validate these changes and ensure correct behavior across various input scenarios.

* fix: Forward upstream provider status codes in error responses

Updated error handling in OpenAIChatCompletionController and createResponse functions to forward upstream provider status codes (e.g., Anthropic 400s) instead of masking them as 500. This change improves error reporting by providing more accurate status codes and error types, enhancing the clarity of error responses for clients.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 22:34:19 -05:00
Danny Avila
9a38af5875
📉 feat: Add Token Usage Tracking for Agents API Routes (#11600)
* feat: Implement token usage tracking for OpenAI and Responses controllers

- Added functionality to record token usage against user balances in OpenAIChatCompletionController and createResponse functions.
- Introduced new utility functions for managing token spending and structured token usage.
- Enhanced error handling for token recording to improve logging and debugging capabilities.
- Updated imports to include new usage tracking methods and configurations.

* test: Add unit tests for recordCollectedUsage function in usage.spec.ts

- Introduced comprehensive tests for the recordCollectedUsage function, covering various scenarios including handling empty and null collectedUsage, single and multiple usage entries, and sequential and parallel execution cases.
- Enhanced token handling tests to ensure correct calculations for both OpenAI and Anthropic formats, including cache token management.
- Improved overall test coverage for usage tracking functionality, ensuring robust validation of expected behaviors and outcomes.

* test: Add unit tests for OpenAI and Responses API controllers

- Introduced comprehensive unit tests for the OpenAIChatCompletionController and createResponse functions, focusing on the correct invocation of recordCollectedUsage for token spending.
- Enhanced tests to validate the passing of balance and transactions configuration to the recordCollectedUsage function.
- Ensured proper dependency injection of spendTokens and spendStructuredTokens in the usage recording process.
- Improved overall test coverage for token usage tracking, ensuring robust validation of expected behaviors and outcomes.
2026-02-01 21:36:51 -05:00
Danny Avila
5af1342dbb
🦥 refactor: Event-Driven Lazy Tool Loading (#11588)
* refactor: json schema tools with lazy loading

- Added LocalToolExecutor class for lazy loading and caching of tools during execution.
- Introduced ToolExecutionContext and ToolExecutor interfaces for better type management.
- Created utility functions to generate tool proxies with JSON schema support.
- Added ExtendedJsonSchema type for enhanced schema definitions.
- Updated existing toolkits to utilize the new schema and executor functionalities.
- Introduced a comprehensive tool definitions registry for managing various tool schemas.

chore: update @librechat/agents to version 3.1.2

refactor: enhance tool loading optimization and classification

- Improved the loadAgentToolsOptimized function to utilize a proxy pattern for all tools, enabling deferred execution and reducing overhead.
- Introduced caching for tool instances and refined tool classification logic to streamline tool management.
- Updated the handling of MCP tools to improve logging and error reporting for missing tools in the cache.
- Enhanced the structure of tool definitions to support better classification and integration with existing tools.

refactor: modularize tool loading and enhance optimization

- Moved the loadAgentToolsOptimized function to a new service file for better organization and maintainability.
- Updated the ToolService to utilize the new service for optimized tool loading, improving code clarity.
- Removed legacy tool loading methods and streamlined the tool loading process to enhance performance and reduce complexity.
- Introduced feature flag handling for optimized tool loading, allowing for easier toggling of this functionality.

refactor: replace loadAgentToolsWithFlag with loadAgentTools in tool loader

refactor: enhance MCP tool loading with proxy creation and classification

refactor: optimize MCP tool loading by grouping tools by server

- Introduced a Map to group cached tools by server name, improving the organization of tool data.
- Updated the createMCPProxyTool function to accept server name directly, enhancing clarity.
- Refactored the logic for handling MCP tools, streamlining the process of creating proxy tools for classification.

refactor: enhance MCP tool loading and proxy creation

- Added functionality to retrieve MCP server tools and reinitialize servers if necessary, improving tool availability.
- Updated the tool loading logic to utilize a Map for organizing tools by server, enhancing clarity and performance.
- Refactored the createToolProxy function to ensure a default response format, streamlining tool creation.

refactor: update createToolProxy to ensure consistent response format

- Modified the createToolProxy function to await the executor's execution and validate the result format.
- Ensured that the function returns a default response structure when the result is not an array of two elements, enhancing reliability in tool proxy creation.

refactor: ToolExecutionContext with toolCall property

- Added toolCall property to ToolExecutionContext interface for improved context handling during tool execution.
- Updated LocalToolExecutor to include toolCall in the runnable configuration, allowing for more flexible tool invocation.
- Modified createToolProxy to pass toolCall from the configuration, ensuring consistent context across tool executions.

refactor: enhance event-driven tool execution and logging

- Introduced ToolExecuteOptions for improved handling of event-driven tool execution, allowing for parallel execution of tool calls.
- Updated getDefaultHandlers to include support for ON_TOOL_EXECUTE events, enhancing the flexibility of tool invocation.
- Added detailed logging in LocalToolExecutor to track tool loading and execution metrics, improving observability and debugging capabilities.
- Refactored initializeClient to integrate event-driven tool loading, ensuring compatibility with the new execution model.

chore: update @librechat/agents to version 3.1.21

refactor: remove legacy tool loading and executor components

- Eliminated the loadAgentToolsWithFlag function, simplifying the tool loading process by directly using loadAgentTools.
- Removed the LocalToolExecutor and related executor components to streamline the tool execution architecture.
- Updated ToolService and related files to reflect the removal of deprecated features, enhancing code clarity and maintainability.

refactor: enhance tool classification and definitions handling

- Updated the loadAgentTools function to return toolDefinitions alongside toolRegistry, improving the structure of tool data returned to clients.
- Removed the convertRegistryToDefinitions function from the initialize.js file, simplifying the initialization process.
- Adjusted the buildToolClassification function to ensure toolDefinitions are built and returned simultaneously with the toolRegistry, enhancing efficiency in tool management.
- Updated type definitions in initialize.ts to include toolDefinitions, ensuring consistency across the codebase.

refactor: implement event-driven tool execution handler

- Introduced createToolExecuteHandler function to streamline the handling of ON_TOOL_EXECUTE events, allowing for parallel execution of tool calls.
- Updated getDefaultHandlers to utilize the new handler, simplifying the event-driven architecture.
- Added handlers.ts file to encapsulate tool execution logic, improving code organization and maintainability.
- Enhanced OpenAI handlers to integrate the new tool execution capabilities, ensuring consistent event handling across the application.

refactor: integrate event-driven tool execution options

- Added toolExecuteOptions to support event-driven tool execution in OpenAI and responses controllers, enhancing flexibility in tool handling.
- Updated handlers to utilize createToolExecuteHandler, allowing for streamlined execution of tools during agent interactions.
- Refactored service dependencies to include toolExecuteOptions, ensuring consistent integration across the application.

refactor: enhance tool loading with definitionsOnly parameter

- Updated createToolLoader and loadAgentTools functions to include a definitionsOnly parameter, allowing for the retrieval of only serializable tool definitions in event-driven mode.
- Adjusted related interfaces and documentation to reflect the new parameter, improving clarity and flexibility in tool management.
- Ensured compatibility across various components by integrating the definitionsOnly option in the initialization process.

refactor: improve agent tool presence check in initialization

- Added a check for tool presence using a new hasAgentTools variable, which evaluates both structuredTools and toolDefinitions.
- Updated the conditional logic in the agent initialization process to utilize the hasAgentTools variable, enhancing clarity and maintainability in tool management.

refactor: enhance agent tool extraction to support tool definitions

- Updated the extractMCPServers function to handle both tool instances and serializable tool definitions, improving flexibility in agent tool management.
- Added a new property toolDefinitions to the AgentWithTools type for better integration of event-driven mode.
- Enhanced documentation to clarify the function's capabilities in extracting unique MCP server names from both tools and tool definitions.

refactor: enhance tool classification and registry building

- Added serverName property to ToolDefinition for improved tool identification.
- Introduced buildToolRegistry function to streamline the creation of tool registries based on MCP tool definitions and agent options.
- Updated buildToolClassification to utilize the new registry building logic, ensuring basic definitions are returned even when advanced classification features are not allowed.
- Enhanced documentation and logging for clarity in tool classification processes.

refactor: update @librechat/agents dependency to version 3.1.22

fix: expose loadTools function in ToolService

- Added loadTools function to the exported module in ToolService.js, enhancing the accessibility of tool loading functionality.

chore: remove configurable options from tool execute options in OpenAI controller

refactor: enhance tool loading mechanism to utilize agent-specific context

chore: update @librechat/agents dependency to version 3.1.23

fix: simplify result handling in createToolExecuteHandler

* refactor: loadToolDefinitions for efficient tool loading in event-driven mode

* refactor: replace legacy tool loading with loadToolsForExecution in OpenAI and responses controllers

- Updated OpenAIChatCompletionController and createResponse functions to utilize loadToolsForExecution for improved tool loading.
- Removed deprecated loadToolsLegacy references, streamlining the tool execution process.
- Enhanced tool loading options to include agent-specific context and configurations.

* refactor: enhance tool loading and execution handling

- Introduced loadActionToolsForExecution function to streamline loading of action tools, improving organization and maintainability.
- Updated loadToolsForExecution to handle both regular and action tools, optimizing the tool loading process.
- Added detailed logging for missing tools in createToolExecuteHandler, enhancing error visibility.
- Refactored tool definitions to normalize action tool names, improving consistency in tool management.

* refactor: enhance built-in tool definitions loading

- Updated loadToolDefinitions to include descriptions and parameters from the tool registry for built-in tools, improving the clarity and usability of tool definitions.
- Integrated getToolDefinition to streamline the retrieval of tool metadata, enhancing the overall tool management process.

* feat: add action tool definitions loading to tool service

- Introduced getActionToolDefinitions function to load action tool definitions based on agent ID and tool names, enhancing the tool loading process.
- Updated loadToolDefinitions to integrate action tool definitions, allowing for better management and retrieval of action-specific tools.
- Added comprehensive tests for action tool definitions to ensure correct loading and parameter handling, improving overall reliability and functionality.

* chore: update @librechat/agents dependency to version 3.1.26

* refactor: add toolEndCallback to handle tool execution results

* fix: tool definitions and execution handling

- Introduced native tools (execute_code, file_search, web_search) to the tool service, allowing for better integration and management of these tools.
- Updated isBuiltInTool function to include native tools in the built-in check, improving tool recognition.
- Added comprehensive tests for loading parameters of native tools, ensuring correct functionality and parameter handling.
- Enhanced tool definitions registry to include new agent tool definitions, streamlining tool retrieval and management.

* refactor: enhance tool loading and execution context

- Added toolRegistry to the context for OpenAIChatCompletionController and createResponse functions, improving tool management.
- Updated loadToolsForExecution to utilize toolRegistry for better integration of programmatic tools and tool search functionalities.
- Enhanced the initialization process to include toolRegistry in agent context, streamlining tool access and configuration.
- Refactored tool classification logic to support event-driven execution, ensuring compatibility with new tool definitions.

* chore: add request duration logging to OpenAI and Responses controllers

- Introduced logging for request start and completion times in OpenAIChatCompletionController and createResponse functions.
- Calculated and logged the duration of each request, enhancing observability and performance tracking.
- Improved debugging capabilities by providing detailed logs for both streaming and non-streaming responses.

* chore: update @librechat/agents dependency to version 3.1.27

* refactor: implement buildToolSet function for tool management

- Introduced buildToolSet function to streamline the creation of tool sets from agent configurations, enhancing tool management across various controllers.
- Updated AgentClient, OpenAIChatCompletionController, and createResponse functions to utilize buildToolSet, improving consistency in tool handling.
- Added comprehensive tests for buildToolSet to ensure correct functionality and edge case handling, enhancing overall reliability.

* refactor: update import paths for ToolExecuteOptions and createToolExecuteHandler

* fix: update GoogleSearch.js description for maximum search results

- Changed the default maximum number of search results from 10 to 5 in the Google Search JSON schema description, ensuring accurate documentation of the expected behavior.

* chore: remove deprecated Browser tool and associated assets

- Deleted the Browser tool definition from manifest.json, which included its name, plugin key, description, and authentication configuration.
- Removed the web-browser.svg asset as it is no longer needed following the removal of the Browser tool.

* fix: ensure tool definitions are valid before processing

- Added a check to verify the existence of tool definitions in the registry before accessing their properties, preventing potential runtime errors.
- Updated the loading logic for built-in tool definitions to ensure that only valid definitions are pushed to the built-in tool definitions array.

* fix: extend ExtendedJsonSchema to support 'null' type and nullable enums

- Updated the ExtendedJsonSchema type to include 'null' as a valid type option.
- Modified the enum property to accept an array of values that can include strings, numbers, booleans, and null, enhancing schema flexibility.

* test: add comprehensive tests for tool definitions loading and registry behavior

- Implemented tests to verify the handling of built-in tools without registry definitions, ensuring they are skipped correctly.
- Added tests to confirm that built-in tools include descriptions and parameters in the registry.
- Enhanced tests for action tools, checking for proper inclusion of metadata and handling of tools without parameters in the registry.

* test: add tests for mixed-type and number enum schema handling

- Introduced tests to validate the parsing of mixed-type enum values, including strings, numbers, booleans, and null.
- Added tests for number enum schema values to ensure correct parsing of numeric inputs, enhancing schema validation coverage.

* fix: update mock implementation for @librechat/agents

- Changed the mock for @librechat/agents to spread the actual module's properties, ensuring that all necessary functionalities are preserved in tests.
- This adjustment enhances the accuracy of the tests by reflecting the real structure of the module.

* fix: change max_results type in GoogleSearch schema from number to integer

- Updated the type of max_results in the Google Search JSON schema to 'integer' for better type accuracy and validation consistency.

* fix: update max_results description and type in GoogleSearch schema

- Changed the type of max_results from 'number' to 'integer' for improved type accuracy.
- Updated the description to reflect the new default maximum number of search results, changing it from 10 to 5.

* refactor: remove unused code and improve tool registry handling

- Eliminated outdated comments and conditional logic related to event-driven mode in the ToolService.
- Enhanced the handling of the tool registry by ensuring it is configurable for better integration during tool execution.

* feat: add definitionsOnly option to buildToolClassification for event-driven mode

- Introduced a new parameter, definitionsOnly, to the BuildToolClassificationParams interface to enable a mode that skips tool instance creation.
- Updated the buildToolClassification function to conditionally add tool definitions without instantiating tools when definitionsOnly is true.
- Modified the loadToolDefinitions function to pass definitionsOnly as true, ensuring compatibility with the new feature.

* test: add unit tests for buildToolClassification with definitionsOnly option

- Implemented tests to verify the behavior of buildToolClassification when definitionsOnly is set to true or false.
- Ensured that tool instances are not created when definitionsOnly is true, while still adding necessary tool definitions.
- Confirmed that loadAuthValues is called appropriately based on the definitionsOnly parameter, enhancing test coverage for this new feature.
2026-02-01 08:50:57 -05:00
Danny Avila
6279ea8dd7
🛸 feat: Remote Agent Access with External API Support (#11503)
* 🪪 feat: Microsoft Graph Access Token Placeholder for MCP Servers (#10867)

* feat: MCP Graph Token env var

* Addressing copilot remarks

* Addressed Copilot review remarks

* Fixed graphtokenservice mock in MCP test suite

* fix: remove unnecessary type check and cast in resolveGraphTokensInRecord

* ci: add Graph Token integration tests in MCPManager

* refactor: update user type definitions to use Partial<IUser> in multiple functions

* test: enhance MCP tests for graph token processing and user placeholder resolution

- Added comprehensive tests to validate the interaction between preProcessGraphTokens and processMCPEnv.
- Ensured correct resolution of graph tokens and user placeholders in various configurations.
- Mocked OIDC utilities to facilitate testing of token extraction and validation.
- Verified that original options remain unchanged after processing.

* chore: import order

* chore: imports

---------

Co-authored-by: Danny Avila <danny@librechat.ai>

* WIP: OpenAI-compatible API for LibreChat agents

- Added OpenAIChatCompletionController for handling chat completions.
- Introduced ListModelsController and GetModelController for listing and retrieving agent details.
- Created routes for OpenAI API endpoints, including /v1/chat/completions and /v1/models.
- Developed event handlers for streaming responses in OpenAI format.
- Implemented request validation and error handling for API interactions.
- Integrated content aggregation and response formatting to align with OpenAI specifications.

This commit establishes a foundational API for interacting with LibreChat agents in a manner compatible with OpenAI's chat completion interface.

* refactor: OpenAI-spec content aggregation for improved performance and clarity

* fix: OpenAI chat completion controller with safe user handling for correct tool loading

* refactor: Remove conversation ID from OpenAI response context and related handlers

* refactor: OpenAI chat completion handling with streaming support

- Introduced a lightweight tracker for streaming responses, allowing for efficient tracking of emitted content and usage metadata.
- Updated the OpenAIChatCompletionController to utilize the new tracker, improving the handling of streaming and non-streaming responses.
- Refactored event handlers to accommodate the new streaming logic, ensuring proper management of tool calls and content aggregation.
- Adjusted response handling to streamline error reporting during streaming sessions.

* WIP: Open Responses API with core service, types, and handlers

- Added Open Responses API module with comprehensive types and enums.
- Implemented core service for processing requests, including validation and input conversion.
- Developed event handlers for streaming responses and non-streaming aggregation.
- Established response building logic and error handling mechanisms.
- Created detailed types for input and output content, ensuring compliance with Open Responses specification.

* feat: Implement response storage and retrieval in Open Responses API

- Added functionality to save user input messages and assistant responses to the database when the `store` flag is set to true.
- Introduced a new endpoint to retrieve stored responses by ID, allowing users to access previous interactions.
- Enhanced the response creation process to include database operations for conversation and message storage.
- Implemented tests to validate the storage and retrieval of responses, ensuring correct behavior for both existing and non-existent response IDs.

* refactor: Open Responses API with additional token tracking and validation

- Added support for tracking cached tokens in response usage, improving token management.
- Updated response structure to include new properties for top log probabilities and detailed usage metrics.
- Enhanced tests to validate the presence and types of new properties in API responses, ensuring compliance with updated specifications.
- Refactored response handling to accommodate new fields and improve overall clarity and performance.

* refactor: Update reasoning event handlers and types for consistency

- Renamed reasoning text events to simplify naming conventions, changing `emitReasoningTextDelta` to `emitReasoningDelta` and `emitReasoningTextDone` to `emitReasoningDone`.
- Updated event types in the API to reflect the new naming, ensuring consistency across the codebase.
- Added `logprobs` property to output events for enhanced tracking of log probabilities.

* feat: Add validation for streaming events in Open Responses API tests

* feat: Implement response.created event in Open Responses API

- Added emitResponseCreated function to emit the response.created event as the first event in the streaming sequence, adhering to the Open Responses specification.
- Updated createResponse function to emit response.created followed by response.in_progress.
- Enhanced tests to validate the order of emitted events, ensuring response.created is triggered before response.in_progress.

* feat: Responses API with attachment event handling

- Introduced `createResponsesToolEndCallback` to handle attachment events in the Responses API, emitting `librechat:attachment` events as per the Open Responses extension specification.
- Updated the `createResponse` function to utilize the new callback for processing tool outputs and emitting attachments during streaming.
- Added helper functions for writing attachment events and defined types for attachment data, ensuring compatibility with the Open Responses protocol.
- Enhanced tests to validate the integration of attachment events within the Responses API workflow.

* WIP: remote agent auth

* fix: Improve loading state handling in AgentApiKeys component

- Updated the rendering logic to conditionally display loading spinner and API keys based on the loading state.
- Removed unnecessary imports and streamlined the component for better readability.

* refactor: Update API key access handling in routes

- Replaced `checkAccess` with `generateCheckAccess` for improved access control.
- Consolidated access checks into a single `checkApiKeyAccess` function, enhancing code readability and maintainability.
- Streamlined route definitions for creating, listing, retrieving, and deleting API keys.

* fix: Add permission handling for REMOTE_AGENT resource type

* feat: Enhance permission handling for REMOTE_AGENT resources

- Updated the deleteAgent and deleteUserAgents functions to handle permissions for both AGENT and REMOTE_AGENT resource types.
- Introduced new functions to enrich REMOTE_AGENT principals and backfill permissions for AGENT owners.
- Modified createAgentHandler and duplicateAgentHandler to grant permissions for REMOTE_AGENT alongside AGENT.
- Added utility functions for retrieving effective permissions for REMOTE_AGENT resources, ensuring consistent access control across the application.

* refactor: Rename and update roles for remote agent access

- Changed role name from API User to Editor in translation files for clarity.
- Updated default editor role ID from REMOTE_AGENT_USER to REMOTE_AGENT_EDITOR in resource configurations.
- Adjusted role localization to reflect the new Editor role.
- Modified access permissions to align with the updated role definitions across the application.

* feat: Introduce remote agent permissions and update access handling

- Added support for REMOTE_AGENTS in permission schemas, including use, create, share, and share_public permissions.
- Updated the interface configuration to include remote agent settings.
- Modified middleware and API key access checks to align with the new remote agent permission structure.
- Enhanced role defaults to incorporate remote agent permissions, ensuring consistent access control across the application.

* refactor: Update AgentApiKeys component and permissions handling

- Refactored the AgentApiKeys component to improve structure and readability, including the introduction of ApiKeysContent for better separation of concerns.
- Updated CreateKeyDialog to accept an onKeyCreated callback, enhancing its functionality.
- Adjusted permission checks in Data component to use REMOTE_AGENTS and USE permissions, aligning with recent permission schema changes.
- Enhanced loading state handling and dialog management for a smoother user experience.

* refactor: Update remote agent access checks in API routes

- Replaced existing access checks with `generateCheckAccess` for remote agents in the API keys and agents routes.
- Introduced specific permission checks for creating, listing, retrieving, and deleting API keys, enhancing access control.
- Improved code structure by consolidating permission handling for remote agents across multiple routes.

* fix: Correct query parameters in ApiKeysContent component

- Updated the useGetAgentApiKeysQuery call to include an object for the enabled parameter, ensuring proper functionality when the component is open.
- This change improves the handling of API key retrieval based on the component's open state.

* feat: Implement remote agents permissions and update API routes

- Added new API route for updating remote agents permissions, enhancing role management capabilities.
- Introduced remote agents permissions handling in the AgentApiKeys component, including a dedicated settings dialog.
- Updated localization files to include new remote agents permission labels for better user experience.
- Refactored data provider to support remote agents permissions updates, ensuring consistent access control across the application.

* feat: Add remote agents permissions to role schema and interface

- Introduced new permissions for REMOTE_AGENTS in the role schema, including USE, CREATE, SHARE, and SHARE_PUBLIC.
- Updated the IRole interface to reflect the new remote agents permissions structure, enhancing role management capabilities.

* feat: Add remote agents settings button to API keys dialog

* feat: Update AgentFooter to include remote agent sharing permissions

- Refactored access checks to incorporate permissions for sharing remote agents.
- Enhanced conditional rendering logic to allow sharing by users with remote agent permissions.
- Improved loading state handling for remote agent permissions, ensuring a smoother user experience.

* refactor: Update API key creation access check and localization strings

- Replaced the access check for creating API keys to use the existing remote agents access check.
- Updated localization strings to correct the descriptions for remote agent permissions, ensuring clarity in user interface.

* fix: resource permission mapping to include remote agents

- Changed the resourceToPermissionMap to use a Partial<Record> for better flexibility.
- Added mapping for REMOTE_AGENT permissions, enhancing the sharing capabilities for remote agents.

* feat: Implement remote access checks for agent models

- Enhanced ListModelsController and GetModelController to include checks for user permissions on remote agents.
- Integrated findAccessibleResources to filter agents based on VIEW permission for REMOTE_AGENT.
- Updated response handling to ensure users can only access agents they have permissions for, improving security and access control.

* fix: Update user parameter type in processUserPlaceholders function

- Changed the user parameter type in the processUserPlaceholders function from Partial<Partial<IUser>> to Partial<IUser> for improved type clarity and consistency.

* refactor: Simplify integration test structure by removing conditional describe

- Replaced conditional describeWithApiKey with a standard describe for all integration tests in responses.spec.js.
- This change enhances test clarity and ensures all tests are executed consistently, regardless of the SKIP_INTEGRATION_TESTS flag.

* test: Update AgentFooter tests to reflect new grant access dialog ID

- Changed test IDs for the grant access dialog in AgentFooter tests to include the resource type, ensuring accurate identification in the test cases.
- This update improves test clarity and aligns with recent changes in the component's implementation.

* test: Enhance integration tests for Open Responses API

- Updated integration tests in responses.spec.js to utilize an authRequest helper for consistent authorization handling across all test cases.
- Introduced a test user and API key creation to improve test setup and ensure proper permission checks for remote agents.
- Added checks for existing access roles and created necessary roles if they do not exist, enhancing test reliability and coverage.

* feat: Extend accessRole schema to include remoteAgent resource type

- Updated the accessRole schema to add 'remoteAgent' to the resourceType enum, enhancing the flexibility of role assignments and permissions management.

* test: refactored test setup to create a minimal Express app for responses routes, enhancing test structure and maintainability.

* test: Enhance abort.spec.js by mocking additional modules for improved test isolation

- Updated the test setup in abort.spec.js to include actual implementations of '@librechat/data-schemas' and '@librechat/api' while maintaining mock functionality.
- This change improves test reliability and ensures that the tests are more representative of the actual module behavior.

* refactor: Update conversation ID generation to use UUID

- Replaced the nanoid with uuidv4 for generating conversation IDs in the createResponse function, enhancing uniqueness and consistency in ID generation.

* test: Add remote agent access roles to AccessRole model tests

- Included additional access roles for remote agents (REMOTE_AGENT_EDITOR, REMOTE_AGENT_OWNER, REMOTE_AGENT_VIEWER) in the AccessRole model tests to ensure comprehensive coverage of role assignments and permissions management.

* chore: Add deletion of user agent API keys in user deletion process

- Updated the user deletion process in UserController and delete-user.js to include the removal of user agent API keys, ensuring comprehensive cleanup of user data upon account deletion.

* test: Add remote agents permissions to permissions.spec.ts

- Enhanced the permissions tests by including comprehensive permission settings for remote agents across various scenarios, ensuring accurate validation of access controls for remote agent roles.

* chore: Update remote agents translations for clarity and consistency

- Removed outdated remote agents translation entries and added revised entries to improve clarity on API key creation and sharing permissions for remote agents. This enhances user understanding of the available functionalities.

* feat: Add indexing and TTL for agent API keys

- Introduced an index on the `key` field for improved query performance.
- Added a TTL index on the `expiresAt` field to enable automatic cleanup of expired API keys, ensuring efficient management of stored keys.

* chore: Update API route documentation for clarity

- Revised comments in the agents route file to clarify the handling of API key authentication.
- Removed outdated endpoint listings to streamline the documentation and focus on current functionality.

---------

Co-authored-by: Max Sanna <max@maxsanna.com>
2026-01-28 17:44:33 -05:00