Commit graph

4134 commits

Author SHA1 Message Date
Marco Beretta
9a3419e6c7
build: optimize docker runtime image 2026-05-08 13:17:41 +02:00
Marco Beretta
b1b9f74004
chore: ignore codex and agent artifacts 2026-05-08 00:43:22 +02:00
Artyom Bogachenko
5683706af5
🔐 feat: OIDC Bearer Token Authentication for Remote Agent API (#12450)
* Remote Agent Auth middleware

* consider migration and update user

* fix eslint errors

* add scope validation

* fix codex review errors

* add filter for use: sig

* add jwks-rsa deps

* Fix remote agent OIDC auth review findings

* Polish remote agent OIDC timeout coverage

* Reject remote OIDC tokens without subject

* Use tenant context for remote agent auth config

* Harden remote agent OIDC scope handling

* Polish remote agent OIDC cache and scope tests

* Resolve remote agent auth review comments

* Reuse OpenID email claim resolver for remote auth

* Skip empty OpenID email fallback claims

* Use pre-auth tenant context for remote auth config

* Downgrade expected OIDC fallback logging

* Require secure remote OIDC endpoints

* Polish remote agent auth edge cases

* Enforce unique balance records

* Bind remote OpenID users to issuer

* Fix issuer-scoped OpenID indexes

* Avoid unique balance index requirement

* Fix remote OpenID issuer normalization boundaries

* Require issuer-bound OpenID lookups

* Enforce tenant API key policy after auth

* Fix remote auth tenant policy types

* Normalize remote OIDC discovery issuer

* Allow normalized remote OIDC issuer validation

* Enforce resolved tenant OIDC policy

* Polish OpenID issuer and scope validation

---------

Co-authored-by: Danny Avila <danny@librechat.ai>
2026-05-04 17:06:35 -04:00
Danny Avila
f5dd053128
🛡️ refactor: Restrict User Tavily Endpoint URLs (#12946) 2026-05-05 05:09:11 +09:00
Danny Avila
3170bd8b22
📦 chore: bump @librechat/agents to v3.1.77 2026-05-03 23:54:46 -04:00
Danny Avila
2c4a78094a
🛂 refactor: Avoid Default Tavily Safe Search (#12939) 2026-05-04 12:01:23 +09:00
Yashwanth Alapati
3da1d8c961
🔍 feat: add Tavily as Search and Scraper Provider (#12581)
* feat: add Tavily integration as search provider and scraper provider

* chore:update tavily web search parameters

* chore:tavily paramer update

* chore:update data-schemas test for tavily

* fix: allow Tavily string option modes

* fix: align Tavily config options

* fix: scope Tavily scraper timeout

* fix: use resolved scraper provider timeout

* fix: widen Tavily search provider types

* fix: harden Tavily web search config

* fix: cap Tavily option timeouts

---------

Co-authored-by: Danny Avila <danny@librechat.ai>
2026-05-04 11:29:13 +09:00
Danny Avila
d6d70eeb26
📦 chore: bump @librechat/agents to v3.1.76 2026-05-03 22:25:12 -04:00
github-actions[bot]
9b376178a6
🌍 i18n: Update translation.json with latest translations (#12916)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-05-04 11:18:20 +09:00
Danny Avila
619f28d76d
🛡️ fix: Sanitize HTML In Admin Banner And MCP Config Dialog (#12927)
Two `dangerouslySetInnerHTML` sites rendered admin-supplied HTML
without sanitization:

- `Banner.tsx` rendered `banner.message` directly.
- `MCPConfigDialog.tsx` rendered each `customUserVars` description.

Wrap both with DOMPurify, allowing only the inline tags needed for
formatting (links, emphasis, line breaks). Hardens against compromised
admin or yaml supply-chain scenarios. Pattern matches the existing
`CustomUserVarsSection.tsx` and `Tooltip.tsx` sanitizer setup.
2026-05-04 11:17:44 +09:00
Danny Avila
37429e8a3e
🚦 feat: Make URL Auto-Submit Configurable (#12929)
`/c/new?prompt=…&submit=true` previously auto-submitted the prompt
unconditionally. For deployments where users may receive crafted
links from external sources, an authenticated victim's click can
trigger an immediate, attacker-controlled prompt against a memory- or
tool-enabled model — providing a 1-click vector for prompt-injection
exfiltration via markdown image rendering.

Add `interface.autoSubmitFromUrl` (default `true` to preserve current
behavior). Operators handling sensitive memory/tool data can set it
to `false` so URL-supplied prompts only pre-fill the composer; the
user must press Send explicitly.
2026-05-04 11:17:19 +09:00
Danny Avila
c7f38d9621
🛡️ fix: Validate Avatar URL Before Fetch (#12928)
`resizeAvatar` previously called `node-fetch` on any string input with
no validation. When OIDC providers surface a user-controllable
`picture` claim, this could be used to make blind SSRF requests to
internal services on every social login.

Wrap the URL fetch with:
- An allowlist on the URL protocol (http/https only).
- The shared `createSSRFSafeAgents` utility, which blocks resolution to
  private, loopback, and link-local IPs at TCP connect time
  (TOCTOU-safe; works equally for hostname targets that DNS-resolve
  privately and for IP-literal targets, since Node's `net.Socket`
  always dispatches through the agent's `lookup` hook).
- `redirect: 'error'` so a public-IP redirect target cannot be used to
  bypass the agent check on a subsequent hop.
- A 5-second total request budget (node-fetch v2's `timeout` covers
  request initiation through full body receipt, bounding slow-loris
  exposure rather than just the TCP connect).
- A 10 MB response cap (`size` option + `Content-Length` pre-check +
  post-read length assertion) so a hostile payload cannot exhaust
  memory before `sharp()` rejects it.

Fetch the canonicalized `parsed.href` rather than the raw input string
to eliminate any future parser-differential between `new URL()` and
the underlying fetch implementation.

Per-call agent construction is intentional: the avatar path runs once
per social login per user, so pooling adds complexity without a
measurable benefit. Documented inline.

Comprehensive test coverage in `avatar.spec.js`:
- Rejects malformed URLs, non-http(s) schemes (file://, data:,
  javascript:).
- Asserts the happy-path canonicalization (`fetch` is called with
  `parsed.href`) and the SSRF-safe agent factory routing
  (https→httpsAgent, http→httpAgent).
- Rejects non-2xx HTTP status.
- Rejects an oversized Content-Length before reading the body, and
  asserts `.buffer()` is never invoked in that case.
- Rejects an oversized body even when the server lies about / omits
  Content-Length.
- Surfaces ESSRF, redirect, and `size` overflow errors thrown by the
  fetch layer.
- Confirms Buffer inputs bypass the fetcher entirely.
2026-05-04 11:16:40 +09:00
Danny Avila
4cce88be42
🪟 feat: Add allowedAddresses Exemption List For SSRF-Guarded Targets (#12933)
* 🪟 feat: Add allowedAddresses Exemption List For SSRF-Guarded Targets

LibreChat already blocks SSRF-prone targets (private IPs, loopback,
link-local, .internal/.local TLDs) at every server-side fetch site
that consumes user-controllable URLs — custom-endpoint baseURLs, MCP
servers, OpenAPI Actions, and OAuth endpoints. The only existing
escape hatch is `allowedDomains`, but that flips the field into a
strict whitelist: adding `127.0.0.1` to permit a self-hosted Ollama
also blocks every public destination that isn't in the list.

Introduce `allowedAddresses` as the orthogonal primitive: a private-
IP-space exemption list. When a hostname or its resolved IP appears
in the list, the SSRF block is bypassed for that target. Public
destinations remain reachable. Operators can now run self-hosted
LLMs / MCP servers / Action endpoints on private addresses without
weakening the default-deny posture for everything else.

Schema additions in `packages/data-provider/src/config.ts`:
- `endpoints.allowedAddresses` (new — gates `validateEndpointURL`)
- `mcpSettings.allowedAddresses` (parallel to `allowedDomains`)
- `actions.allowedAddresses` (parallel to `allowedDomains`)

Core changes in `packages/api/src/auth/`:
- New `isAddressAllowed(hostnameOrIP, allowedAddresses)` — pure,
  case-insensitive, bracket-stripped literal match.
- Threaded the list through `isSSRFTarget`, `resolveHostnameSSRF`,
  `isDomainAllowedCore`, `isActionDomainAllowed`, `isMCPDomainAllowed`,
  `isOAuthUrlAllowed`, and `validateEndpointURL`.
- Extended `createSSRFSafeAgents` and `createSSRFSafeUndiciConnect`
  to accept the list, building an SSRF-safe DNS lookup that exempts
  matching hostnames/IPs at TCP connect time (TOCTOU-safe).

Wiring:
- Custom and OpenAI endpoint initialize sites pass
  `endpoints.allowedAddresses` to `validateEndpointURL`.
- `MCPServersRegistry` stores `allowedAddresses` and exposes it via
  `getAllowedAddresses()`. The factory, connection class, manager,
  `UserConnectionManager`, and `ConnectionsRepository` all thread
  it through to the SSRF utilities.
- `MCPOAuthHandler.initiateOAuthFlow`, `refreshOAuthTokens`, and
  `validateOAuthUrl` accept the list and consult it on every URL
  validation along the OAuth chain.
- `ToolService`, `ActionService`, and the assistants/agents action
  routes pass `actions.allowedAddresses` to `isActionDomainAllowed`
  and to `createSSRFSafeAgents` for runtime action calls.
- `initializeMCPs.js` reads `mcpSettings.allowedAddresses` from the
  app config and forwards it to the registry constructor.

Documentation:
- `librechat.example.yaml` shows the new field next to each existing
  `allowedDomains` block, with a note clarifying that
  `allowedAddresses` is an exemption list (not a whitelist).

Tests:
- Unit tests for `isAddressAllowed` covering literal IPs, hostnames,
  IPv6 brackets, case insensitivity, and partial-match rejection.
- Exemption tests for every entry point: `isSSRFTarget`,
  `resolveHostnameSSRF`, `validateEndpointURL`, `isActionDomainAllowed`,
  `isMCPDomainAllowed`, `isOAuthUrlAllowed`.
- Existing tests updated to reflect the new optional parameter.

Default behavior is unchanged: omitted = empty list = no exemptions.

* 🩹 fix: Plumb allowedAddresses Through AppConfig endpoints Type

The initial PR added `endpoints.allowedAddresses` to the
data-provider config schema and consumed it in the endpoint
initialize sites, but the runtime `AppConfig.endpoints` shape in
`@librechat/data-schemas` was a hand-maintained subset that didn't
include the new field — so `tsc` rejected `appConfig.endpoints.allowedAddresses`.

Add the field to `AppConfig['endpoints']` in
`packages/data-schemas/src/types/app.ts` and forward it from the
loaded config in `packages/data-schemas/src/app/endpoints.ts` so the
runtime config carries the value.

Update `initializeMCPs.spec.js` to expect the third positional
argument (`allowedAddresses`) on the `createMCPServersRegistry` call.

* 🩹 fix: Enforce allowedDomains Before allowedAddresses In isOAuthUrlAllowed

The initial implementation checked the address exemption first, so a
URL whose hostname appeared in `allowedAddresses` would return true
even when the admin had configured `allowedDomains` as a strict bound
on OAuth endpoints. A malicious MCP server could advertise OAuth
metadata, token, or revocation URLs at any address the admin had
permitted for an unrelated reason (a self-hosted LLM at `127.0.0.1`,
for example) and pass validation, expanding SSRF reach beyond the
configured domain whitelist.

Reorder: when `allowedDomains` is set, treat it as authoritative —
return true only if the URL matches a domain entry, otherwise fall
through to false. The address exemption only applies when no
`allowedDomains` is configured (mirrors how the downstream SSRF check
in `validateOAuthUrl` consults `allowedAddresses`).

Add a regression test asserting that an `allowedAddresses` entry does
not broaden a configured `allowedDomains` list.

Reported by chatgpt-codex-connector on PR #12933.

* 🩹 fix: Forward allowedAddresses To Remaining OAuth Callers

Two `MCPOAuthHandler` callers still used the pre-feature signatures and
were silently dropping the new `allowedAddresses` argument:

- `api/server/routes/mcp.js` invoked `initiateOAuthFlow` with the old
  5-argument shape, so OAuth flows initiated through the route handler
  ignored the registry's `getAllowedAddresses()` and would reject any
  metadata/authorization/token URL on a permitted private host.
- `api/server/controllers/UserController.js#maybeUninstallOAuthMCP`
  invoked `revokeOAuthToken` without the address exemption, so
  uninstalling an OAuth-backed MCP server on a permitted private host
  would fail at the revocation step even though the rest of the MCP
  connection path now permits it.

Both sites now read `allowedAddresses` from the registry alongside
`allowedDomains` and forward it. Reported by Copilot on PR #12933.

* 🩹 fix: Update Test Mocks And Assertions For OAuth allowedAddresses

The previous commit started passing `allowedAddresses` to
`MCPOAuthHandler.initiateOAuthFlow` from `api/server/routes/mcp.js`
and to `MCPOAuthHandler.revokeOAuthToken` from
`api/server/controllers/UserController.js`, but the corresponding
test files mocked the registry without `getAllowedAddresses` (causing
`TypeError`s) and asserted the old positional shape on
`toHaveBeenCalledWith`.

Update the mocks and assertions to match the new arity:

- `api/server/routes/__tests__/mcp.spec.js`: add
  `getAllowedDomains`/`getAllowedAddresses` to the registry mock and
  expect the additional positional args on `initiateOAuthFlow`.
- `api/server/controllers/__tests__/maybeUninstallOAuthMCP.spec.js`:
  add a `getAllowedAddresses` mock alongside the existing
  `getAllowedDomains` and seed it in `setupOAuthServerFound`.
- `api/server/controllers/__tests__/UserController.mcpOAuth.spec.js`:
  add `getAllowedAddresses` to the registry mock and expect the
  trailing `null` arg on the three `revokeOAuthToken` assertions.

* 🛡️ fix: Address Comprehensive Review — Scope allowedAddresses To Private IP Space

Major findings from the comprehensive PR review (severity → fix):

**CRITICAL — `validateOAuthUrl` SSRF fallback bypass.** When `allowedDomains`
is configured and a URL fails the whitelist, the SSRF fallback in
`validateOAuthUrl` was still passing `allowedAddresses` to `isSSRFTarget` /
`resolveHostnameSSRF`, letting a malicious MCP server advertise OAuth
endpoints at any address the admin had permitted for an unrelated reason.
Suppress `allowedAddresses` in the fallback when `allowedDomains` is active —
the address exemption is opt-in for the no-whitelist mode only.

**MAJOR — WebSocket transport SSRF check ignored exemptions.** The
`constructTransport` WebSocket branch called `resolveHostnameSSRF(wsHostname)`
without `this.allowedAddresses`, so a permitted private MCP server would
pass `isMCPDomainAllowed` but be blocked at transport creation. Forward
the exemption.

**Scope `allowedAddresses` to private IP space only (operator directive).**
The exemption list is for permitting private/internal targets; it must not
be a back-door to broaden trust to public destinations.
- Schema (`packages/data-provider/src/config.ts`): new
  `allowedAddressesSchema` rejects URLs (`://`), paths/CIDR (`/`),
  whitespace, and public IPv4/IPv6 literals at config-load time. Wired
  into `endpoints`, `mcpSettings`, and `actions`.
- Runtime (`packages/api/src/auth/domain.ts`): `isAddressAllowed` now
  drops public-IP candidates and public-IP entries on the match path —
  defense in depth so a misconfigured runtime list never grants exemption.
- Hot path (`packages/api/src/auth/agent.ts`): `buildSSRFSafeLookup`
  pre-normalizes the list into a `Set<string>` once at construction and
  applies the same scoping filter, so the connect-time DNS lookup is an
  O(1) Set membership check instead of a full re-iterate-and-normalize on
  every outbound request.

**Test coverage for the connect-time and OAuth-fallback paths.**
- `agent.spec.ts`: new describe block exercising `buildSSRFSafeLookup` and
  `createSSRFSafe*` with `allowedAddresses` — hostname-literal exemption,
  resolved-IP exemption, public-IP scoping, URL/CIDR/whitespace rejection,
  and the default no-list block.
- `handler.allowedAddresses.test.ts` (new): integration tests for
  `validateOAuthUrl` — covers both the no-domains-set "permit private"
  path and the strict-bound regression where `allowedAddresses` must NOT
  bypass `allowedDomains`.

**Documentation & cleanup.**
- `connection.ts` redirect SSRF check: explicit comment that
  `allowedAddresses` is intentionally NOT consulted for redirect targets
  (server-controlled, must not inherit the admin's exemption).
- `MCPConnectionFactory.test.ts`: replaced an `eslint-disable` with a
  proper `import { getTenantId } from '@librechat/data-schemas'`. The
  disable was added to make a pre-existing `require()` quiet — the cleaner
  fix is to use the existing top-level import.

Updated `MCPConnectionSSRF.test.ts` WebSocket SSRF assertions to match the
new two-argument call shape (`hostname, allowedAddresses`).

* 🩹 fix: Require Absolute URL Before allowedAddresses Trust Bypass In isOAuthUrlAllowed

`parseDomainSpec` is lenient — it silently prepends `https://` to
schemeless inputs so it can match patterns like bare `example.com`.
That leniency leaked into `isOAuthUrlAllowed`'s new
`allowedAddresses` short-circuit: a value like `10.0.0.5/oauth` (no
scheme) would parse successfully via the prepended default, hit the
address-exemption path, return `true`, and skip `validateOAuthUrl`'s
strict `new URL(url)` parse-or-throw — only to fail later in OAuth
discovery with a less clear runtime error.

Add a strict `new URL(url)` gate at the top of `isOAuthUrlAllowed`.
Schemeless inputs now fall through to `validateOAuthUrl`'s explicit
"Invalid OAuth <field>" rejection. Tests added in both
`auth/domain.spec.ts` (unit) and the OAuth handler integration spec
(end-to-end).

Reported by chatgpt-codex-connector (P2) on PR #12933.

* 🛡️ fix: Address Follow-Up Comprehensive Review — Schema Tests, Shared Normalization, host:port

Auditing the second comprehensive review:

**F1 MAJOR — schema validation untested.** `allowedAddressesSchema` had
zero coverage, so a regression in the three refinement stages or the
three wiring locations (`endpoints` / `mcpSettings` / `actions`) would
silently let invalid entries reach the runtime. Added a dedicated
`describe('allowedAddressesSchema')` block in `config.spec.ts` covering:
valid private IPs (v4 + v6, including the previously-missed 192.0.0.0/24
range), accepted hostnames, all rejection categories (URLs, CIDR, paths,
whitespace tabs/newlines, host:port, public IP literals), and full
`configSchema.parse()` integration at each of the three nesting points.

**F2 MINOR — `isPrivateIPv4Literal` divergence.** The schema reimpl in
`packages/data-provider` was discarding the `c` octet, so the
`192.0.0.0/24` (RFC 5736 IETF protocol assignments) range that the
authoritative `isPrivateIPv4` accepts was being rejected with a
misleading "public IP" error. Destructure `c` and add the missing range
check; covered by the new schema tests.

**F3 MINOR — DRY violation across `domain.ts` and `agent.ts`.** Both
files had independent normalization implementations with a subtle
whitespace-check divergence (`/\s/` vs `.includes(' ')`). Extracted the
shared logic into a new `packages/api/src/auth/allowedAddresses.ts`
module that both consumers import:
  - `normalizeAddressEntry(entry)` — single-entry shape check
  - `looksLikeHostPort(entry)` — host:port detector (used by F4)
  - `normalizeAllowedAddressesSet(list)` — pre-normalized Set for the
    connect-time hot path
  - `isAddressInAllowedSet(candidate, set)` — membership check that
    enforces private-IP scoping on the candidate

Both `isAddressAllowed` (preflight) and `buildSSRFSafeLookup` (connect)
now go through the same primitives; the whitespace divergence is gone.

To break the import cycle (`allowedAddresses` needs `isPrivateIP`,
`domain` previously owned it), extracted IP private-range detection
into a leaf `auth/ip.ts` module. `domain.ts` re-exports `isPrivateIP`
for backward compatibility with existing call sites.

**F4 MINOR — `host:port` silently misclassified.** Entries like
`localhost:8080` previously slipped through the URL/path guard, were
mis-detected as IPv6, failed `isPrivateIP`, and were silently dropped
with a misleading "public IP" schema error. Added an explicit
`looksLikeHostPort` check with a clear error: "allowedAddresses
entries must not include a port — list the bare hostname or IP only."
Bare `::1`, `[::1]`, and other valid IPv6 literals are intentionally
not matched (regex distinguishes by colon count and the bracketed
`[ipv6]:port` form).

**F5 MINOR — hostname-trust documentation gap.** Hostname entries
short-circuit `resolveHostnameSSRF` before any DNS lookup — that's a
deliberate design (admin trusts the name) but it means the exemption
follows whatever the name resolves to at runtime. Added an explicit
note in `librechat.example.yaml` for both `mcpSettings.allowedAddresses`
and `endpoints.allowedAddresses`: "a hostname entry trusts whatever IP
that name resolves to. Only list hostnames whose DNS you control.
Prefer literal IPs when you can."

**F6** (8 positional params) is flagged for follow-up; refactor to an
options object is a breaking-API change deferred to a separate PR.
**F7** (redirect/WebSocket asymmetry, NIT, conf 40) — skipping; the
existing inline comment is sufficient.

* 🧹 chore: Address Follow-Up NITs — Import Order And Mirror-Function Naming

Three NITs from the latest comprehensive review:

**NIT #1 (conf 85) — local import order.** AGENTS.md requires local
imports sorted longest-to-shortest. Both `domain.ts` and `agent.ts`
had `./ip` (shorter) before `./allowedAddresses` (longer). Swapped.

**NIT #2 (conf 60) — missing cross-reference.** The schema-side
`isHostPortShape` in `packages/data-provider/src/config.ts` had no
note pointing at the canonical runtime mirror. Added a JSDoc paragraph
explaining the mirror relationship and why a local copy exists (the
data-provider package can't import from `@librechat/api` without
creating a circular dependency).

**NIT #3 (conf 50) — naming inconsistency.** Renamed
`isHostPortShape` → `looksLikeHostPort` so the schema mirror matches
the runtime helper exactly. Kept as a separate function (not a shared
import) for the same circular-dependency reason; the matching name
makes it obvious they should stay in lockstep.
2026-05-03 21:43:59 -04:00
Danny Avila
85fa881e3c
🔐 fix: Avoid Logging Password On Login Validation Error (#12926)
The Passport local strategy validation error logged the entire request
body (including the password) into error logs. Replace it with the
email only, matching the metadata shape used by sibling log calls in
the same function.
2026-05-03 21:27:41 -04:00
Danny Avila
5013d6d35c
🧯 fix: Harden Code Env Filepath Uploads (#12936)
* fix: Harden code env filepath uploads

* test: Cover code env filepath edge cases

* fix: Scrub code env fallback filenames
2026-05-04 10:25:59 +09:00
Danny Avila
41a6d6d11c
🛡️ fix: Harden MCP Redirect SSRF Checks (#12931)
* fix: Harden MCP redirect SSRF checks

* fix: Address MCP redirect review feedback

* test: Tighten MCP SSRF redirect assertions
2026-05-04 09:58:47 +09:00
Danny Avila
5b567f5ff4
🧷 fix: Pin GitNexus Native Dependency (#12937) 2026-05-04 09:58:21 +09:00
Danny Avila
59e90a1878
🛡️ fix: Harden GitNexus Index Workflow (#12935)
* fix: Harden GitNexus index workflow

* fix: Resolve GitNexus flags before checkout
2026-05-03 18:05:45 -04:00
Danny Avila
1b79e0b785
🧬 chore: Align LibreChat With Agents LangChain Upgrade (#12922)
* 🔧 chore: Update dependencies in package-lock.json and package.json

- Bump version of @librechat/agents to 3.1.75-dev.0 in multiple package.json files.
- Upgrade various AWS SDK and Smithy dependencies to their latest versions in package-lock.json for improved stability and performance.

* 🔧 chore: Update AWS SDK and Smithy dependencies in package-lock.json

- Bump version of @aws-sdk/client-bedrock-runtime to 3.1041.0 and update related dependencies for improved performance and stability.
- Upgrade various AWS SDK and Smithy packages to their latest versions, ensuring compatibility and enhanced functionality.

* chore: Align LibreChat with agents LangChain upgrade

- Route LangChain imports through @librechat/agents facade exports
- Update @librechat/agents to 3.1.75-dev.1 and remove direct LangChain deps
- Normalize nullable agent model params and API key override typing
- Update Google thinking config typing for newer LangChain packages
- Refresh targeted audit-related dependency overrides

* chore: Add Jest types for API specs

* test: Fix LangChain upgrade CI specs

* test: Exercise agents env facade

* fix: Clean up TS preview diagnostics

* fix: Address Codex review feedback
2026-05-03 12:46:01 -04:00
Danny Avila
4e45e8e17c
🧹 fix: Clear MCP OAuth Tokens On Revoke
Fixes #12912.\n\n- Clear stored MCP OAuth tokens and flow state on revoke cleanup-only paths.\n- Keep provider revocation best-effort when token and client metadata are available.\n- Add controller and function coverage for stale metadata, missing config, and cleanup failure paths.
2026-05-03 02:52:47 +09:00
Danny Avila
eb22bb6969
🧭 fix: Migrate Anthropic Long Context (#12911) 2026-05-02 22:14:19 +09:00
Danny Avila
f3e1201ae7
📌 fix: Stabilize Agent Prompt Cache Prefix (#12907)
* fix: stabilize agent prompt cache prefix

* chore: refresh agents sdk lockfile integrity

* test: format agent memory assertion

* test: type agent context fixtures

* fix: preserve MCP instruction precedence

* fix: reuse resolved conversation anchor

* fix: keep resumable startup immediate
2026-05-02 09:55:31 +09:00
Danny Avila
5b5e2b0286
🛡️ fix: Handle MCP Tool Cache Lookup Failures (#12910)
* Handle MCP tool cache lookup failures

* Harden MCP cached tool lookup

* Cover full MCP tool cache outage

* Guard MCP tool cache store lookup
2026-05-02 09:21:28 +09:00
Danny Avila
74307e6dcc
💭 feat: Require Explicit Auto-agent Enablement for Memories (#12886) 2026-05-01 23:56:08 +09:00
ethanlaj
781bfb857d
🩹 fix: Sync ControlCombobox popover width with trigger after layout changes (#12887)
* 🩹 fix: Sync ControlCombobox popover width with trigger after layout changes

The popover width was measured once on mount via offsetWidth. When the agent builder side panel opens after a page reload with the sidebar collapsed, the trigger button is initially measured during the layout transition (~26px) and never re-measured, leaving the agent select dropdown rendered at the far left with no options fully visible.

Use a ResizeObserver to keep buttonWidth in sync with the trigger's actual width whenever it resizes, then disconnect on unmount.

* test: cover ControlCombobox isCollapsed, no-ResizeObserver, and zero-width branches

Address review feedback:
- Use button.offsetWidth as the ResizeObserver fallback instead of
  entry.contentRect.width to avoid a content-box vs border-box mismatch in
  pre-2022 browsers that ship ResizeObserver without borderBoxSize.
- Add tests for the three previously-untested branches: isCollapsed=true
  (no observation of the trigger), ResizeObserver unavailable (sync-only
  measurement), and zero-width entries (state unchanged).

* test: lock the button.offsetWidth fallback against revert

Add a test that drives the ResizeObserver callback with borderBoxSize
absent and divergent contentRect.width vs offsetWidth (251 vs 275).
The fix would silently revert to entry.contentRect.width without this
test failing, so this pins the chosen fallback semantics.

---------

Co-authored-by: Danny Avila <danny@librechat.ai>
2026-04-30 15:02:16 +09:00
Danny Avila
65990a33e9
📥 fix: Resolve Imported-Conversation Default Model From Runtime modelsConfig (#12885)
* 📥 fix: Use Endpoint-Aware Default Model on Imported Conversations

Claude conversations imported from claude.ai's data export display
"gpt-4o-mini" in the chat UI until the page is refreshed, and any
attempt to send a message before refreshing fails with "The model
'gpt-4o-mini' is not available for Anthropic."

Root cause: ImportBatchBuilder.finishConversation() unconditionally
defaulted the saved conversation's `model` field to
openAISettings.model.default, regardless of `this.endpoint`. Claude
exports don't carry a model name, so every imported Claude conversation
landed with endpoint=anthropic but model=gpt-4o-mini.

Fix: pick the default based on `this.endpoint` via a small lookup
(openAI -> gpt-4o-mini, anthropic -> claude-3-5-sonnet-latest), keeping
the existing OpenAI default as the fallback for unknown endpoints.

Fixes #12844

* 🪄 refactor: Resolve Import Default Model From `modelsConfig`

Replace the hardcoded per-endpoint default lookup added in the previous
commit with a runtime resolver that consults the same models config the
chat UI uses (`getModelsConfig` in ModelController -> `loadDefaultModels`
+ `loadConfigModels`). This way an imported conversation defaults to a
model the LibreChat instance has actually configured / discovered for
the endpoint, instead of a hardcoded constant that may not exist on this
deployment.

Resolution order:
1. First non-empty model in `modelsConfig[endpoint]`.
2. Per-endpoint hardcoded fallback (anthropic/openAI settings) if the
   runtime config is empty for the endpoint or `getModelsConfig` throws.
3. `openAISettings.model.default` if even the per-endpoint fallback is
   missing (unknown endpoint).

`importBatchBuilder.finishConversation` now accepts an optional
`defaultModel` argument; each importer resolves it once at the top via
`resolveImportDefaultModel({ endpoint, requestUserId, userRole })` and
threads it through. ChatGPT message-level model selection also falls
back to the resolved default before the hardcoded gpt-4o-mini.
2026-04-30 00:43:04 -04:00
Nick Haffie-Emslie
3758380c61
🔌 fix: Follow 307/308 redirects in MCP streamable HTTP transport (#12850)
* 🔌 fix: Follow 307/308 redirects in MCP streamable HTTP transport

Some MCP servers (e.g. Coda) return 308 Permanent Redirect to route
doc-scoped tool calls to a different endpoint path. The fetch wrapper
used `redirect: 'manual'` for SSRF protection, which silently dropped
these redirects and caused tool calls to fail with empty error bodies.

Follow 307/308 redirects (method-preserving per RFC 7538) up to a
depth of 5. SSRF safety is preserved because the same undici Agent
with its SSRF-safe connect function validates redirect targets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* 🛡️ fix: Harden MCP 307/308 redirect handling against SSRF and credential leaks

- Validate every redirect target against `resolveHostnameSSRF` so allowlist
  deployments (which disable connect-time SSRF protection) still block hops to
  private/reserved IPs.
- Strip `Authorization`, `Cookie`, `mcp-session-id`, and any user-injected
  headers when a 307/308 crosses an origin boundary, mirroring browser/Fetch
  behavior so a redirecting MCP server can't exfiltrate credentials.
- Cancel the intermediate response body before each next hop so undici can
  reuse pooled sockets rather than holding them until GC.
- Restructure redirect test helpers to be same-origin (matching real-world
  Coda-style routing), drop dead setup code, fix the misleading "5 hops
  successfully" test, and add coverage for SSRF-blocked redirects, cross-
  origin credential stripping, and same-origin credential preservation.

* 🛡️ fix: Also strip `serverConfig.headers` on cross-origin MCP redirects

Previously only runtime `setRequestHeaders` keys were treated as secret on a
307/308 cross-origin hop. API keys baked into `serverConfig.headers` (passed
through `requestInit.headers` at transport construction time) survived
stripping, so a malicious MCP endpoint could exfiltrate them by returning a
cross-origin `Location`. Pass the configured header keys through to
`createFetchFunction` so both runtime and config secrets are stripped.

The cross-origin credential test now also configures `serverConfig.headers`
to lock in this behavior.

* 🧹 chore: Tighten MCP redirect-stripping coverage and helper duplication

- Add `proxy-authorization` to the cross-origin forbidden header set so a
  forward-proxy credential header would also be stripped on a cross-origin
  hop, matching the Fetch-spec list.
- Strengthen the cross-origin credential test with positive assertions that
  benign protocol headers (`accept`, `content-type`) survive the hop, so a
  regression that strips everything indiscriminately would now fail.
- Extract the duplicated MCP request handler / session-teardown logic from
  three test helpers into shared `createMCPRequestHandler` and
  `closeMCPSessions` utilities.

* 🛠️ fix: Handle `Request` inputs in MCP `customFetch` URL derivation

`customFetch` is typed to accept `UndiciRequestInfo` (`string | URL | Request`),
but `Request.prototype.toString()` returns `"[object Request]"`. The previous
implementation derived `originalOrigin` and the redirect base via
`url.toString()`, so a `Request` input would throw inside `new URL(...)` before
any network call — a regression even when no redirect was involved.

Add a `getRequestUrlString` helper that extracts the URL string for all three
shapes, track the URL string alongside the fetch input through the redirect
loop, and add parameterized tests that exercise `customFetch` with each shape.

* 🛠️ fix: Don't override `Request` input headers in MCP `buildFetchInit`

Previously `buildFetchInit` always set `headers` on the returned init —
even when neither `init.headers` nor runtime headers contributed anything.
Passing `headers: {}` to `undiciFetch` overrides the headers carried on a
`Request` input (auth tokens, MCP session, protocol negotiation), so
Request-based wrappers could fail authentication even without a redirect
in play. Skip the `headers` override entirely when there is nothing to
merge.

Adds a regression test that supplies `Authorization` and a custom header
on the `Request` itself and asserts both reach the target server.

* 🛠️ fix: Preserve `Request` method/body across MCP redirects + guard cross-origin strip

Two regressions surfaced by extending `customFetch` to accept `Request` inputs:

1. **307/308 method/body loss.** The redirect loop switches `url` to the
   new `Location` string, but the original `Request`'s method and body
   stayed bound to the (now-discarded) `Request` object. A redirected
   POST silently became a GET with no payload — the exact behavior the
   method-preserving codes are designed to prevent. Added a
   `resolveFetchInput` helper that runs once at the top of `customFetch`,
   extracts a `Request`'s method/body/headers into the shared init, and
   buffers the body via `arrayBuffer()` so 307/308 retries can replay it.

2. **Cross-origin strip crashed on absent headers.** After the previous
   fix that stopped `buildFetchInit` from setting `headers: {}`,
   `currentInit.headers` could legitimately be `undefined`. The
   cross-origin branch read it as a `Record` and called `Object.entries`
   on `undefined`, throwing `TypeError`. Guard the branch on
   `currentInit.headers != null` — when there are no headers there is
   nothing to strip.

Adds two regression tests: a POST-with-body `Request` that 308-redirects
cross-origin (asserts both method and body survive) and a no-headers
cross-origin redirect (asserts the strip path no longer crashes).

* 🛠️ fix: Forward `Request.signal` through MCP `customFetch` normalization

`resolveFetchInput` was copying method/body/headers off a `Request` input
but dropping `Request.signal` on the floor, so a caller that wired an
`AbortController` onto the `Request` for cancellation/timeouts lost that
wiring as soon as we re-shaped the input into the `(string, init)` pair
used by the redirect loop. Subsequent aborts no longer reached the
in-flight fetch — a regression from the pre-PR code, which forwarded
the original `Request` directly to undici.

Forward the signal alongside method/body/headers, with explicit
`init.signal` still winning per Fetch-spec semantics. Regression test
aborts a controller before calling \`customFetch\` with the wired
`Request` and asserts the call rejects.

* 🧪 test: Pin URL.origin contract for protocol-downgrade redirect handling

Audit follow-up. The cross-origin strip path keys off
`targetUrl.origin !== originalOrigin`, and `URL.origin` is defined as
`scheme + "://" + host + ":" + port`, so a same-host `https → http`
redirect produces a different origin and trips the strip path through
the existing logic — no separate code path needed.

Pin that contract with a small unit test so a future change to URL
semantics (or a refactor that swaps in a different comparison) doesn't
silently regress protocol-downgrade stripping. Standing up a TLS
fixture (self-signed cert, undici skip-verify, etc.) just to re-prove
the URL spec is wasted complexity.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Danny Avila <danny@librechat.ai>
2026-04-29 22:12:29 +09:00
Danny Avila
756530c2b8
🩹 fix: Polish code-execution attachment UX (#12870)
* 🧹 chore: Strip code-execution boilerplate from tool output

The bash executor in `@librechat/agents` appends two kinds of noise to
every successful run:

1. Trailing `Note:` paragraphs — long behavioral hints repeating
   rules already in the system prompt ("Files from previous executions
   are automatically available...", "Files in 'Available files' are
   inputs..."). Re-stating these on every tool call adds ~50 tokens of
   waste per call, which compounds across long agent traces.

2. Per-file `| <annotation>` suffixes on every line of `Generated
   files:` / `Available files (...):`. The two section headers already
   convey the new-vs-known distinction; the per-file annotations are
   redundant *and* phrased inconsistently ("downloaded by the user"
   vs. "displayed to the user" vs. "known to the user").

Strip both in a small `cleanCodeToolOutput` helper invoked from
`packages/api/src/agents/handlers.ts` for every tool listed in
`CODE_EXECUTION_TOOLS`. Non-code-execution tools pass through
unchanged. The cleaning happens *after* tool resolution but *before*
downstream consumers (model context, SSE forwarding, persistence) see
the content, so subsequent model turns get the lean output.

* 🩹 fix: Polish code-execution attachment rendering

Three rough edges visible in code-interpreter conversations:

1. **Sandbox-internal `.dirkeep` placeholders leak as file chips.** The
   bash executor creates `.dirkeep` inside any new directory so the
   stateless container preserves the folder across executions. After
   `sanitizeArtifactPath`'s `_` prefix and 6-hex collision suffix it
   surfaces as `_.dirkeep-<hash>` — a 0-byte chip with no value to the
   user, sometimes hiding the real artifact behind it. New
   `isInternalSandboxArtifact` helper filters them out of every
   routing path (`Attachment`, `AttachmentGroup`, `LogContent`).

2. **The `-<hash>` collision suffix is visible in chip labels.** The
   suffix is collision-avoidance machinery; users only need to see the
   canonical name. New `displayFilename` strips it for display while
   leaving the on-disk `attachment.filename` untouched so downloads
   resolve. Applied across `FileContainer`, `ToolArtifactCard`,
   `ToolMermaidArtifact`, and `LogContent`'s text-attachment label
   path.

3. **0-byte / placeholder files outrank real artifacts in render
   order.** Bucket sort by salience (non-empty before empty) sinks
   stragglers to the bottom. Stable sort preserves arrival order for
   peers.

Added regression tests cover the new helpers, the dirkeep filter
across buckets, and the within-bucket salience ordering.

* 🩹 fix: Don't auto-open artifact panel on history navigation

Navigating to a previous conversation full of code-execution artifacts
would auto-open the side panel and focus the most-recent artifact —
the same code path that fires for fresh streaming artifacts. Users
expect that "auto-open" behavior only when an artifact arrives via
SSE, not when they revisit an old chat.

Two-part gate:

1. `ToolArtifactCard`'s focus effect captures `isSubmitting` at first
   render via a ref. A card mounted *during* a stream means a new
   artifact arrived → steal panel focus (legacy behavior). A card
   mounted while `isSubmitting === false` is part of conversation
   history → leave focus alone.

2. `Presentation`'s panel-render condition gains `currentArtifactId
   != null`. With (1) keeping `currentArtifactId` null on history
   load, the panel stops rendering at all on navigation — even if
   `artifactsVisibility` was left `true` by a prior conversation.
   User clicks on a chip to re-open (the click handler is unchanged
   and unconditional).

Test seeds `isSubmittingFamily(0)` per case: existing tests opt into
streaming (default `true`) so legacy auto-focus assertions still hold;
new tests for history-load opt into `streaming: false` and verify
no auto-focus + click-to-open still works.

* 🩹 fix: Force panel visible on streaming artifact arrival

The previous commit gated `setCurrentArtifactId` on `isSubmitting` but
left `artifactsVisibility` untouched. When a user had explicitly
closed the panel earlier in the session, a fresh SSE artifact would
set `currentArtifactId` (so the chip read "click to close") but
`Presentation`'s render condition still required `visibility === true`
— net effect: the card claimed to be open, the panel stayed hidden.

Streaming arrivals now also call `setVisible(true)`, which is the
explicit "auto-open when first created" behavior the user asked for.
History mounts (`isSubmitting === false`) still leave both focus and
visibility alone, so navigating to an old conversation does not
re-open the panel.

Two regression tests added: one asserts streaming flips visibility on
even when seeded false, the other asserts history mounts leave a
seeded-false visibility alone.

* 🧹 chore: Tighten code-execution attachment polish per audit feedback

Resolves the eight actionable findings from the comprehensive audit:

- Scope `displayFilename` out of `FileContainer`: opt-in via a new
  `displayName` prop. User-uploaded chips (input area, persisted
  message files) keep their raw filename, eliminating the false-positive
  class where `report-abc123.pdf` was silently rewritten to `report.pdf`.
  Code-execution artifact paths in `Attachment.tsx` explicitly compute
  the de-suffixed name and pass it through.
- Tighten `TRAILING_NOTES_PATTERN` to anchor on the two known boilerplate
  openings (`Files from previous executions`, `Files in "Available files"`),
  so a user-authored `Note:` line preceded by a blank line in stdout no
  longer gets eaten along with everything after it.
- `ToolMermaidArtifact`: compute `visibleFilename` once and reuse for
  title, content, and the download `aria-label` (was using the raw
  `attachment.filename` for the aria-label, creating a screen-reader
  inconsistency).
- `ToolArtifactCard`: read `isSubmittingFamily(0)` once via a
  non-subscribing `useRecoilCallback`, instead of subscribing for the
  full lifetime to a value the ref only ever needs at first render.
- Extract `bySalience` and `byEntrySalience` comparators from
  `attachmentTypes.ts`, replacing the ten duplicated sort lambdas in
  `Attachment.tsx` and `LogContent.tsx`.
- Treat `attachmentSalience({ bytes: undefined })` as neutral (`0`)
  rather than empty (`1`); only an explicit `bytes === 0` sinks. Stops
  non-code-exec sources (web-search inline results, files where the
  schema omits the byte count) from silently sinking past real content.
- Pin the click-history test to the panel-open button by name instead
  of relying on `getByRole('button', { pressed: false })`, which
  matched by DOM order.
- Add the missing blank line between adjacent `it(...)` blocks.
- Drop the verbose narrating comments in `FileContainer` along with the
  removed `displayFilename` import.

Adds three regression tests for the new behavior (FileContainer raw
filename, artifact-context displayName flow, user-authored `Note:` line
preserved through cleanup) and updates the salience test for the new
neutral-undefined semantics.

* 🧹 chore: Drop redundant `@testing-library/jest-dom` import in FileContainer spec

`client/test/setupTests.js` already imports the matchers globally for every
Jest test in the client workspace, so the explicit import here was dead code.
Removing it brings the spec in line with the broader convention used by
`ArtifactRouting.test.tsx`, `LogContent.test.tsx`, and `attachmentTypes.test.ts`.

* 🛡️ fix: Narrow `.dirkeep`/`.gitkeep` filter to the sandbox-specific form

`isInternalSandboxArtifact` was filtering bare `.dirkeep` / `.gitkeep`
along with the post-sanitization form. Bare versions never originate
from the bash executor (the dotfile rewrite + disambiguator step in
`sanitizeArtifactPath` always produces `_.dirkeep-<6 hex>`), so the only
real-world source of a bare `.gitkeep` is project scaffolding the user
uploaded — silently hiding it from every attachment bucket meant the
file disappeared with no way to surface or download it.

Tightening to `^_\.(?:dirkeep|gitkeep)-[0-9a-f]{6}$` keeps the
sandbox-placeholder filter intact while letting user-uploaded markers
render normally. Tests inverted accordingly: bare forms now expected to
render; only the post-sanitization form is filtered.

* 🩹 fix: Address comprehensive-review findings on attachment helpers

Five findings from the latest pass:

- **MAJOR — `displayFilename` false-positive on extensionless 6-hex.**
  The previous regex `/-[0-9a-f]{6}(?=\.[^.]+$|$)/` stripped any leaf
  ending in `-XXXXXX` regardless of context, so a user-named
  `build-a1b2c3` (script-emitted hash artifact, no extension) lost its
  tail and rendered as `build`. Split into two narrower patterns:
  `COLLISION_SUFFIX_BEFORE_EXT` only matches when followed by an
  extension; `SANITIZED_DOTFILE_TRAILING_SUFFIX` only fires when the
  leaf starts with `_.` AND ends with `-XXXXXX` — the unambiguous
  fingerprint of `sanitizeArtifactPath`'s dotfile rewrite.

- **MINOR — `isInternalSandboxArtifact` filter too aggressive.**
  `(file.bytes ?? 0) > 0` treated undefined bytes as zero, falling
  through to the regex check. Tightened to `file.bytes !== 0`: only
  an *explicit* zero counts as the empty-placeholder shape worth
  hiding. Non-code-exec sources without `bytes` populated render
  normally now.

- **MINOR — `getValue()` could throw on a degenerate atom state.**
  Switched the snapshot read in `ToolArtifactCard` to
  `valueMaybe() ?? false` so a transient error / loading state on the
  upstream selector doesn't crash card mount. The `false` default is
  the right history-fallback (don't auto-open if we can't classify).

- **NIT — `attachmentSalience` / `bySalience` over-broad signature.**
  Removed the test-only `{ bytes?: number }` arm; functions now accept
  `TAttachment` directly. The internal `bytes` read still goes through
  a cast since not every TAttachment branch declares it. Tests updated
  to use the existing `baseAttachment(...)` helper.

- **MINOR — Missing regression test for extensionless 6-hex.**
  Added `'build-a1b2c3'` and `'out/blob-deadbe'` cases that pin the
  preservation behavior, plus an `isInternalSandboxArtifact` test that
  asserts undefined-bytes attachments are not filtered.

* 🩹 fix: Make code-file artifacts click-to-open only

Removes mount-time auto-open from `ToolArtifactCard`. Streaming
arrivals no longer hijack the panel — even a freshly-emitted SSE
artifact registers silently in `artifactsState` and waits for the
user to click. Combined with `Presentation`'s
`currentArtifactId != null` render gate, the panel stays closed
across history navigation, page reload, and SSE arrival.

Click is the only path that opens the panel. `handleOpen` is
unchanged: first click focuses + reveals, second click on the same
chip closes.

Dropped:
- `useRecoilCallback` snapshot read of `isSubmittingFamily(0)`
- `mountedDuringStreamRef` ref + lazy-init block
- The whole focus + visibility effect (was effect 3)
- `useRef` import (now unused)

Tests:
- `ArtifactRouting.test.tsx` rewritten to exercise the click path:
  registers-on-mount-without-focus, click-to-open-then-close, multi-
  card-no-auto-focus, click-when-visibility-was-false. The streaming
  state is no longer seeded; both `renderWith` and `renderWithProbe`
  collapsed back to plain `RecoilRoot`.
- `LogContent.test.tsx` flips its panel-routing assertions from
  `pressed: true` (which asserted auto-focus) to `pressed: false`
  with a chip-title check (which asserts the panel card rendered
  but stayed unfocused).

* Revert "🩹 fix: Make code-file artifacts click-to-open only"

This reverts commit 6761531287.

* 🩹 fix: Exclude CODE bucket from streaming auto-open

Narrows the previous-commit revert: rich-preview artifacts (HTML,
React, Markdown, plain text) keep the legacy SSE auto-open UX, but
the CODE bucket (`.py`, `.js`, `.cpp`, `Dockerfile`, `Makefile`, …)
stays click-to-open even on streaming.

Source-code artifacts are typically supporting helpers the agent
emits alongside a richer deliverable (a Python script that builds
the actual `.html` output, for example). Auto-opening every
helper's panel each time it gets written would shove the panel
in front of the user every tool call. The user explicitly opens
a code chip when they want to inspect it.

Implementation:
- Focus+open effect skips early when `artifact.type === CODE`.
- `artifact.type` added to the dep array so the gate re-evaluates
  if the type ever changes (it shouldn't, but the dep is honest).
- JSDoc updated to call out the carve-out.

Tests:
- New `does NOT auto-open a streaming CODE artifact (test.py is
  click-to-open)` — seeds isSubmitting=true, mounts a `.py`,
  asserts the artifact registers but currentArtifactId stays null.
- New `clicking a CODE artifact focuses it even though it skipped
  auto-open` — confirms the click path still surfaces a `.py`.
- All 25 prior auto-open tests for HTML/React/Markdown/plain-text
  buckets still pass unchanged: those types continue to auto-open
  on streaming.

* 🧹 chore: Address two NITs from the audit-fix follow-up review

- **NIT #1 (conf 60)**: Add a test for the dotfile-with-extension
  intersection (`_.config-abcdef.txt` → `.config.txt`). Both halves
  of the path were tested separately — extension-anchored suffix
  stripping and `_.` underscore restoration — but the combination
  wasn't pinned. Adds `expect(displayFilename('_.config-abcdef.txt'))
  .toBe('.config.txt')`.

- **NIT #2 (conf 25)**: Tighten the cast in `attachmentSalience` from
  the anonymous `{ bytes?: number }` shape to the concrete
  `TFile & TAttachmentMetadata` (the actual TAttachment branch that
  declares `bytes`). Same runtime behavior; a future retype of
  `TFile.bytes` will now surface here at compile time instead of
  being silently papered over.

* 🩹 fix: Stop stripping `-<6 hex>` suffixes from non-dotfile filenames

Codex's repeated P2 was correct: the `COLLISION_SUFFIX_BEFORE_EXT`
regex stripped any `-<6 hex>` immediately before an extension
regardless of context. That collapsed legitimate user-named files
like `report-deadbe.csv` and `report-beef01.csv` onto the same chip
label `report.csv`, silently merging distinct files in the UI.

The structural truth: only the dotfile shape (`_.foo-XXXXXX`) carries
an unambiguous discriminator (the leading `_.` that
`sanitizeArtifactPath` adds when rewriting a leading dot). The
extension-only case (`name-<hash>.ext`) has no such discriminator —
we can't distinguish a sanitized `report 1.csv` (which became
`report_1-<hash>.csv`) from a user-named `report-deadbe.csv` from
the filename alone.

Recovering the non-dotfile case cleanly would require a backend
`wasSanitized` metadata flag we don't have. Without it, the safer
choice is to leave non-dotfile names alone — uglier when the file
*was* sanitized, but never collapses distinct files onto a shared
label.

Changes:
- Drop `COLLISION_SUFFIX_BEFORE_EXT`. Replace
  `SANITIZED_DOTFILE_TRAILING_SUFFIX` with a unified
  `SANITIZED_DOTFILE_PATTERN` that handles both extensionless and
  with-extension dotfile shapes in one regex.
- Simplify `displayFilename` to a single match + reconstruct path.
- Update tests: drop the broad-stripping assertion
  (`output-deadbe.csv` → `output.csv`), add explicit codex-regression
  cases (`report-deadbe.csv` and `report-beef01.csv` preserve
  unchanged), document the deliberate non-recovery for sanitized
  non-dotfiles, update the AttachmentGroup→FileContainer integration
  test to reflect the narrower stripping (non-dotfile `archive-deadbe.zip`
  passes through; new dotfile `_.config-abcdef.zip` → `.config.zip`
  exercises the recoverable path).

* 🩹 fix: Scope code-tool annotation stripping to file-list sections

Codex was right: the previous global `.replace` would mutate any line
ending in one of the three annotation phrases — even legitimate
stdout. A user script doing
`echo "foo | File is already downloaded by the user"` had its output
silently scrubbed before being fed back into model context.

New `FILE_SECTION_PATTERN` captures `Generated files:` /
`Available files (...)` blocks (header + lines starting with `- /`).
Annotation stripping now only runs *within* the captured file-list
section via a nested `.replace`, so:

- Inside the section: per-file `| <ann>` suffixes still get stripped
  (line-per-file ≥ 4 files form, inline `, ` comma-separated ≤ 3
  files form — both already covered by existing patterns).
- Outside the section: stdout, stderr, blank lines, the trailing
  `Note:` paragraphs (handled by their own pattern), and any user
  text that coincidentally contains an annotation phrase pass
  through unchanged.

Tests:
- New `does NOT mutate stdout that legitimately contains an
  annotation phrase outside a file-list section` pins the codex
  regression: three coincidental phrases in stdout, no
  `Generated files:` header, all three preserved verbatim.
- New `strips annotations inside a file-list section but preserves
  identical phrases in stdout above it` covers the mixed case where
  the same phrase appears in both stdout and a file listing —
  stdout survives, listing gets cleaned, exactly one occurrence
  remains.
- All 9 prior tests still pass (file-section stripping behavior
  unchanged for both line-per-file and inline-comma layouts).
2026-04-29 08:53:10 -04:00
Danny Avila
4a5fc701d2
📂 fix: Preserve Nested Skill Paths in Code-Env Uploads (#12877)
* fix(code): preserve code env upload filepaths

* chore: Reorder import statements in crud.js
2026-04-29 08:07:46 -04:00
Danny Avila
915b30c60d
📦 chore: update @librechat/agents to v3.1.74 (#12869) 2026-04-29 10:20:52 +09:00
Helge Wiethoff
61b9b1daa7
🩹 fix(SSE): Treat responseCode === 0 as Transport Failure, Not Server Error (#12834)
* fix(sse): treat responseCode===0 as transport failure, not server error

When a long-running model response (e.g. gpt-5.4 with web_search:true)
takes longer than the browser's idle connection timeout, the SSE transport
drops and sse.js fires an error event with responseCode=0 and e.data set
to the raw response buffer (non-JSON SSE text).

The previous guard `!responseCode` is truthy for both 0 (transport drop)
and undefined (genuine server-sent error event), so the client incorrectly
entered the server-error branch, tried to JSON.parse raw SSE text, logged
"Failed to parse server error", and showed the user a red error banner --
even though the backend continued processing and delivered the final answer
seconds later.

Fix 1 (client): change guard from `!responseCode` to `responseCode == null`
so that only undefined/null (no HTTP status at all) triggers the server-error
parse path. responseCode===0 now correctly falls through to the reconnect path.

Fix 2 (backend): after res.flushHeaders() the response is already committed
as SSE. The fallback branch that wrote res.status(404).json() was an HTTP/SSE
protocol violation. Replace with an SSE-conformant event:error frame + res.end().

* fix(sse): use onError helper on subscribe failure + add regression tests

Replace silent res.end() with onError('Failed to subscribe to stream')
so the client receives a parseable SSE error event instead of a stream
that closes with no signal. The previous res.end() left the UI stuck
in "submitting" state because no error/abort/final event ever fired.

Also adds two missing test cases for the responseCode guard change:
- responseCode === 0 with raw SSE buffer data must NOT call errorHandler
  (transport failure should reconnect, not display garbage)
- responseCode == null with JSON error data MUST call errorHandler
  (server-sent error events should still surface to the user)

---------

Co-authored-by: Danny Avila <danny@librechat.ai>
2026-04-29 10:05:51 +09:00
ethanlaj
85894c11c7
🧜‍♂️ fix: Preserve Mermaid foreignObject HTML in Sanitized SVG (#12819) 2026-04-29 09:37:38 +09:00
David Newman
1f37ec842a
🔌 fix: Prevent Repeated Idle Check Triggers for Users With Failed MCP Connections (#12853) 2026-04-29 09:19:35 +09:00
Daniel Lew
2503365c44
🚫 feat: Add Support for none Reranker Type in Web Search Config (#12765)
Most of the codebase already supports the concept of *not* using a reranker w/
web search, except there was no way to initially setup an absent reranker
component.

Now there's a special path for skipping the reranker auth when loading web
search config, which allows for skipping the reranker when using web search.
2026-04-29 09:17:04 +09:00
Gaurav Dubey
84043432a5
🧹 fix: Graceful MCP OAuth Revoke Cleanup When Tokens Are Missing (#12825)
* fix: graceful MCP OAuth revoke cleanup when tokens are missing (#12754)

`maybeUninstallOAuthMCP` in `api/server/controllers/UserController.js`
aborts before the DB-delete and flow-state cleanup steps whenever
`MCPTokenStorage.getTokens` throws `ReauthenticationRequiredError` —
which is exactly what happens when a user clicks "Revoke" on an MCP
server whose backend is already dead and whose refresh token is gone.
The resulting error is both surfaced to the log as a red line and, more
importantly, leaks the DB token row and OAuth flow state.

Wrap the token retrieval in try/catch following the same best-effort
pattern already used for the two `revokeOAuthToken` calls. On
`ReauthenticationRequiredError`, skip revocation silently (info log)
and continue to the cleanup steps. On any other unexpected error, log
a warning and continue — cleanup must always run.

Exported `maybeUninstallOAuthMCP` for direct unit testing and added
`api/server/controllers/__tests__/maybeUninstallOAuthMCP.spec.js` with
8 cases: early-return guards (non-MCP key, non-OAuth server, missing
client info), happy path (both tokens revoked + cleanup), both
failure-to-retrieve paths (ReauthenticationRequiredError and arbitrary
error — cleanup still runs in both), single-token path, and
revocation-call failures (cleanup still runs).

Fixes #12754.

* test: use instanceof check against real ReauthenticationRequiredError

Follow-up to the previous commit on this branch. Two changes:

1. `UserController.maybeUninstallOAuthMCP` now checks
   `error instanceof ReauthenticationRequiredError` using the real
   class imported from `@librechat/api`, instead of comparing
   `error?.name === 'ReauthenticationRequiredError'`. The name-string
   check matched any unrelated error that happened to have the same
   `.name`; the `instanceof` check is a proper identity test.

2. The accompanying spec's jest mock for `@librechat/api` now
   exposes a `ReauthenticationRequiredError` class, and the test
   imports it from that mock so the `instanceof` comparison in the
   production code holds during the test. Without this, the two
   "skips revocation ... still runs cleanup" tests threw
   `TypeError: Right-hand side of 'instanceof' is not an object`
   because the mock left the class undefined.

All 8 tests in the spec pass.
2026-04-29 09:11:12 +09:00
Yorgos K
f2df0ea62b
🛡️ fix: Filter user_provided Sentinel in Tool Credential Loading (#12840)
When GOOGLE_KEY=user_provided is set as an endpoint config, the
loadAuthValues() function in credentials.js would pass the literal
string 'user_provided' to tools via the || fallback chain. This caused
Gemini Image Tools to fail at runtime with an invalid API key error,
as initializeGeminiClient() received the sentinel value instead of a
real key.

The fix aligns loadAuthValues() with checkPluginAuth() in format.ts,
which already correctly excludes user_provided and empty/whitespace
values. Now loadAuthValues() skips these values and continues to the
next field in the fallback chain or falls through to user DB values.

Added regression tests covering:
- user_provided sentinel is skipped, DB value used instead
- Fallback chain continues past user_provided to next field
- Empty and whitespace env values are skipped
- Real env values are returned correctly
- Optional fields with sentinel values handled gracefully
2026-04-29 09:09:54 +09:00
Danny Avila
89bf2ab7b4
💎 fix: Stop Double-Counting Cache Tokens for Gemini/OpenAI in Usage Spend (#12868)
* 💎 fix: Stop Double-Counting Cache Tokens for Gemini/OpenAI in Usage Spend (#12855)

Different providers report `usage_metadata.input_tokens` with different
semantics:

  - Anthropic / Bedrock: `input_tokens` EXCLUDES cache; cache reads/writes
    arrive separately and must be added to get the total prompt size.
  - Gemini / OpenAI: `input_tokens` ALREADY INCLUDES cached tokens
    (Google's `promptTokenCount`, OpenAI's `prompt_tokens`). Their
    `input_token_details.cache_*` are subsets of `input_tokens`.

`recordCollectedUsage` treated both schemes as additive, so for cache-hit
requests on Gemini/OpenAI it added cache tokens on top of an
`input_tokens` value that already contained them — overcharging users by
the cache_hit_rate (e.g., ~67% cache hit ≈ 1.67x overcharge). This
matches the issue reporter's GCP billing comparison.

Adds a small `splitUsage` helper that classifies the provider by model
name and computes `inputOnly` (the non-cached portion) plus the
all-inclusive `totalInput` for both the spend math and the returned
`input_tokens` summary. The helper defaults to additive semantics (the
historical behavior) so unknown providers are unaffected.

Updates existing OpenAI-shaped tests that previously asserted the buggy
additive math, and adds Gemini regression tests using the exact numbers
from the issue report (input=11125, cache_read=7441 → input=3684).

Anthropic / Bedrock paths remain bit-identical to before.

* 🔧 refactor: Classify Cache-Token Semantics by Provider, Not Model Name

Follows up the previous commit. Replaces a model-name regex
(`gemini|gpt|o[1-9]|chatgpt`) with an explicit `Providers` enum lookup
keyed off the `usage.provider` field — `UsageMetadata.provider` already
exists in `IJobStore.ts` but was never being populated.

  - `callbacks.js#ModelEndHandler` now attaches `usage.provider` from
    `agentContext.provider` alongside `usage.model`.
  - `usage.ts` uses a `SUBSET_PROVIDERS` set (`openAI`, `azureOpenAI`,
    `google`, `vertexai`, `xai`, `deepseek`, `openrouter`, `moonshot`)
    backed by the canonical `Providers` enum from
    `librechat-data-provider`.
  - `xai`, `deepseek`, `openrouter`, `moonshot` extend `ChatOpenAI` so
    they inherit subset semantics (verified in node_modules).
  - Defaults to additive when `usage.provider` is missing, so the title
    flow (which doesn't propagate provider) and any pre-this-PR usage
    entries keep their existing behavior.

Tests: switch fixtures from model-name signaling to explicit `provider`
field, plus a Vertex AI case and a "missing provider" fallback case.
2026-04-29 08:36:00 +09:00
Danny Avila
46a86d849f
🛂 fix: Skip Inherited / Mark Skill Files Read-Only in Code-Env Pipeline (#12866)
* 🛂 fix: Skip Re-Download of Inherited Code-Env Files (No More 403 Storms)

When a bash/code-interpreter call lists or operates on inputs the user
already owns (skill files primed via primeInvokedSkills, files inherited
from a prior session), codeapi echoes those files back in the tool
result with `inherited: true`. We were treating every entry as a
generated artifact and calling processCodeOutput on each, which:

1. Hit `/api/files/code/download/<session_id>/<file_id>` with the
   user's session key. Skill files are uploaded under the skill's
   entity_id, so every download 403'd — producing dozens of
   "Unauthorized download" log lines per turn.

2. Surfaced those inputs as ghost file chips in the UI even though
   they were never generated by the run.

3. Wasted a download round-trip even when no auth boundary was
   crossed — the file is already persisted at its origin.

Fix: skip files where `file.inherited === true` in all three
artifact-files loops (`tools.js`, `createToolEndCallback`, and
`createResponsesToolEndCallback`). Skill files remain available to
subsequent calls via primeInvokedSkills / session inheritance — we
just don't redundantly re-download them.

Pairs with codeapi-side change that adds the `inherited` flag.

* 🔒 feat: Mark Skill Files as `read_only` During Code-Env Priming

Pairs with codeapi `read_only` upload flag (ClickHouse/ai#1345). When
LibreChat primes a skill into the code-env, every file in the batch
(SKILL.md plus all bundled scripts/schemas/docs) is now uploaded with
`read_only: true`. Codeapi seals these inputs at the filesystem layer
(chmod 444) and the walker echoes the original refs as `inherited:
true` regardless of whether sandboxed code modified the bytes on disk.

Without this, the previous PR's `inherited` skip handled only the
unchanged case. A modified skill file (pip writing pyc near a .py, a
script accidentally truncating LICENSE.txt, etc.) still flowed through
the modified-input branch on codeapi, got a fresh user-owned file_id,
uploaded as a "generated" artifact, and surfaced in the UI as a chip
the user couldn't actually authorize a download for.

Changes:

- `api/server/services/Files/Code/crud.js`:
  `batchUploadCodeEnvFiles({ ..., read_only })` forwards the flag as
  a multipart form field. Default `false` preserves existing behavior
  for user-attached files and prior-session inheritance.

- `packages/api/src/agents/skillFiles.ts`: type signature gains
  `read_only?: boolean`; `primeSkillFiles` passes `true`.

- `packages/api/src/agents/skillFiles.spec.ts`: assert the upload call
  carries `read_only: true`.

The flag is intentionally not skill-specific. Any future
infrastructure-input flow (system fixtures, cached datasets, etc.) can
opt in the same way.
2026-04-29 08:26:25 +09:00
Danny Avila
f69e8e26f8
🪟 feat: Render Source-Code Artifacts in the Side Panel (#12854)
* 🪟 feat: Render Source-Code Artifacts in the Side Panel (CODE bucket)

PR #12832 wired markdown / mermaid / html / .jsx-tsx tool outputs through
the side-panel artifact pipeline but explicitly punted on code files:

> Everything else (csv, py, json, xls/docx/pptx, …) keeps PR #12829's
> inline behaviour — dedicated viewers will land in follow-ups.

This adds the code-file viewer. A `simple_graph.py` (and every other
common source file) now opens in the side panel alongside markdown,
mermaid, html, and react artifacts instead of falling back to the
inline `<pre>` rendering.

**Design.** New `CODE: 'application/vnd.code'` bucket reuses the static-
markdown sandpack template — `useArtifactProps` pre-wraps the source as
a fenced code block (` ```python\n...\n``` `) before handing it to
`getMarkdownFiles`. The fence carries a `language-<x>` class through
`marked`, so a future highlighter swap-in (e.g. drop `highlight.js`
into the markdown template) picks up syntax colors automatically. The
`react-ts` (sandpack) template's React boot cost is avoided since
source files don't need it.

**Single source of truth for languages.** New `CODE_EXTENSION_TO_LANGUAGE`
map drives BOTH:
  - `EXTENSION_TO_TOOL_ARTIFACT_TYPE` routing (presence in this map =
    code file). Adding a new language is one entry.
  - The fenced-block language hint (exported as `languageForFilename`).

Identifiers follow the GitHub / `highlight.js` convention so the future
highlighter pickup is automatic.

**Scope.** Programming languages + stylesheets + shell + sql/graphql +
build files (Dockerfile/Makefile/HCL). Pure data formats
(CSV/TSV/JSON/JSONL/NDJSON/XML/YAML/TOML) and config dotfiles
(`.env`/`.ini`/`.conf`/`.cfg`) are intentionally NOT routed in this
pass — they're better served by dedicated viewers (CSV table view,
etc.) or remain inline. Adding them later is a one-entry change in
the map.

**JSX/TSX kept on the React (sandpack) bucket.** They're React component
sources; the existing live-preview should win over the static CODE
bucket. Plain `.js`/`.ts` source goes through CODE.

**MIME-type fallback.** The codeapi backend serves `text/x-python`,
`text/x-typescript`, etc. as `Content-Type` for source files, so a
file whose extension was stripped/renamed upstream still routes to
CODE via the MIME map.

**Empty-text gate.** CODE joins MARKDOWN/PLAIN_TEXT in the empty-text
exception (an empty `.py` is still a Python file). HTML/REACT/MERMAID
still require text — their viewers (sandpack/mermaid.js) error on
empty input.

**Files changed:**
- `client/src/utils/artifacts.ts` — `CODE` bucket constant,
  `CODE_EXTENSION_TO_LANGUAGE` map, exported `isCodeExtension` and
  `languageForFilename` helpers, extension/MIME routing additions,
  template + dependencies entries, empty-text gate exception, helper
  hoisting (extensionOf / baseMime moved up so the language map can
  reference them).
- `client/src/hooks/Artifacts/useArtifactProps.ts` — exported
  `wrapAsFencedCodeBlock`, CODE branch that wraps the source then
  routes through `getMarkdownFiles`.

**Tests (+22):**
- 8 parameterized routing cases (.py, .js, .go, .rs, .css, .sh, .sql,
  .kt) verify the CODE bucket fires.
- Extension wins when MIME is generic octet-stream (Python has no
  magic bytes; common case).
- Regression: jsx/tsx STAY on REACT bucket (no live-preview regression).
- Regression: data formats (CSV/JSON/YAML/TOML) and config dotfiles
  (.env/.ini) do NOT route to CODE.
- Empty-text exception for CODE (empty Python file is still a Python file).
- `useArtifactProps`: CODE → content.md / static template, fenced-block
  shape, language hint, unknown-extension fallback to raw extension,
  no-extension empty hint, index.html via markdown template.
- `wrapAsFencedCodeBlock`: language hint, empty hint, single-trailing-
  newline trim, multi-newline preservation, empty-source emit.

87/87 in artifact-impacted tests; 155/155 across the broader artifact
suite. No regressions in pre-existing markdown/mermaid/HTML/REACT/text
behavior.

* 🛡️ fix: Bare-filename routing + adaptive fence delimiter (codex P2 ×2)

Two follow-ups from Codex review on the CODE bucket:

1. **Bare-filename routing for extensionless build files (Codex P2).**
   `Dockerfile`, `Makefile`, `Gemfile`, `Rakefile`, `Vagrantfile`,
   `Brewfile` have no `.` in their basename — `extensionOf` returns
   `''` and the extension map can't match, so they fell through to
   inline rendering despite being in `CODE_EXTENSION_TO_LANGUAGE`.

   New `bareNameOf` helper returns the lowercased basename for
   extensionless filenames (returns `''` for files with a `.` so the
   extension and bare-name paths don't double-match). Both
   `detectArtifactTypeFromFile` and `languageForFilename` consult it as
   a second lookup against the same `CODE_EXTENSION_TO_LANGUAGE` map,
   so adding a new build file is one entry. Path-aware: takes the
   basename so `proj/Dockerfile` (path-preserving sanitizer output)
   still routes correctly.

   Added the four extra Ruby build-script names while I was here.

2. **Adaptive fence delimiter (Codex P2).** A hardcoded ` ``` ` fence
   breaks when the source contains a line starting with ` ``` ` —
   for example, a JS file containing a markdown-shaped template
   literal:
       const md = `
       ```
       hello
       ```
       `;
   CommonMark closes a fence on a line whose backtick run matches-or-
   exceeds the opener, so `marked` would close the outer fence at
   the inner `\`\`\`` and the rest of the file would render as
   markdown — corrupting the artifact and potentially altering
   formatting / links outside `<code>`.

   New `longestLeadingBacktickRun(source)` scans for the longest
   start-of-line backtick run in the payload. Fence length =
   `max(3, longest + 1)` — strictly more than any internal run, so
   `marked` can never close the outer fence early. Only escalates
   when needed; the common case still uses a triple-backtick fence.

   Inline backticks (mid-line) don't count — they're not fence
   delimiters. Only column-zero runs trigger escalation, so e.g.
   a Python file with ` `inline ``` here` ` keeps the 3-fence.

+11 regression tests:
  - 8 parameterized cases: `Dockerfile`/`Makefile`/`Gemfile`/etc. route
    to CODE via bare-name fallback (case-insensitive on basename).
  - Path-aware: `proj/Dockerfile` recognized.
  - No double-match: `dockerfile.dev` (with extension) returns null.
  - Unknown extensionless files (`README`, `LICENSE`) stay null.
  - 4-backtick fence when source has ` ``` ` at start-of-line.
  - 5-backtick fence when source has ` ```` ` at start-of-line.
  - 3-backtick fence (default) for ordinary code.
  - Inline backticks don't escalate.
  - Source starting with backtick run at offset 0.

Plus 6 new `languageForFilename` tests covering bare-name fallback
and path-awareness.

108/108 in artifact-impacted tests (was 87, +21 tests). No regressions.

* 🛡️ fix: Indented fence detection + basename-scoped extensionOf (codex P2/P3)

Two follow-ups from the latest Codex review on the CODE bucket:

1. **Indented backtick runs (Codex P2).** `longestLeadingBacktickRun`
   was scanning `^(`+)` — column 0 only. CommonMark allows fence
   closers to be indented up to 3 spaces, so a JS source containing
   an indented `\`\`\`` (e.g. inside a template literal embedded in a
   class method) would still terminate our outer fence and the
   remainder would render as markdown.

   Updated regex to `^ {0,3}(`+)`. Tabs are not allowed in fence
   indentation (CommonMark expands them to 4 spaces, which is over
   the 3-space limit), so spaces alone suffice. Backticks indented
   4+ spaces are CommonMark "indented code blocks" — they can't
   terminate a fence, so we correctly don't escalate for them.

2. **`extensionOf` path-laden output (Codex P3).** `extensionOf` took
   `lastIndexOf('.')` across the FULL path string, so
   `pkg.v1/Dockerfile` yielded the nonsensical "extension"
   `v1/dockerfile`. `languageForFilename` returned that as the language
   hint (broken `language-v1/dockerfile` class on the fenced block),
   AND the routing's bare-name fallback couldn't fire because the
   extension lookup returned non-empty.

   New `basenameOf` helper strips path separators; `extensionOf` and
   `bareNameOf` both go through it. After the fix:
     - `pkg.v1/Dockerfile` → `extensionOf` returns `''` → `bareNameOf`
       returns `dockerfile` → routes to CODE with correct language.
     - `pkg.v1/main.go` → `extensionOf` returns `go` → routes correctly.
     - `pkg.v1/script.py` → `extensionOf` returns `py` → routes correctly.

+10 regression tests:
  - 5 parameterized cases covering 1-3 space indent at fence lengths
    3, 4, 5 (escalation kicks in correctly).
  - 4-space indent does NOT escalate (CommonMark indented-code-block
    territory; can't close a fence).
  - `pkg.v1/Dockerfile` and `a.b.c/Makefile` route to CODE +
    `languageForFilename` returns `dockerfile`/`makefile`.
  - Dotted-directory files (`pkg.v1/main.go`, `a.b.c/script.py`) still
    route correctly via the basename-scoped extension parse.

118/118 in artifact-impacted tests (was 108, +10 tests). No regressions.

* 🛡️ fix: Comprehensive review polish + MIME-derived language hint (codex P3)

Resolves all 8 valid findings from the comprehensive review and the
follow-up Codex P3 on the same PR. None are user-visible bugs; the set
spans correctness guards, dead-code removal, organization, and test
coverage.

**Comprehensive review #1 — Remove dead `isCodeExtension` export.**
Function was exported with zero callers anywhere in the codebase.

**Comprehensive review #2 — Guard the for-loop against silent overwrites.**
The `for (ext of CODE_EXTENSION_TO_LANGUAGE)` loop blindly assigned
each language extension to the CODE bucket. If a future contributor
added `jsx` or `tsx` to the language map (a natural mistake — they
ARE source code), the loop would silently overwrite the REACT bucket
entries and break the sandpack live-preview with no compile-time or
runtime error. Added `if (ext in EXTENSION_TO_TOOL_ARTIFACT_TYPE) continue`
so explicit map entries always win.

**Comprehensive review #3 — Add `fileToArtifact` end-to-end test for CODE.**
Routing was tested via `detectArtifactTypeFromFile`; full Artifact
construction (id / type / title / content / messageId / language) for
CODE was not. Added 5 new `fileToArtifact` cases.

**Comprehensive review #4 — Move pure utilities out of the hook file.**
`wrapAsFencedCodeBlock` and `longestLeadingBacktickRun` are pure
string transformations with no React dependencies. Moved both to
`utils/artifacts.ts`. Test files updated to import from the new
location.

**Comprehensive review #5 — Correct the MIME-map "mirrors" comment.**
Comment claimed the MIME map mirrored `CODE_EXTENSION_TO_LANGUAGE`, but
covered ~21 of ~60 entries. Reworded to "best-effort COMMON-CASE list,
not an exhaustive mirror" with the rationale (extension routing is
primary; MIME is a stripped-filename fallback).

**Comprehensive review #6 — Drop `lang ? lang : ''` ternary.**
`lang` is typed `string`; the only falsy value is `''`. Removed.
(Replaced via the MIME-fallback rewrite of `wrapAsFencedCodeBlock`,
where `lang` is now used directly without the ternary.)

**Comprehensive review #7 — Avoid double `basenameOf` computation.**
`extensionOf(filename)` and `bareNameOf(filename)` both internally
called `basenameOf` — when the extension lookup missed,
`detectArtifactTypeFromFile` paid for two parses of the same path.
Split into private `extensionFromBasename` / `bareNameFromBasename`
helpers; the caller computes `basenameOf` once and threads it through.

**Comprehensive review #8 — Trim verbose Dockerfile/Makefile comment.**
Inline comment block in the language map duplicated `bareNameOf`'s
JSDoc. Replaced with a one-line pointer.

**Codex P3 — MIME fallback for the CODE language hint.**
`detectArtifactTypeFromFile` routes `{ filename: 'noext', type:
'text/x-python' }` to CODE via the MIME bucket map, but then
`useArtifactProps` derived the language hint from `artifact.title`
ONLY — and `noext` has no extension, so `languageForFilename` returned
empty and the fenced block emitted with no `language-` class. The
future highlighter swap-in would lose syntax-color metadata for these
files.

  - New `MIME_TO_LANGUAGE` map covering the language MIMEs codeapi
    actually emits.
  - `languageForFilename(filename, mime?)` now takes an optional MIME
    second arg and falls back to it after the extension and bare-name
    paths.
  - `fileToArtifact` resolves the language at construction time
    (using both filename AND `attachment.type`) and stores it on
    `artifact.language`. The hook reads `artifact.language` directly
    rather than re-deriving from `title` alone, so the MIME signal
    survives end-to-end.
  - Title-derived fallback in the hook covers older callers that
    don't populate `language`.

Tests:
  +10 cases for the comprehensive review findings (CODE end-to-end
  via `fileToArtifact`, language storage, non-CODE language
  un-set).
  +6 cases for the MIME fallback (`languageForFilename(name, mime)`
  ordering, MIME parameter stripping, extension/bare-name vs MIME
  precedence, empty signal).
  +2 hook tests for `artifact.language` pre-resolved vs title-fallback.

131/131 in directly-impacted files (was 118, +13).
199/199 across the broader artifact suite. No regressions.

Pre-existing TypeScript errors in `a11y/`, `Agents/`, `Auth/`,
`Mermaid.tsx`, etc. are unrelated to this PR (verified by checking
`tsc --noEmit` on origin/dev — same errors).
2026-04-28 19:07:19 -04:00
Danny Avila
c9dee962e7
📂 fix: Preserve Nested Folder Paths for Code-Execution Artifacts (#12848)
* 📂 fix: Preserve Nested Folder Paths for Code-Execution Artifacts

When codeapi reports a generated file at a nested path (`a/b/file.txt`),
`processCodeOutput` was running it through `sanitizeFilename` — which
calls `path.basename()` and then collapses `/` to `_`. The DB row ended
up with `filename: "file.txt"`, `primeFiles` shipped that flat name back
to the next sandbox session, and `cat /mnt/data/a/b/file.txt` 404'd.

Fix: split the sanitizer into two helpers in `packages/api/src/utils/files.ts`:

  - `sanitizeArtifactPath` — segment-wise sanitize while preserving
    `/`. Falls back to basename on `..` traversal, absolute paths, and
    other malformed inputs. The DB record uses this so the next prime()
    can recreate the nested path in the sandbox.

  - `flattenArtifactPath` — encode `/` as `__` for the local
    `saveBuffer` strategies, which key by single-component filename and
    would otherwise create unintended subdirectories under uploads/.

`process.js` is updated to use both: DB filename keeps the path, storage
key flattens. `claimCodeFile` is also keyed on `safeName` so the
(filename, conversationId) compound key stays consistent with the
record `createFile` writes.

Tests:
  +13 unit tests in `files.spec.ts` (sanitizeArtifactPath table,
  flattenArtifactPath round-trip).
  +1 integration test in `process.spec.js` asserting the DB-row vs
  storage-key split for a nested path.
  Updated `process-traversal.spec.js` to mock the new helpers.

64 pass / 0 fail across `Files/Code/`; 36 pass / 0 fail in
`packages/api/src/utils/files.spec.ts`.

Companion: ClickHouse/ai#1327 — the codeapi-side counterpart that stops
phantom file IDs from reaching this code path in the first place. These
two are independent but the matplotlib bug is most cleanly resolved when
both ship.

* 🛡️ fix: Re-add 255-char per-segment cap in sanitizeArtifactPath (codex review P2)

`sanitizeArtifactPath` dropped the 255-char basename cap that
`sanitizeFilename` enforces. Long artifact names then flowed unbounded
into `processCodeOutput`'s storage key (`${file_id}__${flatName}`) and
tripped `ENAMETOOLONG` on filesystems that enforce `NAME_MAX` —
saveBuffer fails, and the file falls back to a download URL instead of
persisting / priming. This was a regression specifically for flat
filenames that the original `sanitizeFilename` would have truncated
safely.

Re-add the cap as a per-path-component limit so it applies cleanly to
both flat and nested paths:

  - Leaf segment: extension-preserving truncation, matching
    `sanitizeFilename`'s shape (`<truncated-stem>-<6 hex>.<ext>`).
  - Non-leaf (directory) segments: plain truncate-and-disambiguate
    (`<truncated-name>-<6 hex>`); directory names don't carry semantic
    extensions worth preserving.
  - Defensive fallback when `path.extname` returns a pathologically long
    "extension" (e.g. `_.aaaa…aaa` after the dotfile underscore prefix
    rewrite turns a long hidden file into a non-dotfile with a 300-char
    "extension"): collapse to whole-segment truncation rather than
    leaving the cap unmet.

+6 unit tests covering: long leaf (regression case), long leaf under a
preserved directory, long non-leaf segment, deeply nested mixed-length,
exact-255 boundary (no truncation), and the dotfile + truncation
interaction.

* 🛡️ fix: Cap flattened storage key against NAME_MAX in processCodeOutput (codex review P1)

Per-segment caps on the path-preserving form aren't enough. Once segments
are joined with `__` for the storage key, deeply-nested or moderately
long paths can still produce a flat form that overflows once
`${file_id}__` is prepended — `${file_id}__a__b__c.csv` for a 3-level
100-char-each path is ~344 chars, well past filesystem NAME_MAX (255).
saveBuffer then trips ENAMETOOLONG and falls back to a download URL,
and the artifact never persists / primes.

`flattenArtifactPath` gets an optional `maxLength` parameter. When set,
the function truncates the flat form to fit, preserving the leaf
extension with the same disambiguating-hex-suffix shape sanitizeFilename
uses. Default (`undefined`) keeps existing call sites uncapped — the cap
is opt-in for callers that are actually building a filesystem key.
Pathologically long "extensions" from `path.extname` (e.g. `.aaaa…aaa`)
fall back to whole-key truncation rather than leaving the cap unmet.

processCodeOutput composes the storage key after `file_id` is known and
passes `255 - file_id.length - 2` as the budget so the full
`${file_id}__${flatName}` string fits in one filesystem path component.

+7 unit tests in files.spec.ts:
  - Pass-through when no maxLength supplied (cap is opt-in).
  - Pass-through when flat form fits within maxLength.
  - Truncation with leaf extension preserved (the regression case).
  - Leaf-only overflow with extension preservation.
  - Pathological long-extension fallback (whole-key truncation).
  - No-extension stem truncation.
  - Boundary equality (off-by-one guard).

+1 integration test in process.spec.js: processCodeOutput passes the
file_id-aware budget (`255 - file_id.length - 2`) to flattenArtifactPath.

114/114 across files.spec.ts + Files/Code (49 + 65).

* 🛡️ fix: Determinize + clamp artifact-path truncation (codex review P2 ×2)

Two follow-ups to Codex review on the path/flat-key cap:

1. **Deterministic truncation suffixes**. The previous helpers used
   `crypto.randomBytes(3)` for the disambiguator, mirroring
   `sanitizeFilename`'s shape. That made the truncated form non-
   deterministic: a re-upload of the same long filename would compute a
   *different* storage key, orphaning the previous on-disk file under
   the reused `file_id` returned by `claimCodeFile`.

   New `deterministicHexSuffix(input)` helper hashes the input with
   SHA-256 and takes the first 6 hex chars. Same input → same suffix
   (storage key stable across re-uploads); different inputs sharing a
   truncation prefix still get different suffixes (collision avoidance).
   24 bits ≈ 16M values is collision-safe for our scale (single-digit
   artifacts per turn per (filename, conversationId) bucket).

   Applied to `truncateLeafSegment`, `truncateDirSegment`, and
   `flattenArtifactPath` — every truncation site in the new helpers.
   `sanitizeFilename` (pre-existing) is intentionally left alone; its
   tests rely on the random-bytes mock and it's outside this PR's scope.

2. **Final clamp on flattenArtifactPath result**. The old `Math.max(1,
   maxLength - ext.length - 7)` floor could let the result slip past
   `maxLength` when the extension was nearly as large as the budget
   (e.g. `maxLength=5`, `ext=".txt"`: budget computed as 0, but result
   was `-<6 hex>.txt` = 11 chars). Drop the `Math.max(1, …)` floor and
   add a final `truncated.slice(0, maxLength)` so the contract holds
   for any input. Also short-circuit `maxLength <= 0` to `''` for
   pathological budgets.

Tests updated to compute the expected hash inline (the existing
`randomBytes` mock doesn't apply to the new code path), plus 4 new
regression tests:
  - sanitizeArtifactPath: same input → same output, different inputs →
    different outputs (determinism + collision avoidance).
  - flattenArtifactPath: same input → same output, different inputs
    sharing a truncation prefix → different outputs.
  - flattenArtifactPath: clamp holds when ext.length > maxLength - 7.
  - flattenArtifactPath: returns '' for maxLength <= 0.

53 unit tests pass. 65 integration tests pass.

* 🛡️ fix: Total-path cap + basename for classifier (codex P2 + comprehensive review)

Four follow-ups from the latest reviews on this PR:

1. **Codex P2: total-path cap in sanitizeArtifactPath**. Per-segment
   caps weren't enough — a deeply nested path (3+ at-cap segments) can
   still produce a joined form past Mongo's 1024-byte indexed-key limit
   (4.0 and earlier reject; later versions configurable). Added
   `ARTIFACT_PATH_TOTAL_MAX = 512` and a leaf-only fallback when the
   joined form exceeds it. Same shape as the absolute-path /
   `..`-traversal fallbacks above; the leaf is already segment-capped to
   ≤255, so the final result stays within bounds.

2. **Codex P2: pass basename to classifier/extractor in process.js**.
   With the path-preserving sanitizer, `safeName` can now be a nested
   string like `reports.v1/Makefile`. The classifier's `extensionOf`
   reads that as `v1/Makefile` (the slice after the dot in the directory
   name) and the bare-name branch rejects because it sees a `.`
   anywhere. Result: extensionless artifacts under dotted folders
   (Makefile, Dockerfile, etc.) get misclassified as `other` and skip
   text extraction. Pass `path.basename(safeName)` to both
   `classifyCodeArtifact` and `extractCodeArtifactText` so
   classification matches what the old flat-name flow produced.

3. **Review nit: drop dead `sanitizeFilename` mock in process.spec.js**.
   process.js no longer imports `sanitizeFilename`; the mock was
   misleading dead code.

4. **Review nit: rename misleading `'embedded parent traversal'` test**.
   `path.posix.normalize('a/../escape.txt')` resolves to `escape.txt`
   which goes through the normal segment-split path, not the
   `sanitizeFilename` fallback. Test name now says "resolves embedded
   parent traversal via path normalization" to match the actual code
   path.

+3 regression tests:
  - sanitizeArtifactPath falls back to leaf-only when joined > 512.
  - sanitizeArtifactPath keeps nested path within the 512 budget.
  - process.spec: passes basename (`Makefile` from `reports.v1/Makefile`)
    to classifyCodeArtifact + extractCodeArtifactText.

Existing "caps every segment in a deeply-nested path" test now uses 2
segments (not 3) so the joined form stays under the new total cap; the
3-segment scenario is covered by the new fallback test instead.

55 unit + 66 integration = 121/121 pass.

* 📝 docs: Correct sanitizeArtifactPath JSDoc to match actual schema index

Two doc-only fixes from the latest comprehensive review (both NIT):

1. **Index field list was wrong**. JSDoc claimed the compound unique
   index was `{ file_id, filename, conversationId, context }`. The
   actual index in `packages/data-schemas/src/schema/file.ts:92-95` is
   `{ filename, conversationId, context, tenantId }` with a partial
   filter for `context: FileContext.execute_code`. The cap rationale
   (Mongo 4.0 indexed-key limit) is correct and unchanged; just the
   field list was wrong. Added the schema file path so future readers
   can find the source of truth.

2. **Trade-off acknowledgement**. The reviewer noted that the
   leaf-only fallback loses directory structure, which means the
   model's `cat /mnt/data/<deep>/<path>/file.txt` would 404 on the
   pathological-depth case — partially re-introducing the original
   flat-name bug for >512-char paths. This is intentional (DB write
   failure is strictly worse than losing structure), but the trade-off
   wasn't called out explicitly in the JSDoc. Added a paragraph
   acknowledging it and noting that the cap is monotonically better
   than the pre-PR behavior, where ALL artifacts were treated this way
   regardless of depth.

No code or test changes — pure JSDoc correction. Tests still 55/0.

* 🛡️ fix: Disambiguate sanitized artifact names to keep claimCodeFile keys unique (codex P2)

`sanitizeArtifactPath` is not injective — multiple raw inputs can collapse
onto the same regex-and-normalize output. Codex's example:
`reports 2026/out.csv` and `reports_2026/out.csv` both sanitize to
`reports_2026/out.csv`. `claimCodeFile` is keyed on the schema's compound
unique `(filename, conversationId, context, tenantId)` index, so the
later upload silently matches the earlier record and overwrites the first
artifact's bytes via the reused `file_id` — a single conversation can
drop files when both names are valid in the sandbox.

This collision space isn't strictly new — pre-PR `sanitizeFilename`
(basename-only) had the same property — but the path-preserving form
gives us enough information to fix it for the first time.

**Fix.** When character-level sanitization changed something (regex
replacement, path normalization, dotfile prefix, empty-segment collapse),
embed a deterministic SHA-256 prefix of the **raw input** in the leaf
segment via the new `embedDisambiguatorInLeaf` helper. Same raw input →
same safe form (idempotent for re-uploads); different raw inputs that
would have collided → different safe forms.

**Why "character-level"** specifically:
- The disambiguator fires when `preCapJoined !== inputName` (post-regex
  + dotfile + empty-segment, BUT pre-truncation).
- Truncation alone is already disambiguated by `truncateLeafSegment`'s
  own seg-hash; firing the input-hash branch on truncation would just
  stack a second hash for no collision-avoidance benefit and clutter
  human-readable filenames.

**Three known collision shapes covered:**
1. `out 1.csv` vs `out_1.csv` (and `out@1.csv` vs `out#1.csv`, etc.)
2. `dir//file.txt` vs `dir/file.txt` (empty-segment collapse)
3. `.x` vs `_.x` (dotfile-prefix step)

**Disambiguator + truncation interaction:** for very long mutated leaves,
`truncateLeafSegment` caps at 255 first, then `embedDisambiguatorInLeaf`
re-trims to insert the input hash. The seg-hash from the first pass is
replaced by the input-hash from the second pass — that's intentional
(input-hash is the load-bearing collision-avoidance suffix; seg-hash was
only ever decorative once the input-hash exists). Final clamp ensures
the result never exceeds `ARTIFACT_PATH_SEGMENT_MAX` regardless of input.

**Disambiguator + total-cap fallback:** when joined > 512, we fall back
to the leaf-only form. The leaf has already had the disambiguator
embedded, so collision avoidance survives the pathological-depth case.

**`embedDisambiguatorInLeaf`** uses `dot <= 1` to detect "no real
extension" (covers extensionless names AND dotfile-prefixed leaves like
`_.hidden` — without this, `_.hidden` would split as stem `_` + ext
`.hidden` and produce the awkward `_-<hash>.hidden`).

**Updated 5 existing tests** that asserted the old collision-prone
outputs — they now verify the disambiguator-included form. The
character-level-only firing rule was load-bearing here: tests for
"clean inputs (no mutation)" and "long inputs (truncation only)" still
pass without any disambiguator clutter.

**+7 regression tests** in a new `collision avoidance (Codex review P2)`
describe block:
1. Different raw inputs sanitizing to the same form get distinct safes
2. Whitespace-vs-underscore in directory segment
3. Dotfile-prefix collision
4. Idempotency: same raw → same safe across calls
5. Clean inputs skip the disambiguator (cosmetic guarantee)
6. Disambiguator survives leaf truncation (long mutated leaf)
7. Disambiguator survives total-cap fallback (pathological depth)

62 unit + 66 integration = 128/128 pass.
2026-04-28 12:52:04 +09:00
Danny Avila
7070eb76aa
🔧 fix: Replace Literal NUL Bytes in handlers.spec Test Fixture + Normalize CRLF (#12852)
Two test-file hygiene fixes:

1. **Literal NUL bytes**. The `'rejects binary content (NUL bytes)
   post-fetch'` test embedded raw `\x00` bytes directly in the source
   string (`const binaryWithNul = '<3 NULs>\rIHDR<2 NULs>\x04'`).
   Embedding NULs in source files breaks editors, linters, ts-loader,
   and most git tooling — `grep` even classifies the file as binary
   ("Binary file matches"). Replace with `\x00` escape sequences in the
   string literal so the source is plain ASCII while the runtime string
   value is unchanged.

2. **CRLF line endings**. My earlier commits to this file picked up
   Windows-style `\r\n` from git's `core.autocrlf=true` checkout
   conversion, then staged them as `\r\n`. The diff against `dev`
   showed the entire file as changed even though only a few lines were
   touched semantically. Normalize the whole file back to LF so future
   diffs read clean.

The diff for this commit is large (~1248 lines marked changed) but
every change is one of: CRLF → LF, or the single `binaryWithNul`
escape-sequence rewrite. No semantic test changes.

Tests: 39/39 pass (unchanged behavior).
2026-04-28 11:20:26 +09:00
Danny Avila
2624c18633
🚫 fix: Reject Binary Files in read_file Sandbox Fallback (No More Mojibake) (#12851)
* 🚫 fix: Reject Binary Files in read_file Sandbox Fallback (No More Mojibake)

`read_file("/mnt/data/simple_graph.png")` was shelling `cat` through codeapi
`/exec` and shipping the result back to the model. codeapi's transport is
JSON, so `stdout` containing PNG bytes round-tripped through lossy UTF-8
replacement (every non-ASCII byte became U+FFFD), got line-numbered by
`addLineNumbers`, and arrived in the model's context as a multi-KB blob of
`�PNG\r\n  2 | ...`. The bytes were unrecoverable — and the same
codeapi sandbox logged the base64-style mojibake too — so the goal is
fail-fast, not retrieval.

Two guards in `handleSandboxFileFallback`, both bypassed by the existing
text path:

  1. **Extension precheck** (BEFORE any network call) for known-binary
     types: images, documents (pdf/docx/xlsx/etc), archives, audio, video,
     native libs, fonts, and a few other byte-soup formats. The message
     for image extensions points at the existing chat attachment ("the
     image is already shown to the user, use bash_tool for programmatic
     processing"); other binaries get the generic "use bash" hint.

  2. **NUL-byte sniff** (AFTER the read) on the first 8KB for unknown
     extensions or no-extension paths. The codeapi `/exec` JSON encoding
     mangles most non-UTF-8 bytes but a NUL terminator from a magic
     header survives the round-trip, so this catches novel binary
     formats without an extension precheck.

`lowercaseExtension` uses the basename to avoid false-triggering on
directory-name dots (e.g. `proj.v1/notes` has no extension, not `.v1/notes`).

+6 tests:
  - Image rejected by extension, never calls readSandboxFile, message
    points at the existing attachment.
  - Non-image binary (.zip) rejected with a different (bash-only) message.
  - Case-insensitive extension match (.PNG vs .png).
  - NUL-byte sniff catches unknown-extension binary post-fetch.
  - Text files with binary-adjacent extensions (.txt) still readable.
  - Dotted directory names don't false-trigger the extension match.

38/38 handlers.spec.ts pass.

The companion bash "command not found" issue from the same conversation
is a separate LLM mistake (writing raw Python as the bash command without
`python3 -c` / heredoc wrapper). Not coded here — flagged to the user.

* 🖼️ fix: Allow SVG read_file (XML text, no mojibake risk) — codex review P2

`.svg` was bucketed with raster-image extensions in
`BINARY_EXTENSIONS_NEVER_READABLE`, which made `handleSandboxFileFallback`
reject every SVG before calling readSandboxFile. SVG is an XML text
format — there's no mojibake risk for normal content, and the model has
legitimate reasons to inspect or edit a generated SVG (tweaking colors,
paths, viewBox, etc.). Block was a regression for valid read_file usage.

Remove `.svg` from both `BINARY_EXTENSIONS_NEVER_READABLE` (so it routes
through the normal sandbox read path) and `IMAGE_EXTENSIONS_FOR_HINT`
(now-dead entry — only used by the rejection-message picker). The
post-fetch NUL-byte sniff still catches anything that turns out to be
binary despite a `.svg` extension.

+1 regression test that an SVG with valid XML content reads through
successfully (`<svg>...<circle/>...</svg>` → status: 'success', content
contains `<svg`/`viewBox`).
2026-04-27 20:47:38 -04:00
Danny Avila
47f65fe39a
🪟 feat: Render Code-Execution Text Artifacts as Side-Panel Artifacts (#12832)
* 🪟 feat: Render Code-Execution Text Artifacts as Side-Panel Artifacts

Builds on PR #12829 (which populates `text` on code-execution file
attachments). When a tool-output file's extension/MIME maps to a
viewer we already have, route it through the artifact UI instead of
the inline `<pre>`:

- text/html, text/htm        → existing artifacts side panel (sandpack)
- App.jsx / App.tsx          → existing artifacts side panel (sandpack)
- *.md / *.markdown / *.mdx  → existing artifacts side panel (sandpack)
- *.mmd / *.mermaid          → standalone Mermaid component, inline
                                (no sandpack/react template)

The card and the mermaid header both expose a download button so the
underlying file is still reachable. Everything else (csv, py, json,
xls/docx/pptx, …) keeps PR #12829's inline behaviour — dedicated
viewers for csv/docx/xlsx/pptx will land in follow-ups.

Backend: `.mmd` and `.mermaid` added to UTF8_TEXT_EXTENSIONS so
mermaid sources reach the client with `text` populated.

Frontend changes:
- `client/src/utils/artifacts.ts` — `TOOL_ARTIFACT_TYPES` constant,
  `detectArtifactTypeFromFile`, `fileToArtifact` (id is derived from
  `file_id` so the same artifact across renders dedupes cleanly).
- `client/src/components/Chat/Messages/Content/Parts/ToolArtifactCard.tsx`
  — registers the artifact in `artifactsState`, renders an
  `ArtifactButton`-style trigger paired with a download button.
- `client/src/components/Chat/Messages/Content/Parts/ToolMermaidArtifact.tsx`
  — wraps the standalone Mermaid component with a filename + download
  header so the file stays reachable.
- `Attachment.tsx` and `LogContent.tsx` — gain panel-artifact and
  mermaid branches in the routing decision tree, ahead of the existing
  inline-text fallback. Existing branches untouched.

Test coverage: backend extension matrix (mmd/mermaid), frontend
predicates (`isPanelArtifact`, `isMermaidArtifact`,
`artifactTypeForAttachment`), `fileToArtifact`, and an RTL suite that
verifies each type routes to the right component (panel card / mermaid
render / inline pre / file chip).

* 🩹 fix: Address review on code-artifacts-panel routing

- ToolArtifactCard: defer artifact registration to the click handler so
  rendering a card never side-effects into `artifactsState`. With
  `artifactsVisibility` defaulting to `true`, eager mount-time
  registration would surface tool artifacts in the side panel without
  user intent — now matches ArtifactButton's pattern. Drop the
  redundant `artifacts` subscription (write-only via useSetRecoilState).
- LogContent.tsx: precompute `Artifact`s inside the existing useMemo
  bucket-sort so each render isn't producing fresh objects. Without
  this, missing updatedAt/createdAt fields would make `toLastUpdate`
  return `Date.now()` and churn Recoil state on every parent render.
- Attachment.tsx + LogContent.tsx: classify each attachment once via
  `artifactTypeForAttachment` and branch on the result, instead of
  calling `isMermaidArtifact` and `isPanelArtifact` back-to-back
  (each of which internally re-classified). AGENTS.md single-pass rule.
- artifacts.ts `detectArtifactTypeFromFile`: strip `;` parameters
  before the MIME comparison (so `text/html; charset=utf-8` is
  recognized) and add fallbacks for `application/vnd.react`,
  `application/vnd.ant.react`, and `application/vnd.mermaid`.
- ToolMermaidArtifact: drop the `id` prop entirely when `file_id` is
  undefined so we never pass an undefined DOM id through to mermaid.
- AttachmentGroup: keys derived from `file_id` (not bare index) so
  add/remove churn doesn't remount stable cards.
- Wrappers (PanelArtifact / MermaidArtifact / ToolMermaidArtifact)
  tightened from `Partial<TAttachment>` to `TAttachment` since the
  caller always passes a full attachment.
- fileToArtifact: drop dead `?? ''` on content (guarded by the
  preceding type check).
- Tests: new click-interaction suite verifying the deferred-registration
  invariant, click registers + opens panel, and second click toggles
  closed without losing the registered artifact.

* 🧹 chore: Address follow-up review NITs

- artifacts.test.ts: regression-pin baseMime() with charset/case
  variants for text/html, text/markdown, application/vnd.react.
- attachmentTypes.ts: drop the now-unused isMermaidArtifact and
  isPanelArtifact wrappers (the routing collapsed onto a single
  artifactTypeForAttachment call in the previous commit, so they
  were only kept alive by their own test). attachmentTypes.test.ts
  rewritten to exercise artifactTypeForAttachment branches directly.
- Attachment.tsx + LogContent.tsx: re-sort the local imports
  longest-to-shortest per AGENTS.md (~/utils/artifacts is 72 chars
  and was sitting after a 51-char import).

*  feat: Auto-open panel + route txt/docx/odt/pptx through artifacts

- artifacts.ts: add `text/plain` to TOOL_ARTIFACT_TYPES so plain-text
  documents (and the markdown-like ones we don't have rich viewers for
  yet) can route through the side panel. `useArtifactProps` already
  dispatches `text/plain` to the markdown-style template, so they
  render cleanly with no panel-side change.
- Extension map gains txt/docx/odt/pptx → text/plain. pptx is wired
  up speculatively — backend extraction is still deferred, so the
  routing fires the moment that lands. The MIME map gets the matching
  office MIME types for symmetry (extension wins, but it's nice to
  have the fallback when sniffing returns the canonical office MIME).
- ToolArtifactCard: register the artifact in `artifactsState` on
  mount again. With visibility defaulting to `true` and the panel's
  `useArtifacts` hook auto-selecting the latest artifact, this gives
  the auto-open behaviour that the legacy streaming artifacts have.
  Click handler is now just "focus + reveal" (registration already
  happened); a user who has explicitly closed the panel keeps it
  closed and uses the click to re-open.
- Tests: parameterised row for each new extension; ArtifactRouting
  invariant flipped from "no register on mount" to "registers on
  mount so panel can auto-open". Existing TextAttachment test that
  used `a.txt` switched to `a.csv` since `.txt` now panel-routes.

* 🐛 fix: Auto-focus latest tool artifact + self-heal after panel close

Two bugs in the previous commit's auto-open behaviour:

1. After closing the side panel, no artifact card could be reopened.
   `useArtifacts.ts` resets `artifactsState` in its unmount cleanup
   (line 50), which fires when visibility goes to `false`. The card's
   mount-only `useEffect` doesn't refire after that wipe, so the
   subsequent click set `currentArtifactId` to an id that was no
   longer in `artifactsState`, and `Presentation.tsx` then refused to
   render the panel because `Object.keys(artifacts).length === 0`.

   Fix: the registration `useEffect` now has no dependency array, so
   it self-heals after the wipe (the dedup check keeps it cheap when
   nothing actually needs writing).

2. Newly-arrived artifacts didn't steal focus from an already-selected
   one. `useArtifacts`'s fallback auto-select (line 64) only fires
   when `currentId` is null or no longer in the list — it deliberately
   protects an existing selection, while the streaming-specific effect
   that handles legacy focus-stealing is gated on `isSubmitting`.
   That gate doesn't apply to tool-output artifacts.

   Fix: a second `useEffect` keyed on `artifact.id` calls
   `setCurrentArtifactId(artifact.id)` whenever a new card mounts.
   Cards mount in attachment-array order, so the LAST-mounted card
   (the newest tool output) wins — matching the legacy "latest auto-
   opens" UX.

Tests: replace the now-stale "no register on mount" assertion with
"registers and auto-focuses on mount", flip the toggle test to start
from the auto-focused state, and add two regression tests covering
the close-then-reopen path and the latest-of-many auto-focus.

*  feat: Route pptx through artifact panel with placeholder content

Before this commit, pptx files fell through to a plain FileContainer
chip even though the extension was wired into the artifact map: backend
text extraction is still deferred for pptx, so `attachment.text` came
back null/empty and `detectArtifactTypeFromFile`'s strict text check
returned null. That meant docx/odt rendered as proper artifact cards
while pptx in the same message rendered as a tiny download chip.

`detectArtifactTypeFromFile` now allows empty text for the plain-text
and markdown buckets, since their viewers (the markdown template) handle
empty content gracefully. HTML / React / Mermaid still require real
content because sandpack and mermaid.js error on empty input.

`fileToArtifact` substitutes a markdown placeholder
("Preview not available yet — click Download to view the file.") when
the file routes through the panel without text. The panel renders the
placeholder via the markdown template; pptx (and any docx that fails
extraction) gets visual parity with its siblings, and the moment
backend extraction lands the placeholder is replaced by real content
without any frontend change.

Tests: split the "no text returns null" assertion into the strict
viewers (HTML/React/Mermaid) and the lenient ones (plain-text/markdown);
add a fileToArtifact case proving pptx without text gets the
placeholder, and another proving real text wins when present.

*  feat: Dedup duplicate tool-artifact cards across tool calls + messages

Two `ToolArtifactCard` instances for the same file_id (e.g. agent reads
back what it just wrote, or the same file is referenced in turns 1 and
5) now collapse to a single chip — the most recent mount wins, the
older sibling re-renders to `null`.

Implementation:

- New `toolArtifactClaim` atomFamily keyed by artifact id. Each card
  generates a unique component-instance key via `useId()`, claims the
  slot in a `useLayoutEffect` (synchronous before paint, no flicker),
  and releases it on unmount only if the claim is still ours. A later
  card with the same id overwrites the claim → earlier card subscribes
  via `useRecoilState` and renders `null`.

- Family-keyed (per artifact id) so adding/removing a claim for one
  file never re-renders cards for unrelated files. Addresses the
  "messages view re-renders frequently" concern: each card subscribes
  only to its own slice.

- `ToolMermaidArtifact` shares the same atom via the new exported
  `toolArtifactKey()` helper, so the same `.mmd` file can't double-
  render either.

- Latest content always wins for the panel because the eager
  `setArtifacts` registration is last-write-wins on `artifactsState`
  by id — independent of which card holds the claim. Updating a file
  refreshes the panel content even if the chip's visual location
  doesn't move.

Tests: two new cases asserting that duplicate panel and mermaid
attachments collapse to a single rendered card.

* 🧹 chore: Address comprehensive review on code-artifacts-panel

- ToolArtifactCard self-heal now subscribes to a per-id selector
  (`artifactByIdSelector`) instead of a no-deps `useEffect`. Effect
  deps are `(artifact, existingEntry, setArtifacts)` so it runs
  deterministically when the slice transitions to undefined (panel-
  unmount cleanup) or when artifact content drifts — not on every
  parent render. Each card subscribes only to its own slice via the
  selectorFamily, so unrelated state changes don't re-render.
- artifacts.ts: localize the empty-content placeholder via a new
  `fileToArtifact(attachment, options?)` signature. Callers in
  `Attachment.tsx` (PanelArtifact) and `LogContent.tsx` resolve
  `com_ui_artifact_preview_pending` from `useLocalize` and thread
  it in. Default is empty string when no placeholder is supplied.
- artifacts.ts: thread `preClassifiedType` through `fileToArtifact`
  so the routing decision tree's `artifactTypeForAttachment` call
  is the only classification — previously `fileToArtifact` re-ran
  `detectArtifactTypeFromFile` after the routing already had the
  answer. Bucket type updated to `Array<{ attachment, type }>`.
- artifacts.ts: drop bare `text/plain` from `MIME_TO_TOOL_ARTIFACT_TYPE`.
  The extension map handles `.txt` explicitly; routing every
  unrecognized-extension `text/plain` file (extensionless scripts,
  `.env`, etc.) through the panel was a wider catch than the PR
  scope intended.
- artifacts.ts: stable `toLastUpdate` fallback of `0` (was
  `Date.now()`). `useArtifacts` sorts by `lastUpdateTime`, so a fresh
  timestamp on every call would re-sort entries non-deterministically
  across renders.
- artifacts.ts: drop dead `toolArtifactId = toolArtifactKey` alias.
  Add `filepath` to the key-derivation fallback chain so two
  unnamed-and-unidentified files don't collide on the literal
  `tool-artifact-unknown` key.
- ToolArtifactCard import order: package types before local types.
- store/artifacts.ts: JSDoc on `toolArtifactClaim` documenting the
  atomFamily-entries-persist-after-unmount trade-off (entries reset
  to null on card unmount; total cost is one key + a null per
  artifact — fine at typical session scale).
- Tests:
  - Updated existing `fileToArtifact` placeholder assertion to use
    the caller-provided string.
  - New: panel routing skips re-classification when
    `preClassifiedType` is provided.
  - New: bare `text/plain` MIME with unrecognized extension does
    NOT route through the panel.
  - New `LogContent.test.tsx` (6 cases) — HTML→panel, mermaid→
    inline, CSV→inline `<pre>`, archive→download chip, pptx→
    placeholder card, mixed split.
  - Dedup tests rewritten to use two AttachmentGroups (matching
    the real per-tool-call render) instead of a same-array
    duplicate that triggered React's duplicate-key warning.

* 🩹 fix: Address codex review + comprehensive review NITs

codex (P2):
- artifacts.ts: switch placeholder fallback to nullish coalescing.
  Empty string is now preserved as legitimate content (a 0-byte `.md`
  or `.txt` is a valid artifact, not "extraction unavailable") —
  only `null`/`undefined` triggers the deferred-extraction placeholder.
- Attachment.tsx: derive React keys via a new `renderKey` helper that
  combines `file_id` with the array index. Prevents duplicate keys
  when the same file_id appears twice in one bucket (rare but
  possible — a tool call writing the same path twice). Without
  unique keys, React's reconciler could reuse the wrong card
  instance, undermining the latest-mention dedup.

comprehensive review NITs:
- Attachment.tsx: hoist `import type { ToolArtifactType }` up into
  the type-import section per AGENTS.md.
- artifacts.ts `fileToArtifact`: defense-in-depth empty-text guard
  for the `preClassifiedType` path. Mirrors the gate in
  `detectArtifactTypeFromFile` so a future caller that bypasses
  classification can't hand sandpack/mermaid an empty buffer.
  Plain-text and markdown remain tolerated empty.

Tests:
- New: empty `.md` content passes through unchanged when a
  placeholder is also supplied.
- New: sibling cards with the same file_id in one group render
  without React key-collision warnings.
- Updated existing placeholder test to use `text: null` (the case
  where the placeholder is actually meant to fire).
- Three parameterized cases pinning the new
  preClassifiedType-with-empty-text safety guard.

* 🩹 fix: Address codex P1/P2 review on code-artifacts-panel

- P1 (stale artifacts leak across conversations): Add a top-level
  `useResetArtifactsOnConversationChange` hook in `Presentation.tsx`
  that wipes `artifactsState` / `currentArtifactId` on every
  conversation switch, regardless of panel visibility. Without this
  guard, ToolArtifactCard's self-heal effect would re-register the
  previous conversation's artifacts after panel close, leaking them
  into the next conversation's panel on open.

- P2 (expiresAt skipped on panel-routed entries): Restore the legacy
  expiry gate in `LogContent` ahead of panel/mermaid bucket-sort, so
  expired pptx/html/etc. attachments fall back to the
  "download expired" message instead of rendering as a clickable
  artifact card backed by a dead link.

Includes regression coverage for both paths.

* 🧹 chore: Share renderAttachmentKey across Attachment + LogContent

Hoist the per-occurrence React-key helper from `Attachment.tsx` into
`attachmentTypes.ts` so `LogContent` can use the same pattern. Apply
it to LogContent's panel/mermaid/text/image/nonInline buckets — the
prior keys (e.g. `mermaid-${file_id ?? index}`, `file.filepath ?? ...`)
would have collided if the same file_id appeared twice in one render,
even though that's astronomically rare for a single tool call.

Also drops the unused `file_id` field on `MermaidEntry` since the key
no longer needs it.

* 🩹 fix: Loosen artifacts util input types to match runtime fallbacks

`fileToArtifact`, `detectArtifactTypeFromFile`, `toolArtifactKey`, and
`toLastUpdate` all read every picked field with a nullish fallback —
their inputs were nonetheless typed as required `Pick<TFile, ...>`.
That mismatch made every realistic fixture (and several call sites
that lack a stable `filepath`) fail typecheck for fields the
implementations never strictly need.

Wrap the picks in `Partial<>` so the type matches the contract.

* 🩹 fix: Gate tool-artifact registration on claim winner

When two `ToolArtifactCard` instances mount for the same `artifact.id`
with divergent content (a code-execution file overwritten across
turns reuses its file_id), both effects subscribe to `existingEntry`
through `artifactByIdSelector`. Each card detects the other's write as
drift and overwrites it back, ping-ponging `artifactsState` between
old and new content and causing render churn / panel flicker.

Gate the self-heal registration on `isMyClaim` so only the latest
(claim-holding) card writes. The non-winner still subscribes to the
slice but short-circuits before calling `setArtifacts`, breaking the
loop. Adds a regression test that fails (loop / wrong final content)
without the gate.
2026-04-28 06:08:32 +09:00
Danny Avila
24e29aa8cb
🌱 fix: Inject Code-Tool Files Into Graph Sessions on First Call (+ read_file Sandbox Fallback) (#12831)
* 🌱 fix: Seed Code Tool Files Into Graph Sessions on First Call

Files attached to an agent's `tool_resources.execute_code` (user uploads
or generated artifacts from a prior turn) were silently dropped on the
first `execute_code` invocation of a turn. The agents-side `ToolNode`
populates `_injected_files` only when its `sessions` map already has an
`EXECUTE_CODE` entry — but that entry is only written by a previous
successful execution, so call #1 had nothing to inject. CodeExecutor
then fell back to a `/files/{session_id}` fetch, but `session_id` was
also empty on call #1, leaving the sandbox without the primed files.

Mirror the existing skill-priming pattern (`primeInvokedSkills` →
`initialSessions`) for code-resource files: eagerly call `primeFiles`
before `createRun` and merge the result into `initialSessions` via a
new `seedCodeFilesIntoSessions` helper. Skill files and code-resource
files now share the same `EXECUTE_CODE` entry; the prior representative
`session_id` is preserved on merge.

* 🔬 chore: Add Diagnostic Logging for Code-Files Seeding

Temporary debug logs to diagnose why first-call file injection is not
firing in real agent runs. Logs `wantsCodeExec`, available tool-resource
keys, primed file count, and the seeded EXECUTE_CODE entry. Will revert
once the failure mode is identified.

* 🪛 refactor: Capture primedCodeFiles per-agent at init, merge across run

Replace the client.js eager `primeFiles` call with a per-agent capture at
initialization time so every agent in a multi-agent run (primary +
handoff + addedConvo) contributes its `tool_resources.execute_code`
files to the shared `Graph.sessions` seed.

- handleTools.js (eager loadTools): the `execute_code` factory closes
  over a `primedCodeFiles` slot and surfaces it in the return.
- ToolService.js loadToolDefinitionsWrapper (event-driven): captures
  `files` from the existing `primeCodeFiles` call (was dropping them
  while only keeping `toolContext`) and surfaces them.
- packages/api initialize.ts: the loadTools callback contract now
  includes `primedCodeFiles`, threaded onto `InitializedAgent`.
- client.js: iterate `[primary, ...agentConfigs.values()]` and merge
  each agent's `primedCodeFiles` into `initialSessions`. Drop the
  primary-only `primeCodeFiles` call and diagnostic logs from the prior
  attempt — wrong layer (single-agent), wrong gate (`agent.tools`
  contained Tool instances after init, so the `.includes("execute_code")`
  string check always failed).

* 🔬 chore: Add per-agent diagnostic logs for code-files seeding

Logs `tool_resources` keys + file counts inside loadToolDefinitionsWrapper
and per-agent `primedCodeFiles` + final initialSessions inside
AgentClient. Will revert once the failure mode is confirmed.

* 🔬 chore: Add file-lookup diagnostics inside initializeAgent

Logs the inputs and intermediate counts of the conversation-file lookup
chain (convo file ids, thread message ids, code-generated and
user-code file counts) so we can pinpoint why `tool_resources.execute_code`
is arriving empty at `loadToolDefinitionsWrapper` despite the agent
having `execute_code` in its tools list.

* 🔬 chore: Probe execute_code files without messageId filter

Adds a relaxed `getFiles({conversationId, context: execute_code})` probe
that runs only when `getCodeGeneratedFiles` returns empty. Lists what's
actually in the DB for this conversation so we can confirm whether the
file is missing entirely or whether the messageId filter is rejecting it.

* 🔬 chore: Fix probe getFiles arg order (sort vs projection)

Probe was passing a projection object as the sort arg, which mongoose
rejected with `Invalid sort value`. Move it to the third arg
(selectFields) so the probe actually runs.

* 🪢 fix: Preserve Original messageId on Code-Output File Update

Each `processCodeOutput` call was overwriting the persisted file's
`messageId` with the *current* run's id. When a turn re-creates an
existing file (filename + conversationId match → `claimCodeFile`
returns the existing record, `isUpdate=true`), the file's link to
the assistant message that originally produced it gets clobbered.

`initializeAgent` later runs `getCodeGeneratedFiles({messageId: $in: <thread>})`
to seed `tool_resources.execute_code` from prior-turn artifacts. With a
stale `messageId` (e.g. from a failed read attempt that re-shelled the
same filename), the file no longer matches the parent-walk thread, so
`tool_resources` arrives empty at agent init, the new
`primedCodeFiles` channel has nothing to seed, and the LLM can't see
its own prior-turn artifacts on the next turn — defeating the
just-added Graph-sessions seeding fix.

Preserve the existing `claimed.messageId` on update; first-creation
behavior is unchanged. The runtime return value still includes the
current run's `messageId` (via `Object.assign(file, { messageId })`)
so the artifact is correctly attributed to the live tool_call.

* 🧹 chore: Remove diagnostic logs from code-files seeding path

Drops the temporary debug logs added to trace the empty-tool_resources
failure mode. Production code paths (loadToolDefinitionsWrapper,
client.js seed loop, initializeAgent file lookup) are left as the
permanent shape: capture primedCodeFiles, merge across agents, seed
initialSessions before run start.

* 🪛 feat: read_file Sandbox Fallback for /mnt/data + Non-Skill Paths

When the model called `read_file` with a code-execution path (e.g.
`/mnt/data/sentinel.txt`), the handler returned a misleading
`Use format: {skillName}/{path}` error. Adds a sandbox-aware fallback:

- Short-circuit `/mnt/data/...` (can never be a skill reference) →
  route to a sandbox `cat` via the new host-provided `readSandboxFile`
  callback, which POSTs to the codeapi `/exec` endpoint.
- Skip the skill resolver entirely when `accessibleSkillIds` is empty
  — the resolved-output of `resolveAgentScopedSkillIds` already
  collapses the admin capability + ephemeral badge + persisted
  `skills_enabled` chain, so an empty value is the authoritative
  "skills aren't in scope for this agent" signal.
- For `{firstSegment}/...` paths, consult the catalog-derived
  `activeSkillNames` Set (no DB read) to detect non-skill names and
  fall through to the sandbox before the model has to retry with
  `bash_tool`.

`activeSkillNames` is captured from `injectSkillCatalog`, threaded onto
`InitializedAgent`, into `agentToolContexts`, then through
`enrichWithSkillConfigurable` into `mergedConfigurable` for the handler.

The host implementation of `readSandboxFile` lives in
`api/server/services/Files/Code/process.js` and shells `cat <path>`
through the seeded sandbox session — `tc.codeSessionContext`
(emitted by ToolNode for `read_file` calls in `@librechat/agents`
v3.1.72+) provides the `session_id` + `_injected_files` so the read
lands in the same sandbox that holds prior-turn artifacts. When the
seeded context isn't available (older agents version, no codeapi
configured), the handler returns a model-visible error pointing at
`bash_tool` instead of silently failing.

Tests: 8 new `handleReadFileCall` cases cover the new short-circuits,
the skills-not-enabled gate, the activeSkillNames lookup, the
sandbox-fallback success path, and the bash_tool retry hint on
fallback failure. Existing `read_file` tests now opt into "skills are
in scope" via a `skillsInScope()` fixture (production wouldn't reach
the skill lookup with empty `accessibleSkillIds`).

* 🔧 chore: Update @librechat/agents dependency to version 3.1.72

Bumps the version of the @librechat/agents package across package-lock.json and relevant package.json files to ensure compatibility with the latest features and fixes.

* 🪛 refactor: Centralize Tool-Session Seed in buildInitialToolSessions Helper

Addresses review feedback on the per-agent merge in client.js:

- **Run-wide semantics, named explicitly.** The merge into a single
  `Graph.sessions[EXECUTE_CODE]` was a deliberate match to the
  agents-library design (`Graph.sessions` is shared across every
  `ToolNode` in the run), but the inline `for (const a of agents)`
  loop in `AgentClient.chatCompletion` made it look per-agent. Move
  the logic to a TS helper `buildInitialToolSessions` that documents
  the run-wide-by-design contract in one place. The CJS controller
  now contains a single call site, no business logic.

- **Subagent walk (P2).** The previous loop only iterated
  `[primary, ...agentConfigs.values()]`. Pure subagents are pruned
  out of `agentConfigs` after init and retained on each parent's
  `subagentAgentConfigs`, so their primed code files were silently
  dropped from the seed. The helper now walks recursively, with a
  visited-Set keyed on object identity that terminates safely on a
  malformed agent graph (cycle).

- **`jest.setup.cjs` polyfill for undici `File`.** Reviewer hit
  `ReferenceError: File is not defined` running the targeted spec on
  WSL — a known Node 18 issue where `globalThis.File` from
  `node:buffer` isn't auto-exposed. Polyfill it inside a Jest setup
  file so the suite boots regardless of Node patch version.

Helper test coverage (8 new): skill-only / agent-only / both,
recursive subagent walk, cycle-safe walk, primary+subagent
deduplication, undefined/null entries in the agents iterable, and
representative session_id preservation across the merge.

16 tests pass total in `codeFilesSession.spec.ts` (8 prior + 8 new).
No behavior change vs. the previous commit for the existing
primary+agentConfigs case — subagent inclusion is the only new
behavior, and it matches what the existing seeding logic would have
done if subagents had been in `agentConfigs`.

* 🪛 fix: FIFO Walk Order in buildInitialToolSessions (P3 review)

The traversal used `Array.pop()` (LIFO), which visited the LAST
top-level agent first. The docstring says "primary first"; the code
contradicted it. When no skill seed exists the first-visited agent's
first file supplies the representative `session_id` written to
`Graph.sessions[EXECUTE_CODE]` — so a LIFO walk silently flipped which
agent that came from. `ToolNode` ultimately uses per-file `session_id`s
for runtime injection (so behavior was indistinguishable for current
callers), but the discrepancy was a footgun for any future consumer
that read the representative.

Switch to FIFO via `Array.shift()` to match both the docstring and the
existing `loadSubagentsFor` walk pattern in
`Endpoints/agents/initialize.js`. Add a regression test that asserts
the primary's `session_id` is the representative (and that all three
agents' files still contribute, with per-file `session_id`s preserved).

* 🔬 test: Lock In Code-Files Bug Fixes Per Comprehensive Review

Addresses MAJOR + MINOR + NIT findings from the multi-pass review:

**Finding #4 (MINOR) — empty relativePath misses sandbox fallback.**
A model calling `read_file("output/")` where "output" isn't a skill
name dead-ended with `Missing file path after skill name` instead of
being routed to the sandbox like every other malformed-path branch.
Add the same `codeEnvAvailable → handleSandboxFileFallback` pattern,
plus two regression tests.

**Finding #7 (NIT) — duplicate `skillsInScope()` helper.**
Hoist the identical helper out of two nested describe blocks to
module scope. Single source of truth.

**Finding #1 (MAJOR) — `persistedMessageId` had zero test coverage.**
The fix preserves a file's original `messageId` on update so
`getCodeGeneratedFiles` can still match it on subsequent turns. A
regression in the `isUpdate ? (claimed.messageId ?? messageId) : messageId`
ternary would silently re-introduce the original cross-turn priming
bug. Five new tests cover:
- UPDATE preserves `claimed.messageId` in the persisted record
- UPDATE falls back to current run id when `claimed.messageId` is
  absent (legacy records predating the field)
- CREATE uses current run id (no claimed record exists)
- The runtime return value uses the LIVE id (artifact attribution)
  even when the persisted record kept the original
- The image branch follows the same contract (would silently regress
  if the ternary diverged across the two file-build branches)

The tests use a `snapshotCreateFileArgs()` helper because
`processCodeOutput` mutates the file object after `createFile`
returns (`Object.assign(file, { messageId, toolCallId })`) and a
naive `createFile.mock.calls[0][0]` would reflect the post-mutation
state instead of what was actually persisted.

**Finding #2 (MAJOR) — `readSandboxFile` had no direct tests.**
The model-controlled `file_path` flows through a POSIX single-quote
escape into a shell `cat` command, making this a security boundary.
A quoting regression would let a malicious filename break out of the
quoted argument and inject arbitrary shell. 20 new tests across:
- Shell quoting (7): plain filenames, embedded `'`, `$()`, backticks,
  newlines, shell metachars, multiple consecutive single-quotes
- Payload shape (6): /exec URL, bash language, conditional
  session_id / files inclusion, dedicated keepAlive:false agents
- Response handling (6): `{content}` on success, null on missing
  base URL or absent stdout, throws on stderr-only, partial-success
  returns stdout, transport errors are logged then rethrown
- Timeout (1): matches processCodeOutput's 15s SLA

Audited findings #5 (acknowledged tech debt — readSandboxFile in JS
workspace), #6 (pre-existing positional-args debt on
enrichWithSkillConfigurable), and #8 (cosmetic JSDoc style) — no
action taken per the reviewer's own assessment.

Audited finding #3 (walk order vs docstring) — already addressed in
commit 007f32341 which converted to FIFO via `queue.shift()` plus a
regression test. The audit was performed against an earlier PR head.

Tests: 152 packages/api + 195 api JS = 347 pass. Typecheck clean.

* 🪛 fix: Pure-Subagent codeEnv + Primed-Skill Routing + ToolService Early Returns

Three findings from the second-pass review:

**P2 — Pure subagents missed `codeEnvAvailable`** (initialize.js).
The pure-subagent init path didn't forward the endpoint-level
`codeEnvAvailable` flag to `initializeAgent`, unlike the primary,
handoff, and addedConvo paths. A code-enabled subagent loaded only
through `subagentAgentConfigs` initialized with
`codeEnvAvailable: false`, so even though the recursive seed walk
found its primed code files, the subagent's own `bash_tool` /
`read_file` sandbox fallback were silently gated off. Forward the
flag and add `codeEnvAvailable: config.codeEnvAvailable` to the
`agentToolContexts.set` for symmetry with the other paths.

**P2 — Primed skills outside the catalog cap were misrouted to
sandbox** (handlers.ts). Manual ($-popover) and always-apply primes
are intentionally resolved off the wider `accessibleSkillIds` ACL
set BEFORE catalog injection — see `resolveManualSkills` for why a
skill outside the `SKILL_CATALOG_LIMIT` cap can still be authorized
for direct manual invocation. The `activeSkillNames` shortcut ran
before reading `skillPrimedIdsByName`, so a primed skill not in the
catalog would fall through to the sandbox instead of resolving via
the pinned `_id`. Read the primed map first and bypass the shortcut
for primed names. New regression test asserts a primed-but-not-
cataloged skill resolves through the existing skill path with
`getSkillByName` invoked and `readSandboxFile` NOT called.

**P3 — `loadAgentTools` early returns dropped `primedCodeFiles`**
(ToolService.js). The non-`definitionsOnly` path captures the field
correctly, but two early-return branches (no-action-tools fast path,
no-action-sets fast path) omitted it. Any traditional
`loadAgentTools(..., definitionsOnly: false)` caller using
execute_code without action tools would have its first-call session
seed silently empty. Add `primedCodeFiles` to both early returns
for consistency with the final return shape.

Tests: 153 packages/api + 195 api JS = 348 pass.

* 🧹 chore: Document jest.mock arrow-indirection pattern in process.spec.js

Per the second-pass review's Finding #2 (NIT, "would help future
readers"): the mock setup mixes direct `jest.fn()` references with
arrow-function indirection (`(...args) => mockX(...args)`). The
indirection isn't stylistic — it's required because `jest.mock(...)`
is hoisted above the outer `const` declarations at parse time, so a
direct reference would capture `undefined`. Inline comment explains
the pattern so the next reader doesn't have to reverse-engineer it
or accidentally "simplify" the mocks and break per-test
`mockReturnValueOnce` / `mockImplementationOnce` overrides.

* 🪛 fix: Five Issues from Pass-N + Codex Review (incl. 404 root cause)

Five real bugs surfaced by another review pass + Codex PR comments
+ the codeapi-side logs we collected during manual testing:

**1) `processCodeOutput` 404 root cause (`callbacks.js`).**
The codeapi worker emits TWO distinct `session_id`s on a tool result:
  - `artifact.session_id` is the EXEC session — the sandbox VM that
    ran the bash command. Files don't live there; it's torn down
    post-execution.
  - `file.session_id` is the STORAGE session — the file-server
    bucket prefix where artifacts actually live.
`callbacks.js` was passing the EXEC id to `processCodeOutput`, which
builds `/download/{session_id}/{id}` and 404s because the file-server
doesn't know about that path. This explains every "Error
downloading/processing code environment file" we saw during testing.
Use `file.session_id ?? output.artifact.session_id` (per-file id with
artifact-level fallback for older worker payloads).

**2) `primeFiles` reupload pushed STALE sandbox ids (`process.js`).**
When `getSessionInfo` returns null (file expired/missing in sandbox),
`reuploadFile` re-uploads via `handleFileUpload`, gets a NEW
`fileIdentifier`, and persists it on the DB record. But `pushFile`
was a closure capturing the OLD `(session_id, id)` parsed at the top
of the loop, so the in-memory `files[]` array (now consumed by
`buildInitialToolSessions` to seed `Graph.sessions`) silently
referenced a sandbox object that no longer existed. The first tool
call would 404 trying to mount it; only the next turn's metadata
re-read would correct course. Parameterize `pushFile` with optional
`(session_id, id)` overrides; in `reuploadFile` parse the new
identifier and pass through. 2 regression tests.

**3) Codex P2 — Cap sandbox fallback output before line-numbering
(`handlers.ts`).** The new `handleSandboxFileFallback` returned
`addLineNumbers(result.content)` without a size guard, so reading a
multi-MB `/mnt/data/*` artifact materialized the file twice in
memory (raw + line-numbered) before downstream truncation. Match the
existing skill-file path's `MAX_READABLE_BYTES` (256KB): truncate
raw first, then number, surface the truncation to the model so it
can use `bash_tool` (`head` / `tail`) for the rest. 2 tests
(oversized truncates with hint, in-cap doesn't).

**4) Codex P2 — Dedupe seeded code files by `(session_id, id)`
(`codeFilesSession.ts`).** Multiple agents in a run commonly carry
the same primed execute-code resources (shared conversation files);
without dedupe, `_injected_files` grows proportionally to agent
count and bloats every `/exec` POST. Use a `(session_id, id)`
identity key so first-seen wins (preserves source ordering); name
alone isn't sufficient because two distinct primed uploads can
share a filename across different sessions. 4 tests covering dedup
across iterations, against pre-existing entries, name-collision
distinct-session preservation, and the multi-agent realistic case
in `buildInitialToolSessions`.

**5) Pass-N P2 — Polyfill `globalThis.File` in api Jest setup
(`api/test/jestSetup.js`).** `packages/api/jest.setup.cjs` had the
polyfill; the legacy api workspace's Jest config has its own
`setupFiles` that didn't, so on Node 18 / WSL the api focused tests
still failed at import time with `ReferenceError: File is not
defined` from undici. Mirror the polyfill.

Tests: 159 packages/api + 206 api JS = 365 pass. Typecheck clean.

* 🔧 chore: Update @librechat/agents dependency to version 3.1.73

Bumps the version of the @librechat/agents package across package-lock.json and relevant package.json files to ensure compatibility with the latest features and fixes.
2026-04-27 08:56:39 +09:00
Danny Avila
f7e47f6012
🪢 feat: Enable Tool-Output References for Bash Tool (#12830)
* chore: Update `@librechat/agents` to v3.1.71-dev.0 across package-lock and package.json files

This commit updates the version of the `@librechat/agents` package from `3.1.70` to `3.1.71-dev.0` in the `package-lock.json` and relevant `package.json` files. Additionally, it marks several dependencies as peer dependencies, ensuring better compatibility and integration across the project.

* 🔗 feat: Enable Tool-Output References for bash_tool when codeenv is on

Wires `@librechat/agents`' `RunConfig.toolOutputReferences` into
`createRun()` and the bash tool's LLM-facing description, gated by the
per-agent `effectiveCodeEnvAvailable` flag. The feature auto-activates
for any run where the bash tool is actually registered; SDK defaults
(~400 KB per output, 5 MB total) match the shell-safe budget. No new
env var or yaml capability — piggybacks on the existing `execute_code`
gate.

- `tools.ts`: replace the module-level `BASH_TOOL_DEF` constant with a
  per-call `buildBashToolDef` that wraps `buildBashExecutionToolDescription`.
  Description now includes the `{{tool<idx>turn<turn>}}` reference syntax
  guide iff the new `enableToolOutputReferences` param is true.
- `initialize.ts`: pass `enableToolOutputReferences: effectiveCodeEnvAvailable`
  into `registerCodeExecutionTools`.
- `run.ts`: add `codeEnvAvailable?: boolean` to `RunAgent`, compute the
  flag from `agents[*].codeEnvAvailable`, and conditionally spread
  `toolOutputReferences: { enabled: true }` into `Run.create`.

* 🧪 test: Cover tool-output references gating end-to-end

- `tools.spec.ts`: 3 new cases asserting `bash_tool.description`
  contains `{{tool<idx>turn<turn>}}` iff `enableToolOutputReferences` is
  true (and unset → false).
- `run-summarization.test.ts`: 4 new cases asserting `Run.create` is
  invoked with `toolOutputReferences: { enabled: true }` iff at least
  one `RunAgent.codeEnvAvailable === true`. Covers the present /
  absent / unset / multi-agent-OR cases.
- `initialize.test.ts` + `skills.test.ts`: extend the existing
  `@librechat/agents` jest mocks with a `buildBashExecutionToolDescription`
  stub so suites stay green when the on-disk SDK lags the published
  3.1.71-dev.0 export.

* chore: Update `@librechat/agents` version to `3.1.71-dev.1` in package-lock and package.json files

This commit updates the version of the `@librechat/agents` package from `3.1.71-dev.0` to `3.1.71-dev.1` across the relevant package files. This change ensures consistency and incorporates any updates or fixes from the new version.

* 🪢 fix: Walk Subagents in toolOutputReferences run-level gate

Codex P2 review on PR #12830: the run-level
`enableToolOutputReferences` flag only inspected the top-level
`agents` array. A parent agent without `execute_code` that spawns a
subagent that *does* have it left the SDK's tool-output reference
registry inactive for the run, so the subagent's `bash_tool` calls
saw `{{tool<idx>turn<turn>}}` placeholders pass through to the
shell unsubstituted.

Replace `agents.some(a => a.codeEnvAvailable === true)` with a
recursive `anyAgentHasCodeEnv` helper that walks
`subagentAgentConfigs` transitively. Cycle-safe via a `visited` set,
mirroring the existing `buildSubagentConfigs.ancestors` pattern in
the same module. The bash tool *description* stays per-agent in
`initializeAgent` (only agents with bash actually registered learn
the `{{…}}` syntax), so broadening the run-level gate doesn't
broaden the model-facing surface — it just lets the SDK's shared
registry serve every `ToolNode` the run compiles, which is exactly
the contract the SDK already implements.

Tests cover three new cases: parent-off / subagent-on, parent-off /
child-off / grandchild-on (transitive descent past one level), and
a cyclic A↔B tree with neither codeenv-enabled (asserts both
termination and absence of `toolOutputReferences`). Existing
single-agent and multi-agent tests stay valid since the new helper
returns `true` whenever the previous `.some(...)` did.

* chore: Update `@librechat/agents` version to `3.1.71` in package-lock and package.json files

This commit updates the version of the `@librechat/agents` package from `3.1.71-dev.1` to `3.1.71` across the relevant package files. This change ensures consistency and incorporates any updates or fixes from the stable release.

* review: address audit findings on tool-output references PR

Two findings from comprehensive PR review on #12830:

#1 (MINOR) — `injectSkillCatalog` omitted `enableToolOutputReferences`
when calling `registerCodeExecutionTools`, so its resulting
`bash_tool` description always lacked the `{{tool<idx>turn<turn>}}`
guide. Today this is a no-op because `initializeAgent` registers
first and the registry `.has()` check makes the skills-path call a
dedupe-only operation. But if call order ever flips (skills-first),
the missing flag would silently ship a `bash_tool` without the
syntax guide, and the `initializeAgent` pass would itself become
the no-op — the feature would silently break with no visible error.
Forward `enableToolOutputReferences: codeEnvAvailable === true` so
both call sites produce identical tool definitions regardless of
firing order. Defense-in-depth, not a current bug. Added a test in
`skills.test.ts` that asserts the bash description contains the
`{{tool<idx>turn<turn>}}` marker when `codeEnvAvailable` is on,
exercising the skills caller end-to-end.

#2 (NIT) — `buildBashToolDef` allocated + froze a fresh object on
every agent init. Replaced with two module-level frozen singletons
(`BASH_TOOL_DEF_WITH_OUTPUT_REFS`, `BASH_TOOL_DEF_WITHOUT_OUTPUT_REFS`)
built once at module load via a `createBashToolDef` helper. The
factory now picks the right cached reference instead of building.
Restores the no-allocation intent of the original `BASH_TOOL_DEF`
constant while keeping the per-agent gate behavior. Two new tests
in `tools.spec.ts` pin the contract: identical-flag calls return
reference-equal `bash_tool` defs across registries; opposite-flag
calls return distinct frozen objects with the expected description
content.
2026-04-26 02:06:23 -07:00
Danny Avila
8c073b4400
📄 feat: Auto-render Text-Based Code Execution Artifacts Inline (#12829)
* 📄 feat: Auto-render Text-Based Code Execution Artifacts Inline

Eagerly extract text content from non-image artifacts produced by code
execution tools and render it inline in the message instead of behind a
click-to-download file card. Reuses the SkillFiles binary-detection
helper and the existing parseDocument dispatcher so docx, xlsx, csv,
html, code, and other text-renderable formats land directly under the
tool call.

PPTX is intentionally classified but not yet extracted — follow-up.

* 🌐 chore: Remove unused com_download_expires locale key

Removed in en/translation.json so the detect-unused-i18n-keys CI check
passes. The only reference was a commented-out localize() call in
LogContent.tsx that was deleted in the previous commit.

* 🩹 fix: Address PR review on code artifact text extraction

- extract.ts: build the temp document path from a randomUUID and pass
  path.basename(name) as originalname so a malicious artifact name
  cannot escape os.tmpdir() (P1 traversal flagged by codex/Copilot).
- process.js: classify and extract using safeName, not the raw name —
  defense in depth alongside the temp-path fix.
- classify.ts: add a bare-name lookup so extensionless text artifacts
  (Makefile, Dockerfile, …) classify as utf8-text instead of falling
  through to other.
- Attachment.tsx: wire aria-expanded / aria-controls on the show-all
  toggle for screen reader support.
- LogContent.tsx: restore a download chip (LogLink) on inline-text
  attachments so users can still pull down the underlying file.
- Tests: cover extensionless filenames and the temp-path traversal
  invariant.

* 🩹 fix: Address comprehensive PR review on code artifact extraction

- extract.ts: walk back to a UTF-8 code-point boundary before truncating
  so cuts cannot land mid-multibyte and emit U+FFFD (CJK/emoji concern).
  truncate() now accepts the original buffer to skip a redundant encode.
- extract.ts: add an 8s timeout around parseDocument via Promise.race so
  a pathological docx/xlsx cannot stall the response path.
- process.js: always set `text` (string or null) on the file payload —
  createFile uses findOneAndUpdate with $set semantics, so omitting the
  field leaves a stale value behind when an artifact's content changes.
- Attachment.tsx: switch the show-all toggle from char-count threshold
  to a useLayoutEffect ref measurement on scrollHeight, and use
  overflow-hidden when collapsed (overflow-auto when expanded) so the
  collapsed box has a single clear interaction model.
- Attachment.tsx + LogContent.tsx: lift `isImageAttachment` /
  `isTextAttachment` into a shared attachmentTypes module. LogContent
  keeps its looser image check (no width/height required) because the
  legacy log surface receives attachments without dimensions.
- Tests: cover multi-byte boundary, the always-set-text contract on
  updates, and the new shared predicates.

* 🧪 test: Component test for TextAttachment + direct withTimeout coverage

- Attachment.tsx: re-order local imports longest-to-shortest per
  AGENTS.md (attachmentTypes ahead of FileContainer/Image).
- extract.ts: export withTimeout so it can be unit-tested directly
  (it's also used internally — exporting carries no runtime cost).
- extract.spec.ts: three small unit tests on withTimeout that cover
  resolve, propagated rejection, and timeout rejection paths with
  real timers.
- TextAttachment.test.tsx: ten cases for the new React component —
  text rendering in <pre>, download chip presence/absence, ref-based
  collapse measurement (with scrollHeight stubbed via prototype),
  aria-expanded toggle, fall-through to FileAttachment for missing
  and empty text, and AttachmentGroup routing.

* 🩹 fix: Canonicalize document MIME by extension before parseDocument

When the classifier puts a file on the document path via its extension
(.docx, .xlsx, …) but the buffer sniffer returned a generic value like
application/zip or application/octet-stream, we previously forwarded
that generic MIME to parseDocument, which dispatches strictly by MIME
and silently rejected it — exactly defeating the extension-first
classification this PR added.

extractDocument now remaps the MIME from the extension (falling back
to the original sniffed MIME if the extension is unrecognized, so files
that reached the document branch via MIME detection still work). Adds
a parameterized test across docx/xlsx/xls/ods/odt against zip/octet
sniffs to guard the regression.

* 🩹 fix: Reuse existing withTimeout from utils/promise

The previous commit's local withTimeout export collided with the
already-exported `withTimeout` from `~/utils/promise`, breaking the
@librechat/api tsc job (TS2308 ambiguous re-export).

Drops the duplicate, imports from `~/utils/promise`, and removes the
now-redundant unit tests (the helper has its own coverage in
utils/promise.spec.ts). The third argument shifts from a label to the
fully-formed timeout error message that the existing helper expects.

* 🧹 chore: TextAttachment test polish (NITs)

- Use the conventional `import Attachment, { AttachmentGroup }` form
  rather than `default as Attachment`.
- Save the original `scrollHeight` property descriptor and restore it
  in afterAll, so the prototype patch never leaks past this suite.
2026-04-26 02:04:00 -07:00
Danny Avila
596f806f60 🛡️ fix: Strict Opt-In Skills Activation per Agent (#12823)
* 🛡️ fix: Strict opt-in skills activation per agent

Skills were activating on every agent run that had the capability +
RBAC enabled, regardless of whether the user (ephemeral) or author
(persisted) had opted in. `scopeSkillIds(undefined)` fell through to
"full accessible catalog" whenever `agent.skills` was unset, which is
the default state for any agent created before skills existed and for
every ephemeral agent.

Activation now requires an explicit signal:
- Ephemeral agent → per-conversation skills badge toggle.
- Persisted agent → new `skills_enabled` master switch on the agent
  doc, surfaced as a toggle in the Agent Builder skills section.
  Enabled + empty/undefined allowlist = full accessible catalog;
  enabled + non-empty allowlist = narrow to those ids; disabled (or
  undefined) = no skills available, even if an allowlist is set.

Centralised the predicate in `resolveAgentScopedSkillIds` so the
primary-agent path, handoff/discovery, the subagent loop, and both
OpenAI controllers all share one source of truth. Frontend `$`
popover scope mirrors the same logic so the UI never offers skills
the backend would refuse to activate.

* test: mock resolveAgentScopedSkillIds in agent controller specs

* refactor: address review findings on skills opt-in PR

- AgentConfig: associate skills label with toggle via htmlFor for
  click/keyboard affordance; simplify Switch handler to Boolean(value).
- skills: mark scopeSkillIds as @internal so runtime callers continue
  to route through resolveAgentScopedSkillIds and inherit the activation
  predicate (ephemeral toggle, persisted skills_enabled).

* fix(agents): include skills_enabled in agent list projection

Without this field, agents loaded via the list endpoint hydrate into the
client agentsMap with skills_enabled === undefined, causing the `$`
skill popover to hide every skill on a fresh page load even when the
agent was saved with skills_enabled: true.

* fix(skills): fail closed for persisted agents during agentsMap hydration

Returning undefined while the agents map loads let the popover render the
full catalog for a persisted agent before we could read its
skills_enabled flag, so the user could pick a skill the backend would
then refuse for the turn. Match the strict opt-in contract by returning
[] until the map is authoritative.

* refactor(skills): extract skillsHintKey for readability

Replaces the nested ternary in the skills section JSX with a
pre-computed constant so the activation -> hint key mapping reads
top-down.

* refactor(skills): unflatten skillsHintKey to remove nested ternary
2026-04-25 04:02:01 -04:00
Danny Avila
7f3d41024a 📦 chore: Update @librechat/agents to v3.1.70
This commit updates the version of the `@librechat/agents` package from `3.1.68-dev.1` to `3.1.70` in the `package-lock.json` and relevant `package.json` files. This change ensures consistency across the project and incorporates any updates or fixes from the new version.
2026-04-25 04:02:01 -04:00
Danny Avila
523994220d 🌉 chore: Gate Skills UI by Agent Capability Checks (#12793)
- Updated the Skills component to include a check for `skillsEnabled` from agent capabilities, ensuring skills are only displayed when both permissions and capabilities allow.
- Modified the `useHandleKeyUp` hook to prevent triggering skill commands when skills are disabled, enhancing user experience and preventing errors.
- Added tests to verify that skills commands do not trigger when skills are disabled, ensuring robust functionality.
- Refactored the `useSideNavLinks` hook to incorporate skills capability checks, ensuring consistent access control across the application.
2026-04-25 04:02:01 -04:00
Danny Avila
d83cb84f59 🪆 feat: Subagent configuration in Agent Builder (#12725)
* 🪆 feat: Subagents configuration (isolated-context child agents)

Surfaces the new @librechat/agents `SubagentConfig` primitive in the Agent
Builder. Subagents let a supervisor delegate a focused subtask to a child
graph running in an isolated context window: verbose tool output stays in
the child, only a filtered summary returns to the parent.

Data model: new `subagents: { enabled, allowSelf, agent_ids }` on Agent,
wired through the Zod, Mongoose, and form schemas plus a new
`AgentCapabilities.subagents` capability (enabled by default).

Backend: `initialize.js` loads explicit subagent configs alongside handoff
agents, and drops subagent-only references from the parallel/handoff maps
so they don't leak into the supervisor's graph. `run.ts` emits
`SubagentConfig[]` on the primary `AgentInputs` — a self-spawn entry when
`allowSelf` is enabled plus one entry per configured agent.

UI: an "Advanced" panel section with an enable toggle, a self-spawn
toggle, and an agent picker (capped at 10). Enabling without adding
agents still yields self-spawn; disabling self-spawn with no agents shows
a warning. A capability flag gates the whole section.

* 🪆 feat: Stream subagent progress to UI (dialog + inline ticker)

Pairs with the @librechat/agents SDK change that forwards child-graph
events through the parent's handler registry (danny-avila/agents#107):

- Self-spawn and explicit subagents can now use event-driven tools,
  because child `ON_TOOL_EXECUTE` dispatches reach our ToolService via
  the parent's registered handler.
- The same forwarding path wraps the child's run_step / run_step_delta
  / run_step_completed / message_delta / reasoning_delta dispatches in
  a new `ON_SUBAGENT_UPDATE` envelope, with start/stop/error bookends.

Backend: `callbacks.js` registers an `ON_SUBAGENT_UPDATE` handler that
forwards each envelope straight to the SSE stream.

Frontend:
- `useStepHandler` consumes `ON_SUBAGENT_UPDATE` events and merges them
  into a per-tool_call Recoil atom (`subagentProgressByToolCallId`).
  First-seen `subagentRunId` claims the most-recent unclaimed `subagent`
  tool call in the active response message — a temporal mapping, no SDK
  wire-format change needed to correlate child runs with parent tool
  calls.
- New `SubagentCall` part component replaces the default `ToolCall`
  rendering when `toolCall.name === Constants.SUBAGENT`: compact status
  ticker showing the 3 most recent update labels, clickable to open a
  dialog with the full activity log + final markdown-rendered result.
- Adds `Constants.SUBAGENT`, `StepEvents.ON_SUBAGENT_UPDATE`, and
  `SubagentUpdateEvent` type in data-provider.

Tests:
- `packages/api npx jest run-summarization` — 23 pass
- `api npx jest initialize` — 16 pass
- `npm run build` — clean

Dependency note: bumps `@librechat/agents` to `^3.1.67-dev.1` — requires
the SDK PR (danny-avila/agents#107) to be merged to dev and published
before this PR merges. `ON_SUBAGENT_UPDATE` is absent from dev.0, so the
handler registration would be a no-op with the older SDK but would not
crash.

* 🪆 fix: address Codex review and review audit on subagents

Stacks on top of the SDK change in danny-avila/agents#107 (bumped to
`^3.1.67-dev.2`).

- **P1 (`initialize.js`)**: subagent-only agents were being deleted from
  both `agentConfigs` AND `agentToolContexts`. The tool-execute handler
  resolves execution context (agent, tool_resources, skill ACLs) from
  `agentToolContexts`, so explicit subagents would run without their
  configured resources and skip action tools. Now only `agentConfigs`
  is pruned — tool context stays intact.
- **P2 (`AgentSubagents.tsx`)**: toggling subagents off set the form
  field to `undefined`; `removeNullishValues` stripped it from the
  PATCH, leaving the server copy enabled. Now it persists an explicit
  `{ enabled: false, ... }` so the update actually clears state.

- **Finding 1 (MAJOR)** — `agent_ids` Zod schema gains `.max()` via a
  new `MAX_SUBAGENTS` export from `data-provider` (shared with the UI
  cap). Crafted payloads can't trigger hundreds of `processAgent`
  calls.
- **Finding 2 (MAJOR)** — `subagentProgressByToolCallId` atomFamily
  atoms are now tracked in a ref and reset from `clearStepMaps` via a
  `useRecoilCallback({ reset })`. No monotonic growth across a session.
- **Finding 3 (MAJOR)** — early-arriving `ON_SUBAGENT_UPDATE` events
  whose parent `tool_call_id` is not yet mapped are now buffered in
  `pendingSubagentBuffer` (keyed by `subagentRunId`) and replayed in
  arrival order once correlation completes. Mirrors the existing
  `pendingDeltaBuffer` pattern.
- **Finding 4 (MAJOR)** — switched to deterministic correlation via
  the new `parentToolCallId` that SDK `3.1.67-dev.2` threads through
  from `ToolRunnableConfig.toolCall.id`. Temporal fallback now iterates
  oldest-unclaimed-first (forward), matching tool-call creation order,
  so concurrent spawns map correctly.
- **Finding 6 (MINOR)** — `agent_ids` are deduped on the backend via
  `new Set(...)` before the load loop. Duplicates no longer produce
  duplicate `SubagentConfig` entries visible to the LLM.
- **Finding 7 (MINOR)** — events array inside each Recoil atom is
  capped at 200 entries. Long-running subagents no longer replay O(n)
  spreads on every update; the dialog log still shows the cap window.
- **Finding 8 (MINOR)** — documented: subagents are loaded only for
  the primary agent this release (handoff children get self-spawn but
  not explicit sub-subagents). In-code comment added so the next
  maintainer doesn't wonder.
- **Finding 9 (NIT)** — removed `{!isSubmitting && null}` dead code
  and the misleading announce-polite comment in `SubagentCall`.

- New `validation.spec.ts` — 9 tests covering the cap on
  `agent_ids.length` at the subagent schema, agent-create, and
  agent-update layers.
- `run-summarization` — 23 pass, `initialize` — 16 pass, total backend
  package: 103 pass across touched areas.

Findings 5 (component tests) and 10 (micro-allocation) are tracked
but deferred; the former needs a Recoil-RenderHook harness that isn't
in this PR's scope, and the latter has negligible impact (one `Array.from`
per subagent run).

* 🧪 test: integration coverage for subagent correlation + backend loading

Addresses the follow-up audit on #12725 with real-code tests (no mock
handlers, only the existing setMessages/getMessages spies and the
standard mongodb-memory-server harness).

Six new tests under a dedicated `describe('subagent loading')`:
- loads a configured subagent, populates `subagentAgentConfigs`, keeps
  it out of `agentConfigs`
- **P1 regression guard**: drives the real `toolExecuteOptions.loadTools`
  closure with the subagent id and asserts `loadToolsForExecution` is
  called with `agent: <subagent>`, `tool_resources`, `actionsEnabled`.
  If anyone deletes `agentToolContexts` again, this fails.
- dedup: three copies of the same id load the agent once
- overlap: agent referenced both as handoff target and subagent stays in
  `agentConfigs`
- capability gate: admin disabling `subagents` suppresses loading even
  when the agent has a config
- per-agent disable: `subagents.enabled: false` skips loading entirely

Five new tests under `describe('on_subagent_update event')` using a
real `RecoilRoot` and a companion `useRecoilCallback` reader so writes
from the hook are observable:
- deterministic correlation via `parentToolCallId` (happy path with
  SDK dev.2+)
- fallback: oldest-unclaimed tool call wins for concurrent spawns
  without `parentToolCallId`
- early-arrival buffer: updates with no mapping get buffered and
  replayed once the tool call appears
- event cap: 205 updates collapse to 200 retained, oldest dropped
- `clearStepMaps` resets tracked atoms back to their null default

- F2 — added explicit `// TODO` marker for handoff-subagent-loading
  extension (matches the comment that referenced it).
- F3 — dropped the unnecessary `MAX_SUBAGENTS as MAX_SUBAGENTS_CAP`
  alias; just import the constant directly.
- Bumped `@librechat/agents` to `^3.1.67-dev.3` to pick up the SDK's
  paired test additions.

- `api/server/services/Endpoints/agents/initialize.spec.js` — 22 pass
  (6 new + 16 existing)
- `packages/api/src/agents/validation.spec.ts` +
  `run-summarization.test.ts` — 103 pass
- `client/src/hooks/SSE/__tests__/useStepHandler.spec.ts` — 48 pass
  (5 new + 43 existing)

* 🪆 fix: strip parent run summary + discovered tools from subagent inputs

Codex P1 on #12725: `buildSubagentConfigs` reused the shared
`buildAgentInput` factory for each explicit child, and that factory
always stamps the parent run's `initialSummary` (cross-run conversation
summary) and `discoveredTools` (tool names the parent's LLM searched
earlier) onto every `AgentInputs` it returns. When subagents were
enabled on a conversation that had already been summarized, every
child inherited that summary — silently defeating the isolated-context
contract and burning extra tokens on unrelated prior chat.

Fix in `run.ts`: after `buildAgentInput(child)`, explicitly clear
`childInputs.initialSummary` and `childInputs.discoveredTools` before
attaching to the `SubagentConfig`. The parent keeps both — that's how
the supervisor receives cross-turn context — but the child starts
fresh.

Paired with danny-avila/agents#107 (bumped to `^3.1.67-dev.4`), which
adds the equivalent strip inside `buildChildInputs` to cover the
self-spawn path where the SDK clones parent `_sourceInputs` directly
and LibreChat never sees the intermediate shape. Belt and suspenders.

Regression test (new):
- `does NOT leak the parent run initialSummary into an explicit child
  (Codex P1 regression)` — sets `initialSummary` on the run, enables
  subagents with an explicit child, asserts the parent still has the
  summary but `childConfig.agentInputs.initialSummary` is `undefined`.
  Same for `discoveredTools`. 24 pass.

* 🪆 fix: capability gate applies to handoff agents + parallel subagent test

### Codex P2 — handoff agents kept `subagents` after capability disabled
The endpoint-level `AgentCapabilities.subagents` gate only cleared
`subagents` on `primaryConfig`. Handoff agents loaded into
`agentConfigs` retained their persisted `subagents.enabled: true`,
and because `run.ts` calls `buildSubagentConfigs` for every agent
input, self-spawn would still fire on a handoff target even when the
admin had disabled the capability globally.

Fix in `initialize.js`: after the subagent loading block, when the
capability is off, iterate `agentConfigs.values()` and clear
`subagents` + `subagentAgentConfigs` on every loaded config.

Regression test: `clears subagents on handoff agents too when
capability is disabled (Codex P2 regression)` — seeds a handoff target
with its own `subagents.enabled: true`, disables the capability at
the endpoint, asserts both primary AND handoff have `subagents`
undefined in the client args. 23 init tests pass.

### Parallel subagent correlation — user-requested verification
Added `keeps parallel subagent streams independent when events
interleave` to `useStepHandler.spec.ts`. Two `subagent` tool calls
seeded side by side, 6 interleaved `ON_SUBAGENT_UPDATE` envelopes
dispatched (a-start, b-start, a-step, b-step, a-stop, b-step), each
carrying its own `parentToolCallId`. Asserts each `tool_call_id`'s
Recoil bucket accumulates only its own run's events, statuses reflect
each run independently (`call_a` → stop, `call_b` → run_step), no
cross-contamination. 49 step-handler tests pass.

* 🪆 fix: SubagentCall detects cancelled / errored states (Codex P2)

Codex P2 on #12725: the old `running` check only consulted
`initialProgress` and the subagent's phase. A user stop, dropped
stream, or backend crash before a terminal `stop`/`error` envelope
arrived would leave the ticker permanently stuck on "working…". Other
*Call components (ToolCall.tsx) already model this via
`!isSubmitting && !finished` → cancelled.

Mirror that pattern. Re-introduce `isSubmitting` on `SubagentCallProps`
(the prop was dropped earlier as 'unused' — that was a bug) and resolve
status as a tri-state:

- `finished`  — initialProgress >= 1, or subagent `stop`/`error`
- `cancelled` — `!isSubmitting && !finished`
- `running`   — neither

New locale keys `com_ui_subagent_cancelled` + `com_ui_subagent_errored`
swap in the right header text per state.

Tests: new `SubagentCall.test.tsx` covers all four states with a real
`RecoilRoot` and a `useRecoilCallback` seeder — no mocked store — 5/5
pass. Includes an explicit P2 regression test that simulates the
`isSubmitting=false, progress.status='run_step', initialProgress<1`
scenario and asserts the cancelled label renders.

* 🪆 feat: semantic ticker + aggregated content-part dialog for subagents

Two rounds of feedback on #12725:

### Ticker — user-readable lines, not raw event names

The old ticker showed \`on_run_step\`, \`on_message_delta\`, etc. — not
meaningful to users. Replaced with \`buildSubagentTickerLines\`, a pure
helper that walks the \`SubagentUpdateEvent\` stream and emits:

- message/reasoning deltas → a single live "Writing: <last 60 chars>"
  (or "Reasoning: …") line that updates in place as chunks arrive
- run_step with tool_calls → "Using calculator(expression=42*58)" for
  a single call, "Using tool: a, b" for parallel (args dropped when
  multiple so the line stays short)
- run_step_completed → "calculator → 42*58 = 2436" (output truncated
  to 48 chars; falls back to "Tool X complete" when output is empty)
- error → "Error: <message>"
- start / stop / run_step_delta → suppressed (too granular / lifecycle-only)

Args and output pass through \`summarizeArgs\` / \`summarizeOutput\`
which flatten JSON to \`key=value\` pairs and head-truncate long
strings so a 200-line tool output never bloats the ticker.

### Dialog — aggregated content parts via leaf renderers

\`aggregateSubagentContent\` folds the raw event stream into
\`TMessageContentParts[]\` — text/reasoning delta streaks collapse into
single \`TEXT\` / \`THINK\` parts, tool calls become \`TOOL_CALL\` parts,
and \`run_step\` boundaries correctly break text runs around tool
calls. The dialog iterates those parts through a \`SubagentDialogPart\`
renderer that delegates to the existing \`Text\`, \`Reasoning\`, and
\`ToolCall\` leaf components — the same sub-components \`<Part />\` uses
— wrapped in a minimal \`MessageContext\` so reasoning expand state and
cursor animation work.

Leaf components are used directly rather than importing \`<Part />\`
itself to avoid a module cycle (Part → Parts/index → SubagentCall →
Part) and to sidestep a hypothetical nested-subagent rendering.

### Tests

- \`subagentContent.test.ts\` — 19 pure-function tests covering the
  aggregator (text concat, reasoning concat, tool call lifecycle,
  interleaving, phase suppression, late-arriving completions) and the
  ticker builder (live preview truncation, args/output snippets,
  parallel-call handling, output truncation, i18n formatter override).
- \`SubagentCall.test.tsx\` — 9 component tests: 5 status-resolution
  (existing) + 2 ticker (semantic text, delta collapse) + 2 dialog
  (aggregated parts routed to leaf renderers, raw-output fallback).

### Locale keys

New: \`com_ui_subagent_ticker_writing\`, \`…_reasoning\`, \`…_error\`,
\`…_using\`, \`…_using_with_args\`, \`…_tool_complete\`,
\`…_tool_output\`. Preserves i18n at the display layer while the
helper stays pure.

* chore: drop unused com_ui_subagent_activity_log locale key

The dialog no longer renders an "Activity log" section — the new
content-parts renderer replaced it. Also tweaks the dialog description
copy to match.

* 🪆 fix: subagent dialog order, persistence, auto-scroll, width

Follow-up pass addressing the four issues observed in real runs
against a live subagent-using parent.

### Aggregator ordering (reasoning appearing after text it preceded)

Reproducible pattern: LLM emits reasoning → text → tool call in that
order, but the dialog rendered text BEFORE reasoning in the content
array. Root cause: `aggregateSubagentContent` maintained `currentText`
and `currentThink` buffers in parallel and only flushed them at a
`run_step` boundary in a fixed (text, think) order, losing the actual
arrival order.

Fix: when a text chunk arrives, close any open think buffer first
(pushes it into the content array right then); symmetric for think →
text. Two new regression tests cover the exact reasoning → text →
tool_call sequence from the screenshot and the repeated
reasoning ↔ text flow across a turn.

### Content persists after completion (markdown not rendering when done)

`clearStepMaps` was calling `resetSubagentAtoms()` at stream end,
which wiped every `subagentProgressByToolCallId` entry. Once reset,
`contentParts.length === 0` and the dialog fell back to rendering the
raw `output` string with plain text — hence the literal `##`/`**` in
the completed-state screenshot. Stopped resetting; the atoms are
bounded per-call (200-event cap) and per-conversation (one per
subagent spawn) so growth matches the rest of the conversation state.
`resetSubagentAtoms` is kept for a future conversation-switch caller.

Also: routed the raw-`output` fallback (older subagent runs recorded
before the event forwarder existed) through the same
`SubagentDialogPart` → `Text` leaf that content parts use, so its
markdown renders the same way.

### Auto-scroll to bottom while running

Added a `scrollRef` on the dialog body and a `useEffect` that pins
`scrollTop = scrollHeight` while the dialog is open AND the subagent
is running. Triggers on `contentParts.length` (new tool calls / part
boundaries) and `events.length` (intra-part deltas) so the cursor
tracks text streaming. Disabled post-completion so re-opening a
finished run doesn't yank to the bottom.

### Wider dialog

Went from `max-w-2xl` (42rem / 672px — too cramped on maximized
laptop windows) to `w-[min(95vw,64rem)] max-w-[min(95vw,64rem)]`.
Narrow on phones, scales up to 64rem on desktop, always leaves a bit
of margin from the viewport edge. Bumped `max-h-[65vh]` on the scroll
area to give the extra width room to breathe vertically too.

### Tests

- `subagentContent.test.ts` — 21 pass (2 new ordering regressions).
- `useStepHandler.spec.ts` — 49 pass (1 updated to assert atoms are
  *preserved* on clearStepMaps).
- `SubagentCall.test.tsx` — 9 pass (unchanged; aggregator-level tests
  cover the ordering).

* 🪆 feat: persist subagent_content via SDK createContentAggregator

Per-request map of createContentAggregator instances keyed by the
parent's tool_call_id. ON_SUBAGENT_UPDATE handler feeds each event
into the matching aggregator (phase → GraphEvent mapping); AgentClient
harvests contentParts onto the subagent tool_call at message save so
the child's reasoning / tool calls / final text survive a page refresh.

Reusing the SDK's battle-tested aggregator instead of a bespoke one
keeps the persisted shape identical to the parent graph's output and
drops ~100 lines of custom aggregation code.

* 🪆 fix: incremental subagent aggregation + dialog render parity

**Disappearing tool_calls**: the Recoil atom trimmed events to a 200-long
rolling window, so verbose subagents could shed the `run_step` that
originally created a tool_call part — rebuilding content from the trimmed
window then produced only the surviving text/reasoning. Fix: fold each
envelope into `contentParts` incrementally in the atom as it arrives
(new `foldSubagentEvent` + cursor state). Event trim window now affects
only the ticker, never the dialog.

**Render parity**: dialog now applies `groupSequentialToolCalls` and
renders single parts through `Container` + grouped batches through
`ToolCallGroup` — same spacing and "Used N tools" collapsing the main
message view uses.

**Width**: `min(96vw, 80rem)` — wider on big screens, still responsive.

**Labels**: "Subagent: X" is jargon. Named subagents render as
`Running "{name}" agent` / `Ran "{name}" agent` (past tense on
completion); self-spawns use `Running subtask` / `Ran subtask` since
`Running "self" agent` reads badly.

* 🪆 polish: subagent dialog parity + agent avatar in header

**Labels**: drop "subtask" framing. Self-spawn shows `Running agent` /
`Ran agent` (past tense on completion); named subagents stay
`Running "X" agent` / `Ran "X" agent`.

**Dialog render parity**: stop wrapping every part in `Container`.
TEXT keeps its `Container` (gap-3 + `mt-5` sibling margin), THINK and
TOOL_CALL render bare so their own wrappers set the full-column width
the regular message view gives them — matches the main `<Part>` dispatch.
Outer scroll region now uses `px-4 py-3` padding and a
`max-w-full flex-grow flex-col gap-0` inner wrapper, mirroring the
`MessageParts` container the main conversation uses.

**Avatar**: header icon now renders the subagent's configured avatar
via `MessageIcon` when `useAgentsMapContext()` has the child agent,
falling back to the `Users` SVG (which keeps its running-state pulse).
Same icon-left-of-label pattern the tool UI uses.

* 🪆 polish: subagent group label, ticker throttle + tail-ellipsis, scroll button

**Grouped label**: ToolCallGroup now detects all-subagent batches and
labels them "Running N agents" / "Ran N agents" instead of "Used N
tools". Mixed batches keep the existing label. The tool-name summary
is suppressed for all-subagent groups (every entry dedupes to
"subagent", which adds nothing).

**Ticker width + tail-ellipsis**: raise the preview cap to 300 chars so
wide containers aren't half-empty, and flip the ticker `<li>` to
`dir="rtl"` so `text-overflow: ellipsis` clips the *oldest* characters
(visually the left edge) — the newest tokens stay pinned to the right
regardless of container width. Bidi lays out the Latin text LTR
internally, the rtl only affects which side gets the ellipsis.

**Throttle**: `useThrottledValue` hook (trailing-edge, 1.2s) smooths
the live `Writing: …` preview so tokens no longer strobe past the
eye faster than they can be read. Ref-based internals (not `useState`)
avoid infinite-update loops when the upstream value is a new-reference
each render; `NEGATIVE_INFINITY` sentinel ensures the very first value
passes through synchronously so tests and first paint aren't delayed.

**Scroll-to-bottom**: dialog tracks `isAtBottom` with a 120px
threshold; auto-scroll only engages when the user is already following
along, and a persistent jump-to-latest button appears whenever they
scroll up — no more fighting the auto-scroll to read back.

* 🪆 polish: snappier ticker, prefix-safe labels, agents icon, readable lines

**Ticker lines are now incrementally aggregated in the atom** — same
pattern as contentParts. The raw-events rolling window is gone; event
volume no longer caps what the ticker can display. Verbose subagents
that used to drop early tool_call lines out of the window now keep the
full 3-line history (using_tool, tool_complete, writing).

**Discriminated-union ticker lines** split a constant prefix (e.g.
"Writing:") from a tail-truncatable body. The prefix lives in a
`shrink-0` span so it never gets clipped when the body overflows; the
body uses `dir="rtl"` only on itself — scoped so non-streaming lines
(e.g. "Waiting for first update…") can't get their trailing ellipsis
flipped by bidi.

**Content-aware throttle**: 800ms interval (down from 1200ms), skipped
entirely while the live buffer is below 120 chars. Early tokens now
appear immediately — no more "Reasoning: I" sitting blank for a full
second before the next heartbeat. Once the preview is long enough to
fill the container, throttling kicks in at the tighter interval.

**Header label** is now a constant verb + optional muted sub-label.
Base reads "Running agent" / "Ran agent" / "Cancelled agent" / "Agent
errored" for every subagent; named subagents get the configured agent
name rendered to the right in secondary text (self-spawns and
unresolved names omit it — "Running self agent" is nonsense).

**ToolCallGroup** now detects `allSubagents` and swaps `StackedToolIcons`
for a single `Users` glyph — otherwise the group header shows a wrench
("tool") icon next to "Ran 5 agents", which reads wrong.

* 🪆 feat: delimiter-aware tool labels in ticker + full-width tool lines

New shared `parseToolName` helper in `client/src/utils/toolLabels.ts`
— single source of truth for splitting `<tool>_mcp_<server>` ids and
mapping native tool names (web_search, execute_code, …) to their
friendly translation keys. `ToolCallGroup` drops its inline copy and
pulls from this helper.

Ticker tool lines now use the shared parser + a new `ToolIdentifier`
sub-renderer so the live log reads like the main tool UI:

  - MCP tool  → `<server> · <code-badge:tool>` (e.g. "github · `search_code`")
  - Native   → friendly name from `TOOL_FRIENDLY_NAME_KEYS`
  - Unknown  → bare `<code>` badge of the raw id

The `using_tool` / `tool_complete` rows now render with a
`flex w-full items-baseline gap-1 overflow-hidden` layout matching
the writing/reasoning rows — they take the full container width
instead of collapsing to content size. Output snippets on
`tool_complete` get the same tail-side `dir="rtl"` ellipsis so the
newest characters stay flush-right when the container is narrow.

Dropped the now-unused template i18n keys
(`com_ui_subagent_ticker_using_with_args`,
`com_ui_subagent_ticker_tool_complete`,
`com_ui_subagent_ticker_tool_output`) in favor of tokens the JSX
composes structurally. Only English is touched per the project rule;
other locales follow externally.

* 🪆 fix: dialog scroll button + auto-scroll during streaming deltas

Two race/trigger bugs in the dialog's scroll behavior:

**Button never showed**: `addEventListener('scroll', …)` in a
`useEffect` ran before Radix's portal had actually committed the
scroll container, so `scrollRef.current` was still null — the listener
never attached, `isAtBottom` stayed stuck at its initial `true`, and
the jump-to-latest button was never rendered. Swap to React's
`onScroll` prop on the element itself so the handler wires up as part
of DOM commit, not a post-commit effect.

**Auto-scroll stalled during text streaming**: the pin-to-bottom
effect only re-fired on `contentParts.length` changes. Message/reasoning
deltas extend the last TEXT/THINK part's `.text` without changing the
array length — so the view would drift up as tokens piled in and
never catch back up. Replace the length-dep effect with a
`ResizeObserver` on the inner content div; every height change (new
part or in-place growth) triggers a scroll-pin when the user is still
at the bottom.

* 🪆 fix: drop leading ellipsis from ticker body

truncatePreview was prepending ... to the tail when the buffer
exceeded 300 chars. The component's CSS already produces a left-side
ellipsis for overflow via dir=rtl + text-overflow: ellipsis — stacking
a data-level ellipsis on top renders a stray dot character right after
the Writing: / Reasoning: label (Writing: .Sure!), which looks like a
typo to the reader.

Data now returns just the last 300 chars when truncating; CSS handles
the visual cue whenever the body actually overflows its container.

* 🪆 fix: Codex review — subagent isolation + concurrent-safe throttle

Three findings from the @codex review pass, all valid:

**P1 — buildAgentInput leaks parent discovered-tool state into subagent
children.** `buildAgentInput` mutates `agent.toolRegistry`
(`overrideDeferLoadingForDiscoveredTools` flips `defer_loading:true→false`
on tools the parent previously searched for) and appends those tools'
definitions to the returned `toolDefinitions` before the function
returns. `buildSubagentConfigs` was clearing the reported
`initialSummary` / `discoveredTools` fields on the returned
AgentInputs, but that happened post-return — the registry writes and
extra tool definitions persisted on the child, silently defeating
context isolation and inflating the child's prompt.

Fix: `buildAgentInput` now takes an `isSubagent` flag that gates the
registry-mutation block and omits `initialSummary` /
`discoveredTools` at the source. `buildSubagentConfigs` passes
`{ isSubagent: true }` for every explicit child; no post-hoc cleanup
needed.

**P2 — ToolCallGroup labels a finished subagent group as still
running when the child returned no output.** `getToolMeta` computed
`hasOutput` as `!!tc.output`, which is `false` for a completed
subagent that returned empty text (the UI already has an "empty
result" fallback for that case). `allCompleted` would stay `false`
and the group header stuck on "Running N agents" forever.

Fix: treat `tc.progress === 1` as completion too — progress is the
authoritative lifecycle signal, output is just content.

**P2 — useThrottledValue schedules `setTimeout` during render.**
Discarded renders under Strict Mode / Concurrent rendering would
leave orphan timers firing against stale trees.

Fix: move `setTimeout` into a `useEffect` keyed on `[value, intervalMs,
enabled]`. Render-time still mutates refs (idempotent), but timer
scheduling lives post-commit. Cleanup on unmount and on passthrough
transitions is preserved.

* 🪆 fix: Codex P2 — wipe subagent atoms on conversation switch

`clearStepMaps()` intentionally doesn't reset `subagentProgressByToolCallId`
so a user can reopen a completed subagent's dialog mid-conversation,
but `resetSubagentAtoms` was defined and never exposed / called —
so each completed run's aggregated `contentParts` + `tickerState`
stayed resident in the `atomFamily` for the whole app session.
Unbounded growth across multi-conversation sessions.

Expose `resetSubagentAtoms` from `useStepHandler` and fire it from
`useEventHandlers` whenever the URL's `conversationId` changes.
That's the correct cleanup boundary: historical subagent dialogs
rehydrate from persisted `subagent_content` on each `tool_call` at
message-save time, so wiping live atoms on switch doesn't lose any
viewable history — it just releases per-tool-call state that the
old conversation's components no longer subscribe to.

* 🪆 fix: Codex round 3 — subagent registry isolation + post-run label

Two more valid findings.

**P1 — parent-order registry mutation leaks into subagent inputs.**
`overrideDeferLoadingForDiscoveredTools` mutates `agent.toolRegistry`
in place (the Map *and* the LCTool objects inside it). When an agent
appears both as a handoff target (normal graph node) AND an explicit
subagent child, a subagent build that ran before the parent's build
captures a reference to the same registry — the parent's later mutation
leaks through to the child.

Fix: for subagent children (`isSubagent`), clone the `toolRegistry`
Map and shallow-clone each LCTool inside before returning the inputs.
`defer_loading` flips on parent-graph registry mutations can't
propagate across the clone boundary. `toolDefinitions` also gets a
shallow-copy pass so the same isolation holds for definitions the
child carries directly.

**P2 — "Running N agents" label stuck after cancel/error.**
ToolCallGroup's all-subagent label was gated only on `allCompleted`,
which requires every child to have `hasOutput || progress === 1`.
A subagent that gets cancelled (stream ends, no `stop` phase, no
output) never satisfies that — so even after `isSubmitting` flips
false, the header stays on "Running N agents" while each individual
card correctly shows "Cancelled agent".

Fix: derive a `subagentsDone` flag as `allCompleted || !isSubmitting`
and gate the past-tense label on that. Matches the tri-state each
SubagentCall card already resolves (finished / cancelled / running).

* 🪆 fix: Codex P2 — ACL-check subagents.agent_ids on create/update

Codex flagged that `subagents.agent_ids` was accepted as arbitrary
strings on the create/update routes while `edges` got a
`validateEdgeAgentAccess` pass — so users could save subagent
references to agents they can't VIEW. At runtime `initializeClient`'s
`processAgent` ACL gate silently drops those, so the persisted
configuration and the actual behavior diverged in a way that is
difficult to diagnose.

Refactor: extract the id-set → unauthorized-ids check into a shared
`collectUnauthorizedAgentIds`, wrap it with a dedicated
`validateSubagentAccess`, and plumb the same 403-on-failure response
the edge path already returns. Applied on both POST /agents and
PATCH /agents/:id.

* 🪆 fix: Codex round 5 — ACL-disable escape hatch + ticker order

Two valid findings.

**P1 — can't disable subagents after losing access to a child.**
The subagent ACL check ran on every create/update that echoed back
the `agent_ids` list, even when the user was explicitly disabling
the feature. The UI keeps the list intact when toggling `enabled:
false`, so a user who subsequently lost VIEW on any child would be
locked in a 403 loop — every edit (including the one that turns
subagents off) bounces.

Fix: gate the ACL check on `subagents.enabled !== false` at both
the POST /agents and PATCH /agents/:id handlers. Empty list stays
a no-op. Disabling the feature is always permitted.

**P2 — ticker fold merges out-of-order previews across delta
switches.** `foldSubagentEventIntoTicker` carried `textLineIdx` /
`thinkLineIdx` across a reasoning → text → reasoning transition,
so the second reasoning chunk appended to the original reasoning
line instead of starting a new chronological one.

Fix: close the opposite buffer + cursor when a delta-type switch
is detected (same rule the content-parts reducer already applies).
Added a regression test.

* 🪆 fix: Codex round 6 — preserve mid-stream atoms + honor sequential suppression

Two valid findings.

**P2 — atom reset fires on initial chat URL assignment.**
`useEventHandlers` initialized `lastConversationIdRef` from the URL's
current `paramId`, then reset subagent atoms whenever the ref and
`paramId` disagreed. For a brand-new conversation the URL stamp goes
from `undefined → "abc123"` while the first response is still
streaming, which used to drop subagent ticker/content state mid-run
and leave dialogs missing earlier updates.

Fix: only reset when *both* the old and new IDs are non-null and
differ — i.e. a user-initiated switch between two established
conversations. The initial assignment passes through without
clearing.

**P2 — ON_SUBAGENT_UPDATE bypassed `hide_sequential_outputs`.**
Every other streaming handler in `callbacks.js`
(`ON_RUN_STEP`, `ON_MESSAGE_DELTA`, etc.) gates emission on
`checkIfLastAgent` + `metadata?.hide_sequential_outputs`, but the
subagent forwarder did an unconditional `emitEvent` — so intermediate
agents in a sequential chain were leaking their children's activity
to the client even when the chain was configured to suppress
intermediates.

Fix: accept `metadata` and apply the same `isLastAgent ||
!hide_sequential_outputs` gate. Aggregation still runs regardless of
visibility (persistence + dialog depend on it); only the SSE forward
is suppressed.

* 🪆 fix: Codex P2 — gate subagent ACL check on endpoint capability

`validateSubagentAccess` ran on every create/update where
`subagents.enabled !== false`, regardless of the endpoint-level
`subagents` capability. When the capability is off at the appConfig
level, `initializeClient` already strips the `subagents` block at
runtime — so persisted `agent_ids` are inert — but the validation
could still 403 on a legacy record whose referenced child is no
longer viewable, blocking unrelated edits.

Fix: add `isSubagentsCapabilityEnabled(req)` that reads the agents
endpoint's capabilities from `req.config` and gate both the create
and update ACL checks on it. Capability-off environments can update
agents with stale `subagents` data freely; capability-on keeps the
full ACL protection.

* 🪆 fix: Codex P2 — reset subagent atoms on id→null navigation too

Previous guard (both-established) skipped the reset whenever
`paramId` became null/undefined, so navigating from an existing
chat to a "new chat" route left stale subagent progress resident
in the `atomFamily` until the user picked a specific different
chat.

Swap the both-established check for a one-time flag: skip only the
very first `undefined → id` transition (the brand-new-chat URL
stamp that happens mid-stream), then reset on any subsequent
change — id→id, id→null, null→id-after-reset. If the user started
on an established chat the flag is true at mount, so the guard is
a no-op and every navigation resets normally.

* 🪆 fix: Codex round 9 — subagent persistence gate + handoff children

Two valid findings.

**P1 — hide_sequential_outputs also gates persistence.**
The previous fix gated the SSE forward on
`isLastAgent || !hide_sequential_outputs` but still ran the
per-tool-call `createContentAggregator` aggregation unconditionally.
`finalizeSubagentContent` would then attach the hidden intermediate
agent's child reasoning / tool output to the saved message, so a
page refresh could reveal activity that was intentionally suppressed
live. Move the visibility gate to the top of the handler — hidden
agents now skip both aggregation and emission, so
"hide_sequential_outputs" is a consistent "don't record" rule for
subagent traces.

**P2 — handoff agents' explicit subagents were silently dropped.**
`initializeClient` only resolved `subagentAgentConfigs` for the
primary config, so an agent used via handoff that had its own
`subagents.agent_ids` saved in the builder would get self-spawn
only; every explicit child was quietly ignored, creating a
saved-config / runtime mismatch the user couldn't diagnose.

Extract the resolution into a shared `loadSubagentsFor(config)`
helper and invoke it for the primary and every handoff agent in
`agentConfigs`. The `edgeAgentIds` precomputation stays outside
the helper (it's loop-invariant). Capability-off shortcuts return
empty early so the existing strip-on-capability-off path still
holds.

* 🪆 fix: Codex P2 — recursive subagent build for multi-level delegation

Previously only the outer `agents[]` loop attached `subagentConfigs`
to its inputs, so a child used as a subagent (invoked via the
`subagent` tool) lost every explicit spawn target of its own. A
user-valid configuration like A → B → C would only run the top
layer; B could never actually delegate to C from inside A's run.

Recursively build `subagentConfigs` for each child inside
`buildSubagentConfigs`, passing the child's freshly-constructed
`childInputs` down so its own `subagents.enabled` children get
resolved too. Added cycle protection via an `ancestors` Set — a
configuration like A → B → A is safely cut off at the second
encounter of A rather than recursing forever (the existing
`child.id === agent.id` guard already prevents the direct self-loop).

* 🪆 fix: Codex P2 — reset subagent atoms on useEventHandlers unmount

The effect that resets subagent atoms only fired on `paramId` change,
so unmounting the chat container (route change away from /c) never
flushed the atoms. `knownSubagentAtomKeys` lives in a ref inside
`useStepHandler` — once the hook unmounts the ref is gone, so a
subsequent remount can't clean atoms it never registered.

Added a second `useEffect` that only runs cleanup on unmount (empty
deps aside from the stable `resetSubagentAtoms` callback). Keeps
`atomFamily` bounded across full route teardowns too.

* 🪆 fix: Codex round 13 — cyclic subagent guard + prefer persisted

Two valid findings.

**P1 — cyclic subagent ref reloads the primary.** A configuration
like `A ↔ B` (B lists A as its own subagent) would send
`loadSubagentsFor` down a path that couldn't find A in
`agentConfigs` (the primary isn't stored there), so it called
`processAgent(A)` a second time. That inserts a fresh config for
the primary id, which downstream duplicates in
`[primary, ...agentConfigs.values()]` and can replace the primary's
tool context with the reloaded copy.

Fix: short-circuit when a subagent ref points back at
`primaryConfig.id` — reuse the already-loaded primary config.
Primary is always an edge id so no pruning bookkeeping needed.

**P2 — live atom preferred over canonical persisted trace.** The
dialog picked `progress.contentParts` ahead of
`persistedContent`, but the Recoil bucket is best-effort — after
a disconnect/reconnect it can be stale or partial. The server's
`subagent_content` on the `tool_call` is the canonical record
refreshed on sync. Preferring live could hide completed
tool/reasoning history that was actually persisted.

Fix: flip the preference order. Persisted wins when it's
non-empty; live covers the mid-stream window (before the parent
message saves, persisted is empty) and the older-runs fallback.
Updated the test that enforced the old order to lock the new
semantics in (separate mid-stream live-fallback assertion kept).

* 🪆 fix: Codex P2 — subagent atom reset rule simplified to 'leaving established id'

The `hasEstablishedConversationRef` + check for initial undefined→id
covered the first navigation but missed the equivalent mid-stream
URL stamp when a user goes from an existing chat to a new chat and
sends a message there (`id → null → newId`). The null → newId
transition was still hitting the reset branch and wiping the
in-flight subagent ticker/content for that first turn.

Simpler rule: only reset when the PREVIOUS paramId is an established
id. Every transition AWAY from an established chat clears (id→id2,
id→null, id→undefined); every transition FROM null/undefined passes
through (initial mount, new-chat URL stamp mid-stream). Drop the
`hasEstablishedConversationRef` machinery in favor of that single
condition.

* 🪆 fix: Codex P2 — match runtime's strict subagent enable check in ACL

Runtime (`initializeClient` + `run.ts`) treats `subagents?.enabled`
as a truthy predicate — `undefined`, `null`, missing, and `false`
all short-circuit. The ACL gate was using `!== false` which
accepted `undefined` / missing as "enabled" and could 403 a payload
whose subagent tool would be inert at runtime.

Swap both create and update to `enabled === true`. Only a strictly-
enabled payload triggers the ACL check; the disable path (`false`)
still passes through so a user who lost VIEW on a child can still
save the disable edit.

* 🪆 fix: Codex P2 — reject missing subagent references with 400

`validateSubagentAccess` collapsed through `collectUnauthorizedAgentIds`,
which returns an empty list for ids with no DB record — so typos and
references to deleted agents passed validation silently, and
`initializeClient` later dropped them at runtime. Saved config would
then list spawn targets that the backend never honored, a hard-to-
diagnose drift.

Refactor the helper into `classifyAgentReferences(ids, …)` which
returns `{ missing, unauthorized }` separately. `validateEdgeAgentAccess`
keeps its old semantics (missing is intentional — a self-referential
`from` names the agent being created). `validateSubagentReferences`
surfaces both buckets so the create/update handlers can 400 on
missing and 403 on unauthorized with distinct error messages and
`agent_ids` lists.

* 🪆 polish: tighten subagent dialog grid gap to gap-2

OGDialogContent's grid default is `gap-4`, which renders the title,
description, and scroll area as three visually separated panels.
Drop to `gap-2` so they read as one block.

* 🪆 polish: swap Subagents above Handoffs in Advanced panel

Subagents is the more common knob users reach for, so show it first.
Handoffs keep the same Controller wiring, just move below.
2026-04-25 04:02:01 -04:00