Commit graph

9 commits

Author SHA1 Message Date
Danny Avila
6357ea10c1
🧭 feat: Scope Model Spec Skills (#13522)
* feat: scope model spec skills

* style: format skill catalog limit

* fix: serialize model spec skill resolution

* test: satisfy model spec load config typing

* fix: apply model spec skills to added conversations

* fix: support alwaysApply frontmatter alias

* fix: address model spec skills review
2026-06-05 10:22:02 -04:00
Danny Avila
1da789bac0
🗂️ feat: Add Agent File Authoring Tools (#13435)
* feat: add agent file authoring tools

* style: format file authoring changes

* style: satisfy file authoring prettier

* test: fix file authoring initialization expectations

* fix: complete skill file authoring flow

* fix: pass skill authoring state on edit

* test: mock missing bundled skill file

* fix: harden agent file authoring gates

* fix: preserve file authoring runtime context

* test: fix authoring context mock typing

* fix: preserve subagent skill primes

* test: avoid array at in handler spec

* refactor: deepen skill authoring runtime wiring

* fix: address codex authoring review findings

* test: fix authoring collision fixture type

* test: add skill file authoring mock e2e

* fix: Improve skill file authoring recovery

* fix: Show file authoring args while running

* fix: Clarify skill rename authoring errors

* fix: Keep code-only file authoring schemas sandbox scoped

* fix: Address skill authoring review findings

* fix: Gate skill authoring on write access
2026-06-03 23:58:12 -04:00
Danny Avila
68eac104ad
🗂️ fix: Scope Handoff Agent Context Docs (#13167)
* fix: Scope agent context docs to handoff agents

* fix: Deduplicate scoped request context

* refactor: Extract agent attachment helpers
2026-05-18 15:36:22 -04:00
Danny Avila
7631366f52
🪵 chore: Log Subagent Limit Hits (#13068) 2026-05-11 09:25:08 -04:00
Danny Avila
70b6bb69d3
🧬 fix: Bound Subagent Expansion (#13064)
* fix: Bound subagent expansion

* fix: Preserve subagent path depth
2026-05-11 08:53:53 -04:00
Danny Avila
22890771cf
🧭 fix: Preserve Resend Files for Subagents (#13030) 2026-05-08 17:21:52 -04:00
Danny Avila
d83cb84f59 🪆 feat: Subagent configuration in Agent Builder (#12725)
* 🪆 feat: Subagents configuration (isolated-context child agents)

Surfaces the new @librechat/agents `SubagentConfig` primitive in the Agent
Builder. Subagents let a supervisor delegate a focused subtask to a child
graph running in an isolated context window: verbose tool output stays in
the child, only a filtered summary returns to the parent.

Data model: new `subagents: { enabled, allowSelf, agent_ids }` on Agent,
wired through the Zod, Mongoose, and form schemas plus a new
`AgentCapabilities.subagents` capability (enabled by default).

Backend: `initialize.js` loads explicit subagent configs alongside handoff
agents, and drops subagent-only references from the parallel/handoff maps
so they don't leak into the supervisor's graph. `run.ts` emits
`SubagentConfig[]` on the primary `AgentInputs` — a self-spawn entry when
`allowSelf` is enabled plus one entry per configured agent.

UI: an "Advanced" panel section with an enable toggle, a self-spawn
toggle, and an agent picker (capped at 10). Enabling without adding
agents still yields self-spawn; disabling self-spawn with no agents shows
a warning. A capability flag gates the whole section.

* 🪆 feat: Stream subagent progress to UI (dialog + inline ticker)

Pairs with the @librechat/agents SDK change that forwards child-graph
events through the parent's handler registry (danny-avila/agents#107):

- Self-spawn and explicit subagents can now use event-driven tools,
  because child `ON_TOOL_EXECUTE` dispatches reach our ToolService via
  the parent's registered handler.
- The same forwarding path wraps the child's run_step / run_step_delta
  / run_step_completed / message_delta / reasoning_delta dispatches in
  a new `ON_SUBAGENT_UPDATE` envelope, with start/stop/error bookends.

Backend: `callbacks.js` registers an `ON_SUBAGENT_UPDATE` handler that
forwards each envelope straight to the SSE stream.

Frontend:
- `useStepHandler` consumes `ON_SUBAGENT_UPDATE` events and merges them
  into a per-tool_call Recoil atom (`subagentProgressByToolCallId`).
  First-seen `subagentRunId` claims the most-recent unclaimed `subagent`
  tool call in the active response message — a temporal mapping, no SDK
  wire-format change needed to correlate child runs with parent tool
  calls.
- New `SubagentCall` part component replaces the default `ToolCall`
  rendering when `toolCall.name === Constants.SUBAGENT`: compact status
  ticker showing the 3 most recent update labels, clickable to open a
  dialog with the full activity log + final markdown-rendered result.
- Adds `Constants.SUBAGENT`, `StepEvents.ON_SUBAGENT_UPDATE`, and
  `SubagentUpdateEvent` type in data-provider.

Tests:
- `packages/api npx jest run-summarization` — 23 pass
- `api npx jest initialize` — 16 pass
- `npm run build` — clean

Dependency note: bumps `@librechat/agents` to `^3.1.67-dev.1` — requires
the SDK PR (danny-avila/agents#107) to be merged to dev and published
before this PR merges. `ON_SUBAGENT_UPDATE` is absent from dev.0, so the
handler registration would be a no-op with the older SDK but would not
crash.

* 🪆 fix: address Codex review and review audit on subagents

Stacks on top of the SDK change in danny-avila/agents#107 (bumped to
`^3.1.67-dev.2`).

- **P1 (`initialize.js`)**: subagent-only agents were being deleted from
  both `agentConfigs` AND `agentToolContexts`. The tool-execute handler
  resolves execution context (agent, tool_resources, skill ACLs) from
  `agentToolContexts`, so explicit subagents would run without their
  configured resources and skip action tools. Now only `agentConfigs`
  is pruned — tool context stays intact.
- **P2 (`AgentSubagents.tsx`)**: toggling subagents off set the form
  field to `undefined`; `removeNullishValues` stripped it from the
  PATCH, leaving the server copy enabled. Now it persists an explicit
  `{ enabled: false, ... }` so the update actually clears state.

- **Finding 1 (MAJOR)** — `agent_ids` Zod schema gains `.max()` via a
  new `MAX_SUBAGENTS` export from `data-provider` (shared with the UI
  cap). Crafted payloads can't trigger hundreds of `processAgent`
  calls.
- **Finding 2 (MAJOR)** — `subagentProgressByToolCallId` atomFamily
  atoms are now tracked in a ref and reset from `clearStepMaps` via a
  `useRecoilCallback({ reset })`. No monotonic growth across a session.
- **Finding 3 (MAJOR)** — early-arriving `ON_SUBAGENT_UPDATE` events
  whose parent `tool_call_id` is not yet mapped are now buffered in
  `pendingSubagentBuffer` (keyed by `subagentRunId`) and replayed in
  arrival order once correlation completes. Mirrors the existing
  `pendingDeltaBuffer` pattern.
- **Finding 4 (MAJOR)** — switched to deterministic correlation via
  the new `parentToolCallId` that SDK `3.1.67-dev.2` threads through
  from `ToolRunnableConfig.toolCall.id`. Temporal fallback now iterates
  oldest-unclaimed-first (forward), matching tool-call creation order,
  so concurrent spawns map correctly.
- **Finding 6 (MINOR)** — `agent_ids` are deduped on the backend via
  `new Set(...)` before the load loop. Duplicates no longer produce
  duplicate `SubagentConfig` entries visible to the LLM.
- **Finding 7 (MINOR)** — events array inside each Recoil atom is
  capped at 200 entries. Long-running subagents no longer replay O(n)
  spreads on every update; the dialog log still shows the cap window.
- **Finding 8 (MINOR)** — documented: subagents are loaded only for
  the primary agent this release (handoff children get self-spawn but
  not explicit sub-subagents). In-code comment added so the next
  maintainer doesn't wonder.
- **Finding 9 (NIT)** — removed `{!isSubmitting && null}` dead code
  and the misleading announce-polite comment in `SubagentCall`.

- New `validation.spec.ts` — 9 tests covering the cap on
  `agent_ids.length` at the subagent schema, agent-create, and
  agent-update layers.
- `run-summarization` — 23 pass, `initialize` — 16 pass, total backend
  package: 103 pass across touched areas.

Findings 5 (component tests) and 10 (micro-allocation) are tracked
but deferred; the former needs a Recoil-RenderHook harness that isn't
in this PR's scope, and the latter has negligible impact (one `Array.from`
per subagent run).

* 🧪 test: integration coverage for subagent correlation + backend loading

Addresses the follow-up audit on #12725 with real-code tests (no mock
handlers, only the existing setMessages/getMessages spies and the
standard mongodb-memory-server harness).

Six new tests under a dedicated `describe('subagent loading')`:
- loads a configured subagent, populates `subagentAgentConfigs`, keeps
  it out of `agentConfigs`
- **P1 regression guard**: drives the real `toolExecuteOptions.loadTools`
  closure with the subagent id and asserts `loadToolsForExecution` is
  called with `agent: <subagent>`, `tool_resources`, `actionsEnabled`.
  If anyone deletes `agentToolContexts` again, this fails.
- dedup: three copies of the same id load the agent once
- overlap: agent referenced both as handoff target and subagent stays in
  `agentConfigs`
- capability gate: admin disabling `subagents` suppresses loading even
  when the agent has a config
- per-agent disable: `subagents.enabled: false` skips loading entirely

Five new tests under `describe('on_subagent_update event')` using a
real `RecoilRoot` and a companion `useRecoilCallback` reader so writes
from the hook are observable:
- deterministic correlation via `parentToolCallId` (happy path with
  SDK dev.2+)
- fallback: oldest-unclaimed tool call wins for concurrent spawns
  without `parentToolCallId`
- early-arrival buffer: updates with no mapping get buffered and
  replayed once the tool call appears
- event cap: 205 updates collapse to 200 retained, oldest dropped
- `clearStepMaps` resets tracked atoms back to their null default

- F2 — added explicit `// TODO` marker for handoff-subagent-loading
  extension (matches the comment that referenced it).
- F3 — dropped the unnecessary `MAX_SUBAGENTS as MAX_SUBAGENTS_CAP`
  alias; just import the constant directly.
- Bumped `@librechat/agents` to `^3.1.67-dev.3` to pick up the SDK's
  paired test additions.

- `api/server/services/Endpoints/agents/initialize.spec.js` — 22 pass
  (6 new + 16 existing)
- `packages/api/src/agents/validation.spec.ts` +
  `run-summarization.test.ts` — 103 pass
- `client/src/hooks/SSE/__tests__/useStepHandler.spec.ts` — 48 pass
  (5 new + 43 existing)

* 🪆 fix: strip parent run summary + discovered tools from subagent inputs

Codex P1 on #12725: `buildSubagentConfigs` reused the shared
`buildAgentInput` factory for each explicit child, and that factory
always stamps the parent run's `initialSummary` (cross-run conversation
summary) and `discoveredTools` (tool names the parent's LLM searched
earlier) onto every `AgentInputs` it returns. When subagents were
enabled on a conversation that had already been summarized, every
child inherited that summary — silently defeating the isolated-context
contract and burning extra tokens on unrelated prior chat.

Fix in `run.ts`: after `buildAgentInput(child)`, explicitly clear
`childInputs.initialSummary` and `childInputs.discoveredTools` before
attaching to the `SubagentConfig`. The parent keeps both — that's how
the supervisor receives cross-turn context — but the child starts
fresh.

Paired with danny-avila/agents#107 (bumped to `^3.1.67-dev.4`), which
adds the equivalent strip inside `buildChildInputs` to cover the
self-spawn path where the SDK clones parent `_sourceInputs` directly
and LibreChat never sees the intermediate shape. Belt and suspenders.

Regression test (new):
- `does NOT leak the parent run initialSummary into an explicit child
  (Codex P1 regression)` — sets `initialSummary` on the run, enables
  subagents with an explicit child, asserts the parent still has the
  summary but `childConfig.agentInputs.initialSummary` is `undefined`.
  Same for `discoveredTools`. 24 pass.

* 🪆 fix: capability gate applies to handoff agents + parallel subagent test

### Codex P2 — handoff agents kept `subagents` after capability disabled
The endpoint-level `AgentCapabilities.subagents` gate only cleared
`subagents` on `primaryConfig`. Handoff agents loaded into
`agentConfigs` retained their persisted `subagents.enabled: true`,
and because `run.ts` calls `buildSubagentConfigs` for every agent
input, self-spawn would still fire on a handoff target even when the
admin had disabled the capability globally.

Fix in `initialize.js`: after the subagent loading block, when the
capability is off, iterate `agentConfigs.values()` and clear
`subagents` + `subagentAgentConfigs` on every loaded config.

Regression test: `clears subagents on handoff agents too when
capability is disabled (Codex P2 regression)` — seeds a handoff target
with its own `subagents.enabled: true`, disables the capability at
the endpoint, asserts both primary AND handoff have `subagents`
undefined in the client args. 23 init tests pass.

### Parallel subagent correlation — user-requested verification
Added `keeps parallel subagent streams independent when events
interleave` to `useStepHandler.spec.ts`. Two `subagent` tool calls
seeded side by side, 6 interleaved `ON_SUBAGENT_UPDATE` envelopes
dispatched (a-start, b-start, a-step, b-step, a-stop, b-step), each
carrying its own `parentToolCallId`. Asserts each `tool_call_id`'s
Recoil bucket accumulates only its own run's events, statuses reflect
each run independently (`call_a` → stop, `call_b` → run_step), no
cross-contamination. 49 step-handler tests pass.

* 🪆 fix: SubagentCall detects cancelled / errored states (Codex P2)

Codex P2 on #12725: the old `running` check only consulted
`initialProgress` and the subagent's phase. A user stop, dropped
stream, or backend crash before a terminal `stop`/`error` envelope
arrived would leave the ticker permanently stuck on "working…". Other
*Call components (ToolCall.tsx) already model this via
`!isSubmitting && !finished` → cancelled.

Mirror that pattern. Re-introduce `isSubmitting` on `SubagentCallProps`
(the prop was dropped earlier as 'unused' — that was a bug) and resolve
status as a tri-state:

- `finished`  — initialProgress >= 1, or subagent `stop`/`error`
- `cancelled` — `!isSubmitting && !finished`
- `running`   — neither

New locale keys `com_ui_subagent_cancelled` + `com_ui_subagent_errored`
swap in the right header text per state.

Tests: new `SubagentCall.test.tsx` covers all four states with a real
`RecoilRoot` and a `useRecoilCallback` seeder — no mocked store — 5/5
pass. Includes an explicit P2 regression test that simulates the
`isSubmitting=false, progress.status='run_step', initialProgress<1`
scenario and asserts the cancelled label renders.

* 🪆 feat: semantic ticker + aggregated content-part dialog for subagents

Two rounds of feedback on #12725:

### Ticker — user-readable lines, not raw event names

The old ticker showed \`on_run_step\`, \`on_message_delta\`, etc. — not
meaningful to users. Replaced with \`buildSubagentTickerLines\`, a pure
helper that walks the \`SubagentUpdateEvent\` stream and emits:

- message/reasoning deltas → a single live "Writing: <last 60 chars>"
  (or "Reasoning: …") line that updates in place as chunks arrive
- run_step with tool_calls → "Using calculator(expression=42*58)" for
  a single call, "Using tool: a, b" for parallel (args dropped when
  multiple so the line stays short)
- run_step_completed → "calculator → 42*58 = 2436" (output truncated
  to 48 chars; falls back to "Tool X complete" when output is empty)
- error → "Error: <message>"
- start / stop / run_step_delta → suppressed (too granular / lifecycle-only)

Args and output pass through \`summarizeArgs\` / \`summarizeOutput\`
which flatten JSON to \`key=value\` pairs and head-truncate long
strings so a 200-line tool output never bloats the ticker.

### Dialog — aggregated content parts via leaf renderers

\`aggregateSubagentContent\` folds the raw event stream into
\`TMessageContentParts[]\` — text/reasoning delta streaks collapse into
single \`TEXT\` / \`THINK\` parts, tool calls become \`TOOL_CALL\` parts,
and \`run_step\` boundaries correctly break text runs around tool
calls. The dialog iterates those parts through a \`SubagentDialogPart\`
renderer that delegates to the existing \`Text\`, \`Reasoning\`, and
\`ToolCall\` leaf components — the same sub-components \`<Part />\` uses
— wrapped in a minimal \`MessageContext\` so reasoning expand state and
cursor animation work.

Leaf components are used directly rather than importing \`<Part />\`
itself to avoid a module cycle (Part → Parts/index → SubagentCall →
Part) and to sidestep a hypothetical nested-subagent rendering.

### Tests

- \`subagentContent.test.ts\` — 19 pure-function tests covering the
  aggregator (text concat, reasoning concat, tool call lifecycle,
  interleaving, phase suppression, late-arriving completions) and the
  ticker builder (live preview truncation, args/output snippets,
  parallel-call handling, output truncation, i18n formatter override).
- \`SubagentCall.test.tsx\` — 9 component tests: 5 status-resolution
  (existing) + 2 ticker (semantic text, delta collapse) + 2 dialog
  (aggregated parts routed to leaf renderers, raw-output fallback).

### Locale keys

New: \`com_ui_subagent_ticker_writing\`, \`…_reasoning\`, \`…_error\`,
\`…_using\`, \`…_using_with_args\`, \`…_tool_complete\`,
\`…_tool_output\`. Preserves i18n at the display layer while the
helper stays pure.

* chore: drop unused com_ui_subagent_activity_log locale key

The dialog no longer renders an "Activity log" section — the new
content-parts renderer replaced it. Also tweaks the dialog description
copy to match.

* 🪆 fix: subagent dialog order, persistence, auto-scroll, width

Follow-up pass addressing the four issues observed in real runs
against a live subagent-using parent.

### Aggregator ordering (reasoning appearing after text it preceded)

Reproducible pattern: LLM emits reasoning → text → tool call in that
order, but the dialog rendered text BEFORE reasoning in the content
array. Root cause: `aggregateSubagentContent` maintained `currentText`
and `currentThink` buffers in parallel and only flushed them at a
`run_step` boundary in a fixed (text, think) order, losing the actual
arrival order.

Fix: when a text chunk arrives, close any open think buffer first
(pushes it into the content array right then); symmetric for think →
text. Two new regression tests cover the exact reasoning → text →
tool_call sequence from the screenshot and the repeated
reasoning ↔ text flow across a turn.

### Content persists after completion (markdown not rendering when done)

`clearStepMaps` was calling `resetSubagentAtoms()` at stream end,
which wiped every `subagentProgressByToolCallId` entry. Once reset,
`contentParts.length === 0` and the dialog fell back to rendering the
raw `output` string with plain text — hence the literal `##`/`**` in
the completed-state screenshot. Stopped resetting; the atoms are
bounded per-call (200-event cap) and per-conversation (one per
subagent spawn) so growth matches the rest of the conversation state.
`resetSubagentAtoms` is kept for a future conversation-switch caller.

Also: routed the raw-`output` fallback (older subagent runs recorded
before the event forwarder existed) through the same
`SubagentDialogPart` → `Text` leaf that content parts use, so its
markdown renders the same way.

### Auto-scroll to bottom while running

Added a `scrollRef` on the dialog body and a `useEffect` that pins
`scrollTop = scrollHeight` while the dialog is open AND the subagent
is running. Triggers on `contentParts.length` (new tool calls / part
boundaries) and `events.length` (intra-part deltas) so the cursor
tracks text streaming. Disabled post-completion so re-opening a
finished run doesn't yank to the bottom.

### Wider dialog

Went from `max-w-2xl` (42rem / 672px — too cramped on maximized
laptop windows) to `w-[min(95vw,64rem)] max-w-[min(95vw,64rem)]`.
Narrow on phones, scales up to 64rem on desktop, always leaves a bit
of margin from the viewport edge. Bumped `max-h-[65vh]` on the scroll
area to give the extra width room to breathe vertically too.

### Tests

- `subagentContent.test.ts` — 21 pass (2 new ordering regressions).
- `useStepHandler.spec.ts` — 49 pass (1 updated to assert atoms are
  *preserved* on clearStepMaps).
- `SubagentCall.test.tsx` — 9 pass (unchanged; aggregator-level tests
  cover the ordering).

* 🪆 feat: persist subagent_content via SDK createContentAggregator

Per-request map of createContentAggregator instances keyed by the
parent's tool_call_id. ON_SUBAGENT_UPDATE handler feeds each event
into the matching aggregator (phase → GraphEvent mapping); AgentClient
harvests contentParts onto the subagent tool_call at message save so
the child's reasoning / tool calls / final text survive a page refresh.

Reusing the SDK's battle-tested aggregator instead of a bespoke one
keeps the persisted shape identical to the parent graph's output and
drops ~100 lines of custom aggregation code.

* 🪆 fix: incremental subagent aggregation + dialog render parity

**Disappearing tool_calls**: the Recoil atom trimmed events to a 200-long
rolling window, so verbose subagents could shed the `run_step` that
originally created a tool_call part — rebuilding content from the trimmed
window then produced only the surviving text/reasoning. Fix: fold each
envelope into `contentParts` incrementally in the atom as it arrives
(new `foldSubagentEvent` + cursor state). Event trim window now affects
only the ticker, never the dialog.

**Render parity**: dialog now applies `groupSequentialToolCalls` and
renders single parts through `Container` + grouped batches through
`ToolCallGroup` — same spacing and "Used N tools" collapsing the main
message view uses.

**Width**: `min(96vw, 80rem)` — wider on big screens, still responsive.

**Labels**: "Subagent: X" is jargon. Named subagents render as
`Running "{name}" agent` / `Ran "{name}" agent` (past tense on
completion); self-spawns use `Running subtask` / `Ran subtask` since
`Running "self" agent` reads badly.

* 🪆 polish: subagent dialog parity + agent avatar in header

**Labels**: drop "subtask" framing. Self-spawn shows `Running agent` /
`Ran agent` (past tense on completion); named subagents stay
`Running "X" agent` / `Ran "X" agent`.

**Dialog render parity**: stop wrapping every part in `Container`.
TEXT keeps its `Container` (gap-3 + `mt-5` sibling margin), THINK and
TOOL_CALL render bare so their own wrappers set the full-column width
the regular message view gives them — matches the main `<Part>` dispatch.
Outer scroll region now uses `px-4 py-3` padding and a
`max-w-full flex-grow flex-col gap-0` inner wrapper, mirroring the
`MessageParts` container the main conversation uses.

**Avatar**: header icon now renders the subagent's configured avatar
via `MessageIcon` when `useAgentsMapContext()` has the child agent,
falling back to the `Users` SVG (which keeps its running-state pulse).
Same icon-left-of-label pattern the tool UI uses.

* 🪆 polish: subagent group label, ticker throttle + tail-ellipsis, scroll button

**Grouped label**: ToolCallGroup now detects all-subagent batches and
labels them "Running N agents" / "Ran N agents" instead of "Used N
tools". Mixed batches keep the existing label. The tool-name summary
is suppressed for all-subagent groups (every entry dedupes to
"subagent", which adds nothing).

**Ticker width + tail-ellipsis**: raise the preview cap to 300 chars so
wide containers aren't half-empty, and flip the ticker `<li>` to
`dir="rtl"` so `text-overflow: ellipsis` clips the *oldest* characters
(visually the left edge) — the newest tokens stay pinned to the right
regardless of container width. Bidi lays out the Latin text LTR
internally, the rtl only affects which side gets the ellipsis.

**Throttle**: `useThrottledValue` hook (trailing-edge, 1.2s) smooths
the live `Writing: …` preview so tokens no longer strobe past the
eye faster than they can be read. Ref-based internals (not `useState`)
avoid infinite-update loops when the upstream value is a new-reference
each render; `NEGATIVE_INFINITY` sentinel ensures the very first value
passes through synchronously so tests and first paint aren't delayed.

**Scroll-to-bottom**: dialog tracks `isAtBottom` with a 120px
threshold; auto-scroll only engages when the user is already following
along, and a persistent jump-to-latest button appears whenever they
scroll up — no more fighting the auto-scroll to read back.

* 🪆 polish: snappier ticker, prefix-safe labels, agents icon, readable lines

**Ticker lines are now incrementally aggregated in the atom** — same
pattern as contentParts. The raw-events rolling window is gone; event
volume no longer caps what the ticker can display. Verbose subagents
that used to drop early tool_call lines out of the window now keep the
full 3-line history (using_tool, tool_complete, writing).

**Discriminated-union ticker lines** split a constant prefix (e.g.
"Writing:") from a tail-truncatable body. The prefix lives in a
`shrink-0` span so it never gets clipped when the body overflows; the
body uses `dir="rtl"` only on itself — scoped so non-streaming lines
(e.g. "Waiting for first update…") can't get their trailing ellipsis
flipped by bidi.

**Content-aware throttle**: 800ms interval (down from 1200ms), skipped
entirely while the live buffer is below 120 chars. Early tokens now
appear immediately — no more "Reasoning: I" sitting blank for a full
second before the next heartbeat. Once the preview is long enough to
fill the container, throttling kicks in at the tighter interval.

**Header label** is now a constant verb + optional muted sub-label.
Base reads "Running agent" / "Ran agent" / "Cancelled agent" / "Agent
errored" for every subagent; named subagents get the configured agent
name rendered to the right in secondary text (self-spawns and
unresolved names omit it — "Running self agent" is nonsense).

**ToolCallGroup** now detects `allSubagents` and swaps `StackedToolIcons`
for a single `Users` glyph — otherwise the group header shows a wrench
("tool") icon next to "Ran 5 agents", which reads wrong.

* 🪆 feat: delimiter-aware tool labels in ticker + full-width tool lines

New shared `parseToolName` helper in `client/src/utils/toolLabels.ts`
— single source of truth for splitting `<tool>_mcp_<server>` ids and
mapping native tool names (web_search, execute_code, …) to their
friendly translation keys. `ToolCallGroup` drops its inline copy and
pulls from this helper.

Ticker tool lines now use the shared parser + a new `ToolIdentifier`
sub-renderer so the live log reads like the main tool UI:

  - MCP tool  → `<server> · <code-badge:tool>` (e.g. "github · `search_code`")
  - Native   → friendly name from `TOOL_FRIENDLY_NAME_KEYS`
  - Unknown  → bare `<code>` badge of the raw id

The `using_tool` / `tool_complete` rows now render with a
`flex w-full items-baseline gap-1 overflow-hidden` layout matching
the writing/reasoning rows — they take the full container width
instead of collapsing to content size. Output snippets on
`tool_complete` get the same tail-side `dir="rtl"` ellipsis so the
newest characters stay flush-right when the container is narrow.

Dropped the now-unused template i18n keys
(`com_ui_subagent_ticker_using_with_args`,
`com_ui_subagent_ticker_tool_complete`,
`com_ui_subagent_ticker_tool_output`) in favor of tokens the JSX
composes structurally. Only English is touched per the project rule;
other locales follow externally.

* 🪆 fix: dialog scroll button + auto-scroll during streaming deltas

Two race/trigger bugs in the dialog's scroll behavior:

**Button never showed**: `addEventListener('scroll', …)` in a
`useEffect` ran before Radix's portal had actually committed the
scroll container, so `scrollRef.current` was still null — the listener
never attached, `isAtBottom` stayed stuck at its initial `true`, and
the jump-to-latest button was never rendered. Swap to React's
`onScroll` prop on the element itself so the handler wires up as part
of DOM commit, not a post-commit effect.

**Auto-scroll stalled during text streaming**: the pin-to-bottom
effect only re-fired on `contentParts.length` changes. Message/reasoning
deltas extend the last TEXT/THINK part's `.text` without changing the
array length — so the view would drift up as tokens piled in and
never catch back up. Replace the length-dep effect with a
`ResizeObserver` on the inner content div; every height change (new
part or in-place growth) triggers a scroll-pin when the user is still
at the bottom.

* 🪆 fix: drop leading ellipsis from ticker body

truncatePreview was prepending ... to the tail when the buffer
exceeded 300 chars. The component's CSS already produces a left-side
ellipsis for overflow via dir=rtl + text-overflow: ellipsis — stacking
a data-level ellipsis on top renders a stray dot character right after
the Writing: / Reasoning: label (Writing: .Sure!), which looks like a
typo to the reader.

Data now returns just the last 300 chars when truncating; CSS handles
the visual cue whenever the body actually overflows its container.

* 🪆 fix: Codex review — subagent isolation + concurrent-safe throttle

Three findings from the @codex review pass, all valid:

**P1 — buildAgentInput leaks parent discovered-tool state into subagent
children.** `buildAgentInput` mutates `agent.toolRegistry`
(`overrideDeferLoadingForDiscoveredTools` flips `defer_loading:true→false`
on tools the parent previously searched for) and appends those tools'
definitions to the returned `toolDefinitions` before the function
returns. `buildSubagentConfigs` was clearing the reported
`initialSummary` / `discoveredTools` fields on the returned
AgentInputs, but that happened post-return — the registry writes and
extra tool definitions persisted on the child, silently defeating
context isolation and inflating the child's prompt.

Fix: `buildAgentInput` now takes an `isSubagent` flag that gates the
registry-mutation block and omits `initialSummary` /
`discoveredTools` at the source. `buildSubagentConfigs` passes
`{ isSubagent: true }` for every explicit child; no post-hoc cleanup
needed.

**P2 — ToolCallGroup labels a finished subagent group as still
running when the child returned no output.** `getToolMeta` computed
`hasOutput` as `!!tc.output`, which is `false` for a completed
subagent that returned empty text (the UI already has an "empty
result" fallback for that case). `allCompleted` would stay `false`
and the group header stuck on "Running N agents" forever.

Fix: treat `tc.progress === 1` as completion too — progress is the
authoritative lifecycle signal, output is just content.

**P2 — useThrottledValue schedules `setTimeout` during render.**
Discarded renders under Strict Mode / Concurrent rendering would
leave orphan timers firing against stale trees.

Fix: move `setTimeout` into a `useEffect` keyed on `[value, intervalMs,
enabled]`. Render-time still mutates refs (idempotent), but timer
scheduling lives post-commit. Cleanup on unmount and on passthrough
transitions is preserved.

* 🪆 fix: Codex P2 — wipe subagent atoms on conversation switch

`clearStepMaps()` intentionally doesn't reset `subagentProgressByToolCallId`
so a user can reopen a completed subagent's dialog mid-conversation,
but `resetSubagentAtoms` was defined and never exposed / called —
so each completed run's aggregated `contentParts` + `tickerState`
stayed resident in the `atomFamily` for the whole app session.
Unbounded growth across multi-conversation sessions.

Expose `resetSubagentAtoms` from `useStepHandler` and fire it from
`useEventHandlers` whenever the URL's `conversationId` changes.
That's the correct cleanup boundary: historical subagent dialogs
rehydrate from persisted `subagent_content` on each `tool_call` at
message-save time, so wiping live atoms on switch doesn't lose any
viewable history — it just releases per-tool-call state that the
old conversation's components no longer subscribe to.

* 🪆 fix: Codex round 3 — subagent registry isolation + post-run label

Two more valid findings.

**P1 — parent-order registry mutation leaks into subagent inputs.**
`overrideDeferLoadingForDiscoveredTools` mutates `agent.toolRegistry`
in place (the Map *and* the LCTool objects inside it). When an agent
appears both as a handoff target (normal graph node) AND an explicit
subagent child, a subagent build that ran before the parent's build
captures a reference to the same registry — the parent's later mutation
leaks through to the child.

Fix: for subagent children (`isSubagent`), clone the `toolRegistry`
Map and shallow-clone each LCTool inside before returning the inputs.
`defer_loading` flips on parent-graph registry mutations can't
propagate across the clone boundary. `toolDefinitions` also gets a
shallow-copy pass so the same isolation holds for definitions the
child carries directly.

**P2 — "Running N agents" label stuck after cancel/error.**
ToolCallGroup's all-subagent label was gated only on `allCompleted`,
which requires every child to have `hasOutput || progress === 1`.
A subagent that gets cancelled (stream ends, no `stop` phase, no
output) never satisfies that — so even after `isSubmitting` flips
false, the header stays on "Running N agents" while each individual
card correctly shows "Cancelled agent".

Fix: derive a `subagentsDone` flag as `allCompleted || !isSubmitting`
and gate the past-tense label on that. Matches the tri-state each
SubagentCall card already resolves (finished / cancelled / running).

* 🪆 fix: Codex P2 — ACL-check subagents.agent_ids on create/update

Codex flagged that `subagents.agent_ids` was accepted as arbitrary
strings on the create/update routes while `edges` got a
`validateEdgeAgentAccess` pass — so users could save subagent
references to agents they can't VIEW. At runtime `initializeClient`'s
`processAgent` ACL gate silently drops those, so the persisted
configuration and the actual behavior diverged in a way that is
difficult to diagnose.

Refactor: extract the id-set → unauthorized-ids check into a shared
`collectUnauthorizedAgentIds`, wrap it with a dedicated
`validateSubagentAccess`, and plumb the same 403-on-failure response
the edge path already returns. Applied on both POST /agents and
PATCH /agents/:id.

* 🪆 fix: Codex round 5 — ACL-disable escape hatch + ticker order

Two valid findings.

**P1 — can't disable subagents after losing access to a child.**
The subagent ACL check ran on every create/update that echoed back
the `agent_ids` list, even when the user was explicitly disabling
the feature. The UI keeps the list intact when toggling `enabled:
false`, so a user who subsequently lost VIEW on any child would be
locked in a 403 loop — every edit (including the one that turns
subagents off) bounces.

Fix: gate the ACL check on `subagents.enabled !== false` at both
the POST /agents and PATCH /agents/:id handlers. Empty list stays
a no-op. Disabling the feature is always permitted.

**P2 — ticker fold merges out-of-order previews across delta
switches.** `foldSubagentEventIntoTicker` carried `textLineIdx` /
`thinkLineIdx` across a reasoning → text → reasoning transition,
so the second reasoning chunk appended to the original reasoning
line instead of starting a new chronological one.

Fix: close the opposite buffer + cursor when a delta-type switch
is detected (same rule the content-parts reducer already applies).
Added a regression test.

* 🪆 fix: Codex round 6 — preserve mid-stream atoms + honor sequential suppression

Two valid findings.

**P2 — atom reset fires on initial chat URL assignment.**
`useEventHandlers` initialized `lastConversationIdRef` from the URL's
current `paramId`, then reset subagent atoms whenever the ref and
`paramId` disagreed. For a brand-new conversation the URL stamp goes
from `undefined → "abc123"` while the first response is still
streaming, which used to drop subagent ticker/content state mid-run
and leave dialogs missing earlier updates.

Fix: only reset when *both* the old and new IDs are non-null and
differ — i.e. a user-initiated switch between two established
conversations. The initial assignment passes through without
clearing.

**P2 — ON_SUBAGENT_UPDATE bypassed `hide_sequential_outputs`.**
Every other streaming handler in `callbacks.js`
(`ON_RUN_STEP`, `ON_MESSAGE_DELTA`, etc.) gates emission on
`checkIfLastAgent` + `metadata?.hide_sequential_outputs`, but the
subagent forwarder did an unconditional `emitEvent` — so intermediate
agents in a sequential chain were leaking their children's activity
to the client even when the chain was configured to suppress
intermediates.

Fix: accept `metadata` and apply the same `isLastAgent ||
!hide_sequential_outputs` gate. Aggregation still runs regardless of
visibility (persistence + dialog depend on it); only the SSE forward
is suppressed.

* 🪆 fix: Codex P2 — gate subagent ACL check on endpoint capability

`validateSubagentAccess` ran on every create/update where
`subagents.enabled !== false`, regardless of the endpoint-level
`subagents` capability. When the capability is off at the appConfig
level, `initializeClient` already strips the `subagents` block at
runtime — so persisted `agent_ids` are inert — but the validation
could still 403 on a legacy record whose referenced child is no
longer viewable, blocking unrelated edits.

Fix: add `isSubagentsCapabilityEnabled(req)` that reads the agents
endpoint's capabilities from `req.config` and gate both the create
and update ACL checks on it. Capability-off environments can update
agents with stale `subagents` data freely; capability-on keeps the
full ACL protection.

* 🪆 fix: Codex P2 — reset subagent atoms on id→null navigation too

Previous guard (both-established) skipped the reset whenever
`paramId` became null/undefined, so navigating from an existing
chat to a "new chat" route left stale subagent progress resident
in the `atomFamily` until the user picked a specific different
chat.

Swap the both-established check for a one-time flag: skip only the
very first `undefined → id` transition (the brand-new-chat URL
stamp that happens mid-stream), then reset on any subsequent
change — id→id, id→null, null→id-after-reset. If the user started
on an established chat the flag is true at mount, so the guard is
a no-op and every navigation resets normally.

* 🪆 fix: Codex round 9 — subagent persistence gate + handoff children

Two valid findings.

**P1 — hide_sequential_outputs also gates persistence.**
The previous fix gated the SSE forward on
`isLastAgent || !hide_sequential_outputs` but still ran the
per-tool-call `createContentAggregator` aggregation unconditionally.
`finalizeSubagentContent` would then attach the hidden intermediate
agent's child reasoning / tool output to the saved message, so a
page refresh could reveal activity that was intentionally suppressed
live. Move the visibility gate to the top of the handler — hidden
agents now skip both aggregation and emission, so
"hide_sequential_outputs" is a consistent "don't record" rule for
subagent traces.

**P2 — handoff agents' explicit subagents were silently dropped.**
`initializeClient` only resolved `subagentAgentConfigs` for the
primary config, so an agent used via handoff that had its own
`subagents.agent_ids` saved in the builder would get self-spawn
only; every explicit child was quietly ignored, creating a
saved-config / runtime mismatch the user couldn't diagnose.

Extract the resolution into a shared `loadSubagentsFor(config)`
helper and invoke it for the primary and every handoff agent in
`agentConfigs`. The `edgeAgentIds` precomputation stays outside
the helper (it's loop-invariant). Capability-off shortcuts return
empty early so the existing strip-on-capability-off path still
holds.

* 🪆 fix: Codex P2 — recursive subagent build for multi-level delegation

Previously only the outer `agents[]` loop attached `subagentConfigs`
to its inputs, so a child used as a subagent (invoked via the
`subagent` tool) lost every explicit spawn target of its own. A
user-valid configuration like A → B → C would only run the top
layer; B could never actually delegate to C from inside A's run.

Recursively build `subagentConfigs` for each child inside
`buildSubagentConfigs`, passing the child's freshly-constructed
`childInputs` down so its own `subagents.enabled` children get
resolved too. Added cycle protection via an `ancestors` Set — a
configuration like A → B → A is safely cut off at the second
encounter of A rather than recursing forever (the existing
`child.id === agent.id` guard already prevents the direct self-loop).

* 🪆 fix: Codex P2 — reset subagent atoms on useEventHandlers unmount

The effect that resets subagent atoms only fired on `paramId` change,
so unmounting the chat container (route change away from /c) never
flushed the atoms. `knownSubagentAtomKeys` lives in a ref inside
`useStepHandler` — once the hook unmounts the ref is gone, so a
subsequent remount can't clean atoms it never registered.

Added a second `useEffect` that only runs cleanup on unmount (empty
deps aside from the stable `resetSubagentAtoms` callback). Keeps
`atomFamily` bounded across full route teardowns too.

* 🪆 fix: Codex round 13 — cyclic subagent guard + prefer persisted

Two valid findings.

**P1 — cyclic subagent ref reloads the primary.** A configuration
like `A ↔ B` (B lists A as its own subagent) would send
`loadSubagentsFor` down a path that couldn't find A in
`agentConfigs` (the primary isn't stored there), so it called
`processAgent(A)` a second time. That inserts a fresh config for
the primary id, which downstream duplicates in
`[primary, ...agentConfigs.values()]` and can replace the primary's
tool context with the reloaded copy.

Fix: short-circuit when a subagent ref points back at
`primaryConfig.id` — reuse the already-loaded primary config.
Primary is always an edge id so no pruning bookkeeping needed.

**P2 — live atom preferred over canonical persisted trace.** The
dialog picked `progress.contentParts` ahead of
`persistedContent`, but the Recoil bucket is best-effort — after
a disconnect/reconnect it can be stale or partial. The server's
`subagent_content` on the `tool_call` is the canonical record
refreshed on sync. Preferring live could hide completed
tool/reasoning history that was actually persisted.

Fix: flip the preference order. Persisted wins when it's
non-empty; live covers the mid-stream window (before the parent
message saves, persisted is empty) and the older-runs fallback.
Updated the test that enforced the old order to lock the new
semantics in (separate mid-stream live-fallback assertion kept).

* 🪆 fix: Codex P2 — subagent atom reset rule simplified to 'leaving established id'

The `hasEstablishedConversationRef` + check for initial undefined→id
covered the first navigation but missed the equivalent mid-stream
URL stamp when a user goes from an existing chat to a new chat and
sends a message there (`id → null → newId`). The null → newId
transition was still hitting the reset branch and wiping the
in-flight subagent ticker/content for that first turn.

Simpler rule: only reset when the PREVIOUS paramId is an established
id. Every transition AWAY from an established chat clears (id→id2,
id→null, id→undefined); every transition FROM null/undefined passes
through (initial mount, new-chat URL stamp mid-stream). Drop the
`hasEstablishedConversationRef` machinery in favor of that single
condition.

* 🪆 fix: Codex P2 — match runtime's strict subagent enable check in ACL

Runtime (`initializeClient` + `run.ts`) treats `subagents?.enabled`
as a truthy predicate — `undefined`, `null`, missing, and `false`
all short-circuit. The ACL gate was using `!== false` which
accepted `undefined` / missing as "enabled" and could 403 a payload
whose subagent tool would be inert at runtime.

Swap both create and update to `enabled === true`. Only a strictly-
enabled payload triggers the ACL check; the disable path (`false`)
still passes through so a user who lost VIEW on a child can still
save the disable edit.

* 🪆 fix: Codex P2 — reject missing subagent references with 400

`validateSubagentAccess` collapsed through `collectUnauthorizedAgentIds`,
which returns an empty list for ids with no DB record — so typos and
references to deleted agents passed validation silently, and
`initializeClient` later dropped them at runtime. Saved config would
then list spawn targets that the backend never honored, a hard-to-
diagnose drift.

Refactor the helper into `classifyAgentReferences(ids, …)` which
returns `{ missing, unauthorized }` separately. `validateEdgeAgentAccess`
keeps its old semantics (missing is intentional — a self-referential
`from` names the agent being created). `validateSubagentReferences`
surfaces both buckets so the create/update handlers can 400 on
missing and 403 on unauthorized with distinct error messages and
`agent_ids` lists.

* 🪆 polish: tighten subagent dialog grid gap to gap-2

OGDialogContent's grid default is `gap-4`, which renders the title,
description, and scroll area as three visually separated panels.
Drop to `gap-2` so they read as one block.

* 🪆 polish: swap Subagents above Handoffs in Advanced panel

Subagents is the more common knob users reach for, so show it first.
Handoffs keep the same Controller wiring, just move below.
2026-04-25 04:02:01 -04:00
Danny Avila
dd72b7b17e
🔄 chore: Consolidate agent model imports across middleware and tests from rebase
- Updated imports for `createAgent` and `getAgent` to streamline access from a unified `~/models` path.
- Enhanced test files to reflect the new import structure, ensuring consistency and maintainability across the codebase.
- Improved clarity by removing redundant imports and aligning with the latest model updates.
2026-03-21 14:28:55 -04:00
Danny Avila
bcf45519bd
🪪 fix: Enforce VIEW ACL on Agent Edge References at Write and Runtime (#12246)
* 🛡️ fix: Enforce ACL checks on handoff edge and added-convo agent loading

Edge-linked agents and added-convo agents were fetched by ID via
getAgent without verifying the requesting user's access permissions.
This allowed an authenticated user to reference another user's private
agent in edges or addedConvo and have it initialized at runtime.

Add checkPermission(VIEW) gate in processAgent before initializing
any handoff agent, and in processAddedConvo for non-ephemeral added
agents. Unauthorized agents are logged and added to skippedAgentIds
so orphaned-edge filtering removes them cleanly.

* 🛡️ fix: Validate edge agent access at agent create/update time

Reject agent create/update requests that reference agents in edges
the requesting user cannot VIEW. This provides early feedback and
prevents storing unauthorized agent references as defense-in-depth
alongside the runtime ACL gate in processAgent.

Add collectEdgeAgentIds utility to extract all unique agent IDs from
an edge array, and validateEdgeAgentAccess helper in the v1 handler.

* 🧪 test: Improve ACL gate test coverage and correctness

- Add processAgent ACL gate tests for initializeClient (skip/allow handoff agents)
- Fix addedConvo.spec.js to mock loadAddedAgent directly instead of getAgent
- Seed permMap with ownedAgent VIEW bits in v1.spec.js update-403 test

* 🧹 chore: Remove redundant addedConvo ACL gate (now in middleware)

PR #12243 moved the addedConvo agent ACL check upstream into
canAccessAgentFromBody middleware, making the runtime check in
processAddedConvo and its spec redundant.

* 🧪 test: Rewrite processAgent ACL test with real DB and minimal mocking

Replace heavy mock-based test (12 mocks, Providers.XAI crash) with
MongoMemoryServer-backed integration test that exercises real getAgent,
checkPermission, and AclEntry — only external I/O (initializeAgent,
ToolService, AgentClient) remains mocked. Load edge utilities directly
from packages/api/src/agents/edges to sidestep the config.ts barrel.

* 🧪 fix: Use requireActual spread for @librechat/agents and @librechat/api mocks

The Providers.XAI crash was caused by mocking @librechat/agents with
a minimal replacement object, breaking the @librechat/api initialization
chain. Match the established pattern from client.test.js and
recordCollectedUsage.spec.js: spread jest.requireActual for both
packages, overriding only the functions under test.
2026-03-15 18:08:57 -04:00