* 🪝 feat: HITL Tool Approval Scaffolding
Adds the foundational types, job-state, config schema, and policy module
for human-in-the-loop tool approval. Purely additive — no behavior change
on existing runs. Lands ahead of the agents-SDK interrupt/checkpointer
integration so both tracks can land independently.
- LangChain HumanInterrupt-shaped types in `Agents.*` namespace
(`HumanInterruptPayload`, `ToolApprovalRequest`, `ToolReviewConfig`,
`PendingAction`, `ToolApprovalResolution`); `ToolCall`/`ToolCallDelta`
gain an optional `approval` field.
- New `requires_action` job status (non-terminal) plus `pendingAction`
field on `SerializableJobData` and `GenerationJobMetadata`. Both stores
treat the status as paused-but-alive; Redis `updateJob` has explicit
`requires_action`/`running` transition branches that refresh the hash
TTL, manage the `runningJobs` set, and `HDEL pendingAction` on resume.
Both stores include `requires_action` in `getActiveJobIdsByUser`.
- `GenerationJobManager` gains `markRequiresAction`, `getPendingAction`,
`clearPendingAction`; `getJobCountByStatus` aggregates the new status.
- `endpoints.agents.toolApproval` config (`default`/`required`/`excluded`)
and a policy module exporting `decideToolApproval`, `requiresApproval`,
and `buildPendingAction` (the LangChain-shaped payload builder).
- 20 unit tests covering policy resolution and the manager lifecycle.
* 🧭 refactor: Align HITL Surface with Agents SDK Permissions Model
Reshapes Slice A on top of the agents SDK's now-landed HITL surface
(`createToolPolicyHook`, discriminated `HumanInterruptPayload`, `'bypass'`
mode naming). Host stops reimplementing evaluation logic and becomes a
config mapper + payload wrapper.
Schema (data-provider):
- `toolApproval` shape now mirrors SDK `ToolPolicyConfig` 1:1:
`mode: 'default' | 'dontAsk' | 'bypass'`, plus `allow` / `deny` / `ask`
glob lists and an optional `reason` template. `enabled` is the
LibreChat-only admin kill switch.
- `'bypass'` (not `'bypassPermissions'`) — matches the SDK's surface.
Types (`Agents.*` namespace):
- `HumanInterruptType` extended to `'tool_approval' | 'ask_user_question'`.
- `HumanInterruptPayload` is now a discriminated union — `tool_approval`
carries `action_requests` + `review_configs`; `ask_user_question`
carries a free-form question with optional curated options.
- New: `AskUserQuestionRequest`, `AskUserQuestionOption`,
`AskUserQuestionResolution`.
- `ToolApprovalDecision` (string union) renamed to
`ToolApprovalDecisionType` to free the `Decision` name for the SDK's
discriminated object union later.
- `ToolApprovalResolution` gains `reason?` and `scope?: 'once' | 'session'
| 'always'` so route signatures stabilize before persistence lands.
Policy module (`packages/api/src/agents/hitl/policy.ts`):
- Drop `decideToolApproval` / `requiresApproval` / `ToolRef` — the SDK's
`createToolPolicyHook` handles full evaluation
(`deny → bypass → allow → ask → dontAsk → fallthrough(ask)`).
- Add `isHITLEnabled(policy)` — the kill-switch predicate that gates the
SDK's `humanInTheLoop: { enabled: false }` opt-out in Slice B.
- Add `mapToolApprovalPolicy(policy)` — strips `enabled`, returns a
`ToolPolicyConfig` to feed `createToolPolicyHook`. Structural mirror of
the SDK type so this compiles before the SDK upgrade ships.
- Reshape `buildPendingAction(payload, ctx)` to wrap any
`HumanInterruptPayload` with job context — accepts SDK output directly.
- Add `buildToolApprovalPayload(...)` and `buildAskUserQuestionPayload(...)`
helpers for synthesizing payloads in tests / pre-SDK flows.
Tests:
- 22 new unit tests covering the mapper, predicate, and payload builders;
20 → 27 total pass across policy + manager-lifecycle suites.
* 🪢 chore: Import ToolPolicyConfig From `@librechat/agents`
The SDK type now ships in 3.1.77 (already pinned on `dev`), so the
structural mirror in `policy.ts` is redundant. Drop the local interface
and import directly so future SDK changes to `ToolPolicyConfig` propagate
without our `mapToolApprovalPolicy` going stale.
* 🔑 fix: Carry tool_call_id On ToolReviewConfig (HITL)
`ToolReviewConfig` was joining with `ToolApprovalRequest` by position only.
That breaks the moment a single batch contains the same tool called twice
(e.g. a model fanning out parallel `mcp:server:search` calls): the UI can't
tell which review config applies to which action request once it filters
or reorders.
Mirrors the SDK's `ToolApprovalReviewConfig` shape — `tool_call_id` is the
join key, `action_name` is retained for display only.
Also: drop a JSDoc warning on `isHITLEnabled` so a future contributor doesn't
wire `humanInTheLoop: { enabled: true }` without supplying a host
checkpointer — the SDK's `MemorySaver` fallback is process-local and
silently breaks resume across worker hops.
- `Agents.ToolReviewConfig` adds `tool_call_id: string`
- `buildToolApprovalPayload` populates `tool_call_id` per review config
- New test covers the duplicate-tool batch case (two parallel calls to
the same tool); 27 → 28 tests
* fix: Address HITL review findings
* fix: Refresh paused HITL Redis state
* test: Stabilize HITL abort fallback specs
* 🎨 style: Sort imports to satisfy dev lint gate (HITL)
* 🏛️ refactor: Deepen HITL approval lifecycle into one race-safe seam
Architecture-review candidate #1 (+ #4). The requires_action lifecycle was
three shallow pass-throughs over updateJob with the legal transitions
smeared across JSDoc, the JobStatus union, and each store adapter — and the
resume transition was NOT race-safe: the Redis lua checked existence, not
status, so two concurrent approval submits both drove the run (re-executing
tools / double-billing).
- IJobStore.transitionStatus: atomic compare-and-set status transition that
only fires if the job is currently `from`. InMemory: sync compare. Redis:
single-node lua with a status guard (cluster best-effort, matching the
existing posture); reconciles membership sets + TTLs to `to`.
- New ApprovalLifecycle module: pause / peek / resolve / expire — guarded,
race-safe transitions behind one interface. resolve() returns true to
exactly one concurrent caller; the previously-undefined
requires_action → aborted expiry edge is now explicit; peek treats
past-expiresAt as gone (lazy expiry).
- GenerationJobManager exposes `approvals` and delegates; the three shallow
methods (mark/get/clearPendingAction) are removed — callers cross the deep
interface.
- #4: typeContract.spec asserts the SDK <-> data-provider HITL types stay
compatible (fails the build on drift); RedisJobStore validates the
pendingAction shape on deserialize instead of a bare JSON.parse (defends
the cold-resume path against malformed/stale records).
- Tests rewritten at the deep interface: double-resolve wins once,
pause-on-terminal rejected, explicit expiry, lazy-expiry peek.
No Slice B wiring — this deepens the existing scaffolding so the future
resume route and run seam are born crossing one race-safe interface.
* 🛡️ fix: Address Codex review on the HITL approval lifecycle
Seven findings on the lifecycle deepening (089ba09f9), all valid:
- F3 actionId guard: resolve/expire take an expectedActionId; pause records a
flat `pendingActionId` the atomic CAS guards on, so a stale decision can't
resume a job that has since paused for a different action.
- F4 cluster single-winner: transitionStatus now decides the winner with an
atomic CAS on the single-slot job hash (one Lua, cluster-safe), then
reconciles cross-slot membership sets — two concurrent resolves can no
longer both win on Redis Cluster.
- F1 resume reaping: resolve refreshes `lastActiveAt`; both stores' stale-
running failsafes key off it, so a long-paused approval isn't reaped right
after resuming.
- F2 expire completedAt: expire writes completedAt so terminal cleanup
reclaims the job (InMemory only cleans terminal jobs with completedAt set).
- F5 facade: buildJobFacade copies pendingAction into metadata so status/
resume routes can render the prompt.
- F6 resume metadata: PendingAction + buildPendingAction carry the SDK
interruptId/threadId needed to rebuild Command({ resume }) cross-process.
- F7 mirror: data-provider AskUserQuestionRequest gains optional description.
Tests added at the interface: stale-actionId resolve rejected, expire sets
completedAt. tsc + lint clean; policy + type-contract specs pass.
* 🛡️ fix: Address Codex round 2 on the HITL Redis adapter
Five P2 findings on abf4b86291, all valid Redis-adapter consequences of
round 1:
- G1 terminal cleanup on expiry: transitionStatus's terminal path now runs
the same chunk/run-step/userJobs cleanup as updateJob (extracted into a
shared applyTerminalContentCleanup). Expired approvals no longer leave
Redis stream contents around for the full running TTL.
- G2 pause via updateJob mirrors pendingActionId, so a pause through the
generic path carries the flat field the stale-decision guard compares.
- G3 resume via updateJob refreshes lastActiveAt (and clears pendingActionId),
matching transitionStatus so a long-paused job isn't reaped post-resume.
- G4 getActiveJobIdsByUser excludes a requires_action job whose pendingAction
is past expiry (both stores), via shared isPendingActionExpired — the client
stops polling an expired prompt.
- G5 createJob clears stale pendingAction/pendingActionId/lastActiveAt on a
reused streamId, so a fresh run never exposes a prior run's approval metadata
and cleanup keys off the new createdAt.
Tests added: expired pending-approval excluded from the active set. tsc +
lint clean; policy + type-contract specs pass.
* 🛡️ fix: Address Codex round 3 — approval expiry lifecycle completeness
Three P2 findings on 780833d908, all valid:
- H1 status consistency: /chat/status now treats a non-expired
requires_action job as active (matching /chat/active), so a client
refreshing while an approval is pending resumes/subscribes instead of
treating the run as finished and stranding it.
- H2 active expiry: cleanup now finalizes past-expiry requires_action jobs
(→ aborted) in both stores instead of only filtering them from the active
list — an expired prompt no longer lingers resident until key TTL. Redis
routes through transitionStatus (terminal content cleanup); in-memory marks
terminal + reclaims.
- H3 resumed liveness: in-memory stale-running check uses
max(lastActivity, lastActiveAt, createdAt), so a just-resumed job isn't
reaped on a stale per-chunk lastActivity entry before the next chunk.
Test added: in-memory cleanup finalizes + reclaims a past-expiry approval.
tsc + lint clean; policy + type-contract specs pass.
* 🛡️ fix: Address Codex round 4 — paused-job edge cases across the stack
Five P2 findings on 4324a4e776, all valid:
- I1 message validation: validateMessageReq's active-job read bypass now
accepts a live requires_action job, so a new-conversation run that pauses
before its final save can recover the prompt instead of 404ing.
- I2 expire targets the observed record: resolve()'s expired path passes
`expectedActionId ?? job.pendingAction.actionId`, so a concurrent
resume+re-pause can't let expire abort a different action.
- I3 stale/malformed prompts: new isPendingActionStale (missing OR expired)
drives active-listing exclusion + cleanup expiry in both stores, and the
status route + middleware require a live pendingAction — a requires_action
job whose pendingAction was dropped on deserialize no longer reads active.
- I4 in-memory parity: InMemory updateJob mirrors pendingActionId on pause and
clears it + refreshes lastActiveAt on resume (matching RedisJobStore), so a
pause via the generic path is still resolvable by actionId.
- I5 long approval windows: paused-job live TTL (job/chunks/run-steps) now
covers pendingAction.expiresAt + grace (pauseTtlSeconds), on both the
transitionStatus and updateJob pause paths, so Redis can't evict a paused
job before its decision window closes.
tsc + lint clean; policy + type-contract specs pass.
* 🛡️ fix: Codex round 5 — refuse unresolvable resolves; expose pending action
Two of three findings on c8abd826e1 (the third deferred to Slice B):
- J3 resolve() refuses a requires_action job that has lost its pendingAction
(e.g. a malformed record dropped on deserialize): it expires/finalizes the
job instead of driving a resumed run with no reviewed interrupt payload —
consistent with how active-listing + cleanup already treat a stale prompt.
- J2 /chat/status returns the live pendingAction for a paused stream, so a
client rebuilding from status (reload / cross-replica) has the action id +
payload to render and submit the prompt, not just "paused".
Deferred (Slice B): J1 — emitting a terminal SSE event on approval expiry so
already-subscribed clients close. The store-level lifecycle can't emit
transport events, and there are no live SSE subscribers to a paused stream
until the Slice B runtime wiring exists; tracked for that work.
tsc + lint clean; policy + type-contract specs pass.
* 🛡️ fix: Codex final round — paused-job TTL + pendingAction in resume contract
Two of three findings on e7d9cf21b6 (third deferred to Slice B):
- K2 paused-job TTL: a paused (requires_action) job no longer inherits the
20-minute running TTL — it uses a dedicated requires_action backstop
(default 24h, configurable) so a no-expiry approval (the buildPendingAction
default), which the API treats as live, isn't evicted by Redis mid-window.
A longer pendingAction.expiresAt still extends beyond the backstop.
- K3 resume contract: pendingAction is now carried on the typed ResumeState
(data-provider) and populated by getResumeState for a live paused job, so a
reloading / cross-replica client can rebuild the prompt from resumeState
(the contract useResumeOnLoad actually reads), not just a loose status field.
Deferred (Slice B): K1 — emit a terminal SSE event on expiry so already-
subscribed clients close. Requires the manager/eventTransport layer (the
store-level lifecycle and cleanup loops have no transport access) and has no
live subscriber until the Slice B subscribe/resume path exists; tracked there.
tsc + lint clean; policy + type-contract specs pass.
* ♻️ refactor: dedup HITL transition path + liveness predicate (arch review)
Two follow-ups from the post-hardening architecture re-review — both pure
dedup, no behavior change:
A — collapse the dual status-transition path. transitionStatus is now the
sole membership-aware transition (running ⇄ requires_action). Removed the
updateJob requires_action/running branches and the now-orphaned
transitionToRequiresAction / transitionToRunning / refreshLiveJobTtls, plus
the per-store pause/resume mirror logic that had to be re-synced into parity
across review rounds (G2/G3/I4/I5). updateJob is back to a plain field
writer + terminal cleanup. The Redis integration tests that drove
updateJob({status}) now drive transitionStatus (the real path).
B — one canonical "is this approval live?" predicate. isPendingActionStale /
isPendingActionExpired are exported from @librechat/api and used by the
stores, ApprovalLifecycle (dropped its private isExpired), the /chat/status
route, and validateMessageReq — replacing 3 inlined re-derivations that were
the drift source behind several review findings.
tsc + lint clean; policy + type-contract specs pass. Redis integration specs
(migrated) are CI-verified.