LibreChat/api/server/middleware
Danny Avila 771b93bf10
🪝 feat: HITL Tool Approval Scaffolding (Slice A) (#12938)
* 🪝 feat: HITL Tool Approval Scaffolding

Adds the foundational types, job-state, config schema, and policy module
for human-in-the-loop tool approval. Purely additive — no behavior change
on existing runs. Lands ahead of the agents-SDK interrupt/checkpointer
integration so both tracks can land independently.

- LangChain HumanInterrupt-shaped types in `Agents.*` namespace
  (`HumanInterruptPayload`, `ToolApprovalRequest`, `ToolReviewConfig`,
  `PendingAction`, `ToolApprovalResolution`); `ToolCall`/`ToolCallDelta`
  gain an optional `approval` field.
- New `requires_action` job status (non-terminal) plus `pendingAction`
  field on `SerializableJobData` and `GenerationJobMetadata`. Both stores
  treat the status as paused-but-alive; Redis `updateJob` has explicit
  `requires_action`/`running` transition branches that refresh the hash
  TTL, manage the `runningJobs` set, and `HDEL pendingAction` on resume.
  Both stores include `requires_action` in `getActiveJobIdsByUser`.
- `GenerationJobManager` gains `markRequiresAction`, `getPendingAction`,
  `clearPendingAction`; `getJobCountByStatus` aggregates the new status.
- `endpoints.agents.toolApproval` config (`default`/`required`/`excluded`)
  and a policy module exporting `decideToolApproval`, `requiresApproval`,
  and `buildPendingAction` (the LangChain-shaped payload builder).
- 20 unit tests covering policy resolution and the manager lifecycle.

* 🧭 refactor: Align HITL Surface with Agents SDK Permissions Model

Reshapes Slice A on top of the agents SDK's now-landed HITL surface
(`createToolPolicyHook`, discriminated `HumanInterruptPayload`, `'bypass'`
mode naming). Host stops reimplementing evaluation logic and becomes a
config mapper + payload wrapper.

Schema (data-provider):
- `toolApproval` shape now mirrors SDK `ToolPolicyConfig` 1:1:
  `mode: 'default' | 'dontAsk' | 'bypass'`, plus `allow` / `deny` / `ask`
  glob lists and an optional `reason` template. `enabled` is the
  LibreChat-only admin kill switch.
- `'bypass'` (not `'bypassPermissions'`) — matches the SDK's surface.

Types (`Agents.*` namespace):
- `HumanInterruptType` extended to `'tool_approval' | 'ask_user_question'`.
- `HumanInterruptPayload` is now a discriminated union — `tool_approval`
  carries `action_requests` + `review_configs`; `ask_user_question`
  carries a free-form question with optional curated options.
- New: `AskUserQuestionRequest`, `AskUserQuestionOption`,
  `AskUserQuestionResolution`.
- `ToolApprovalDecision` (string union) renamed to
  `ToolApprovalDecisionType` to free the `Decision` name for the SDK's
  discriminated object union later.
- `ToolApprovalResolution` gains `reason?` and `scope?: 'once' | 'session'
  | 'always'` so route signatures stabilize before persistence lands.

Policy module (`packages/api/src/agents/hitl/policy.ts`):
- Drop `decideToolApproval` / `requiresApproval` / `ToolRef` — the SDK's
  `createToolPolicyHook` handles full evaluation
  (`deny → bypass → allow → ask → dontAsk → fallthrough(ask)`).
- Add `isHITLEnabled(policy)` — the kill-switch predicate that gates the
  SDK's `humanInTheLoop: { enabled: false }` opt-out in Slice B.
- Add `mapToolApprovalPolicy(policy)` — strips `enabled`, returns a
  `ToolPolicyConfig` to feed `createToolPolicyHook`. Structural mirror of
  the SDK type so this compiles before the SDK upgrade ships.
- Reshape `buildPendingAction(payload, ctx)` to wrap any
  `HumanInterruptPayload` with job context — accepts SDK output directly.
- Add `buildToolApprovalPayload(...)` and `buildAskUserQuestionPayload(...)`
  helpers for synthesizing payloads in tests / pre-SDK flows.

Tests:
- 22 new unit tests covering the mapper, predicate, and payload builders;
  20 → 27 total pass across policy + manager-lifecycle suites.

* 🪢 chore: Import ToolPolicyConfig From `@librechat/agents`

The SDK type now ships in 3.1.77 (already pinned on `dev`), so the
structural mirror in `policy.ts` is redundant. Drop the local interface
and import directly so future SDK changes to `ToolPolicyConfig` propagate
without our `mapToolApprovalPolicy` going stale.

* 🔑 fix: Carry tool_call_id On ToolReviewConfig (HITL)

`ToolReviewConfig` was joining with `ToolApprovalRequest` by position only.
That breaks the moment a single batch contains the same tool called twice
(e.g. a model fanning out parallel `mcp:server:search` calls): the UI can't
tell which review config applies to which action request once it filters
or reorders.

Mirrors the SDK's `ToolApprovalReviewConfig` shape — `tool_call_id` is the
join key, `action_name` is retained for display only.

Also: drop a JSDoc warning on `isHITLEnabled` so a future contributor doesn't
wire `humanInTheLoop: { enabled: true }` without supplying a host
checkpointer — the SDK's `MemorySaver` fallback is process-local and
silently breaks resume across worker hops.

- `Agents.ToolReviewConfig` adds `tool_call_id: string`
- `buildToolApprovalPayload` populates `tool_call_id` per review config
- New test covers the duplicate-tool batch case (two parallel calls to
  the same tool); 27 → 28 tests

* fix: Address HITL review findings

* fix: Refresh paused HITL Redis state

* test: Stabilize HITL abort fallback specs

* 🎨 style: Sort imports to satisfy dev lint gate (HITL)

* 🏛️ refactor: Deepen HITL approval lifecycle into one race-safe seam

Architecture-review candidate #1 (+ #4). The requires_action lifecycle was
three shallow pass-throughs over updateJob with the legal transitions
smeared across JSDoc, the JobStatus union, and each store adapter — and the
resume transition was NOT race-safe: the Redis lua checked existence, not
status, so two concurrent approval submits both drove the run (re-executing
tools / double-billing).

- IJobStore.transitionStatus: atomic compare-and-set status transition that
  only fires if the job is currently `from`. InMemory: sync compare. Redis:
  single-node lua with a status guard (cluster best-effort, matching the
  existing posture); reconciles membership sets + TTLs to `to`.
- New ApprovalLifecycle module: pause / peek / resolve / expire — guarded,
  race-safe transitions behind one interface. resolve() returns true to
  exactly one concurrent caller; the previously-undefined
  requires_action → aborted expiry edge is now explicit; peek treats
  past-expiresAt as gone (lazy expiry).
- GenerationJobManager exposes `approvals` and delegates; the three shallow
  methods (mark/get/clearPendingAction) are removed — callers cross the deep
  interface.
- #4: typeContract.spec asserts the SDK <-> data-provider HITL types stay
  compatible (fails the build on drift); RedisJobStore validates the
  pendingAction shape on deserialize instead of a bare JSON.parse (defends
  the cold-resume path against malformed/stale records).
- Tests rewritten at the deep interface: double-resolve wins once,
  pause-on-terminal rejected, explicit expiry, lazy-expiry peek.

No Slice B wiring — this deepens the existing scaffolding so the future
resume route and run seam are born crossing one race-safe interface.

* 🛡️ fix: Address Codex review on the HITL approval lifecycle

Seven findings on the lifecycle deepening (089ba09f9), all valid:

- F3 actionId guard: resolve/expire take an expectedActionId; pause records a
  flat `pendingActionId` the atomic CAS guards on, so a stale decision can't
  resume a job that has since paused for a different action.
- F4 cluster single-winner: transitionStatus now decides the winner with an
  atomic CAS on the single-slot job hash (one Lua, cluster-safe), then
  reconciles cross-slot membership sets — two concurrent resolves can no
  longer both win on Redis Cluster.
- F1 resume reaping: resolve refreshes `lastActiveAt`; both stores' stale-
  running failsafes key off it, so a long-paused approval isn't reaped right
  after resuming.
- F2 expire completedAt: expire writes completedAt so terminal cleanup
  reclaims the job (InMemory only cleans terminal jobs with completedAt set).
- F5 facade: buildJobFacade copies pendingAction into metadata so status/
  resume routes can render the prompt.
- F6 resume metadata: PendingAction + buildPendingAction carry the SDK
  interruptId/threadId needed to rebuild Command({ resume }) cross-process.
- F7 mirror: data-provider AskUserQuestionRequest gains optional description.

Tests added at the interface: stale-actionId resolve rejected, expire sets
completedAt. tsc + lint clean; policy + type-contract specs pass.

* 🛡️ fix: Address Codex round 2 on the HITL Redis adapter

Five P2 findings on abf4b86291, all valid Redis-adapter consequences of
round 1:

- G1 terminal cleanup on expiry: transitionStatus's terminal path now runs
  the same chunk/run-step/userJobs cleanup as updateJob (extracted into a
  shared applyTerminalContentCleanup). Expired approvals no longer leave
  Redis stream contents around for the full running TTL.
- G2 pause via updateJob mirrors pendingActionId, so a pause through the
  generic path carries the flat field the stale-decision guard compares.
- G3 resume via updateJob refreshes lastActiveAt (and clears pendingActionId),
  matching transitionStatus so a long-paused job isn't reaped post-resume.
- G4 getActiveJobIdsByUser excludes a requires_action job whose pendingAction
  is past expiry (both stores), via shared isPendingActionExpired — the client
  stops polling an expired prompt.
- G5 createJob clears stale pendingAction/pendingActionId/lastActiveAt on a
  reused streamId, so a fresh run never exposes a prior run's approval metadata
  and cleanup keys off the new createdAt.

Tests added: expired pending-approval excluded from the active set. tsc +
lint clean; policy + type-contract specs pass.

* 🛡️ fix: Address Codex round 3 — approval expiry lifecycle completeness

Three P2 findings on 780833d908, all valid:

- H1 status consistency: /chat/status now treats a non-expired
  requires_action job as active (matching /chat/active), so a client
  refreshing while an approval is pending resumes/subscribes instead of
  treating the run as finished and stranding it.
- H2 active expiry: cleanup now finalizes past-expiry requires_action jobs
  (→ aborted) in both stores instead of only filtering them from the active
  list — an expired prompt no longer lingers resident until key TTL. Redis
  routes through transitionStatus (terminal content cleanup); in-memory marks
  terminal + reclaims.
- H3 resumed liveness: in-memory stale-running check uses
  max(lastActivity, lastActiveAt, createdAt), so a just-resumed job isn't
  reaped on a stale per-chunk lastActivity entry before the next chunk.

Test added: in-memory cleanup finalizes + reclaims a past-expiry approval.
tsc + lint clean; policy + type-contract specs pass.

* 🛡️ fix: Address Codex round 4 — paused-job edge cases across the stack

Five P2 findings on 4324a4e776, all valid:

- I1 message validation: validateMessageReq's active-job read bypass now
  accepts a live requires_action job, so a new-conversation run that pauses
  before its final save can recover the prompt instead of 404ing.
- I2 expire targets the observed record: resolve()'s expired path passes
  `expectedActionId ?? job.pendingAction.actionId`, so a concurrent
  resume+re-pause can't let expire abort a different action.
- I3 stale/malformed prompts: new isPendingActionStale (missing OR expired)
  drives active-listing exclusion + cleanup expiry in both stores, and the
  status route + middleware require a live pendingAction — a requires_action
  job whose pendingAction was dropped on deserialize no longer reads active.
- I4 in-memory parity: InMemory updateJob mirrors pendingActionId on pause and
  clears it + refreshes lastActiveAt on resume (matching RedisJobStore), so a
  pause via the generic path is still resolvable by actionId.
- I5 long approval windows: paused-job live TTL (job/chunks/run-steps) now
  covers pendingAction.expiresAt + grace (pauseTtlSeconds), on both the
  transitionStatus and updateJob pause paths, so Redis can't evict a paused
  job before its decision window closes.

tsc + lint clean; policy + type-contract specs pass.

* 🛡️ fix: Codex round 5 — refuse unresolvable resolves; expose pending action

Two of three findings on c8abd826e1 (the third deferred to Slice B):

- J3 resolve() refuses a requires_action job that has lost its pendingAction
  (e.g. a malformed record dropped on deserialize): it expires/finalizes the
  job instead of driving a resumed run with no reviewed interrupt payload —
  consistent with how active-listing + cleanup already treat a stale prompt.
- J2 /chat/status returns the live pendingAction for a paused stream, so a
  client rebuilding from status (reload / cross-replica) has the action id +
  payload to render and submit the prompt, not just "paused".

Deferred (Slice B): J1 — emitting a terminal SSE event on approval expiry so
already-subscribed clients close. The store-level lifecycle can't emit
transport events, and there are no live SSE subscribers to a paused stream
until the Slice B runtime wiring exists; tracked for that work.

tsc + lint clean; policy + type-contract specs pass.

* 🛡️ fix: Codex final round — paused-job TTL + pendingAction in resume contract

Two of three findings on e7d9cf21b6 (third deferred to Slice B):

- K2 paused-job TTL: a paused (requires_action) job no longer inherits the
  20-minute running TTL — it uses a dedicated requires_action backstop
  (default 24h, configurable) so a no-expiry approval (the buildPendingAction
  default), which the API treats as live, isn't evicted by Redis mid-window.
  A longer pendingAction.expiresAt still extends beyond the backstop.
- K3 resume contract: pendingAction is now carried on the typed ResumeState
  (data-provider) and populated by getResumeState for a live paused job, so a
  reloading / cross-replica client can rebuild the prompt from resumeState
  (the contract useResumeOnLoad actually reads), not just a loose status field.

Deferred (Slice B): K1 — emit a terminal SSE event on expiry so already-
subscribed clients close. Requires the manager/eventTransport layer (the
store-level lifecycle and cleanup loops have no transport access) and has no
live subscriber until the Slice B subscribe/resume path exists; tracked there.

tsc + lint clean; policy + type-contract specs pass.

* ♻️ refactor: dedup HITL transition path + liveness predicate (arch review)

Two follow-ups from the post-hardening architecture re-review — both pure
dedup, no behavior change:

A — collapse the dual status-transition path. transitionStatus is now the
   sole membership-aware transition (running ⇄ requires_action). Removed the
   updateJob requires_action/running branches and the now-orphaned
   transitionToRequiresAction / transitionToRunning / refreshLiveJobTtls, plus
   the per-store pause/resume mirror logic that had to be re-synced into parity
   across review rounds (G2/G3/I4/I5). updateJob is back to a plain field
   writer + terminal cleanup. The Redis integration tests that drove
   updateJob({status}) now drive transitionStatus (the real path).

B — one canonical "is this approval live?" predicate. isPendingActionStale /
   isPendingActionExpired are exported from @librechat/api and used by the
   stores, ApprovalLifecycle (dropped its private isExpired), the /chat/status
   route, and validateMessageReq — replacing 3 inlined re-derivations that were
   the drift source behind several review findings.

tsc + lint clean; policy + type-contract specs pass. Redis integration specs
(migrated) are CI-verified.
2026-06-24 16:47:16 -04:00
..
__tests__
accessResources
assistants
config
limiters
roles
spec
validate
abortMiddleware.js
abortMiddleware.spec.js
abortRun.js
buildEndpointOption.js
buildEndpointOption.spec.js
canAccessSharedLink.js
canDeleteAccount.js
canDeleteAccount.spec.js
checkBan.js
checkDomainAllowed.js
checkInviteUser.js
checkPeoplePickerAccess.js
checkPeoplePickerAccess.spec.js
checkSharePublicAccess.js
checkSharePublicAccess.spec.js
denyRequest.js
error.js
index.js
logHeaders.js
moderateText.js
noIndex.js
optionalJwtAuth.js
optionalShareFileAuth.js
optionalShareFileAuth.spec.js
requireJwtAuth.js
requireLdapAuth.js
requireLocalAuth.js
setHeaders.js
setTwoFactorTempUser.js
uaParser.js
validateImageRequest.js
validateMessageReq.js 🪝 feat: HITL Tool Approval Scaffolding (Slice A) (#12938) 2026-06-24 16:47:16 -04:00
validateModel.js
validatePasswordReset.js
validateRegistration.js