LibreChat

mirror of https://github.com/danny-avila/LibreChat.git synced 2026-07-03 04:42:11 +00:00

History

Danny Avila 771b93bf10 🪝 feat: HITL Tool Approval Scaffolding (Slice A) (#12938 ) * 🪝 feat: HITL Tool Approval Scaffolding Adds the foundational types, job-state, config schema, and policy module for human-in-the-loop tool approval. Purely additive — no behavior change on existing runs. Lands ahead of the agents-SDK interrupt/checkpointer integration so both tracks can land independently. - LangChain HumanInterrupt-shaped types in `Agents.` namespace (`HumanInterruptPayload`, `ToolApprovalRequest`, `ToolReviewConfig`, `PendingAction`, `ToolApprovalResolution`); `ToolCall`/`ToolCallDelta` gain an optional `approval` field. - New `requires_action` job status (non-terminal) plus `pendingAction` field on `SerializableJobData` and `GenerationJobMetadata`. Both stores treat the status as paused-but-alive; Redis `updateJob` has explicit `requires_action`/`running` transition branches that refresh the hash TTL, manage the `runningJobs` set, and `HDEL pendingAction` on resume. Both stores include `requires_action` in `getActiveJobIdsByUser`. - `GenerationJobManager` gains `markRequiresAction`, `getPendingAction`, `clearPendingAction`; `getJobCountByStatus` aggregates the new status. - `endpoints.agents.toolApproval` config (`default`/`required`/`excluded`) and a policy module exporting `decideToolApproval`, `requiresApproval`, and `buildPendingAction` (the LangChain-shaped payload builder). - 20 unit tests covering policy resolution and the manager lifecycle. 🧭 refactor: Align HITL Surface with Agents SDK Permissions Model Reshapes Slice A on top of the agents SDK's now-landed HITL surface (`createToolPolicyHook`, discriminated `HumanInterruptPayload`, `'bypass'` mode naming). Host stops reimplementing evaluation logic and becomes a config mapper + payload wrapper. Schema (data-provider): - `toolApproval` shape now mirrors SDK `ToolPolicyConfig` 1:1: `mode: 'default' \| 'dontAsk' \| 'bypass'`, plus `allow` / `deny` / `ask` glob lists and an optional `reason` template. `enabled` is the LibreChat-only admin kill switch. - `'bypass'` (not `'bypassPermissions'`) — matches the SDK's surface. Types (`Agents.` namespace): - `HumanInterruptType` extended to `'tool_approval' \| 'ask_user_question'`. - `HumanInterruptPayload` is now a discriminated union — `tool_approval` carries `action_requests` + `review_configs`; `ask_user_question` carries a free-form question with optional curated options. - New: `AskUserQuestionRequest`, `AskUserQuestionOption`, `AskUserQuestionResolution`. - `ToolApprovalDecision` (string union) renamed to `ToolApprovalDecisionType` to free the `Decision` name for the SDK's discriminated object union later. - `ToolApprovalResolution` gains `reason?` and `scope?: 'once' \| 'session' \| 'always'` so route signatures stabilize before persistence lands. Policy module (`packages/api/src/agents/hitl/policy.ts`): - Drop `decideToolApproval` / `requiresApproval` / `ToolRef` — the SDK's `createToolPolicyHook` handles full evaluation (`deny → bypass → allow → ask → dontAsk → fallthrough(ask)`). - Add `isHITLEnabled(policy)` — the kill-switch predicate that gates the SDK's `humanInTheLoop: { enabled: false }` opt-out in Slice B. - Add `mapToolApprovalPolicy(policy)` — strips `enabled`, returns a `ToolPolicyConfig` to feed `createToolPolicyHook`. Structural mirror of the SDK type so this compiles before the SDK upgrade ships. - Reshape `buildPendingAction(payload, ctx)` to wrap any `HumanInterruptPayload` with job context — accepts SDK output directly. - Add `buildToolApprovalPayload(...)` and `buildAskUserQuestionPayload(...)` helpers for synthesizing payloads in tests / pre-SDK flows. Tests: - 22 new unit tests covering the mapper, predicate, and payload builders; 20 → 27 total pass across policy + manager-lifecycle suites. 🪢 chore: Import ToolPolicyConfig From `@librechat/agents` The SDK type now ships in 3.1.77 (already pinned on `dev`), so the structural mirror in `policy.ts` is redundant. Drop the local interface and import directly so future SDK changes to `ToolPolicyConfig` propagate without our `mapToolApprovalPolicy` going stale. * 🔑 fix: Carry tool_call_id On ToolReviewConfig (HITL) `ToolReviewConfig` was joining with `ToolApprovalRequest` by position only. That breaks the moment a single batch contains the same tool called twice (e.g. a model fanning out parallel `mcp:server:search` calls): the UI can't tell which review config applies to which action request once it filters or reorders. Mirrors the SDK's `ToolApprovalReviewConfig` shape — `tool_call_id` is the join key, `action_name` is retained for display only. Also: drop a JSDoc warning on `isHITLEnabled` so a future contributor doesn't wire `humanInTheLoop: { enabled: true }` without supplying a host checkpointer — the SDK's `MemorySaver` fallback is process-local and silently breaks resume across worker hops. - `Agents.ToolReviewConfig` adds `tool_call_id: string` - `buildToolApprovalPayload` populates `tool_call_id` per review config - New test covers the duplicate-tool batch case (two parallel calls to the same tool); 27 → 28 tests * fix: Address HITL review findings * fix: Refresh paused HITL Redis state * test: Stabilize HITL abort fallback specs * 🎨 style: Sort imports to satisfy dev lint gate (HITL) * 🏛️ refactor: Deepen HITL approval lifecycle into one race-safe seam Architecture-review candidate #1 (+ #4). The requires_action lifecycle was three shallow pass-throughs over updateJob with the legal transitions smeared across JSDoc, the JobStatus union, and each store adapter — and the resume transition was NOT race-safe: the Redis lua checked existence, not status, so two concurrent approval submits both drove the run (re-executing tools / double-billing). - IJobStore.transitionStatus: atomic compare-and-set status transition that only fires if the job is currently `from`. InMemory: sync compare. Redis: single-node lua with a status guard (cluster best-effort, matching the existing posture); reconciles membership sets + TTLs to `to`. - New ApprovalLifecycle module: pause / peek / resolve / expire — guarded, race-safe transitions behind one interface. resolve() returns true to exactly one concurrent caller; the previously-undefined requires_action → aborted expiry edge is now explicit; peek treats past-expiresAt as gone (lazy expiry). - GenerationJobManager exposes `approvals` and delegates; the three shallow methods (mark/get/clearPendingAction) are removed — callers cross the deep interface. - #4: typeContract.spec asserts the SDK <-> data-provider HITL types stay compatible (fails the build on drift); RedisJobStore validates the pendingAction shape on deserialize instead of a bare JSON.parse (defends the cold-resume path against malformed/stale records). - Tests rewritten at the deep interface: double-resolve wins once, pause-on-terminal rejected, explicit expiry, lazy-expiry peek. No Slice B wiring — this deepens the existing scaffolding so the future resume route and run seam are born crossing one race-safe interface. * 🛡️ fix: Address Codex review on the HITL approval lifecycle Seven findings on the lifecycle deepening (`089ba09f9`), all valid: - F3 actionId guard: resolve/expire take an expectedActionId; pause records a flat `pendingActionId` the atomic CAS guards on, so a stale decision can't resume a job that has since paused for a different action. - F4 cluster single-winner: transitionStatus now decides the winner with an atomic CAS on the single-slot job hash (one Lua, cluster-safe), then reconciles cross-slot membership sets — two concurrent resolves can no longer both win on Redis Cluster. - F1 resume reaping: resolve refreshes `lastActiveAt`; both stores' stale- running failsafes key off it, so a long-paused approval isn't reaped right after resuming. - F2 expire completedAt: expire writes completedAt so terminal cleanup reclaims the job (InMemory only cleans terminal jobs with completedAt set). - F5 facade: buildJobFacade copies pendingAction into metadata so status/ resume routes can render the prompt. - F6 resume metadata: PendingAction + buildPendingAction carry the SDK interruptId/threadId needed to rebuild Command({ resume }) cross-process. - F7 mirror: data-provider AskUserQuestionRequest gains optional description. Tests added at the interface: stale-actionId resolve rejected, expire sets completedAt. tsc + lint clean; policy + type-contract specs pass. * 🛡️ fix: Address Codex round 2 on the HITL Redis adapter Five P2 findings on `abf4b86291`, all valid Redis-adapter consequences of round 1: - G1 terminal cleanup on expiry: transitionStatus's terminal path now runs the same chunk/run-step/userJobs cleanup as updateJob (extracted into a shared applyTerminalContentCleanup). Expired approvals no longer leave Redis stream contents around for the full running TTL. - G2 pause via updateJob mirrors pendingActionId, so a pause through the generic path carries the flat field the stale-decision guard compares. - G3 resume via updateJob refreshes lastActiveAt (and clears pendingActionId), matching transitionStatus so a long-paused job isn't reaped post-resume. - G4 getActiveJobIdsByUser excludes a requires_action job whose pendingAction is past expiry (both stores), via shared isPendingActionExpired — the client stops polling an expired prompt. - G5 createJob clears stale pendingAction/pendingActionId/lastActiveAt on a reused streamId, so a fresh run never exposes a prior run's approval metadata and cleanup keys off the new createdAt. Tests added: expired pending-approval excluded from the active set. tsc + lint clean; policy + type-contract specs pass. * 🛡️ fix: Address Codex round 3 — approval expiry lifecycle completeness Three P2 findings on `780833d908`, all valid: - H1 status consistency: /chat/status now treats a non-expired requires_action job as active (matching /chat/active), so a client refreshing while an approval is pending resumes/subscribes instead of treating the run as finished and stranding it. - H2 active expiry: cleanup now finalizes past-expiry requires_action jobs (→ aborted) in both stores instead of only filtering them from the active list — an expired prompt no longer lingers resident until key TTL. Redis routes through transitionStatus (terminal content cleanup); in-memory marks terminal + reclaims. - H3 resumed liveness: in-memory stale-running check uses max(lastActivity, lastActiveAt, createdAt), so a just-resumed job isn't reaped on a stale per-chunk lastActivity entry before the next chunk. Test added: in-memory cleanup finalizes + reclaims a past-expiry approval. tsc + lint clean; policy + type-contract specs pass. * 🛡️ fix: Address Codex round 4 — paused-job edge cases across the stack Five P2 findings on `4324a4e776`, all valid: - I1 message validation: validateMessageReq's active-job read bypass now accepts a live requires_action job, so a new-conversation run that pauses before its final save can recover the prompt instead of 404ing. - I2 expire targets the observed record: resolve()'s expired path passes `expectedActionId ?? job.pendingAction.actionId`, so a concurrent resume+re-pause can't let expire abort a different action. - I3 stale/malformed prompts: new isPendingActionStale (missing OR expired) drives active-listing exclusion + cleanup expiry in both stores, and the status route + middleware require a live pendingAction — a requires_action job whose pendingAction was dropped on deserialize no longer reads active. - I4 in-memory parity: InMemory updateJob mirrors pendingActionId on pause and clears it + refreshes lastActiveAt on resume (matching RedisJobStore), so a pause via the generic path is still resolvable by actionId. - I5 long approval windows: paused-job live TTL (job/chunks/run-steps) now covers pendingAction.expiresAt + grace (pauseTtlSeconds), on both the transitionStatus and updateJob pause paths, so Redis can't evict a paused job before its decision window closes. tsc + lint clean; policy + type-contract specs pass. * 🛡️ fix: Codex round 5 — refuse unresolvable resolves; expose pending action Two of three findings on `c8abd826e1` (the third deferred to Slice B): - J3 resolve() refuses a requires_action job that has lost its pendingAction (e.g. a malformed record dropped on deserialize): it expires/finalizes the job instead of driving a resumed run with no reviewed interrupt payload — consistent with how active-listing + cleanup already treat a stale prompt. - J2 /chat/status returns the live pendingAction for a paused stream, so a client rebuilding from status (reload / cross-replica) has the action id + payload to render and submit the prompt, not just "paused". Deferred (Slice B): J1 — emitting a terminal SSE event on approval expiry so already-subscribed clients close. The store-level lifecycle can't emit transport events, and there are no live SSE subscribers to a paused stream until the Slice B runtime wiring exists; tracked for that work. tsc + lint clean; policy + type-contract specs pass. * 🛡️ fix: Codex final round — paused-job TTL + pendingAction in resume contract Two of three findings on `e7d9cf21b6` (third deferred to Slice B): - K2 paused-job TTL: a paused (requires_action) job no longer inherits the 20-minute running TTL — it uses a dedicated requires_action backstop (default 24h, configurable) so a no-expiry approval (the buildPendingAction default), which the API treats as live, isn't evicted by Redis mid-window. A longer pendingAction.expiresAt still extends beyond the backstop. - K3 resume contract: pendingAction is now carried on the typed ResumeState (data-provider) and populated by getResumeState for a live paused job, so a reloading / cross-replica client can rebuild the prompt from resumeState (the contract useResumeOnLoad actually reads), not just a loose status field. Deferred (Slice B): K1 — emit a terminal SSE event on expiry so already- subscribed clients close. Requires the manager/eventTransport layer (the store-level lifecycle and cleanup loops have no transport access) and has no live subscriber until the Slice B subscribe/resume path exists; tracked there. tsc + lint clean; policy + type-contract specs pass. * ♻️ refactor: dedup HITL transition path + liveness predicate (arch review) Two follow-ups from the post-hardening architecture re-review — both pure dedup, no behavior change: A — collapse the dual status-transition path. transitionStatus is now the sole membership-aware transition (running ⇄ requires_action). Removed the updateJob requires_action/running branches and the now-orphaned transitionToRequiresAction / transitionToRunning / refreshLiveJobTtls, plus the per-store pause/resume mirror logic that had to be re-synced into parity across review rounds (G2/G3/I4/I5). updateJob is back to a plain field writer + terminal cleanup. The Redis integration tests that drove updateJob({status}) now drive transitionStatus (the real path). B — one canonical "is this approval live?" predicate. isPendingActionStale / isPendingActionExpired are exported from @librechat/api and used by the stores, ApprovalLifecycle (dropped its private isExpired), the /chat/status route, and validateMessageReq — replacing 3 inlined re-derivations that were the drift source behind several review findings. tsc + lint clean; policy + type-contract specs pass. Redis integration specs (migrated) are CI-verified.		2026-06-24 16:47:16 -04:00
..
__tests__
accessResources
assistants
config
limiters
roles
spec
validate
abortMiddleware.js
abortMiddleware.spec.js
abortRun.js
buildEndpointOption.js
buildEndpointOption.spec.js
canAccessSharedLink.js
canDeleteAccount.js
canDeleteAccount.spec.js
checkBan.js
checkDomainAllowed.js
checkInviteUser.js
checkPeoplePickerAccess.js
checkPeoplePickerAccess.spec.js
checkSharePublicAccess.js
checkSharePublicAccess.spec.js
denyRequest.js
error.js
index.js
logHeaders.js
moderateText.js
noIndex.js
optionalJwtAuth.js
optionalShareFileAuth.js
optionalShareFileAuth.spec.js
requireJwtAuth.js
requireLdapAuth.js
requireLocalAuth.js
setHeaders.js
setTwoFactorTempUser.js
uaParser.js
validateImageRequest.js
validateMessageReq.js	🪝 feat: HITL Tool Approval Scaffolding (Slice A) (#12938 )	2026-06-24 16:47:16 -04:00
validateModel.js
validatePasswordReset.js
validateRegistration.js