Commit graph

1 commit

Author SHA1 Message Date
Danny Avila
9225a279eb 🎚️ feat: Per-User Skill Active/Inactive Toggle with Ownership-Aware Defaults (#12692)
* feat: per-user skill active/inactive toggle with ownership-aware defaults

- Add `skillStates` map (Record<string, boolean>) to user schema for
  per-user active/inactive overrides on skills
- Add `defaultActiveOnShare` to interface.skills config (default: false)
  so admins can control whether shared skills auto-activate
- Add GET/POST /api/user/settings/skills/active endpoints with validation
- Add React Query hooks with optimistic mutations for skill states
- Add useSkillActiveState hook with ownership-aware resolution:
  owned skills default active, shared skills default inactive
- Add toggle switch UI to SkillListItem and SkillDetail components
- Filter inactive skills in injectSkillCatalog before agent injection
- Add localization keys for active/inactive labels

* fix: use Record instead of Map for IUser.skillStates

Mongoose .lean() flattens Map to a plain object, causing type
incompatibility with IUser in methods that return lean documents.

* fix: address review findings for skill active states

- Fail-closed when userId is absent: filter rejects all shared skills
  instead of passing them through unfiltered (Codex P1)
- Validate Mongoose Map key characters (reject . and $) in controller
  to return 400 instead of a 500 from schema validation (Codex P2)
- Block toggle while initial skill states query is loading to prevent
  overwriting server-side overrides with an empty snapshot (Codex P2)
- Extract shared SkillToggle component, eliminating duplicate toggle
  markup in SkillListItem and SkillDetail (Finding #3)
- Move skill state query/mutation hooks from Favorites.ts to
  Skills/queries.ts per feature-directory convention (Finding #4)
- Fix hardcoded English aria-label in SkillListItem by passing the
  localized string from the parent SkillList (Finding #5)
- Fix inline arrow in SkillList render loop: pass stable callback
  reference so SkillListItem memo() is not invalidated (Finding #1)
- Extract toRecord() helper in controller to DRY the Map-to-Object
  conversion (Finding #6)
- Remove Promise.resolve wrapping synchronous config read (Finding #8)
- Remove unused TUpdateSkillStatesRequest type (Finding #12)

* fix: forward tabIndex on SkillToggle to preserve list keyboard nav

The original inline toggle had tabIndex={-1} so the row itself
remained the sole tab target. The extraction into SkillToggle
dropped this prop, making every list toggle a tab stop. Add an
optional tabIndex prop and pass -1 from SkillListItem.

* fix: plumb skillStates to all agent entry points, isolate toggle keydown

- Add skillStates/defaultActiveOnShare loading to openai.js and
  responses.js controllers so shared-skill activation is respected
  across all agent entry points, not just initialize.js (Codex P1)
- Stop keydown propagation on SkillToggle so Enter/Space does not
  bubble to the parent row's navigation handler (Codex P2)

* fix: paginate catalog fetch and serialize toggle writes

- Paginate listSkillsByAccess (up to 10 pages of 100) until the active
  catalog quota is filled, so inactive shared skills in recent positions
  do not starve active owned skills past the first page (Codex P1)
- Extend listSkillsByAccess interface with cursor/has_more/after for
  catalog pagination
- Serialize skill-state writes via a ref queue: one in-flight request
  at a time, with the latest desired state sent when the previous one
  settles. Prevents last-response-wins races where an older request
  overwrites newer toggles (Codex P2)

* fix: share write queue across hook instances, block toggle on fetch error

- Move the write queue from a per-instance useRef to a module-scoped
  object so every mount of useSkillActiveState (SkillList, SkillDetail,
  etc.) serializes against the same in-flight slot. Prior per-instance
  queues allowed two components to race full-map POSTs (Codex P1)
- Extend the toggle guard beyond isLoading: also block when isError is
  true or data is undefined. Prevents a failed GET from seeding a
  toggle with an empty baseline that would wipe server-side overrides
  on the next successful POST (Codex P1)

* fix: stale closure, orphan cleanup, and cap-error UX

- Read toggle baseline from React Query cache via queryClient.getQueryData
  instead of the captured skillStates closure. The closure can be stale
  between onMutate's setQueryData and the next render, so rapid successive
  toggles would build on old state and drop earlier changes (Codex P1)
- Surface the MAX_SKILL_STATES_EXCEEDED error code with a specific toast
  key (com_ui_skill_states_limit) so users understand the 200-cap rather
  than seeing a generic error
- Prune orphaned entries (skillIds whose Skill doc no longer exists) on
  both GET and POST in SkillStatesController. Self-heals over time
  without needing cascade-delete hooks or a migration job. Uses one
  indexed Skill._id query per request

* test: pin skill active-state precedence with unit tests

Extract the active-state resolution logic from a closure inside
injectSkillCatalog into an exported resolveSkillActive helper, then
cover every branch of the precedence matrix:

- Fails closed when userId is absent (even with defaultActiveOnShare=true)
- Explicit override wins over ownership and config (both true and false)
- Owned skills default to active when no override is set
- Shared skills default to defaultActiveOnShare value
- Undefined skillStates behaves identically to an empty object
- defaultActiveOnShare defaults to false when omitted
- Owned skills ignore defaultActiveOnShare entirely

Closes Finding #2 from the pre-rebase comprehensive review. Mirrors
the existing scopeSkillIds test style; injectSkillCatalog now calls
resolveSkillActive instead of inlining the closure.

* refactor: limit skill active toggle to detail header, drop label

- Remove the per-row toggle from SkillListItem and the active-state
  plumbing (hook call, isSkillEnabled/onToggleEnabled/toggleAriaLabel
  props) from SkillList. The detail view is now the single place to
  change a skill's active state
- Drop dim/muted styling for inactive skills in the sidebar: without
  a control there, the visual indication has nowhere to land
- Resize SkillToggle to match neighbor buttons: outer h-9 container,
  h-6 w-11 track with size-5 knob, no label span. The 'Active' /
  'Inactive' text that accompanied the detail-view toggle is removed
- Remove the now-unused label prop and tabIndex prop (the tabIndex
  existed only for the list-row context) from SkillToggle. Drop the
  onKeyDown stopPropagation for the same reason
- Remove now-orphaned com_ui_skill_active / com_ui_skill_inactive
  translation keys

* style: shrink SkillToggle track to h-5 w-9 with size-4 knob

Container stays at h-9 to match neighbor button heights. The toggle
track itself drops from h-6 w-11 to h-5 w-9, with a size-4 knob
travelling 1.125rem on activation. Visually lighter inside the row.

* fix: remove redundant skillStates entries that match the resolved default

When a toggle lands on the ownership/config default, delete the key
from the map instead of persisting `{id: defaultValue}`. Without this,
a user toggling a skill off and back on would leave `{id: true}` for
an owned skill (whose default is already true), silently consuming a
slot against the 200-entry cap. Repeated round-trip toggles could
exhaust the quota with zero meaningful overrides (Codex P2).

Preserves the exceptions-list invariant that the runtime-resolution
design depends on.

* fix: prune before enforcing skill-state cap; reject non-ObjectId keys

Reorder the update controller so pruneOrphans runs before the 200-cap
check. Without this, a user near the cap with some orphaned entries
(skills deleted since their last GET) could send a payload that would
pass after pruning but gets rejected by the raw-size check first.

Add a sanity cap on raw payload size (2 * MAX_SKILL_STATES) so abusive
inputs do not reach the DB query, and enforce the real cap on the
pruned result instead.

Harden pruneOrphans: the earlier early-return path could pass
non-ObjectId keys through unchanged. Now only valid ObjectIds are
returned, and the Skill-model-unavailable fallback filters by format.

Also add isValidObjectIdString validation at the input boundary so
malformed (but otherwise non-Mongo-unsafe) keys never reach persistence
(Codex P2 x2).

* fix: enforce active filter at execute time, prune revoked shares, scope queue per user

P1: injectSkillCatalog now returns activeSkillIds (the filtered set
that appears in the catalog). initializeAgent uses that set as the
stored accessibleSkillIds on the initialized agent, so getSkillByName
at runtime cannot resolve a deactivated skill — even if the LLM
hallucinates a name or the user invokes by direct-invocation shorthand.
Previously the executor authorized against the full ACL set, bypassing
the active-state guarantee (Codex P1).

P2: pruneOrphans now checks user access via findAccessibleResources
in addition to skill existence. When a share is revoked, the user's
skillStates entry for that skill had no cleanup path and silently
consumed the 200-cap. Self-heals on both GET and POST. One extra ACL
query per settings read/write; scoped to a single user so no N-user
amplification (Codex P2).

P2: the write queue moves from a single module-scoped object to a Map
keyed by userId. Logout/login in the same tab can no longer flush the
previous user's pending snapshot under the new session's auth. Each
userId gets its own pending/inFlight slot; the in-flight request
retains its original auth via the cookie already attached when sent,
so the race window closes (Codex P2).

* refactor: extract skillStates helpers to packages/api; add tests; polish

Address the remaining valid findings from the comprehensive review:

- Extract toRecord, loadSkillStates, validateSkillStatesPayload, and
  pruneOrphanSkillStates into packages/api/src/skills/skillStates.ts
  as TypeScript. The controller in /api shrinks to a ~90-line thin
  wrapper that builds live dependency adapters for Mongoose + the
  permission service (Review #2 DRY, #3 workspace boundary)

- Replace the triplicated 12-line skillStates loading block in
  initialize.js, openai.js, and responses.js with a single call to
  loadSkillStates from @librechat/api. One helper, three sites

- Swap console.error for the project logger in the controller
  (Review #7)

- Remove the redundant INVALID_KEY_PATTERN regex: a valid ObjectId
  cannot contain . or $, so isValidObjectIdString already covers it
  (Review #11)

- Parameterize the 200-cap error toast with {{0}} interpolation
  driven by the error response's `limit` field, so future changes to
  MAX_SKILL_STATES update the UI message automatically (Review #12)

- Add 24 unit tests for the new skillStates helpers (toRecord,
  resolveDefaultActiveOnShare, loadSkillStates, validateSkillStates-
  Payload, pruneOrphanSkillStates) covering success paths, malformed
  input, cap boundaries, and parallel-query behavior (Review #4)

- Add 10 tests for injectSkillCatalog pagination covering empty
  accessible set, missing listSkillsByAccess, single-page filter,
  owned-vs-shared defaults, explicit-override precedence, multi-page
  collection, MAX_CATALOG_PAGES safety cap, early termination on
  has_more=false, additional_instructions injection, and fail-closed
  without userId (Review #5)

Total test count: 60 (was 26 on this surface).

* fix: rename skillStates ValidationError to avoid barrel-export collision

packages/api/src/types/error.ts already exports a ValidationError
(MongooseError extension). Re-exporting a different shape from
skills/skillStates.ts through the skills barrel caused TS2308 in CI
because the root index re-exports both. Rename to
SkillStatesValidationError to keep the exports disjoint.

* refactor: tighten tests and absorb caller guard into loadSkillStates

Address the followup review findings:

- Add optional `accessibleSkillIds` param to loadSkillStates so the
  helper short-circuits to defaults when no skills are accessible.
  All three controllers drop the residual 7-line conditional wrapper
  in favor of a single destructured call (Review #2)

- Remove the unreachable `typeof key !== 'string'` check from
  validateSkillStatesPayload: Object.entries always yields string
  keys per the JS spec (Review #3)

- Replace the two `as unknown as` agent casts in the injectSkillCatalog
  tests with a `makeAgent()` factory typed directly as the function's
  parameter shape (Review #4)

- Tighten the MAX_CATALOG_PAGES assertion from `toBeLessThanOrEqual(11)`
  to `toHaveBeenCalledTimes(10)` — the loop deterministically makes
  exactly 10 page fetches before hitting the cap (Review #1)

- Rewrite the parallel-execution test for pruneOrphanSkillStates using
  deferred promises instead of microtask-order assertions. The test
  now inspects `toHaveBeenCalledTimes(1)` on both mocks after a single
  Promise.resolve() yield, pinning Promise.all usage without relying
  on push-order into a shared array (Review #5)

- Evict stale writeQueue entries on user change via a module-scoped
  `lastSeenUserId` sentinel. When a different user's toggle is the
  first one after a logout/login, the previous user's queue entry is
  deleted. Keeps the Map bounded without adding hook-instance effect
  cleanup (Review #6)

* fix(test): mock loadSkillStates in openai and responses controller specs

The prior refactor replaced the inline 12-line skillStates loading
block with a call to loadSkillStates from @librechat/api. Both
controller spec files mock @librechat/api as a flat object, so any
new named import from that package is undefined in the test env.
Calling `await loadSkillStates(...)` threw before recordCollectedUsage
ran, surfacing as "undefined is not iterable" on the test's array
destructure of `mockRecordCollectedUsage.mock.calls[0]`.

Add the missing mock to both spec files alongside the existing
scopeSkillIds stub.

* fix: abandon stale skillStates write queues on user switch

Close the cross-session leak window where an in-flight flush loop
still holds a reference to a previous user's queue: it could fire its
next mutateAsync under the new session's auth cookies and persist
the stale snapshot to the new user's document (Codex P1).

Add an `abandoned` flag on `WriteQueue`. Three mechanisms cooperate:

- `getWriteQueue` marks every non-active queue abandoned when the
  user differs from the last-seen identity (pre-existing eviction
  site, now more aggressive).
- A `useEffect` on `userId` calls the same abandonment pass on every
  render with a new active identity, covering the window between
  logout/login and the new user's first toggle (when `getWriteQueue`
  would otherwise not fire).
- The flush loop checks `!queue.abandoned` in its while condition so
  the second and later iterations exit without firing another
  `mutateAsync` after the session changes.

The first iteration's in-flight request (already dispatched under the
original user's cookies) still runs to completion or failure on its
own — only the subsequent iterations, which are the dangerous ones,
are blocked.
2026-04-25 04:02:00 -04:00