Commit graph

15 commits

Author SHA1 Message Date
Danny Avila
197a1dc4e2
🧬 feat: Add GitHub Skill Sync (#13293)
Some checks failed
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
GitNexus Index / index (push) Waiting to run
GitNexus Index / post-index (push) Blocked by required conditions
Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run
Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run
Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run
Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions
Sync Helm Chart Tags / Ignore non-main push (push) Waiting to run
Sync Helm Chart Tags / Sync chart tags (push) Waiting to run
Publish `librechat-data-provider` to NPM / pack (push) Has been cancelled
Publish `librechat-data-provider` to NPM / publish-npm (push) Has been cancelled
* feat: Add GitHub skill sync

* fix: Address GitHub skill sync CI

* fix: Harden GitHub skill sync review paths

* fix: Prevent overlapping skill sync runs

* fix: Address GitHub skill sync review findings

* fix: Satisfy Git ref lint rule

* fix: Address GitHub sync review follow-ups

* fix: Match skill frontmatter closing fence

* fix: Address GitHub sync review cycle

* fix: Address GitHub sync review follow-ups

* fix: Harden GitHub skill sync worker

* fix: Format GitHub sync rollback log

* fix: Address GitHub sync review feedback

* fix: Format skill import parse handling

* fix: Coerce scalar skill frontmatter and correct scheduler timer clear

- parse: coerce numeric/boolean name and description scalars to strings instead of dropping them to empty (restores pre-refactor behavior; preserves absent-vs-empty distinction for the when-to-use fallback)
- scheduler: clear the setTimeout handle with clearTimeout rather than clearInterval
- test: cover non-string scalar frontmatter coercion

* fix: Tolerate trailing whitespace after SKILL.md opening frontmatter fence

extractFrontmatterBlock required the opening fence to be exactly '---\n', so an opener with trailing spaces/tabs (e.g. '---   \n') silently dropped all frontmatter even though the closing-fence regex already tolerates it. Match the opener with /^---[ \t]*\n/ for symmetry. Addresses Codex P3 (parse.ts:24).

* feat: Run GitHub skill sync under a per-source tenant context

Under TENANT_ISOLATION_STRICT, the sync ran with no async tenant context, so the tenant-isolation mongoose hooks threw on every Skill/SkillFile/AclEntry operation; in non-strict mode synced skills were written tenant-less and never matched tenant-scoped reads. Add an optional per-source tenantId to the skillSync config; when set, each source sync runs inside tenantStorage.run({ tenantId }) so skills, files, and public ACL grants are created and listed within that tenant, and the skill row is stamped with the tenantId for correct dedup. Sources without tenantId keep the prior single-tenant behavior. Avoids runAsSystem. Addresses Codex P2 (sync.js:70).

Lock/status/credential bookkeeping stays outside the tenant context (those collections are intentionally global).

* test: Restore dropped tenant-context coverage for GitHub skill sync

The prior commit shipped the getTenantId import in github.spec.ts without the tenant tests that use it (lost in an interrupted edit), which failed the eslint --max-warnings=0 CI job on an unused import. Restore both github.spec.ts tenant tests (tenant-scoped run stamps tenantId and executes inside the tenant ALS context; no-tenant run stays ambient) and the two config-schemas tenant tests (accepts tenantId, rejects __SYSTEM__).

* test: Restore dropped github.spec tenant-context tests

The previous commit's github.spec.ts edit did not apply (anchor mismatch), so the getTenantId import remained unused and failed eslint --max-warnings=0. Add the two tenant tests that use it: a tenant-scoped run stamps tenantId and executes inside the tenant ALS context, and a no-tenant run stays ambient.

* feat: Scope synced skill author to tenant and harden tenant-context sync

Addresses the latest Codex review on the per-source tenant change:
- makeSourceAuthorId now folds tenantId into the synthetic author hash so the
  same source mirrored into different tenants gets distinct author ids (clearer
  audits, no cross-tenant author collisions). Single-tenant author ids stay
  stable (suffix omitted when tenantId is absent).
- syncSourceInTenantContext uses an async callback per the tenant-context
  contract so the ALS store propagates across awaited Mongoose calls.
- Tests: same-source/different-tenant yields distinct authors; mirror cleanup
  is scoped to the source and deletes only its absent-upstream skills.

* fix: Repair tsc error and guard external edits in github skill sync

- Fix TS2352 in github.spec mirror-cleanup test: build the existing-skill mock via makeSkill with authorName instead of an under-typed 'as CreateSkillInput' cast (this was the failing TypeScript CI check on f00ce3c5a).
- 808: commitExistingRemoteSkillAfterFileSync re-reads to clear our own file-sync version bumps, but now compares refreshed content against the pre-sync snapshot (body/name/description/always-apply) and throws SKILL_CONFLICT on a concurrent external edit instead of overwriting it.

* docs: Note skillSync source tenantId is effectively immutable

Changing/adding/removing a source's tenantId orphans previously mirrored skills in the old tenant (a tenant-scoped sync cannot clean another tenant's data without runAsSystem, which is intentionally avoided).

* fix: Key GitHub skill upstream identity on source id and path only

Addresses Codex finding (github.ts:217): makeUpstreamId previously included owner/repo, so repointing a source to a renamed or replacement repository (same source id) changed the upstreamId, made findSkillBySourceIdentity miss the existing mirror, and then collided on the (name, author, tenantId) uniqueness constraint — leaving the source stuck failing. Identity now keys on the stable source id + root path only. The feature is unreleased, so there is no stored-id migration. Updated spec upstreamId fixtures to the new format; the existing ref-independent identity test now also covers repo moves.

* fix: Scope GitHub skill mirror deletion to the source tenant

Addresses Codex P1 (github.ts:1047/1057): an ambient source (no tenantId) runs listSkillsBySource without tenant context, which under non-strict isolation returns github-synced skills across all tenants. The mirror-deletion pass then treated other tenants' skills as absent-upstream and could delete them. Filter existingSyncedSkills to rows whose tenantId matches the source's configured tenantId (absent = its own ambient bucket) before deleting, so a sync never removes another tenant's mirrored skills. Covered by a test where an ambient run leaves a tenant-b-owned skill untouched.

* fix: Apply tenant-scoped mirror deletion implementation

The prior commit (75ccfa3fc) added the test but the source change to github.ts was lost in an interrupted edit, leaving a failing test with no implementation. This adds the actual guard: the mirror-deletion pass skips skills whose tenantId does not match the source's configured tenantId (absent = ambient bucket), so an ambient source whose listSkillsBySource returns cross-tenant rows under non-strict isolation cannot delete another tenant's mirrored skills.

* fix: Resolve global access role outside tenant context for synced skill grants

Addresses Codex P2 (github.ts:1166): default access roles (incl. skill_viewer) are seeded globally with no tenantId under runAsSystem, but a tenant-scoped sync wraps ensurePublicViewer in the source's tenant context. The PermissionService grantPermission resolved the role via a tenant-isolated AccessRole query, so the global role did not match and tenant-scoped syncs failed with 'Role skill_viewer not found'. The sync adapter now resolves the role inside runAsSystem (matching the global seed) and writes the ACL entry in the active tenant context, so the AclEntry is tenant-scoped (visible to tenant users) while the role lookup still succeeds. Covered by service tests for the resolve-vs-write split and the missing-role failure.

* fix: Strip placeholder frontmatter booleans and check skill conflict before file sync

- 1083 (github.ts:759): toCleanFrontmatter now drops a non-boolean always-apply (e.g. the 'always-apply:' / 'always-apply: # TODO' placeholder, which js-yaml yields as null). The boolean is already captured in the dedicated alwaysApply field; persisting null left ambiguous frontmatter on the synced skill.
- 1080 (github.ts:1057): for an existing mirrored skill, check for an external content edit (via getSkillById + hasExternalSkillEdit) BEFORE syncSkillFiles mutates the bundled files, so a concurrently edited skill fails fast with SKILL_CONFLICT without partial file rewrites. The post-file-sync check still guards edits that land during the file sync window.
Tests: placeholder always-apply is dropped from synced frontmatter; concurrent-edit conflict leaves files unmutated (no upsert/delete).

* fix: Harden GitHub skill sync review paths

* fix: Reuse moved GitHub skill mirrors

* fix: Scope GitHub sync identity conflicts

* test: Fix GitHub sync conflict mock typing

* fix: Support nested env-backed skill sync

* fix: Keep skill sync config base-only

* fix: Scope GitHub skill identity lookup by tenant

* fix: Harden GitHub skill sync admin gates

* fix: Guard existing skill sync permission grants

* feat: Trigger skill sync from resolved config

* fix: Scope resolved skill sync by tenant

* test: Allow manual skill sync status tenant scoping

* refactor: Extract skill sync trigger orchestrator

* test: Complete orchestrator status fixture

* chore: Bump data provider version

* fix: Restrict skill sync server credentials

* test: Complete admin skill sync status fixtures

* fix: tighten skill sync trigger safeguards

* fix: preserve alwaysApply skill sync alias

* chore: sort skill sync imports

* fix: preserve skill sync request scope

* fix: harden skill sync review edges

* refactor: move skill sync admin access to api package

* fix: add skill sync declaration return types

* fix: satisfy skill sync type checks

* fix: resolve codex skill sync review findings

* fix: harden skill sync review edges

* fix: resolve codex skill sync edge findings

* fix: satisfy API declaration build after rebase
2026-06-10 21:05:54 -04:00
Danny Avila
6bc75d24c8
️ refactor: Migrate @librechat/api build to tsdown (#13595)
* ️ refactor: Migrate @librechat/api build to tsdown

Replace Rollup with tsdown (rolldown + oxc isolated-declarations) for the
@librechat/api package build, mirroring the merged data-schemas migration.

- Add tsdown.config.mjs (cjs output, oxc dts, externalize all bare deps,
  bundle first-party `~/` + relative imports)
- Annotate exports for isolatedDeclarations (codefix-driven). Collapse the
  tokens.ts model->token maps to Record<string, Record<string, number>> and
  switch validation.ts's runtime `files` field from z.any() to z.unknown()
  so no explicit `any` is introduced
- Repoint package.json main/types/exports to tsdown's .cjs/.d.cts output
- Add src/telemetry.ts entry shim so the two index.ts entries don't collide
  in oxc's flat dts output (stable dist/telemetry.{cjs,d.cts})
- Delete rollup.config.js

Build time ~36s -> ~0.5s. No runtime behavior change: 5712 unit tests pass,
both entries load via require(), legacy /api consumes them unchanged.

* 👷 ci: Hash packages/api/tsdown.config.mjs in build-api cache keys

The build-api cache keys hashed `packages/api/server-rollup.config.js`,
which never existed (api used `rollup.config.js`, now removed) — a copy-paste
artifact from the data-provider key that matched no file. Replace it with the
new `packages/api/tsdown.config.mjs` so edits to the build config (entry,
format, externals) bust the api build cache, matching the data-schemas key.
2026-06-08 10:54:48 -04:00
Danny Avila
aeb5adff34
🪦 fix: Add Durable MCP Config Tombstones (#13534)
* fix: add durable MCP config tombstones

* fix: preserve scoped config tombstones

* fix: clean up config tombstone lint

* fix: handle empty model spec skill allowlist

* fix: preserve inactive config tombstones
2026-06-05 15:05:40 -04:00
Dustin Healy
2bcf3e8582
🪟 fix: Apply Admin-Panel Config Overrides To YAML-Defined MCP Servers (#13173)
Some checks are pending
Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run
Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run
GitNexus Index / index (push) Waiting to run
GitNexus Index / post-index (push) Blocked by required conditions
* 🪟 fix: Apply Admin-Panel Config Overrides To YAML-Defined MCP Servers

Admin-panel saves of MCP server fields for YAML-defined servers were
silently dropped by the registry. ensureConfigServers filtered out any
merged config entry whose name appeared in YAML, so overrides such as
iconPath, title, and description never reached getAllServerConfigs even
though the override row had been written to the configs collection and
the AppConfig merge layer had produced the correct merged result.

The filter is removed and replaced with a content-equivalence
short-circuit in ensureSingleConfigServer. YAML-defined servers whose
merged config matches the YAML cache entry skip lazy-init, so unmodified
YAML servers still avoid a redundant inspection round trip. The new
private helper matchesYamlConfig reuses the existing content-hash
function on configurable fields only.

getAllServerConfigs now overlays config-tier entries onto the YAML base
while preserving user-DB entries (source: 'user'), giving precedence of
YAML, then Config tier, then User DB. The docstring is updated to
describe the new order.

Multi-tenancy is already enforced upstream of the registry by the
AppConfig layer, so the registry stays tenant-agnostic and overrides
remain isolated per tenant.

Tests cover the new behavior: config-tier override on YAML-defined
server flows through to getAllServerConfigs, YAML servers without
effective overrides skip lazy-init, user-DB entries win over
config-tier overlays, pure config-tier servers still lazy-init, and the
merged config passed to lazy-init preserves all YAML fields when the
override only adds new ones.

* 🛠️ fix: Address Review Feedback For YAML Override Precedence

Both Copilot and Codex flagged matchesYamlConfig as broken in
production: the cached YAML config carries inspector-derived defaults
(requiresOAuth defaulted to false when YAML omits it, serverInstructions
rewritten from the YAML toggle to the fetched server-instructions
string, and so on) that are absent from appConfig.mcpConfig. The
content-hash comparison reports a mismatch for every YAML server with
no admin-panel override, so ensureSingleConfigServer re-inspects all of
them anyway. The optimization never fires in practice and reintroduces
the source tagging it was supposed to avoid.

Remove the short-circuit and the matchesYamlConfig helper. YAML-defined
servers that appear in the merged config go through lazy-init like any
other entry. A smarter optimization that overlays cosmetic-only
override fields onto the YAML cache without re-inspection belongs in a
follow-up once the boundary between configurable fields and
inspector-derived fields is well defined.

Update the existing ensureConfigServers tests that asserted the old
filter behavior (should exclude YAML servers from config-source
detection, should return empty when all servers are YAML) to assert
the new behavior: YAML servers pass through ensureConfigServers and
are lazy-initialized when they appear in the merged config. Make the
inspector mock more realistic by spreading the raw input first and
overlaying only runtime fields, so the test fixtures match production
where the inspector preserves configurable fields. Drop the companion
test in MCPServersRegistry.test.ts that asserted the short-circuit
fires for unchanged YAML servers; the hand-crafted fixture skipped
inspector defaults and was not representative.

Copilot also flagged that getServerConfig short-circuits to
configServers before checking the user DB, so the precedence enforced
in getAllServerConfigs (user-DB beats config-tier) was bypassed in the
single-server lookup. When configServers carries an entry and a userId
is available, check the user DB first and prefer a source: 'user'
entry so per-user servers are never shadowed by an admin-panel
override.

* 🛡️ fix: Harden Admin Override Overlay Against Failure Stubs

Hardens the admin-panel override path against transient inspect failures
and removes a defensive branch that guarded an impossible state.

The getAllServerConfigs overlay now skips failed-inspection stubs so a
healthy YAML or DB entry stays visible during the 5-minute retry window
instead of being clobbered by a stub. When the overlay does land, the
base entry's source tier is preserved, which keeps Tools/mcp.js routing
its failed-inspection recovery to the correct storage location.

The user-DB precedence block in getServerConfig is removed: configServers
is built from appConfig.mcpConfig which only ever carries admin-tier
entries, so the DB lookup defended a state that cannot occur via the
current call graph. The dead yamlServerNames memoization is also gone.

Adds two regression tests covering inspection-failure preservation and
source-tier preservation on successful overlay, and adds a debug log
when an admin override is suppressed by a user-tier entry. The
makeParsedConfig test factory now honors overrides correctly.

* 🧪 test: Strengthen Admin Override Coverage And Docs

Adds an end-to-end regression test that chains MCPServerInspector.inspect
failure through ensureConfigServers and getAllServerConfigs, asserting
the healthy YAML base entry survives a transient inspect failure
intact. The previous regression test hand-built a failure stub and
skipped ensureSingleConfigServer, leaving the production chain itself
untested.

The getAllServerConfigs docstring now spells out both overlay guards
(failed-stub skip and user-tier preservation) and the source-field
preservation contract that downstream recovery logic depends on.
The yamlLangfuseConfig test fixture is frozen so a future test cannot
mutate it and contaminate sibling tests in the describe block.

* 🔧 fix: Skip Lazy-Init For Unchanged YAML MCP Servers

Adds an admin-configurable-field equivalence check so YAML-defined MCP
servers that carry no admin override skip lazy-init in
ensureConfigServers. This avoids the per-request inspect storm and
keeps unmodified YAML servers out of the config-tier cache, so admin
saves that touch unrelated overrides no longer evict and tear down
those YAML connections.

A second guard in getServerConfig prevents failed-inspection stubs in
configServers from shadowing the healthy YAML base entry for the
duration of the retry window. The aggregate path already had this
guard via getAllServerConfigs; this brings the single-server path to
parity, so Tools/mcp.js recovery routes to YAML reinspection rather
than bailing on the config retry timer.

Adds three regression tests covering the unmodified-YAML skip, the
admin-override lazy-init trigger, and the failure-stub fallthrough.
Updates two existing ensureConfigServers tests that previously
documented the now-incorrect "always lazy-init YAML" behavior.

* 🔓 feat: Expose baseOnly Flag On Admin Config Base Endpoint

The admin getBaseConfig handler now reads req.query.baseOnly and
forwards it to getAppConfig so an admin panel client can request the
un-merged YAML and AppService base configuration without DB overrides
applied. The flag is opt-in; existing callers see no behaviour change
because the default remains the merged response.

The query value is coerced through String() so Express array forms
like baseOnly=true&baseOnly=true are treated as false rather than
truthy by accident. A handler test pins the forwarding behaviour and
the default-merged behaviour against future regressions.

* 🧹 fix: Address Codex Review Findings On MCP Registry Precedence Path

Four follow-ups from the Codex review of PR #13173:

P1. getServerConfig now preserves the configServers candidate as a
last-resort fallback when both YAML cache and user DB return nothing,
so admin-defined config-only servers carrying inspectionFailed=true
still surface the failure stub to callers in api/server/services/Tools/mcp.js
that rely on it to return the still-unreachable message. The
not-found memoization is preserved.

P2a. proxy is added to ADMIN_CONFIGURABLE_FIELDS so an admin override
on SSE/streamable-http proxy is no longer treated as an unchanged YAML
server and correctly triggers lazy-init.

P2b. isUnmodifiedYamlServer now treats absent-on-rawConfig fields as
equal, so inspector-derived values on the cached YAML entry
(notably requiresOAuth filled in by detectOAuth at startup) do not
force unmodified YAML servers to re-init on every request.

P3. getBaseConfig parses ?baseOnly strictly against the literal string
true instead of String-coercing, so array shapes like baseOnly[]=true
no longer pass through.

Regression tests cover all four paths.

* 🧹 fix: Drop Misleading Shadow Warning On Config Vs User-DB Collisions

The Config-tier branch of warnOnOperatorManagedNameCollisions logged
that Config MCP servers shadow DB-backed servers, but
getAllServerConfigs actually preserves the user-tier entry on a
Config-vs-user collision and skips the override. The warning was
describing the opposite of what the code does and would mislead
operational debugging.

The YAML-tier call is unchanged because YAML still legitimately
shadows DB-backed servers. The per-entry debug log inside the
collision branch already captures the actual outcome.

Test renamed and rewritten to assert the user-tier entry is
preserved and no shadow warning is emitted.

* 🔒 fix: Keep Tenant-Scoped configServers Candidate Out Of The Global Read-Through Cache

The prior fix for surfacing inspectionFailed stubs from admin-defined
config-only servers wrote the per-call configServers candidate into
readThroughCache when YAML and DB both missed. The cache key is keyed
by serverName plus userId, so a failed stub from one tenant could
satisfy a later no-userId lookup made by another tenant before any
configServers resolution ran.

getServerConfig now caches only the global YAML/DB resolution (still
caching undefined to memoize not-found lookups) and uses the candidate
strictly as an unmemoized function-level fallback that surfaces the
failure stub to the caller without leaking it across tenants.

Regression test exercises a no-userId call after a tenant-scoped
failure and asserts the cache returns undefined rather than the
stub, and that a second tenant sees their own healthy candidate.

* 🔄 fix: Mirror getAllServerConfigs Precedence Exactly In getServerConfig

getServerConfig was short-circuiting with the configServers candidate
on every healthy lookup, which made single-server callers diverge from
the aggregate path for name collisions between config-tier overrides
and user-DB entries. The aggregate path preserves the user-tier entry
on such collisions, so single-server callers saw the admin override
while list views saw the user server for the same name.

getServerConfig now resolves the YAML/DB base first and applies the
same four-step precedence used in getAllServerConfigs:

  1. user-tier base wins absolutely over a config-tier candidate
  2. healthy YAML/DB base wins over a failed (inspectionFailed)
     candidate
  3. healthy candidate overlays its fields onto the base, preserving
     the base entry's source tag so downstream recovery routes to the
     correct storage location
  4. with no base, the candidate is returned as-is for config-only
     servers

readThroughCache still memoizes only the global YAML/DB lookup, so
the per-call configServers candidate never enters the cache and the
tenant-isolation guarantee from the previous fix is preserved.

Regression tests cover the user-wins-over-config case and the
YAML-overlay-with-yaml-source-preserved case.

*  perf: Batch YAML Cache Read In ensureConfigServers

isUnmodifiedYamlServer was calling cacheConfigsRepo.get(serverName)
per entry. In the Redis aggregate-key backend, get() is implemented
as getAll() then map lookup, so N concurrent per-server lookups
inflate into N full-map reads and deserializations on every
ensureConfigServers pass.

The loop now takes a single getAll() snapshot at the top and hands
it into a synchronous isUnmodifiedYamlServer helper, turning O(n)
remote reads into O(1) regardless of how many MCP entries are
resolved. The snapshot also gives the unchanged-YAML comparison
one consistent view of YAML across all entries.

Regression test spies on cacheConfigsRepo.get and asserts it is
never called from ensureConfigServers, with getAll called exactly
once.
2026-05-23 17:11:13 -04:00
Danny Avila
ea28dbfa89
🧹 chore: Clean Up Config Fields (#12537)
* chore: remove unused `interface.endpointsMenu` config field

* chore: address review — restore JSDoc UI-only example, add Zod strip test

* chore: remove unused `interface.sidePanel` config field

* chore: restrict fileStrategy/fileStrategies schema to valid storage backends

* fix: use valid FileStorage value in AppService test

* chore: address review — version bump, exhaustiveness guard, JSDoc, configSchema test

* chore: remove debug logger.log from MessageIcon render path

* fix: rewrite MessageIcon render tests to use render counting instead of logger spying

* chore: bump librechat-data-provider to 0.8.407

* chore: sync example YAML version to 1.3.7
2026-04-03 12:22:58 -04:00
Danny Avila
f8405e731b
🗂️ fix: Allow Empty-Overrides Scope Creation in Admin Config (#12492)
* fix: Allow empty-overrides scope creation when priority is provided

The upsertConfigOverrides handler short-circuited when overrides was
empty, returning a plain message instead of creating the config document.
This broke the admin panel's "create blank scope" flow which sends
`{ overrides: {}, priority: N }` — the missing `config` property in the
response caused an `_id` error on the client.

The early return now only triggers when both overrides are empty and no
priority is provided. Per-section permission checks are scoped to cases
where override sections are actually present.

* test: Add tests for empty-overrides scope creation with priority

* test: Address review nits for empty-overrides scope tests

- Add res.statusCode/res.body assertions to capability-check test
- Add 403/401 tests for empty overrides + priority path
- Use mockResolvedValue(null) for consistency on bare jest.fn()
- Remove narrating comment; fold intent into test name
2026-03-31 21:46:48 -04:00
Dustin Healy
2451bf54cf
🛡️ fix: Restrict System Grants to Role Principals (#12491)
* 🛡️ fix: restrict system grants to role principals only

Narrows GrantPrincipalType to PrincipalType.ROLE, rejecting GROUP and
USER with 400. Removes grant cascade cleanup from group/user deletion
handlers and their route wiring since only roles can hold grants.

* 🛡️ fix: address review findings for grants roles-only restriction

Add missing GROUP rejection test for revokeGrant (symmetric with
getPrincipalGrants and assignGrant coverage), add extensibility comment
to GrantPrincipalType, and document the checkRoleExists guard.
2026-03-31 19:25:14 -04:00
Danny Avila
c0ce7fee91
🚫 refactor: Remove Interface Config from Override Processing (#12473)
Add INTERFACE_PERMISSION_FIELDS set defining the interface fields that
seed role permissions at startup (prompts, agents, marketplace, etc.).
These fields are now stripped from DB config overrides in the merge
layer because updateInterfacePermissions() only runs at boot — DB
overrides for these fields create a client/server permission mismatch.

Pure UI fields (endpointsMenu, modelSelect, parameters, presets,
sidePanel, customWelcome, etc.) continue to work in overrides as
before.

YAML startup path is completely unaffected.
2026-03-31 11:07:31 -04:00
Dustin Healy
3d1b883e9d
👨‍👨‍👦‍👦 feat: Admin Users API Endpoints (#12446)
* feat: add admin user management endpoints

Add /api/admin/users with list, search, and delete handlers gated by
ACCESS_ADMIN + READ_USERS/MANAGE_USERS system grants. Handler factory
in packages/api uses findUsers, countUsers, and deleteUserById from
data-schemas.

* fix: address convention violations in admin users handlers

* fix: add pagination, self-deletion guard, and DB-level search limit

- listUsers now uses parsePagination + countUsers for proper pagination
  matching the roles/groups pattern
- findUsers extended with optional limit/offset options
- deleteUser returns 403 when caller tries to delete own account
- searchUsers passes limit to DB query instead of fetching all and
  slicing in JS
- Fix import ordering per CLAUDE.md, complete logger mock
- Replace fabricated date fallback with undefined

* fix: deterministic sort, null-safe pagination, consistent search filter

- Add sort option to findUsers; listUsers sorts by createdAt desc for
  deterministic pagination
- Use != null guards for offset/limit to handle zero values correctly
- Remove username from search filter since it is not in the projection
  or AdminUserSearchResult response type

* fix: last-admin deletion guard and search query max-length

- Prevent deleting the last admin user (look up target role, count
  admins, reject with 400 if count <= 1)
- Cap search query at 200 characters to prevent regex DoS
- Add tests for both guards

* fix: include missing capability name in 403 Forbidden response

* fix: cascade user deletion cleanup, search username, parallel capability checks

- Cascade Config, AclEntry, and SystemGrant cleanup on user deletion
  (matching the pattern in roles/groups handlers)
- Add username to admin search $or filter for parity with searchUsers
- Parallelize READ_* capability checks in listAllGrants with Promise.all

* fix: TOCTOU safety net, capability info leak, DRY/style cleanup, data-layer tests

- Add post-delete admin recount with CRITICAL log if race leaves 0 admins
- Revert capability name from 403 response to server-side log only
- Document thin deleteUserById limitation (full cascade is a future task)
- DRY: extract query.trim() to local variable in searchUsersHandler
- Add username to search projection, response type, and AdminUserSearchResult
- Functional filter/map in grants.ts parallel capability check
- Consistent null guards and limit>0 guard in findUsers options
- Fallback for empty result.message on delete response
- Fix mockUser() to generate unique _id per call
- Break long destructuring across multiple lines
- Assert countUsers filter and non-admin skip in delete tests
- Add data-layer tests for findUsers limit, offset, sort, and pagination

* chore: comment out admin delete user endpoint (out of scope)

* fix: cast USER principalId to ObjectId for ACL entry cleanup

ACL entries store USER principalId as ObjectId (via grantPermission casting),
but deleteAclEntries is a raw deleteMany that passes the filter through.
Passing a string won't match stored ObjectIds, leaving orphaned entries.

* chore: comment out unused requireManageUsers alongside disabled delete route

* fix: add missing logger.warn mock in capabilities test

* fix: harden admin users handlers — type safety, response consistency, test coverage

- Unify response shape: AdminUserSearchResult.userId → id, add AdminUserListItem type
- Fix unsafe req.query type assertion in searchUsersHandler (typeof guards)
- Anchor search regex with ^ for prefix matching (enables index usage)
- Add total/capped to search response for truncation signaling
- Add parseInt radix, remove redundant new Date() wrap
- Add tests: countUsers throw, countUsers call args, array query param, capped flag

* fix: scope deleteGrantsForPrincipal to tenant, deterministic search sort, align test mocks

- Add tenantId option to AdminUsersDeps.deleteGrantsForPrincipal and
  pass req.user.tenantId at the call site, matching the pattern already
  used by the roles and groups handlers
- Add sort: { name: 1 } to searchUsersHandler for deterministic results
- Align test mock deleteUserById messages with production output
  ('User was deleted successfully.')
- Make capped-results test explicitly set limit: '20' instead of
  relying on the implicit default

* test: add tenantId propagation test for deleteGrantsForPrincipal

Add tenantId to createReqRes user type and test that a non-undefined
tenantId is threaded through to deleteGrantsForPrincipal.

* test: remove redundant deleteUserById override in tenantId test

---------

Co-authored-by: Danny Avila <danny@librechat.ai>
2026-03-30 23:06:50 -04:00
Dustin Healy
a4a17ac771
⛩️ feat: Admin Grants API Endpoints (#12438)
* feat: add System Grants handler factory with tests

Handler factory with 4 endpoints: getEffectiveCapabilities (expanded
capability set for authenticated user), getPrincipalGrants (list grants
for a specific principal), assignGrant, and revokeGrant. Write ops
dynamically check MANAGE_ROLES/GROUPS/USERS based on target principal
type. 31 unit tests covering happy paths, validation, 403, and errors.

* feat: wire System Grants REST routes

Mount /api/admin/grants with requireJwtAuth + ACCESS_ADMIN gate.
Add barrel export for createAdminGrantsHandlers and AdminGrantsDeps.

* fix: cascade grant cleanup on role deletion

Add deleteGrantsForPrincipal to AdminRolesDeps and call it in
deleteRoleHandler via Promise.allSettled after successful deletion,
matching the groups cleanup pattern. 3 tests added for cleanup call,
skip on 404, and resilience to cleanup failure.

* fix: simplify cascade grant cleanup on role deletion

Replace Promise.allSettled wrapper with a direct try/catch for the
single deleteGrantsForPrincipal call.

* fix: harden grant handlers with auth, validation, types, and RESTful revoke

- Add per-handler auth checks (401) and granular capability gates
  (READ_* for getPrincipalGrants, possession check for assignGrant)
- Extract validatePrincipal helper; rewrite validateGrantBody to use
  direct type checks instead of unsafe `as string` casts
- Align DI types with data layer (ResolvedPrincipal.principalType
  widened to string, getUserPrincipals role made optional)
- Switch revoke route from DELETE body to RESTful URL params
- Return 201 for assignGrant to match roles/groups create convention
- Handle null grantCapability return with 500
- Add comprehensive test coverage for new auth/validation paths

* fix: deduplicate ResolvedPrincipal, typed body, defensive auth checks

- Remove duplicate ResolvedPrincipal from capabilities.ts; import the
  canonical export from grants.ts
- Replace Record<string, unknown> with explicit GrantRequestBody interface
- Add defensive 403 when READ_CAPABILITY_BY_TYPE lookup misses
- Document revoke asymmetry (no possession check) with JSDoc
- Use _id only in resolveUser (avoid Mongoose virtual reliance)
- Improve null-grant error message
- Complete logger mock in tests

* refactor: move ResolvedPrincipal to shared types to fix circular dep

Extract ResolvedPrincipal from admin/grants.ts to types/principal.ts
so middleware/capabilities.ts imports from shared types rather than
depending upward on the admin handler layer.

* chore: remove dead re-export, align logger mocks across admin tests

- Remove unused ResolvedPrincipal re-export from grants.ts (canonical
  source is types/principal.ts)
- Align logger mocks in roles.spec.ts and groups.spec.ts to include
  all log levels (error, warn, info, debug) matching grants.spec.ts

* fix: cascade Config and AclEntry cleanup on role deletion

Add deleteConfig and deleteAclEntries to role deletion cascade,
matching the group deletion pattern. Previously only grants were
cleaned up, leaving orphaned config overrides and ACL entries.

* perf: single-query batch for getEffectiveCapabilities

Add getCapabilitiesForPrincipals (plural) to the data layer — a single
$or query across all principals instead of N+1 parallel queries. Wire
it into the grants handler so getEffectiveCapabilities hits the DB once
regardless of how many principals the user has.

* fix: defer SystemCapabilities access to factory call time

Move all SystemCapabilities usage (VALID_CAPABILITIES,
MANAGE_CAPABILITY_BY_TYPE, READ_CAPABILITY_BY_TYPE) inside the
createAdminGrantsHandlers factory. External test suites that mock
@librechat/data-schemas without providing SystemCapabilities crashed
at import time when grants.ts was loaded transitively.

* test: add data-layer and handler test coverage for review findings

- Add 6 mongodb-memory-server tests for getCapabilitiesForPrincipals:
  multi-principal batch, empty array, filtering, tenant scoping
- Add handler test: all principals filtered (only PUBLIC)
- Add handler test: granting an implied capability succeeds
- Add handler test: all cascade cleanup operations fail simultaneously
- Document platform-scope-only tenantId behavior in JSDoc

* fix: resolveUser fallback to user.id, early-return empty principals

- Match capabilities middleware pattern: _id?.toString() ?? user.id
  to handle JWT-deserialized users without Mongoose _id
- Move empty-array guard before principals.map() to skip unnecessary
  normalizePrincipalId calls
- Add comment explaining VALID_PRINCIPAL_TYPES module-scope asymmetry

* refactor: derive VALID_PRINCIPAL_TYPES from capability maps

Make MANAGE_CAPABILITY_BY_TYPE and READ_CAPABILITY_BY_TYPE
non-Partial Records over a shared GrantPrincipalType union, then
derive VALID_PRINCIPAL_TYPES from the map keys. This makes divergence
between the three data structures structurally impossible.

* feat: add GET /api/admin/grants list-all-grants endpoint

Add listAllGrants data-layer method and handler so the admin panel
can fetch all grants in a single request instead of fanning out
N+M calls per role and group. Response is filtered to only include
grants for principal types the caller has read access to.

* fix: update principalType to use GrantPrincipalType for consistency in grants handling

- Refactor principalType in createAdminGrantsHandlers to use GrantPrincipalType instead of PrincipalType for better type accuracy.
- Ensure type consistency across the grants handling logic in the API.

* fix: address admin grants review findings — tenantId propagation, capability validation, pagination, and test coverage

Propagate tenantId through all grant operations for multi-tenancy support.
Extract isValidCapability to accept full SystemCapability union (base, section,
assign) and reuse it in both Mongoose schema validation and handler input checks.
Replace listAllGrants with paginated listGrants + countGrants. Filter PUBLIC
principals from getCapabilitiesForPrincipals queries. Export getCachedPrincipals
from ALS store for fast-path principal resolution. Move DELETE capability param
to query string to avoid colon-in-URL issues. Remove dead code and add
comprehensive handler and data-layer test coverage.

* refactor: harden admin grants — FilterQuery types, auth-first ordering, DELETE path param, isValidCapability tests

Replace Record<string, unknown> with FilterQuery<ISystemGrant> across all
data-layer query filters. Refactor buildTenantFilter to a pure tenantCondition
function that returns a composable FilterQuery fragment, eliminating the $or
collision between tenant and principal queries. Move auth check before input
validation in getPrincipalGrantsHandler, assignGrantHandler, and
revokeGrantHandler to avoid leaking valid type names to unauthenticated callers.
Switch DELETE route from query param back to path param (/:capability) with
encodeURIComponent per project conventions. Add compound index for listGrants
sort. Type VALID_PRINCIPAL_TYPES as Set<GrantPrincipalType>. Remove unused
GetCachedPrincipalsFn type export. Add dedicated isValidCapability unit tests
and revokeGrant idempotency test.

* refactor: batch capability checks in listGrantsHandler via getHeldCapabilities

Replace 3 parallel hasCapabilityForPrincipals DB calls with a single
getHeldCapabilities query that returns the subset of capabilities any
principal holds. Also: defensive limit(0) clamp, parallelized assignGrant
auth checks, principalId type-vs-required error split, tenantCondition
hoisted to factory top, JSDoc on cascade deps, DELETE route encoding note.

* fix: normalize principalId and filter undefined in getHeldCapabilities

Add normalizePrincipalId + null guard to getHeldCapabilities, matching
the contract of getCapabilitiesForPrincipals. Simplify allCaps build
with flatMap, add no-tenantId cross-check and undefined-principalId
test cases.

* refactor: use concrete types in GrantRequestBody, rename encoding test

Replace unknown fields with explicit string types in GrantRequestBody,
matching the established pattern in roles/groups/config handlers. Rename
misleading 'encoded' test to 'with colons' since Express auto-decodes
req.params.

* fix: support hierarchical parent capabilities in possession checks

hasCapabilityForPrincipals and getHeldCapabilities now resolve parent
base capabilities for section/assignment grants. An admin holding
manage:configs can now grant manage:configs:<section> and transitively
read:configs:<section>. Fixes anti-escalation 403 blocking config
capability delegation.

* perf: use getHeldCapabilities in assignGrant to halve DB round-trips

assignGrantHandler was making two parallel hasCapabilityForPrincipals
calls to check manage + capability possession. getHeldCapabilities was
introduced in this PR specifically for this pattern. Replace with a
single batched call. Update corresponding spec assertions.

* fix: validate role existence before granting capabilities

Grants for non-existent role names were silently persisted, creating
orphaned grants that could surprise-activate if a role with that name
was later created. Add optional checkRoleExists dep to assignGrant and
wire it to getRoleByName in the route file.

* refactor: tighten principalType typing and use grantCapability in tests

Narrow getCapabilitiesForPrincipals parameter from string to
PrincipalType, removing the redundant cast. Replace direct
SystemGrant.create() calls in getCapabilitiesForPrincipals tests with
methods.grantCapability() to honor the schema's normalization invariant.
Add getHeldCapabilities extended capability tests.

* test: rename misleading cascade cleanup test name

The test only injects failure into deleteGrantsForPrincipal, not all
cascade operations. Rename from 'cascade cleanup fails' to 'grant
cleanup fails' to match the actual scope.

* fix: reorder role check after permission guard, add tenantId to index

Move checkRoleExists after the getHeldCapabilities permission check so
that a sub-MANAGE_ROLES admin cannot probe role name existence via
400 vs 403 response codes.

Add tenantId to the { principalType, capability } index so listGrants
queries in multi-tenant deployments can use a covering index instead
of post-scanning for tenant condition.

Add missing test for checkRoleExists throwing.

* fix: scope deleteGrantsForPrincipal to tenant on role deletion

deleteGrantsForPrincipal previously filtered only on principalType +
principalId, deleting grants across all tenants. Since the role schema
supports multi-tenancy (compound unique index on name + tenantId), two
tenants can share a role name like 'editor'. Deleting that role in one
tenant would wipe grants for identically-named roles in other tenants.

Add optional tenantId parameter to deleteGrantsForPrincipal. When
provided, scopes the delete to that tenant plus platform-level grants.
Propagate req.user.tenantId through the role deletion cascade.

* fix: scope grant cleanup to tenant on group deletion

Same cross-tenant gap as the role deletion path: deleteGroupHandler
called deleteGrantsForPrincipal without tenantId, so deleting a group
would wipe its grants across all tenants. Extract req.user.tenantId
and pass it through.

* test: add HTTP integration test for admin grants routes

Supertest-based test with real MongoMemoryServer exercising the full
Express wiring: route registration, injected auth middleware, handler
DI deps, and real DB round-trips.

Covers GET /, GET /effective, POST / + DELETE / lifecycle, role
existence validation, and 401 for unauthenticated callers.

Also documents the expandImplications scope: the /effective endpoint
returns base-level capabilities only; section-level resolution is
handled at authorization check time by getParentCapabilities.

* fix: use exact tenant match in deleteGrantsForPrincipal, normalize principalId, harden API

CRITICAL: deleteGrantsForPrincipal was using tenantCondition (a
read-query helper) for deleteMany, which includes the
{ tenantId: { $exists: false } } arm. This silently destroyed
platform-level grants when a tenant-scoped role/group deletion
occurred. Replace with exact { tenantId } match for deletes so
platform-level grants survive tenant-scoped cascade cleanup.

Refactor deleteGrantsForPrincipal signature from fragile positional
overload (sessionOrTenantId union + maybeSession) to a clean options
object: { tenantId?, session? }. Update all callers and test assertions.

Add normalizePrincipalId to hasCapabilityForPrincipals to match the
pattern already used by getHeldCapabilities — prevents string/ObjectId
type mismatch on USER/GROUP principal queries.

Also: export GrantPrincipalType from barrel, add upper-bound cap to
listGrants, document GROUP/USER existence check trade-off, add
integration tests for tenant-isolation property of deleteGrantsForPrincipal.

* fix: forward tenantId to getUserPrincipals in resolvePrincipals

resolvePrincipals had tenantId available from the caller but only
forwarded it to getCachedPrincipals (cache lookup). The DB fallback
via getUserPrincipals omitted it. While the Group schema's
applyTenantIsolation Mongoose plugin handles scoping via
AsyncLocalStorage in HTTP request context, explicitly passing tenantId
makes the contract visible and prevents silent cross-tenant group
resolution if called outside request context.

* fix: remove unused import and add assertion to 401 integration test

Remove unused SystemCapabilities import flagged by ESLint. Add explicit
body assertion to the 401 test so it has a jest expect() call.

* chore: hoist grant limit constants to scope, remove dead isolateModules

Move GRANTS_DEFAULT_LIMIT / GRANTS_MAX_LIMIT from inside listGrants
function body to createSystemGrantMethods scope so they are evaluated
once at module load. Remove dead jest.isolateModules + jest.doMock
block in integration test — the ~/models mock was never exercised
since handlers are built with explicit DI deps.

---------

Co-authored-by: Danny Avila <danny@librechat.ai>
2026-03-30 16:49:23 -04:00
Danny Avila
fda1bfc3cc
🔬 ci: Add TypeScript Type Checks to Backend Workflow and Fix All Type Errors (#12451)
* fix(data-schemas): resolve TypeScript strict type check errors in source files

- Constrain ConfigSection to string keys via `string & keyof TCustomConfig`
- Replace broken `z` import from data-provider with TCustomConfig derivation
- Add `_id: Types.ObjectId` to IUser matching other Document interfaces
- Add `federatedTokens` and `openidTokens` optional fields to IUser
- Type mongoose model accessors as `Model<IRole>` and `Model<IUser>`
- Widen `getPremiumRate` param to accept `number | null`
- Widen `bulkWriteAclEntries` ops to untyped `AnyBulkWriteOperation[]`
- Fix `getUserPrincipals` return type to use `PrincipalType` enum
- Add non-null assertions for `connection.db` in migration files
- Import DailyRotateFile constructor directly instead of relying on
  broken module augmentation across mismatched node_modules trees
- Add winston-daily-rotate-file as devDependency for type resolution

* fix(data-schemas): resolve TypeScript type errors in test files

- Replace arbitrary test keys with valid TCustomConfig properties in config.spec
- Use non-null assertions for permission objects in role.methods.spec
- Replace `.SHARED_GLOBAL` access with `.not.toHaveProperty()` for legacy field
- Add non-null assertions for balance, writeRate, readRate in spendTokens.spec
- Update mock user _id to use ObjectId in user.test
- Remove unused Schema import in tenantIndexes.spec

* fix(api): resolve TypeScript strict type check errors across source and test files

- Widen getUserPrincipals dep type in capabilities middleware
- Fix federatedTokens type in createSafeUser return
- Use proper mock req type for read-only properties in preAuthTenant.spec
- Replace `as IUser` casts with ObjectId-typed mocks in openid/oidc specs
- Use TokenExchangeMethodEnum values instead of string literals in MCP specs
- Fix SessionStore type compatibility in sessionCache specs
- Replace `catch (error: any)` with `(error as Error)` in redis specs
- Remove invalid properties from test data in initialize and MCP specs
- Add String.prototype.isWellFormed declaration for sanitizeTitle spec

* fix(client): resolve TypeScript type errors in shared client components

- Add default values for destructured bindings in OGDialogTemplate
- Replace broken ExtendedFile import with inline type in FileIcon

* ci: add TypeScript type-check job to backend review workflow

Add a `typecheck` job that runs `tsc --noEmit` on all four TypeScript
workspaces (data-provider, data-schemas, @librechat/api, @librechat/client)
after the build step. Catches type errors that rollup builds may miss.

* fix(data-schemas): add local type declaration for DailyRotateFile transport

The `winston-daily-rotate-file` package ships a module augmentation for
`winston/lib/winston/transports`, but it fails when winston and
winston-daily-rotate-file resolve from different node_modules trees
(which happens in this monorepo due to npm hoisting).

Add a local `.d.ts` declaration that augments the same module path from
within data-schemas' compilation unit, so `tsc --noEmit` passes while
keeping the original runtime pattern (`new winston.transports.DailyRotateFile`).

* fix: address code review findings from PR #12451

- Restore typed `AnyBulkWriteOperation<AclEntry>[]` on bulkWriteAclEntries,
  cast to untyped only at the tenantSafeBulkWrite call site (Finding 1)
- Type `findUser` model accessor consistently with `findUsers` (Finding 2)
- Replace inline `import('mongoose').ClientSession` with top-level import type
- Use `toHaveLength` for spy assertions in playwright-expect spec file
- Replace numbered Record casts with `.not.toHaveProperty()` in
  role.methods.spec for SHARED_GLOBAL assertions
- Use per-test ObjectIds instead of shared testUserId in openid.spec
- Replace inline `import()` type annotations with top-level SessionData
  import in sessionCache spec
- Remove extraneous blank line in user.ts searchUsers

* refactor: address remaining review findings (4–7)

- Extract OIDCTokens interface in user.ts; deduplicate across IUser fields
  and oidc.ts FederatedTokens (Finding 4)
- Move String.isWellFormed declaration from spec file to project-level
  src/types/es2024-string.d.ts (Finding 5)
- Replace verbose `= undefined` defaults in OGDialogTemplate with null
  coalescing pattern (Finding 6)
- Replace `Record<string, unknown>` TestConfig with named interface
  containing explicit test fields (Finding 7)
2026-03-28 21:06:39 -04:00
Dustin Healy
5972a21479
🪪 feat: Admin Roles API Endpoints (#12400)
* feat: add createRole and deleteRole methods to role

* feat: add admin roles handler factory and Express routes

* fix: address convention violations in admin roles handlers

* fix: rename createRole/deleteRole to avoid AccessRole name collision

The existing accessRole.ts already exports createRole/deleteRole for the
AccessRole model. In createMethods index.ts, these are spread after
roleMethods, overwriting them. Renamed our Role methods to
createRoleByName/deleteRoleByName to match the existing pattern
(getRoleByName, updateRoleByName) and avoid the collision.

* feat: add description field to Role model

- Add description to IRole, CreateRoleRequest, UpdateRoleRequest types
- Add description field to Mongoose roleSchema (default: '')
- Wire description through createRoleHandler and updateRoleHandler
- Include description in listRoles select clause so it appears in list

* fix: address Copilot review findings in admin roles handlers

* test: add unit tests for admin roles and groups handlers

* test: add data-layer tests for createRoleByName, deleteRoleByName, listUsersByRole

* fix: allow system role updates when name is unchanged

The updateRoleHandler guard rejected any request where body.name matched
a system role, even when the name was not being changed. This blocked
editing a system role's description. Compare against the URL param to
only reject actual renames to reserved names.

* fix: address external review findings for admin roles

- Block renaming system roles (ADMIN/USER) and add user migration on rename
- Add input validation: name max-length, trim on update, duplicate name check
- Replace fragile String.includes error matching with prefix-based classification
- Catch MongoDB 11000 duplicate key in createRoleByName
- Add pagination (limit/offset/total) to getRoleMembersHandler
- Reverse delete order in deleteRoleByName — reassign users before deletion
- Add role existence check in removeRoleMember; drop unused createdAt select
- Add Array.isArray guard for permissions input; use consistent ?? coalescing
- Fix import ordering per AGENTS.md conventions
- Type-cast mongoose.models.User as Model<IUser> for proper TS inference
- Add comprehensive tests: rename guards, pagination, validation, 500 paths

* fix: address re-review findings for admin roles

- Gate deleteRoleByName on existence check — skip user reassignment and
  cache invalidation when role doesn't exist (fixes test mismatch)
- Reverse rename order: migrate users before renaming role so a migration
  failure leaves the system in a consistent state
- Add .sort({ _id: 1 }) to listUsersByRole for deterministic pagination
- Import shared AdminMember type from data-schemas instead of local copy;
  make joinedAt optional since neither groups nor roles populate it
- Change IRole.description from optional to required to match schema default
- Add data-layer tests for updateUsersByRole and countUsersByRole
- Add handler test verifying users-first rename ordering and migration
  failure safety

* fix: add rollback on rename failure and update PR description

- Roll back user migration if updateRoleByName returns null during a
  rename (race: role deleted between existence check and update)
- Add test verifying rollback calls updateUsersByRole in reverse
- Update PR #12400 description to reflect current test counts (56
  handler tests, 40 data-layer tests) and safety features

* fix: rollback on rename throw, description validation, delete/DRY cleanup

- Hoist isRename/trimmedName above try block so catch can roll back user
  migration when updateRoleByName throws (not just returns null)
- Add description type + max-length (2000) validation in create and update,
  consistent with groups handler
- Remove redundant getRoleByName existence check in deleteRoleHandler —
  use deleteRoleByName return value directly
- Skip no-op name write when body.name equals current name (use isRename)
- Extract getUserModel() accessor to DRY repeated Model<IUser> casts
- Use name.trim() consistently in createRoleByName error messages
- Add tests: rename-throw rollback, description validation (create+update),
  update delete test mocks to match simplified handler

* fix: guard spurious rollback, harden createRole error path, validate before DB calls

- Add migrationRan flag to prevent rollback of user migration that never ran
- Return generic message on 500 in createRoleHandler, specific only for 409
- Move description validation before DB queries in updateRoleHandler
- Return existing role early when update body has no changes
- Wrap cache.set in createRoleByName with try/catch to prevent masking DB success
- Add JSDoc on 11000 catch explaining compound unique index
- Add tests: spurious rollback guard, empty update body, description validation
  ordering, listUsersByRole pagination

* fix: validate permissions in create, RoleConflictError, rollback safety, cache consistency

- Add permissions type/array validation in createRoleHandler
- Introduce RoleConflictError class replacing fragile string-prefix matching
- Wrap rollback in !role null path with try/catch for correct 404 response
- Wrap deleteRoleByName cache.set in try/catch matching createRoleByName
- Narrow updateRoleHandler body type to { name?, description? }
- Add tests: non-string description in create, rollback failure logging,
  permissions array rejection, description max-length assertion fix

* feat: prevent removing the last admin user

Add guard in removeRoleMember that checks countUsersByRole before
demoting an ADMIN user, returning 400 if they are the last one.

* fix: move interleaved export below imports, add await to countUsersByRole

* fix: paginate listRoles, null-guard permissions handler, fix export ordering

- Add limit/offset/total pagination to listRoles matching the groups pattern
- Add countRoles data-layer method
- Omit permissions from listRoles select (getRole returns full document)
- Null-guard re-fetched role in updateRolePermissionsHandler
- Move interleaved export below all imports in methods/index.ts

* fix: address review findings — race safety, validation DRY, type accuracy, test coverage

- Add post-write admin count verification in removeRoleMember to prevent
  zero-admin race condition (TOCTOU → rollback if count hits 0)
- Make IRole.description optional; backfill in initializeRoles for
  pre-existing roles that lack the field (.lean() bypasses defaults)
- Extract parsePagination, validateNameParam, validateRoleName, and
  validateDescription helpers to eliminate duplicated validation
- Add validateNameParam guard to all 7 handlers reading req.params.name
- Catch 11000 in updateRoleByName and surface as 409 via RoleConflictError
- Add idempotent skip in addRoleMember when user already has target role
- Verify updateRolePermissions test asserts response body
- Add data-layer tests: listRoles sort/pagination/projection, countRoles,
  and createRoleByName 11000 duplicate key race

* fix: defensive rollback in removeRoleMember, type/style cleanup, test coverage

- Wrap removeRoleMember post-write admin rollback in try/catch so a
  transient DB failure cannot leave the system with zero administrators
- Replace double `as unknown[] as IRole[]` cast with `.lean<IRole[]>()`
- Type parsePagination param explicitly; extract DEFAULT/MAX page constants
- Preserve original error cause in updateRoleByName re-throw
- Add test for rollback failure path in removeRoleMember (returns 400)
- Add test for pre-existing roles missing description field (.lean())

* chore: bump @librechat/data-schemas to 0.0.47

* fix: stale cache on rename, extract renameRole helper, shared pagination, cleanup

- Fix updateRoleByName cache bug: invalidate old key and populate new key
  when updates.name differs from roleName (prevents stale cache after rename)
- Extract renameRole helper to eliminate mutable outer-scope state flags
  (isRename, trimmedName, migrationRan) in updateRoleHandler
- Unify system-role protection to 403 for both rename-from and rename-to
- Extract parsePagination to shared admin/pagination.ts; use in both
  roles.ts and groups.ts
- Extract name.trim() to local const in createRoleByName (was called 5×)
- Remove redundant findOne pre-check in deleteRoleByName
- Replace getUserModel closure with local const declarations
- Remove redundant description ?? '' in createRoleHandler (schema default)
- Add doc comment on updateRolePermissionsHandler noting cache dependency
- Add data-layer tests for cache rename behavior (old key null, new key set)

* fix: harden role guards, add User.role index, validate names, improve tests

- Add index on User.role field for efficient member queries at scale
- Replace fragile SystemRoles key lookup with value-based Set check (6 sites)
- Elevate rename rollback failure logging to CRITICAL (matches removeRoleMember)
- Guard removeRoleMember against non-ADMIN system roles (403 for USER)
- Fix parsePagination limit=0 gotcha: use parseInt + NaN check instead of ||
- Add control character and reserved path segment validation to role names
- Simplify validateRoleName: remove redundant casts and dead conditions
- Add JSDoc to deleteRoleByName documenting non-atomic window
- Split mixed value+type import in methods/index.ts per AGENTS.md
- Add 9 new tests: permissions assertion, combined rename+desc, createRole
  with permissions, pagination edge cases, control char/reserved name
  rejection, system role removeRoleMember guard

* fix: exact-case reserved name check, consistent validation, cleaner createRole

- Remove .toLowerCase() from reserved name check so only exact matches
  (members, permissions) are rejected, not legitimate names like "Members"
- Extract trimmed const in validateRoleName for consistent validation
- Add control char check to validateNameParam for parity with body validation
- Build createRole roleData conditionally to avoid passing description: undefined
- Expand deleteRoleByName JSDoc documenting self-healing design and no-op trade-off

* fix: scope rename rollback to only migrated users, prevent cross-role corruption

Capture user IDs before forward migration so the rollback path only
reverts users this request actually moved. Previously the rollback called
updateUsersByRole(newName, currentName) which would sweep all users with
the new role — including any independently assigned by a concurrent admin
request — causing silent cross-role data corruption.

Adds findUserIdsByRole and updateUsersRoleByIds to the data layer.
Extracts rollbackMigratedUsers helper to deduplicate rollback sites.

* fix: guard last admin in addRoleMember to prevent zero-admin lockout

Since each user has exactly one role, addRoleMember implicitly removes
the user from their current role. Without a guard, reassigning the sole
admin to a non-admin role leaves zero admins and locks out admin
management. Adds the same countUsersByRole check used in removeRoleMember.

* fix: wire findUserIdsByRole and updateUsersRoleByIds into roles route

The scoped rollback deps added in c89b5db were missing from the route
DI wiring, causing renameRole to call undefined and return a 500.

* fix: post-write admin guard in addRoleMember, compound role index, review cleanup

- Add post-write admin count check + rollback to addRoleMember to match
  removeRoleMember's two-phase TOCTOU protection (prevents zero-admin via
  concurrent requests)
- Replace single-field User.role index with compound { role: 1, tenantId: 1 }
  to align with existing multi-tenant index pattern (email, OAuth IDs)
- Narrow listRoles dep return type to RoleListItem (projected fields only)
- Refactor validateDescription to early-return style per AGENTS.md
- Remove redundant double .lean() in updateRoleByName
- Document rename snapshot race window in renameRole JSDoc
- Document cache null-set behavior in deleteRoleByName
- Add routing-coupling comment on RESERVED_ROLE_NAMES
- Add test for addRoleMember post-write rollback

* fix: review cleanup — system-role guard, type safety, JSDoc accuracy, tests

- Add system-role guard to addRoleMember: block direct assignment to
  non-ADMIN system roles (403), symmetric with removeRoleMember
- Fix RESERVED_ROLE_NAMES comment: explain semantic URL ambiguity, not
  a routing conflict (Express resolves single vs multi-segment correctly)
- Replace _id: unknown with Types.ObjectId | string per AGENTS.md
- Narrow listRoles data-layer return type to Pick<IRole, 'name' | 'description'>
  to match the actual .select() projection
- Move updateRoleHandler param check inside try/catch for consistency
- Include user IDs in all CRITICAL rollback failure logs for operator recovery
- Clarify deleteRoleByName JSDoc: replace "self-healing" with "idempotent",
  document that recovery requires caller retry
- Add tests: system-role guard, promote non-admin to ADMIN,
  findUserIdsByRole throw prevents migration

* fix: include _id in listRoles return type to match RoleListItem

Pick<IRole, 'name' | 'description'> omits _id, making it incompatible
with the handler dep's RoleListItem which requires _id.

* fix: case-insensitive system role guard, reject null permissions, check updateUser result

- System role name checks now use case-insensitive comparison via
  toUpperCase() — prevents creating 'admin' or 'user' which would
  collide with the legacy roles route that uppercases params
- Reject permissions: null in createRole (typeof null === 'object'
  was bypassing the validation)
- Check updateUser return in addRoleMember — return 404 if the user
  was deleted between the findUser and updateUser calls

* fix: check updateUser return in removeRoleMember for concurrent delete safety

---------

Co-authored-by: Danny Avila <danny@librechat.ai>
2026-03-27 15:44:47 -04:00
Dustin Healy
2e3d66cfe2
👥 feat: Admin Groups API Endpoints (#12387)
* feat: add listGroups and deleteGroup methods to userGroup

* feat: add admin groups handler factory and Express routes

* fix: address convention violations in admin groups handlers

* fix: address Copilot review findings in admin groups handlers

- Escape regex in listGroups to prevent injection/ReDoS
- Validate ObjectId format in all handlers accepting id/userId params
- Replace N+1 findUser loop with batched findUsers query
- Remove unused findGroupsByMemberId from dep interface
- Map Mongoose ValidationError to 400 in create/update handlers
- Validate name in updateGroupHandler (reject empty/whitespace)
- Handle null updateGroupById result (race condition)
- Tighten error message matching in add/remove member handlers

* test: add unit tests for admin groups handlers

* fix: address code review findings for admin groups

Atomic delete/update handlers (single DB trip), pass through
idOnTheSource, add removeMemberById for non-ObjectId members,
deduplicate member results, fix error message exposure, add hard
cap/sort to listGroups, replace GroupListFilter with Pick of
GroupFilterOptions, validate memberIds as array, trim name in
update, fix import order, and improve test hygiene with fresh
IDs per test.

* fix: cascade cleanup, pagination, and test coverage for admin groups

Add deleteGrantsForPrincipal to systemGrant data layer and wire cascade
cleanup (Config, AclEntry, SystemGrant) into deleteGroupHandler. Add
limit/offset pagination to getGroupMembers. Guard empty PATCH bodies with
400. Remove dead type guard and unnecessary type cast. Add 11 new tests
covering cascade delete, idempotent member removal, empty update, search
filter, 500 error paths, and pagination.

* fix: harden admin groups with cascade resilience, type safety, and fallback removal

Wrap cascade cleanup in inner try/catch so partial failure logs but still
returns 200 (group is already deleted). Replace Record<string, unknown> on
deleteAclEntries with proper typed filter. Log warning for unmapped user
ObjectIds in createGroup memberIds. Add removeMemberById fallback when
removeUserFromGroup throws User not found for ObjectId-format userId.
Extract VALID_GROUP_SOURCES constant. Add 3 new tests (60 total).

* refactor: add countGroups, pagination, and projection type to data layer

Extract buildGroupQuery helper, add countGroups method, support
limit/offset/skip in listGroups, standardize session handling to
.session(session ?? null), and tighten projection parameter from
Record<string, unknown> to Record<string, 0 | 1>.

* fix: cascade resilience, pagination, validation, and error clarity for admin groups

- Use Promise.allSettled for cascade cleanup so all steps run even if
  one fails; log individual rejections
- Echo deleted group id in delete response
- Add countGroups dep and wire limit/offset pagination for listGroups
- Deduplicate memberIds before computing total in getGroupMembers
- Use { memberIds: 1 } projection in getGroupMembers
- Cap memberIds at 500 entries in createGroup
- Reject search queries exceeding 200 characters
- Clarify addGroupMember error for non-ObjectId userId
- Document deleted-user fallback limitation in removeGroupMember

* test: extend handler and DB-layer test coverage for admin groups

Handler tests: projection assertion, dedup total, memberIds cap,
search max length, non-ObjectId memberIds passthrough, cascade partial
failure resilience, dedup scenarios, echo id in delete response.

DB-layer tests: listGroups sort/filter/pagination, countGroups,
deleteGroup, removeMemberById, deleteGrantsForPrincipal.

* fix: cast group principalId to ObjectId for ACL entry cleanup

deleteAclEntries is a thin deleteMany wrapper with no type casting,
but grantPermission stores group principalId as ObjectId. Passing the
raw string from req.params would leave orphaned ACL entries on group
deletion.

* refactor: remove redundant pagination clamping from DB listGroups

Handler already clamps limit/offset at the API boundary. The DB
method is a general-purpose building block and should not re-validate.

* fix: add source and name validation, import order, and test coverage for admin groups

- Validate source against VALID_GROUP_SOURCES in createGroupHandler
- Cap name at 500 characters in both create and update handlers
- Document total as upper bound in getGroupMembers response
- Document ObjectId requirement for deleteAclEntries in cascade
- Fix import ordering in test file (local value after type imports)
- Add tests for updateGroup with description, email, avatar fields
- Add tests for invalid source and name max-length in both handlers

* fix: add field length caps, flatten nested try/catch, and fix logger level in admin groups

Add max-length validation for description, email, avatar, and
idOnTheSource in create/update handlers. Extract removeObjectIdMember
helper to flatten nested try/catch per never-nesting convention. Downgrade
unmapped-memberIds log from error to warn. Fix type import ordering and
add missing await in removeMemberById for consistency.
2026-03-26 17:36:18 -04:00
Danny Avila
9f6d8c6e93
🧵 feat: ALS Context Middleware, Tenant Threading, and Config Cache Invalidation (#12407)
* feat: add tenant context middleware for ALS-based isolation

Introduces tenantContextMiddleware that propagates req.user.tenantId
into AsyncLocalStorage, activating the Mongoose applyTenantIsolation
plugin for all downstream DB queries within a request.

- Strict mode (TENANT_ISOLATION_STRICT=true) returns 403 if no tenantId
- Non-strict mode passes through for backward compatibility
- No-op for unauthenticated requests
- Includes 6 unit tests covering all paths

* feat: register tenant middleware and wrap startup/auth in runAsSystem()

- Register tenantContextMiddleware in Express app after capability middleware
- Wrap server startup initialization in runAsSystem() for strict mode compat
- Wrap auth strategy getAppConfig() calls in runAsSystem() since they run
  before user context is established (LDAP, SAML, OpenID, social login, AuthService)

* feat: thread tenantId through all getAppConfig callers

Pass tenantId from req.user to getAppConfig() across all callers that
have request context, ensuring correct per-tenant cache key resolution.

Also fixes getBaseConfig admin endpoint to scope to requesting admin's
tenant instead of returning the unscoped base config.

Files updated:
- Controllers: UserController, PluginController
- Middleware: checkDomainAllowed, balance
- Routes: config
- Services: loadConfigModels, loadDefaultModels, getEndpointsConfig, MCP
- Audio services: TTSService, STTService, getVoices, getCustomConfigSpeech
- Admin: getBaseConfig endpoint

* feat: add config cache invalidation on admin mutations

- Add clearOverrideCache(tenantId?) to flush per-principal override caches
  by enumerating Keyv store keys matching _OVERRIDE_: prefix
- Add invalidateConfigCaches() helper that clears base config, override
  caches, tool caches, and endpoint config cache in one call
- Wire invalidation into all 5 admin config mutation handlers
  (upsert, patch, delete field, delete overrides, toggle active)
- Add strict mode warning when __default__ tenant fallback is used
- Add 3 new tests for clearOverrideCache (all/scoped/base-preserving)

* chore: update getUserPrincipals comment to reflect ALS-based tenant filtering

The TODO(#12091) about missing tenantId filtering is resolved by the
tenant context middleware + applyTenantIsolation Mongoose plugin.
Group queries are now automatically scoped by tenantId via ALS.

* fix: replace runAsSystem with baseOnly for pre-tenant code paths

App configs are tenant-owned — runAsSystem() would bypass tenant
isolation and return cross-tenant DB overrides. Instead, add
baseOnly option to getAppConfig() that returns YAML-derived config
only, with zero DB queries.

All startup code, auth strategies, and MCP initialization now use
getAppConfig({ baseOnly: true }) to get the YAML config without
touching the Config collection.

* fix: address PR review findings — middleware ordering, types, cache safety

- Chain tenantContextMiddleware inside requireJwtAuth after passport auth
  instead of global app.use() where req.user is always undefined (Finding 1)
- Remove global tenantContextMiddleware registration from index.js
- Update BalanceMiddlewareOptions to include tenantId, remove redundant cast (Finding 4)
- Add warning log when clearOverrideCache cannot enumerate keys on Redis (Finding 3)
- Use startsWith instead of includes for cache key filtering (Finding 12)
- Use generator loop instead of Array.from for key enumeration (Finding 3)
- Selective barrel export — exclude _resetTenantMiddlewareStrictCache (Finding 5)
- Move isMainThread check to module level, remove per-request check (Finding 9)
- Move mid-file require to top of app.js (Finding 8)
- Parallelize invalidateConfigCaches with Promise.all (Finding 10)
- Remove clearOverrideCache from public app.js exports (internal only)
- Strengthen getUserPrincipals comment re: ALS dependency (Finding 2)

* fix: restore runAsSystem for startup DB ops, consolidate require, clarify baseOnly

- Restore runAsSystem() around performStartupChecks, updateInterfacePermissions,
  initializeMCPs, and initializeOAuthReconnectManager — these make Mongoose
  queries that need system context in strict tenant mode (NEW-3)
- Consolidate duplicate require('@librechat/api') in requireJwtAuth.js (NEW-1)
- Document that baseOnly ignores role/userId/tenantId in JSDoc (NEW-2)

* test: add requireJwtAuth tenant chaining + invalidateConfigCaches tests

- requireJwtAuth: 5 tests verifying ALS tenant context is set after
  passport auth, isolated between concurrent requests, and not set
  when user has no tenantId (Finding 6)
- invalidateConfigCaches: 4 tests verifying all four caches are cleared,
  tenantId is threaded through, partial failure is handled gracefully,
  and operations run in parallel via Promise.all (Finding 11)

* fix: address Copilot review — passport errors, namespaced cache keys, /base scoping

- Forward passport errors in requireJwtAuth before entering tenant
  middleware — prevents silent auth failures from reaching handlers (P1)
- Account for Keyv namespace prefix in clearOverrideCache — stored keys
  are namespaced as "APP_CONFIG:_OVERRIDE_:..." not "_OVERRIDE_:...",
  so override caches were never actually matched/cleared (P2)
- Remove role from getBaseConfig — /base should return tenant-scoped
  base config, not role-merged config that drifts per admin role (P2)
- Return tenantStorage.run() for cleaner async semantics
- Update mock cache in service.spec.ts to simulate Keyv namespacing

* fix: address second review — cache safety, code quality, test reliability

- Decouple cache invalidation from mutation response: fire-and-forget
  with logging so DB mutation success is not masked by cache failures
- Extract clearEndpointConfigCache helper from inline IIFE
- Move isMainThread check to lazy once-per-process guard (no import
  side effect)
- Memoize process.env read in overrideCacheKey to avoid per-request
  env lookups and log flooding in strict mode
- Remove flaky timer-based parallelism assertion, use structural check
- Merge orphaned double JSDoc block on getUserPrincipals
- Fix stale [getAppConfig] log prefix → [ensureBaseConfig]
- Fix import order in tenant.spec.ts (package types before local values)
- Replace "Finding 1" reference with self-contained description
- Use real tenantStorage primitives in requireJwtAuth spec mock

* fix: move JSDoc to correct function after clearEndpointConfigCache extraction

* refactor: remove Redis SCAN from clearOverrideCache, rely on TTL expiry

Redis SCAN causes 60s+ stalls under concurrent load (see #12410).
APP_CONFIG defaults to FORCED_IN_MEMORY_CACHE_NAMESPACES, so the
in-memory store.keys() path handles the standard case. When APP_CONFIG
is Redis-backed, overrides expire naturally via overrideCacheTtl (60s
default) — an acceptable window for admin config mutations.

* fix: remove return from tenantStorage.run to satisfy void middleware signature

* fix: address second review — cache safety, code quality, test reliability

- Switch invalidateConfigCaches from Promise.all to Promise.allSettled
  so partial failures are logged individually instead of producing one
  undifferentiated error (Finding 3)
- Gate overrideCacheKey strict-mode warning behind a once-per-process
  flag to prevent log flooding under load (Finding 4)
- Add test for passport error forwarding in requireJwtAuth — the
  if (err) { return next(err) } branch now has coverage (Finding 5)
- Add test for real partial failure in invalidateConfigCaches where
  clearAppConfigCache rejects (not just the swallowed endpoint error)

* chore: reorder imports in index.js and app.js for consistency

- Moved logger and runAsSystem imports to maintain a consistent import order across files.
- Improved code readability by ensuring related imports are grouped together.
2026-03-26 17:35:00 -04:00
Danny Avila
4b6d68b3b5
🎛️ feat: DB-Backed Per-Principal Config System (#12354)
*  feat: Add Config schema, model, and methods for role-based DB config overrides

Add the database foundation for principal-based configuration overrides
(user, group, role) in data-schemas. Includes schema with tenantId and
tenant isolation, CRUD methods, and barrel exports.

* 🔧 fix: Add shebang and enforce LF line endings for git hooks

The pre-commit hook was missing #!/bin/sh, and core.autocrlf=true was
converting it to CRLF, both causing "Exec format error" on Windows.
Add .gitattributes to force LF for .husky/* and *.sh files.

*  feat: Add admin config API routes with section-level capability checks

Add /api/admin/config endpoints for managing per-principal config
overrides (user, group, role). Handlers in @librechat/api use DI pattern
with section-level hasConfigCapability checks for granular access control.

Supports full overrides replacement, per-field PATCH via dot-paths, field
deletion, toggle active, and listing.

* 🐛 fix: Move deleteConfigField fieldPath from URL param to request body

The path-to-regexp wildcard syntax (:fieldPath(*)) is not supported by
the version used in Express. Send fieldPath in the DELETE request body
instead, which also avoids URL-encoding issues with dotted paths.

*  feat: Wire config resolution into getAppConfig with override caching

Add mergeConfigOverrides utility in data-schemas for deep-merging DB
config overrides into base AppConfig by priority order.

Update getAppConfig to query DB for applicable configs when role/userId
is provided, with short-TTL caching and a hasAnyConfigs feature flag
for zero-cost when no DB configs exist.

Also: add unique compound index on Config schema, pass userId from
config middleware, and signal config changes from admin API handlers.

* 🔄 refactor: Extract getAppConfig logic into packages/api as TS service

Move override resolution, caching strategy, and signalConfigChange from
api/server/services/Config/app.js into packages/api/src/app/appConfigService.ts
using the DI factory pattern (createAppConfigService). The JS file becomes
a thin wiring layer injecting loadBaseConfig, cache, and DB dependencies.

* 🧹 chore: Rename configResolution.ts to resolution.ts

*  feat: Move admin types & capabilities to librechat-data-provider

Move SystemCapabilities, CapabilityImplications, and utility functions
(hasImpliedCapability, expandImplications) from data-schemas to
data-provider so they are available to external consumers like the
admin panel without a data-schemas dependency.

Add API-friendly admin types: TAdminConfig, TAdminSystemGrant,
TAdminAuditLogEntry, TAdminGroup, TAdminMember, TAdminUserSearchResult,
TCapabilityCategory, and CAPABILITY_CATEGORIES.

data-schemas re-exports these from data-provider and extends with
config-schema-derived types (ConfigSection, SystemCapability union).

Bump version to 0.8.500.

* feat: Add JSON-serializable admin config API response types to data-schemas

Add AdminConfig, AdminConfigListResponse, AdminConfigResponse, and
AdminConfigDeleteResponse types so both LibreChat API handlers and the
admin panel can share the same response contract. Bump version to 0.0.41.

* refactor: Move admin capabilities & types from data-provider to data-schemas

SystemCapabilities, CapabilityImplications, utility functions,
CAPABILITY_CATEGORIES, and admin API response types should not be in
data-provider as it gets compiled into the frontend bundle, exposing
the capability surface. Moved everything to data-schemas (server-only).

All consumers already import from @librechat/data-schemas, so no
import changes needed elsewhere. Consolidated duplicate AdminConfig
type (was in both config.ts and admin.ts).

* chore: Bump @librechat/data-schemas to 0.0.42

* refactor: Reorganize admin capabilities into admin/ and types/admin.ts

Split systemCapabilities.ts following data-schemas conventions:
- Types (BaseSystemCapability, SystemCapability, AdminConfig, etc.)
  → src/types/admin.ts
- Runtime code (SystemCapabilities, CapabilityImplications, utilities)
  → src/admin/capabilities.ts

Revert data-provider version to 0.8.401 (no longer modified).

* chore: Fix import ordering, rename appConfigService to service

- Rename app/appConfigService.ts → app/service.ts (directory provides context)
- Fix import order in admin/config.ts, types/admin.ts, types/config.ts
- Add naming convention to AGENTS.md

* feat: Add DB base config support (role/__base__)

- Add BASE_CONFIG_PRINCIPAL_ID constant for reserved base config doc
- getApplicableConfigs always includes __base__ in queries
- getAppConfig queries DB even without role/userId when DB configs exist
- Bump @librechat/data-schemas to 0.0.43

* fix: Address PR review issues for admin config

- Add listAllConfigs method; listConfigs endpoint returns all active
  configs instead of only __base__
- Normalize principalId to string in all config methods to prevent
  ObjectId vs string mismatch on user/group lookups
- Block __proto__ and all dunder-prefixed segments in field path
  validation to prevent prototype pollution
- Fix configVersion off-by-one: default to 0, guard pre('save') with
  !isNew, use $inc on findOneAndUpdate
- Remove unused getApplicableConfigs from admin handler deps

* fix: Enable tree-shaking for data-schemas, bump packages

- Switch data-schemas Rollup output to preserveModules so each source
  file becomes its own chunk; consumers (admin panel) can now import
  just the modules they need without pulling in winston/mongoose/etc.
- Add sideEffects: false to data-schemas package.json
- Bump data-schemas to 0.0.44, data-provider to 0.8.402

* feat: add capabilities subpath export to data-schemas

Adds `@librechat/data-schemas/capabilities` subpath export so browser
consumers can import BASE_CONFIG_PRINCIPAL_ID and capability constants
without pulling in Node.js-only modules (winston, async_hooks, etc.).

Bump version to 0.0.45.

* fix: include dist/ in data-provider npm package

Add explicit files field so npm includes dist/types/ in the published
package. Without this, the root .gitignore exclusion of dist/ causes
npm to omit type declarations, breaking TypeScript consumers.

* chore: bump librechat-data-provider to 0.8.403

* feat: add GET /api/admin/config/base for raw AppConfig

Returns the full AppConfig (YAML + DB base merged) so the admin panel
can display actual config field values and structure. The startup config
endpoint (/api/config) returns TStartupConfig which is a different shape
meant for the frontend app.

* chore: imports order

* fix: address code review findings for admin config

Critical:
- Fix clearAppConfigCache: was deleting from wrong cache store (CONFIG_STORE
  instead of APP_CONFIG), now clears BASE and HAS_DB_CONFIGS keys
- Eliminate race condition: patchConfigField and deleteConfigField now use
  atomic MongoDB $set/$unset with dot-path notation instead of
  read-modify-write cycles, removing the lost-update bug entirely
- Add patchConfigFields and unsetConfigField atomic DB methods

Major:
- Reorder cache check before principal resolution in getAppConfig so
  getUserPrincipals DB query only fires on cache miss
- Replace '' as ConfigSection with typed BROAD_CONFIG_ACCESS constant
- Parallelize capability checks with Promise.all instead of sequential
  awaits in for loops
- Use loose equality (== null) for cache miss check to handle both null
  and undefined returns from cache implementations
- Set HAS_DB_CONFIGS_KEY to true on successful config fetch

Minor:
- Remove dead pre('save') hook from config schema (all writes use
  findOneAndUpdate which bypasses document hooks)
- Consolidate duplicate type imports in resolution.ts
- Remove dead deepGet/deepSet/deepUnset functions (replaced by atomic ops)
- Add .sort({ priority: 1 }) to getApplicableConfigs query
- Rename _impliedBy to impliedByMap

* fix: self-referencing BROAD_CONFIG_ACCESS constant

* fix: replace type-cast sentinel with proper null parameter

Update hasConfigCapability to accept ConfigSection | null where null
means broad access check (MANAGE_CONFIGS or READ_CONFIGS only).
Removes the '' as ConfigSection type lie from admin config handlers.

* fix: remaining review findings + add tests

- listAllConfigs accepts optional { isActive } filter so admin listing
  can show inactive configs (#9)
- Standardize session application to .session(session ?? null) across
  all config DB methods (#15)
- Export isValidFieldPath and getTopLevelSection for testability
- Add 38 tests across 3 spec files:
  - config.spec.ts (api): path validation, prototype pollution rejection
  - resolution.spec.ts: deep merge, priority ordering, array replacement
  - config.spec.ts (data-schemas): full CRUD, ObjectId normalization,
    atomic $set/$unset, configVersion increment, toggle, __base__ query

* fix: address second code review findings

- Fix cross-user cache contamination: overrideCacheKey now handles
  userId-without-role case with its own cache key (#1)
- Add broad capability check before DB lookup in getConfig to prevent
  config existence enumeration (#2/#3)
- Move deleteConfigField fieldPath from request body to query parameter
  for proxy/load balancer compatibility (#5)
- Derive BaseSystemCapability from SystemCapabilities const instead of
  manual string union (#6)
- Return 201 on upsert creation, 200 on update (#11)
- Remove inline narration comments per AGENTS.md (#12)
- Type overrides as Partial<TCustomConfig> in DB methods and handler
  deps (#13)
- Replace double as-unknown-as casts in resolution.ts with generic
  deepMerge<T> (#14)
- Make override cache TTL injectable via AppConfigServiceDeps (#16)
- Add exhaustive never check in principalModel switch (#17)

* fix: remaining review findings — tests, rename, semantics

- Rename signalConfigChange → markConfigsDirty with JSDoc documenting
  the stale-window tradeoff and overrideCacheTtl knob
- Fix DEFAULT_OVERRIDE_CACHE_TTL naming convention
- Add createAppConfigService tests (14 cases): cache behavior, feature
  flag, cross-user key isolation, fallback on error, markConfigsDirty
- Add admin handler integration tests (13 cases): auth ordering,
  201/200 on create/update, fieldPath from query param, markConfigsDirty
  calls, capability checks

* fix: global flag corruption + empty overrides auth bypass

- Remove HAS_DB_CONFIGS_KEY=false optimization: a scoped query returning
  no configs does not mean no configs exist globally. Setting the flag
  false from a per-principal query short-circuited all subsequent users.
- Add broad manage capability check before section checks in
  upsertConfigOverrides: empty overrides {} no longer bypasses auth.

* test: add regression and invariant tests for config system

Regression tests:
- Bug 1: User A's empty result does not short-circuit User B's overrides
- Bug 2: Empty overrides {} returns 403 without MANAGE_CONFIGS

Invariant tests (applied across ALL handlers):
- All 5 mutation handlers call markConfigsDirty on success
- All 5 mutation handlers return 401 without auth
- All 5 mutation handlers return 403 without capability
- All 3 read handlers return 403 without capability

* fix: third review pass — all findings addressed

Service (service.ts):
- Restore HAS_DB_CONFIGS=false for base-only queries (no role/userId)
  so deployments with zero DB configs skip DB queries (#1)
- Resolve cache once at factory init instead of per-invocation (#8)
- Use BASE_CONFIG_PRINCIPAL_ID constant in overrideCacheKey (#10)
- Add JSDoc to clearAppConfigCache documenting stale-window (#4)
- Fix log message to not say "from YAML" (#14)

Admin handlers (config.ts):
- Use configVersion===1 for 201 vs 200, eliminating TOCTOU race (#2)
- Add Array.isArray guard on overrides body (#5)
- Import CapabilityUser from capabilities.ts, remove duplicate (#6)
- Replace as-unknown-as cast with targeted type assertion (#7)
- Add MAX_PATCH_ENTRIES=100 cap on entries array (#15)
- Reorder deleteConfigField to validate principalType first (#12)
- Export CapabilityUser from middleware/capabilities.ts

DB methods (config.ts):
- Remove isActive:true from patchConfigFields to prevent silent
  reactivation of disabled configs (#3)

Schema (config.ts):
- Change principalId from Schema.Types.Mixed to String (#11)

Tests:
- Add patchConfigField unsafe fieldPath rejection test (#9)
- Add base-only HAS_DB_CONFIGS=false test (#1)
- Update 201/200 tests to use configVersion instead of findConfig (#2)

* fix: add read handler 401 invariant tests + document flag behavior

- Add invariant: all 3 read handlers return 401 without auth
- Document on markConfigsDirty that HAS_DB_CONFIGS stays true after
  all configs are deleted until clearAppConfigCache or restart

* fix: remove HAS_DB_CONFIGS false optimization entirely

getApplicableConfigs([]) only queries for __base__, not all configs.
A deployment with role/group configs but no __base__ doc gets the
flag poisoned to false by a base-only query, silently ignoring all
scoped overrides. The optimization is not safe without a comprehensive
Config.exists() check, which adds its own DB cost. Removed entirely.

The flag is now write-once-true (set when configs are found or by
markConfigsDirty) and only cleared by clearAppConfigCache/restart.

* chore: reorder import statements in app.js for clarity

* refactor: remove HAS_DB_CONFIGS_KEY machinery entirely

The three-state flag (false/null/true) was the source of multiple bugs
across review rounds. Every attempt to safely set it to false was
defeated by getApplicableConfigs querying only a subset of principals.

Removed: HAS_DB_CONFIGS_KEY constant, all reads/writes of the flag,
markConfigsDirty (now a no-op concept), notifyChange wrapper, and all
tests that seeded false manually.

The per-user/role TTL cache (overrideCacheTtl, default 60s) is the
sole caching mechanism. On cache miss, getApplicableConfigs queries
the DB. This is one indexed query per user per TTL window — acceptable
for the config override use case.

* docs: rewrite admin panel remaining work with current state

* perf: cache empty override results to avoid repeated DB queries

When getApplicableConfigs returns no configs for a principal, cache
baseConfig under their override key with TTL. Without this, every
user with no per-principal overrides hits MongoDB on every request
after the 60s cache window expires.

* fix: add tenantId to cache keys + reject PUBLIC principal type

- Include tenantId in override cache keys to prevent cross-tenant
  config contamination. Single-tenant deployments (tenantId undefined)
  use '_' as placeholder — no behavior change for them.
- Reject PrincipalType.PUBLIC in admin config validation — PUBLIC has
  no PrincipalModel and is never resolved by getApplicableConfigs,
  so config docs for it would be dead data.
- Config middleware passes req.user.tenantId to getAppConfig.

* fix: fourth review pass findings

DB methods (config.ts):
- findConfigByPrincipal accepts { includeInactive } option so admin
  GET can retrieve inactive configs (#5)
- upsertConfig catches E11000 duplicate key on concurrent upserts and
  retries without upsert flag (#2)
- unsetConfigField no longer filters isActive:true, consistent with
  patchConfigFields (#11)
- Typed filter objects replace Record<string, unknown> (#12)

Admin handlers (config.ts):
- patchConfigField: serial broad capability check before Promise.all
  to pre-warm ALS principal cache, preventing N parallel DB calls (#3)
- isValidFieldPath rejects leading/trailing dots and consecutive
  dots (#7)
- Duplicate fieldPaths in patch entries return 400 (#8)
- DEFAULT_PRIORITY named constant replaces hardcoded 10 (#14)
- Admin getConfig and patchConfigField pass includeInactive to
  findConfigByPrincipal (#5)
- Route import uses barrel instead of direct file path (#13)

Resolution (resolution.ts):
- deepMerge has MAX_MERGE_DEPTH=10 guard to prevent stack overflow
  from crafted deeply nested configs (#4)

* fix: final review cleanup

- Remove ADMIN_PANEL_REMAINING.md (local dev notes with Windows paths)
- Add empty-result caching regression test
- Add tenantId to AdminConfigDeps.getAppConfig type
- Restore exhaustive never check in principalModel switch
- Standardize toggleConfigActive session handling to options pattern

* fix: validate priority in patchConfigField handler

Add the same non-negative number validation for priority that
upsertConfigOverrides already has. Without this, invalid priority
values could be stored via PATCH and corrupt merge ordering.

* chore: remove planning doc from PR

* fix: correct stale cache key strings in service tests

* fix: clean up service tests and harden tenant sentinel

- Remove no-op cache delete lines from regression tests
- Change no-tenant sentinel from '_' to '__default__' to avoid
  collision with a real tenant ID when multi-tenancy is enabled
- Remove unused CONFIG_STORE from AppConfigServiceDeps

* chore: bump @librechat/data-schemas to 0.0.46

* fix: block prototype-poisoning keys in deepMerge

Skip __proto__, constructor, and prototype keys during config merge
to prevent prototype pollution via PUT /api/admin/config overrides.
2026-03-25 19:39:29 -04:00