LibreChat

mirror of https://github.com/danny-avila/LibreChat.git synced 2026-07-02 12:22:22 +00:00

Author	SHA1	Message	Date
Danny Avila	197a1dc4e2	🧬 feat: Add GitHub Skill Sync (#13293 ) Some checks failed Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details GitNexus Index / index (push) Waiting to run Details GitNexus Index / post-index (push) Blocked by required conditions Details Docker Dev Images Build / build (Dockerfile, librechat-dev, node) (push) Waiting to run Details Docker Dev Images Build / build (Dockerfile.multi, librechat-dev-api, api-build) (push) Waiting to run Details Sync Locize Translations & Create Translation PR / Sync Translation Keys with Locize (push) Waiting to run Details Sync Locize Translations & Create Translation PR / Create Translation PR on Version Published (push) Blocked by required conditions Details Sync Helm Chart Tags / Ignore non-main push (push) Waiting to run Details Sync Helm Chart Tags / Sync chart tags (push) Waiting to run Details Publish `librechat-data-provider` to NPM / pack (push) Has been cancelled Details Publish `librechat-data-provider` to NPM / publish-npm (push) Has been cancelled Details * feat: Add GitHub skill sync * fix: Address GitHub skill sync CI * fix: Harden GitHub skill sync review paths * fix: Prevent overlapping skill sync runs * fix: Address GitHub skill sync review findings * fix: Satisfy Git ref lint rule * fix: Address GitHub sync review follow-ups * fix: Match skill frontmatter closing fence * fix: Address GitHub sync review cycle * fix: Address GitHub sync review follow-ups * fix: Harden GitHub skill sync worker * fix: Format GitHub sync rollback log * fix: Address GitHub sync review feedback * fix: Format skill import parse handling * fix: Coerce scalar skill frontmatter and correct scheduler timer clear - parse: coerce numeric/boolean name and description scalars to strings instead of dropping them to empty (restores pre-refactor behavior; preserves absent-vs-empty distinction for the when-to-use fallback) - scheduler: clear the setTimeout handle with clearTimeout rather than clearInterval - test: cover non-string scalar frontmatter coercion * fix: Tolerate trailing whitespace after SKILL.md opening frontmatter fence extractFrontmatterBlock required the opening fence to be exactly '---\n', so an opener with trailing spaces/tabs (e.g. '--- \n') silently dropped all frontmatter even though the closing-fence regex already tolerates it. Match the opener with /^---[ \t]\n/ for symmetry. Addresses Codex P3 (parse.ts:24). feat: Run GitHub skill sync under a per-source tenant context Under TENANT_ISOLATION_STRICT, the sync ran with no async tenant context, so the tenant-isolation mongoose hooks threw on every Skill/SkillFile/AclEntry operation; in non-strict mode synced skills were written tenant-less and never matched tenant-scoped reads. Add an optional per-source tenantId to the skillSync config; when set, each source sync runs inside tenantStorage.run({ tenantId }) so skills, files, and public ACL grants are created and listed within that tenant, and the skill row is stamped with the tenantId for correct dedup. Sources without tenantId keep the prior single-tenant behavior. Avoids runAsSystem. Addresses Codex P2 (sync.js:70). Lock/status/credential bookkeeping stays outside the tenant context (those collections are intentionally global). * test: Restore dropped tenant-context coverage for GitHub skill sync The prior commit shipped the getTenantId import in github.spec.ts without the tenant tests that use it (lost in an interrupted edit), which failed the eslint --max-warnings=0 CI job on an unused import. Restore both github.spec.ts tenant tests (tenant-scoped run stamps tenantId and executes inside the tenant ALS context; no-tenant run stays ambient) and the two config-schemas tenant tests (accepts tenantId, rejects __SYSTEM__). * test: Restore dropped github.spec tenant-context tests The previous commit's github.spec.ts edit did not apply (anchor mismatch), so the getTenantId import remained unused and failed eslint --max-warnings=0. Add the two tenant tests that use it: a tenant-scoped run stamps tenantId and executes inside the tenant ALS context, and a no-tenant run stays ambient. * feat: Scope synced skill author to tenant and harden tenant-context sync Addresses the latest Codex review on the per-source tenant change: - makeSourceAuthorId now folds tenantId into the synthetic author hash so the same source mirrored into different tenants gets distinct author ids (clearer audits, no cross-tenant author collisions). Single-tenant author ids stay stable (suffix omitted when tenantId is absent). - syncSourceInTenantContext uses an async callback per the tenant-context contract so the ALS store propagates across awaited Mongoose calls. - Tests: same-source/different-tenant yields distinct authors; mirror cleanup is scoped to the source and deletes only its absent-upstream skills. * fix: Repair tsc error and guard external edits in github skill sync - Fix TS2352 in github.spec mirror-cleanup test: build the existing-skill mock via makeSkill with authorName instead of an under-typed 'as CreateSkillInput' cast (this was the failing TypeScript CI check on f00ce3c5a). - 808: commitExistingRemoteSkillAfterFileSync re-reads to clear our own file-sync version bumps, but now compares refreshed content against the pre-sync snapshot (body/name/description/always-apply) and throws SKILL_CONFLICT on a concurrent external edit instead of overwriting it. * docs: Note skillSync source tenantId is effectively immutable Changing/adding/removing a source's tenantId orphans previously mirrored skills in the old tenant (a tenant-scoped sync cannot clean another tenant's data without runAsSystem, which is intentionally avoided). * fix: Key GitHub skill upstream identity on source id and path only Addresses Codex finding (github.ts:217): makeUpstreamId previously included owner/repo, so repointing a source to a renamed or replacement repository (same source id) changed the upstreamId, made findSkillBySourceIdentity miss the existing mirror, and then collided on the (name, author, tenantId) uniqueness constraint — leaving the source stuck failing. Identity now keys on the stable source id + root path only. The feature is unreleased, so there is no stored-id migration. Updated spec upstreamId fixtures to the new format; the existing ref-independent identity test now also covers repo moves. * fix: Scope GitHub skill mirror deletion to the source tenant Addresses Codex P1 (github.ts:1047/1057): an ambient source (no tenantId) runs listSkillsBySource without tenant context, which under non-strict isolation returns github-synced skills across all tenants. The mirror-deletion pass then treated other tenants' skills as absent-upstream and could delete them. Filter existingSyncedSkills to rows whose tenantId matches the source's configured tenantId (absent = its own ambient bucket) before deleting, so a sync never removes another tenant's mirrored skills. Covered by a test where an ambient run leaves a tenant-b-owned skill untouched. * fix: Apply tenant-scoped mirror deletion implementation The prior commit (75ccfa3fc) added the test but the source change to github.ts was lost in an interrupted edit, leaving a failing test with no implementation. This adds the actual guard: the mirror-deletion pass skips skills whose tenantId does not match the source's configured tenantId (absent = ambient bucket), so an ambient source whose listSkillsBySource returns cross-tenant rows under non-strict isolation cannot delete another tenant's mirrored skills. * fix: Resolve global access role outside tenant context for synced skill grants Addresses Codex P2 (github.ts:1166): default access roles (incl. skill_viewer) are seeded globally with no tenantId under runAsSystem, but a tenant-scoped sync wraps ensurePublicViewer in the source's tenant context. The PermissionService grantPermission resolved the role via a tenant-isolated AccessRole query, so the global role did not match and tenant-scoped syncs failed with 'Role skill_viewer not found'. The sync adapter now resolves the role inside runAsSystem (matching the global seed) and writes the ACL entry in the active tenant context, so the AclEntry is tenant-scoped (visible to tenant users) while the role lookup still succeeds. Covered by service tests for the resolve-vs-write split and the missing-role failure. * fix: Strip placeholder frontmatter booleans and check skill conflict before file sync - 1083 (github.ts:759): toCleanFrontmatter now drops a non-boolean always-apply (e.g. the 'always-apply:' / 'always-apply: # TODO' placeholder, which js-yaml yields as null). The boolean is already captured in the dedicated alwaysApply field; persisting null left ambiguous frontmatter on the synced skill. - 1080 (github.ts:1057): for an existing mirrored skill, check for an external content edit (via getSkillById + hasExternalSkillEdit) BEFORE syncSkillFiles mutates the bundled files, so a concurrently edited skill fails fast with SKILL_CONFLICT without partial file rewrites. The post-file-sync check still guards edits that land during the file sync window. Tests: placeholder always-apply is dropped from synced frontmatter; concurrent-edit conflict leaves files unmutated (no upsert/delete). * fix: Harden GitHub skill sync review paths * fix: Reuse moved GitHub skill mirrors * fix: Scope GitHub sync identity conflicts * test: Fix GitHub sync conflict mock typing * fix: Support nested env-backed skill sync * fix: Keep skill sync config base-only * fix: Scope GitHub skill identity lookup by tenant * fix: Harden GitHub skill sync admin gates * fix: Guard existing skill sync permission grants * feat: Trigger skill sync from resolved config * fix: Scope resolved skill sync by tenant * test: Allow manual skill sync status tenant scoping * refactor: Extract skill sync trigger orchestrator * test: Complete orchestrator status fixture * chore: Bump data provider version * fix: Restrict skill sync server credentials * test: Complete admin skill sync status fixtures * fix: tighten skill sync trigger safeguards * fix: preserve alwaysApply skill sync alias * chore: sort skill sync imports * fix: preserve skill sync request scope * fix: harden skill sync review edges * refactor: move skill sync admin access to api package * fix: add skill sync declaration return types * fix: satisfy skill sync type checks * fix: resolve codex skill sync review findings * fix: harden skill sync review edges * fix: resolve codex skill sync edge findings * fix: satisfy API declaration build after rebase	2026-06-10 21:05:54 -04:00
Danny Avila	6bc75d24c8	⚡️ refactor: Migrate `@librechat/api` build to `tsdown` (#13595 ) * ⚡️ refactor: Migrate @librechat/api build to tsdown Replace Rollup with tsdown (rolldown + oxc isolated-declarations) for the @librechat/api package build, mirroring the merged data-schemas migration. - Add tsdown.config.mjs (cjs output, oxc dts, externalize all bare deps, bundle first-party `~/` + relative imports) - Annotate exports for isolatedDeclarations (codefix-driven). Collapse the tokens.ts model->token maps to Record<string, Record<string, number>> and switch validation.ts's runtime `files` field from z.any() to z.unknown() so no explicit `any` is introduced - Repoint package.json main/types/exports to tsdown's .cjs/.d.cts output - Add src/telemetry.ts entry shim so the two index.ts entries don't collide in oxc's flat dts output (stable dist/telemetry.{cjs,d.cts}) - Delete rollup.config.js Build time ~36s -> ~0.5s. No runtime behavior change: 5712 unit tests pass, both entries load via require(), legacy /api consumes them unchanged. * 👷 ci: Hash packages/api/tsdown.config.mjs in build-api cache keys The build-api cache keys hashed `packages/api/server-rollup.config.js`, which never existed (api used `rollup.config.js`, now removed) — a copy-paste artifact from the data-provider key that matched no file. Replace it with the new `packages/api/tsdown.config.mjs` so edits to the build config (entry, format, externals) bust the api build cache, matching the data-schemas key.	2026-06-08 10:54:48 -04:00
Danny Avila	aeb5adff34	🪦 fix: Add Durable MCP Config Tombstones (#13534 ) * fix: add durable MCP config tombstones * fix: preserve scoped config tombstones * fix: clean up config tombstone lint * fix: handle empty model spec skill allowlist * fix: preserve inactive config tombstones	2026-06-05 15:05:40 -04:00
Dustin Healy	2bcf3e8582	🪟 fix: Apply Admin-Panel Config Overrides To YAML-Defined MCP Servers (#13173 ) Some checks are pending Docker Dev Branch Images Build / build (Dockerfile, lc-dev, node) (push) Waiting to run Details Docker Dev Branch Images Build / build (Dockerfile.multi, lc-dev-api, api-build) (push) Waiting to run Details GitNexus Index / index (push) Waiting to run Details GitNexus Index / post-index (push) Blocked by required conditions Details * 🪟 fix: Apply Admin-Panel Config Overrides To YAML-Defined MCP Servers Admin-panel saves of MCP server fields for YAML-defined servers were silently dropped by the registry. ensureConfigServers filtered out any merged config entry whose name appeared in YAML, so overrides such as iconPath, title, and description never reached getAllServerConfigs even though the override row had been written to the configs collection and the AppConfig merge layer had produced the correct merged result. The filter is removed and replaced with a content-equivalence short-circuit in ensureSingleConfigServer. YAML-defined servers whose merged config matches the YAML cache entry skip lazy-init, so unmodified YAML servers still avoid a redundant inspection round trip. The new private helper matchesYamlConfig reuses the existing content-hash function on configurable fields only. getAllServerConfigs now overlays config-tier entries onto the YAML base while preserving user-DB entries (source: 'user'), giving precedence of YAML, then Config tier, then User DB. The docstring is updated to describe the new order. Multi-tenancy is already enforced upstream of the registry by the AppConfig layer, so the registry stays tenant-agnostic and overrides remain isolated per tenant. Tests cover the new behavior: config-tier override on YAML-defined server flows through to getAllServerConfigs, YAML servers without effective overrides skip lazy-init, user-DB entries win over config-tier overlays, pure config-tier servers still lazy-init, and the merged config passed to lazy-init preserves all YAML fields when the override only adds new ones. * 🛠️ fix: Address Review Feedback For YAML Override Precedence Both Copilot and Codex flagged matchesYamlConfig as broken in production: the cached YAML config carries inspector-derived defaults (requiresOAuth defaulted to false when YAML omits it, serverInstructions rewritten from the YAML toggle to the fetched server-instructions string, and so on) that are absent from appConfig.mcpConfig. The content-hash comparison reports a mismatch for every YAML server with no admin-panel override, so ensureSingleConfigServer re-inspects all of them anyway. The optimization never fires in practice and reintroduces the source tagging it was supposed to avoid. Remove the short-circuit and the matchesYamlConfig helper. YAML-defined servers that appear in the merged config go through lazy-init like any other entry. A smarter optimization that overlays cosmetic-only override fields onto the YAML cache without re-inspection belongs in a follow-up once the boundary between configurable fields and inspector-derived fields is well defined. Update the existing ensureConfigServers tests that asserted the old filter behavior (should exclude YAML servers from config-source detection, should return empty when all servers are YAML) to assert the new behavior: YAML servers pass through ensureConfigServers and are lazy-initialized when they appear in the merged config. Make the inspector mock more realistic by spreading the raw input first and overlaying only runtime fields, so the test fixtures match production where the inspector preserves configurable fields. Drop the companion test in MCPServersRegistry.test.ts that asserted the short-circuit fires for unchanged YAML servers; the hand-crafted fixture skipped inspector defaults and was not representative. Copilot also flagged that getServerConfig short-circuits to configServers before checking the user DB, so the precedence enforced in getAllServerConfigs (user-DB beats config-tier) was bypassed in the single-server lookup. When configServers carries an entry and a userId is available, check the user DB first and prefer a source: 'user' entry so per-user servers are never shadowed by an admin-panel override. * 🛡️ fix: Harden Admin Override Overlay Against Failure Stubs Hardens the admin-panel override path against transient inspect failures and removes a defensive branch that guarded an impossible state. The getAllServerConfigs overlay now skips failed-inspection stubs so a healthy YAML or DB entry stays visible during the 5-minute retry window instead of being clobbered by a stub. When the overlay does land, the base entry's source tier is preserved, which keeps Tools/mcp.js routing its failed-inspection recovery to the correct storage location. The user-DB precedence block in getServerConfig is removed: configServers is built from appConfig.mcpConfig which only ever carries admin-tier entries, so the DB lookup defended a state that cannot occur via the current call graph. The dead yamlServerNames memoization is also gone. Adds two regression tests covering inspection-failure preservation and source-tier preservation on successful overlay, and adds a debug log when an admin override is suppressed by a user-tier entry. The makeParsedConfig test factory now honors overrides correctly. * 🧪 test: Strengthen Admin Override Coverage And Docs Adds an end-to-end regression test that chains MCPServerInspector.inspect failure through ensureConfigServers and getAllServerConfigs, asserting the healthy YAML base entry survives a transient inspect failure intact. The previous regression test hand-built a failure stub and skipped ensureSingleConfigServer, leaving the production chain itself untested. The getAllServerConfigs docstring now spells out both overlay guards (failed-stub skip and user-tier preservation) and the source-field preservation contract that downstream recovery logic depends on. The yamlLangfuseConfig test fixture is frozen so a future test cannot mutate it and contaminate sibling tests in the describe block. * 🔧 fix: Skip Lazy-Init For Unchanged YAML MCP Servers Adds an admin-configurable-field equivalence check so YAML-defined MCP servers that carry no admin override skip lazy-init in ensureConfigServers. This avoids the per-request inspect storm and keeps unmodified YAML servers out of the config-tier cache, so admin saves that touch unrelated overrides no longer evict and tear down those YAML connections. A second guard in getServerConfig prevents failed-inspection stubs in configServers from shadowing the healthy YAML base entry for the duration of the retry window. The aggregate path already had this guard via getAllServerConfigs; this brings the single-server path to parity, so Tools/mcp.js recovery routes to YAML reinspection rather than bailing on the config retry timer. Adds three regression tests covering the unmodified-YAML skip, the admin-override lazy-init trigger, and the failure-stub fallthrough. Updates two existing ensureConfigServers tests that previously documented the now-incorrect "always lazy-init YAML" behavior. * 🔓 feat: Expose baseOnly Flag On Admin Config Base Endpoint The admin getBaseConfig handler now reads req.query.baseOnly and forwards it to getAppConfig so an admin panel client can request the un-merged YAML and AppService base configuration without DB overrides applied. The flag is opt-in; existing callers see no behaviour change because the default remains the merged response. The query value is coerced through String() so Express array forms like baseOnly=true&baseOnly=true are treated as false rather than truthy by accident. A handler test pins the forwarding behaviour and the default-merged behaviour against future regressions. * 🧹 fix: Address Codex Review Findings On MCP Registry Precedence Path Four follow-ups from the Codex review of PR #13173: P1. getServerConfig now preserves the configServers candidate as a last-resort fallback when both YAML cache and user DB return nothing, so admin-defined config-only servers carrying inspectionFailed=true still surface the failure stub to callers in api/server/services/Tools/mcp.js that rely on it to return the still-unreachable message. The not-found memoization is preserved. P2a. proxy is added to ADMIN_CONFIGURABLE_FIELDS so an admin override on SSE/streamable-http proxy is no longer treated as an unchanged YAML server and correctly triggers lazy-init. P2b. isUnmodifiedYamlServer now treats absent-on-rawConfig fields as equal, so inspector-derived values on the cached YAML entry (notably requiresOAuth filled in by detectOAuth at startup) do not force unmodified YAML servers to re-init on every request. P3. getBaseConfig parses ?baseOnly strictly against the literal string true instead of String-coercing, so array shapes like baseOnly[]=true no longer pass through. Regression tests cover all four paths. * 🧹 fix: Drop Misleading Shadow Warning On Config Vs User-DB Collisions The Config-tier branch of warnOnOperatorManagedNameCollisions logged that Config MCP servers shadow DB-backed servers, but getAllServerConfigs actually preserves the user-tier entry on a Config-vs-user collision and skips the override. The warning was describing the opposite of what the code does and would mislead operational debugging. The YAML-tier call is unchanged because YAML still legitimately shadows DB-backed servers. The per-entry debug log inside the collision branch already captures the actual outcome. Test renamed and rewritten to assert the user-tier entry is preserved and no shadow warning is emitted. * 🔒 fix: Keep Tenant-Scoped configServers Candidate Out Of The Global Read-Through Cache The prior fix for surfacing inspectionFailed stubs from admin-defined config-only servers wrote the per-call configServers candidate into readThroughCache when YAML and DB both missed. The cache key is keyed by serverName plus userId, so a failed stub from one tenant could satisfy a later no-userId lookup made by another tenant before any configServers resolution ran. getServerConfig now caches only the global YAML/DB resolution (still caching undefined to memoize not-found lookups) and uses the candidate strictly as an unmemoized function-level fallback that surfaces the failure stub to the caller without leaking it across tenants. Regression test exercises a no-userId call after a tenant-scoped failure and asserts the cache returns undefined rather than the stub, and that a second tenant sees their own healthy candidate. * 🔄 fix: Mirror getAllServerConfigs Precedence Exactly In getServerConfig getServerConfig was short-circuiting with the configServers candidate on every healthy lookup, which made single-server callers diverge from the aggregate path for name collisions between config-tier overrides and user-DB entries. The aggregate path preserves the user-tier entry on such collisions, so single-server callers saw the admin override while list views saw the user server for the same name. getServerConfig now resolves the YAML/DB base first and applies the same four-step precedence used in getAllServerConfigs: 1. user-tier base wins absolutely over a config-tier candidate 2. healthy YAML/DB base wins over a failed (inspectionFailed) candidate 3. healthy candidate overlays its fields onto the base, preserving the base entry's source tag so downstream recovery routes to the correct storage location 4. with no base, the candidate is returned as-is for config-only servers readThroughCache still memoizes only the global YAML/DB lookup, so the per-call configServers candidate never enters the cache and the tenant-isolation guarantee from the previous fix is preserved. Regression tests cover the user-wins-over-config case and the YAML-overlay-with-yaml-source-preserved case. * ⚡ perf: Batch YAML Cache Read In ensureConfigServers isUnmodifiedYamlServer was calling cacheConfigsRepo.get(serverName) per entry. In the Redis aggregate-key backend, get() is implemented as getAll() then map lookup, so N concurrent per-server lookups inflate into N full-map reads and deserializations on every ensureConfigServers pass. The loop now takes a single getAll() snapshot at the top and hands it into a synchronous isUnmodifiedYamlServer helper, turning O(n) remote reads into O(1) regardless of how many MCP entries are resolved. The snapshot also gives the unchanged-YAML comparison one consistent view of YAML across all entries. Regression test spies on cacheConfigsRepo.get and asserts it is never called from ensureConfigServers, with getAll called exactly once.	2026-05-23 17:11:13 -04:00
Danny Avila	ea28dbfa89	🧹 chore: Clean Up Config Fields (#12537 ) * chore: remove unused `interface.endpointsMenu` config field * chore: address review — restore JSDoc UI-only example, add Zod strip test * chore: remove unused `interface.sidePanel` config field * chore: restrict fileStrategy/fileStrategies schema to valid storage backends * fix: use valid FileStorage value in AppService test * chore: address review — version bump, exhaustiveness guard, JSDoc, configSchema test * chore: remove debug logger.log from MessageIcon render path * fix: rewrite MessageIcon render tests to use render counting instead of logger spying * chore: bump librechat-data-provider to 0.8.407 * chore: sync example YAML version to 1.3.7	2026-04-03 12:22:58 -04:00
Danny Avila	f8405e731b	🗂️ fix: Allow Empty-Overrides Scope Creation in Admin Config (#12492 ) * fix: Allow empty-overrides scope creation when priority is provided The upsertConfigOverrides handler short-circuited when overrides was empty, returning a plain message instead of creating the config document. This broke the admin panel's "create blank scope" flow which sends `{ overrides: {}, priority: N }` — the missing `config` property in the response caused an `_id` error on the client. The early return now only triggers when both overrides are empty and no priority is provided. Per-section permission checks are scoped to cases where override sections are actually present. * test: Add tests for empty-overrides scope creation with priority * test: Address review nits for empty-overrides scope tests - Add res.statusCode/res.body assertions to capability-check test - Add 403/401 tests for empty overrides + priority path - Use mockResolvedValue(null) for consistency on bare jest.fn() - Remove narrating comment; fold intent into test name	2026-03-31 21:46:48 -04:00
Dustin Healy	2451bf54cf	🛡️ fix: Restrict System Grants to Role Principals (#12491 ) * 🛡️ fix: restrict system grants to role principals only Narrows GrantPrincipalType to PrincipalType.ROLE, rejecting GROUP and USER with 400. Removes grant cascade cleanup from group/user deletion handlers and their route wiring since only roles can hold grants. * 🛡️ fix: address review findings for grants roles-only restriction Add missing GROUP rejection test for revokeGrant (symmetric with getPrincipalGrants and assignGrant coverage), add extensibility comment to GrantPrincipalType, and document the checkRoleExists guard.	2026-03-31 19:25:14 -04:00
Danny Avila	c0ce7fee91	🚫 refactor: Remove Interface Config from Override Processing (#12473 ) Add INTERFACE_PERMISSION_FIELDS set defining the interface fields that seed role permissions at startup (prompts, agents, marketplace, etc.). These fields are now stripped from DB config overrides in the merge layer because updateInterfacePermissions() only runs at boot — DB overrides for these fields create a client/server permission mismatch. Pure UI fields (endpointsMenu, modelSelect, parameters, presets, sidePanel, customWelcome, etc.) continue to work in overrides as before. YAML startup path is completely unaffected.	2026-03-31 11:07:31 -04:00
Dustin Healy	3d1b883e9d	👨‍👨‍👦‍👦 feat: Admin Users API Endpoints (#12446 ) * feat: add admin user management endpoints Add /api/admin/users with list, search, and delete handlers gated by ACCESS_ADMIN + READ_USERS/MANAGE_USERS system grants. Handler factory in packages/api uses findUsers, countUsers, and deleteUserById from data-schemas. * fix: address convention violations in admin users handlers * fix: add pagination, self-deletion guard, and DB-level search limit - listUsers now uses parsePagination + countUsers for proper pagination matching the roles/groups pattern - findUsers extended with optional limit/offset options - deleteUser returns 403 when caller tries to delete own account - searchUsers passes limit to DB query instead of fetching all and slicing in JS - Fix import ordering per CLAUDE.md, complete logger mock - Replace fabricated date fallback with undefined * fix: deterministic sort, null-safe pagination, consistent search filter - Add sort option to findUsers; listUsers sorts by createdAt desc for deterministic pagination - Use != null guards for offset/limit to handle zero values correctly - Remove username from search filter since it is not in the projection or AdminUserSearchResult response type * fix: last-admin deletion guard and search query max-length - Prevent deleting the last admin user (look up target role, count admins, reject with 400 if count <= 1) - Cap search query at 200 characters to prevent regex DoS - Add tests for both guards * fix: include missing capability name in 403 Forbidden response * fix: cascade user deletion cleanup, search username, parallel capability checks - Cascade Config, AclEntry, and SystemGrant cleanup on user deletion (matching the pattern in roles/groups handlers) - Add username to admin search $or filter for parity with searchUsers - Parallelize READ_* capability checks in listAllGrants with Promise.all * fix: TOCTOU safety net, capability info leak, DRY/style cleanup, data-layer tests - Add post-delete admin recount with CRITICAL log if race leaves 0 admins - Revert capability name from 403 response to server-side log only - Document thin deleteUserById limitation (full cascade is a future task) - DRY: extract query.trim() to local variable in searchUsersHandler - Add username to search projection, response type, and AdminUserSearchResult - Functional filter/map in grants.ts parallel capability check - Consistent null guards and limit>0 guard in findUsers options - Fallback for empty result.message on delete response - Fix mockUser() to generate unique _id per call - Break long destructuring across multiple lines - Assert countUsers filter and non-admin skip in delete tests - Add data-layer tests for findUsers limit, offset, sort, and pagination * chore: comment out admin delete user endpoint (out of scope) * fix: cast USER principalId to ObjectId for ACL entry cleanup ACL entries store USER principalId as ObjectId (via grantPermission casting), but deleteAclEntries is a raw deleteMany that passes the filter through. Passing a string won't match stored ObjectIds, leaving orphaned entries. * chore: comment out unused requireManageUsers alongside disabled delete route * fix: add missing logger.warn mock in capabilities test * fix: harden admin users handlers — type safety, response consistency, test coverage - Unify response shape: AdminUserSearchResult.userId → id, add AdminUserListItem type - Fix unsafe req.query type assertion in searchUsersHandler (typeof guards) - Anchor search regex with ^ for prefix matching (enables index usage) - Add total/capped to search response for truncation signaling - Add parseInt radix, remove redundant new Date() wrap - Add tests: countUsers throw, countUsers call args, array query param, capped flag * fix: scope deleteGrantsForPrincipal to tenant, deterministic search sort, align test mocks - Add tenantId option to AdminUsersDeps.deleteGrantsForPrincipal and pass req.user.tenantId at the call site, matching the pattern already used by the roles and groups handlers - Add sort: { name: 1 } to searchUsersHandler for deterministic results - Align test mock deleteUserById messages with production output ('User was deleted successfully.') - Make capped-results test explicitly set limit: '20' instead of relying on the implicit default * test: add tenantId propagation test for deleteGrantsForPrincipal Add tenantId to createReqRes user type and test that a non-undefined tenantId is threaded through to deleteGrantsForPrincipal. * test: remove redundant deleteUserById override in tenantId test --------- Co-authored-by: Danny Avila <danny@librechat.ai>	2026-03-30 23:06:50 -04:00
Dustin Healy	a4a17ac771	⛩️ feat: Admin Grants API Endpoints (#12438 ) * feat: add System Grants handler factory with tests Handler factory with 4 endpoints: getEffectiveCapabilities (expanded capability set for authenticated user), getPrincipalGrants (list grants for a specific principal), assignGrant, and revokeGrant. Write ops dynamically check MANAGE_ROLES/GROUPS/USERS based on target principal type. 31 unit tests covering happy paths, validation, 403, and errors. * feat: wire System Grants REST routes Mount /api/admin/grants with requireJwtAuth + ACCESS_ADMIN gate. Add barrel export for createAdminGrantsHandlers and AdminGrantsDeps. * fix: cascade grant cleanup on role deletion Add deleteGrantsForPrincipal to AdminRolesDeps and call it in deleteRoleHandler via Promise.allSettled after successful deletion, matching the groups cleanup pattern. 3 tests added for cleanup call, skip on 404, and resilience to cleanup failure. * fix: simplify cascade grant cleanup on role deletion Replace Promise.allSettled wrapper with a direct try/catch for the single deleteGrantsForPrincipal call. * fix: harden grant handlers with auth, validation, types, and RESTful revoke - Add per-handler auth checks (401) and granular capability gates (READ_* for getPrincipalGrants, possession check for assignGrant) - Extract validatePrincipal helper; rewrite validateGrantBody to use direct type checks instead of unsafe `as string` casts - Align DI types with data layer (ResolvedPrincipal.principalType widened to string, getUserPrincipals role made optional) - Switch revoke route from DELETE body to RESTful URL params - Return 201 for assignGrant to match roles/groups create convention - Handle null grantCapability return with 500 - Add comprehensive test coverage for new auth/validation paths * fix: deduplicate ResolvedPrincipal, typed body, defensive auth checks - Remove duplicate ResolvedPrincipal from capabilities.ts; import the canonical export from grants.ts - Replace Record<string, unknown> with explicit GrantRequestBody interface - Add defensive 403 when READ_CAPABILITY_BY_TYPE lookup misses - Document revoke asymmetry (no possession check) with JSDoc - Use _id only in resolveUser (avoid Mongoose virtual reliance) - Improve null-grant error message - Complete logger mock in tests * refactor: move ResolvedPrincipal to shared types to fix circular dep Extract ResolvedPrincipal from admin/grants.ts to types/principal.ts so middleware/capabilities.ts imports from shared types rather than depending upward on the admin handler layer. * chore: remove dead re-export, align logger mocks across admin tests - Remove unused ResolvedPrincipal re-export from grants.ts (canonical source is types/principal.ts) - Align logger mocks in roles.spec.ts and groups.spec.ts to include all log levels (error, warn, info, debug) matching grants.spec.ts * fix: cascade Config and AclEntry cleanup on role deletion Add deleteConfig and deleteAclEntries to role deletion cascade, matching the group deletion pattern. Previously only grants were cleaned up, leaving orphaned config overrides and ACL entries. * perf: single-query batch for getEffectiveCapabilities Add getCapabilitiesForPrincipals (plural) to the data layer — a single $or query across all principals instead of N+1 parallel queries. Wire it into the grants handler so getEffectiveCapabilities hits the DB once regardless of how many principals the user has. * fix: defer SystemCapabilities access to factory call time Move all SystemCapabilities usage (VALID_CAPABILITIES, MANAGE_CAPABILITY_BY_TYPE, READ_CAPABILITY_BY_TYPE) inside the createAdminGrantsHandlers factory. External test suites that mock @librechat/data-schemas without providing SystemCapabilities crashed at import time when grants.ts was loaded transitively. * test: add data-layer and handler test coverage for review findings - Add 6 mongodb-memory-server tests for getCapabilitiesForPrincipals: multi-principal batch, empty array, filtering, tenant scoping - Add handler test: all principals filtered (only PUBLIC) - Add handler test: granting an implied capability succeeds - Add handler test: all cascade cleanup operations fail simultaneously - Document platform-scope-only tenantId behavior in JSDoc * fix: resolveUser fallback to user.id, early-return empty principals - Match capabilities middleware pattern: _id?.toString() ?? user.id to handle JWT-deserialized users without Mongoose _id - Move empty-array guard before principals.map() to skip unnecessary normalizePrincipalId calls - Add comment explaining VALID_PRINCIPAL_TYPES module-scope asymmetry * refactor: derive VALID_PRINCIPAL_TYPES from capability maps Make MANAGE_CAPABILITY_BY_TYPE and READ_CAPABILITY_BY_TYPE non-Partial Records over a shared GrantPrincipalType union, then derive VALID_PRINCIPAL_TYPES from the map keys. This makes divergence between the three data structures structurally impossible. * feat: add GET /api/admin/grants list-all-grants endpoint Add listAllGrants data-layer method and handler so the admin panel can fetch all grants in a single request instead of fanning out N+M calls per role and group. Response is filtered to only include grants for principal types the caller has read access to. * fix: update principalType to use GrantPrincipalType for consistency in grants handling - Refactor principalType in createAdminGrantsHandlers to use GrantPrincipalType instead of PrincipalType for better type accuracy. - Ensure type consistency across the grants handling logic in the API. * fix: address admin grants review findings — tenantId propagation, capability validation, pagination, and test coverage Propagate tenantId through all grant operations for multi-tenancy support. Extract isValidCapability to accept full SystemCapability union (base, section, assign) and reuse it in both Mongoose schema validation and handler input checks. Replace listAllGrants with paginated listGrants + countGrants. Filter PUBLIC principals from getCapabilitiesForPrincipals queries. Export getCachedPrincipals from ALS store for fast-path principal resolution. Move DELETE capability param to query string to avoid colon-in-URL issues. Remove dead code and add comprehensive handler and data-layer test coverage. * refactor: harden admin grants — FilterQuery types, auth-first ordering, DELETE path param, isValidCapability tests Replace Record<string, unknown> with FilterQuery<ISystemGrant> across all data-layer query filters. Refactor buildTenantFilter to a pure tenantCondition function that returns a composable FilterQuery fragment, eliminating the $or collision between tenant and principal queries. Move auth check before input validation in getPrincipalGrantsHandler, assignGrantHandler, and revokeGrantHandler to avoid leaking valid type names to unauthenticated callers. Switch DELETE route from query param back to path param (/:capability) with encodeURIComponent per project conventions. Add compound index for listGrants sort. Type VALID_PRINCIPAL_TYPES as Set<GrantPrincipalType>. Remove unused GetCachedPrincipalsFn type export. Add dedicated isValidCapability unit tests and revokeGrant idempotency test. * refactor: batch capability checks in listGrantsHandler via getHeldCapabilities Replace 3 parallel hasCapabilityForPrincipals DB calls with a single getHeldCapabilities query that returns the subset of capabilities any principal holds. Also: defensive limit(0) clamp, parallelized assignGrant auth checks, principalId type-vs-required error split, tenantCondition hoisted to factory top, JSDoc on cascade deps, DELETE route encoding note. * fix: normalize principalId and filter undefined in getHeldCapabilities Add normalizePrincipalId + null guard to getHeldCapabilities, matching the contract of getCapabilitiesForPrincipals. Simplify allCaps build with flatMap, add no-tenantId cross-check and undefined-principalId test cases. * refactor: use concrete types in GrantRequestBody, rename encoding test Replace unknown fields with explicit string types in GrantRequestBody, matching the established pattern in roles/groups/config handlers. Rename misleading 'encoded' test to 'with colons' since Express auto-decodes req.params. * fix: support hierarchical parent capabilities in possession checks hasCapabilityForPrincipals and getHeldCapabilities now resolve parent base capabilities for section/assignment grants. An admin holding manage:configs can now grant manage:configs:<section> and transitively read:configs:<section>. Fixes anti-escalation 403 blocking config capability delegation. * perf: use getHeldCapabilities in assignGrant to halve DB round-trips assignGrantHandler was making two parallel hasCapabilityForPrincipals calls to check manage + capability possession. getHeldCapabilities was introduced in this PR specifically for this pattern. Replace with a single batched call. Update corresponding spec assertions. * fix: validate role existence before granting capabilities Grants for non-existent role names were silently persisted, creating orphaned grants that could surprise-activate if a role with that name was later created. Add optional checkRoleExists dep to assignGrant and wire it to getRoleByName in the route file. * refactor: tighten principalType typing and use grantCapability in tests Narrow getCapabilitiesForPrincipals parameter from string to PrincipalType, removing the redundant cast. Replace direct SystemGrant.create() calls in getCapabilitiesForPrincipals tests with methods.grantCapability() to honor the schema's normalization invariant. Add getHeldCapabilities extended capability tests. * test: rename misleading cascade cleanup test name The test only injects failure into deleteGrantsForPrincipal, not all cascade operations. Rename from 'cascade cleanup fails' to 'grant cleanup fails' to match the actual scope. * fix: reorder role check after permission guard, add tenantId to index Move checkRoleExists after the getHeldCapabilities permission check so that a sub-MANAGE_ROLES admin cannot probe role name existence via 400 vs 403 response codes. Add tenantId to the { principalType, capability } index so listGrants queries in multi-tenant deployments can use a covering index instead of post-scanning for tenant condition. Add missing test for checkRoleExists throwing. * fix: scope deleteGrantsForPrincipal to tenant on role deletion deleteGrantsForPrincipal previously filtered only on principalType + principalId, deleting grants across all tenants. Since the role schema supports multi-tenancy (compound unique index on name + tenantId), two tenants can share a role name like 'editor'. Deleting that role in one tenant would wipe grants for identically-named roles in other tenants. Add optional tenantId parameter to deleteGrantsForPrincipal. When provided, scopes the delete to that tenant plus platform-level grants. Propagate req.user.tenantId through the role deletion cascade. * fix: scope grant cleanup to tenant on group deletion Same cross-tenant gap as the role deletion path: deleteGroupHandler called deleteGrantsForPrincipal without tenantId, so deleting a group would wipe its grants across all tenants. Extract req.user.tenantId and pass it through. * test: add HTTP integration test for admin grants routes Supertest-based test with real MongoMemoryServer exercising the full Express wiring: route registration, injected auth middleware, handler DI deps, and real DB round-trips. Covers GET /, GET /effective, POST / + DELETE / lifecycle, role existence validation, and 401 for unauthenticated callers. Also documents the expandImplications scope: the /effective endpoint returns base-level capabilities only; section-level resolution is handled at authorization check time by getParentCapabilities. * fix: use exact tenant match in deleteGrantsForPrincipal, normalize principalId, harden API CRITICAL: deleteGrantsForPrincipal was using tenantCondition (a read-query helper) for deleteMany, which includes the { tenantId: { $exists: false } } arm. This silently destroyed platform-level grants when a tenant-scoped role/group deletion occurred. Replace with exact { tenantId } match for deletes so platform-level grants survive tenant-scoped cascade cleanup. Refactor deleteGrantsForPrincipal signature from fragile positional overload (sessionOrTenantId union + maybeSession) to a clean options object: { tenantId?, session? }. Update all callers and test assertions. Add normalizePrincipalId to hasCapabilityForPrincipals to match the pattern already used by getHeldCapabilities — prevents string/ObjectId type mismatch on USER/GROUP principal queries. Also: export GrantPrincipalType from barrel, add upper-bound cap to listGrants, document GROUP/USER existence check trade-off, add integration tests for tenant-isolation property of deleteGrantsForPrincipal. * fix: forward tenantId to getUserPrincipals in resolvePrincipals resolvePrincipals had tenantId available from the caller but only forwarded it to getCachedPrincipals (cache lookup). The DB fallback via getUserPrincipals omitted it. While the Group schema's applyTenantIsolation Mongoose plugin handles scoping via AsyncLocalStorage in HTTP request context, explicitly passing tenantId makes the contract visible and prevents silent cross-tenant group resolution if called outside request context. * fix: remove unused import and add assertion to 401 integration test Remove unused SystemCapabilities import flagged by ESLint. Add explicit body assertion to the 401 test so it has a jest expect() call. * chore: hoist grant limit constants to scope, remove dead isolateModules Move GRANTS_DEFAULT_LIMIT / GRANTS_MAX_LIMIT from inside listGrants function body to createSystemGrantMethods scope so they are evaluated once at module load. Remove dead jest.isolateModules + jest.doMock block in integration test — the ~/models mock was never exercised since handlers are built with explicit DI deps. --------- Co-authored-by: Danny Avila <danny@librechat.ai>	2026-03-30 16:49:23 -04:00
Danny Avila	fda1bfc3cc	🔬 ci: Add TypeScript Type Checks to Backend Workflow and Fix All Type Errors (#12451 ) * fix(data-schemas): resolve TypeScript strict type check errors in source files - Constrain ConfigSection to string keys via `string & keyof TCustomConfig` - Replace broken `z` import from data-provider with TCustomConfig derivation - Add `_id: Types.ObjectId` to IUser matching other Document interfaces - Add `federatedTokens` and `openidTokens` optional fields to IUser - Type mongoose model accessors as `Model<IRole>` and `Model<IUser>` - Widen `getPremiumRate` param to accept `number \| null` - Widen `bulkWriteAclEntries` ops to untyped `AnyBulkWriteOperation[]` - Fix `getUserPrincipals` return type to use `PrincipalType` enum - Add non-null assertions for `connection.db` in migration files - Import DailyRotateFile constructor directly instead of relying on broken module augmentation across mismatched node_modules trees - Add winston-daily-rotate-file as devDependency for type resolution * fix(data-schemas): resolve TypeScript type errors in test files - Replace arbitrary test keys with valid TCustomConfig properties in config.spec - Use non-null assertions for permission objects in role.methods.spec - Replace `.SHARED_GLOBAL` access with `.not.toHaveProperty()` for legacy field - Add non-null assertions for balance, writeRate, readRate in spendTokens.spec - Update mock user _id to use ObjectId in user.test - Remove unused Schema import in tenantIndexes.spec * fix(api): resolve TypeScript strict type check errors across source and test files - Widen getUserPrincipals dep type in capabilities middleware - Fix federatedTokens type in createSafeUser return - Use proper mock req type for read-only properties in preAuthTenant.spec - Replace `as IUser` casts with ObjectId-typed mocks in openid/oidc specs - Use TokenExchangeMethodEnum values instead of string literals in MCP specs - Fix SessionStore type compatibility in sessionCache specs - Replace `catch (error: any)` with `(error as Error)` in redis specs - Remove invalid properties from test data in initialize and MCP specs - Add String.prototype.isWellFormed declaration for sanitizeTitle spec * fix(client): resolve TypeScript type errors in shared client components - Add default values for destructured bindings in OGDialogTemplate - Replace broken ExtendedFile import with inline type in FileIcon * ci: add TypeScript type-check job to backend review workflow Add a `typecheck` job that runs `tsc --noEmit` on all four TypeScript workspaces (data-provider, data-schemas, @librechat/api, @librechat/client) after the build step. Catches type errors that rollup builds may miss. * fix(data-schemas): add local type declaration for DailyRotateFile transport The `winston-daily-rotate-file` package ships a module augmentation for `winston/lib/winston/transports`, but it fails when winston and winston-daily-rotate-file resolve from different node_modules trees (which happens in this monorepo due to npm hoisting). Add a local `.d.ts` declaration that augments the same module path from within data-schemas' compilation unit, so `tsc --noEmit` passes while keeping the original runtime pattern (`new winston.transports.DailyRotateFile`). * fix: address code review findings from PR #12451 - Restore typed `AnyBulkWriteOperation<AclEntry>[]` on bulkWriteAclEntries, cast to untyped only at the tenantSafeBulkWrite call site (Finding 1) - Type `findUser` model accessor consistently with `findUsers` (Finding 2) - Replace inline `import('mongoose').ClientSession` with top-level import type - Use `toHaveLength` for spy assertions in playwright-expect spec file - Replace numbered Record casts with `.not.toHaveProperty()` in role.methods.spec for SHARED_GLOBAL assertions - Use per-test ObjectIds instead of shared testUserId in openid.spec - Replace inline `import()` type annotations with top-level SessionData import in sessionCache spec - Remove extraneous blank line in user.ts searchUsers * refactor: address remaining review findings (4–7) - Extract OIDCTokens interface in user.ts; deduplicate across IUser fields and oidc.ts FederatedTokens (Finding 4) - Move String.isWellFormed declaration from spec file to project-level src/types/es2024-string.d.ts (Finding 5) - Replace verbose `= undefined` defaults in OGDialogTemplate with null coalescing pattern (Finding 6) - Replace `Record<string, unknown>` TestConfig with named interface containing explicit test fields (Finding 7)	2026-03-28 21:06:39 -04:00
Dustin Healy	5972a21479	🪪 feat: Admin Roles API Endpoints (#12400 ) * feat: add createRole and deleteRole methods to role * feat: add admin roles handler factory and Express routes * fix: address convention violations in admin roles handlers * fix: rename createRole/deleteRole to avoid AccessRole name collision The existing accessRole.ts already exports createRole/deleteRole for the AccessRole model. In createMethods index.ts, these are spread after roleMethods, overwriting them. Renamed our Role methods to createRoleByName/deleteRoleByName to match the existing pattern (getRoleByName, updateRoleByName) and avoid the collision. * feat: add description field to Role model - Add description to IRole, CreateRoleRequest, UpdateRoleRequest types - Add description field to Mongoose roleSchema (default: '') - Wire description through createRoleHandler and updateRoleHandler - Include description in listRoles select clause so it appears in list * fix: address Copilot review findings in admin roles handlers * test: add unit tests for admin roles and groups handlers * test: add data-layer tests for createRoleByName, deleteRoleByName, listUsersByRole * fix: allow system role updates when name is unchanged The updateRoleHandler guard rejected any request where body.name matched a system role, even when the name was not being changed. This blocked editing a system role's description. Compare against the URL param to only reject actual renames to reserved names. * fix: address external review findings for admin roles - Block renaming system roles (ADMIN/USER) and add user migration on rename - Add input validation: name max-length, trim on update, duplicate name check - Replace fragile String.includes error matching with prefix-based classification - Catch MongoDB 11000 duplicate key in createRoleByName - Add pagination (limit/offset/total) to getRoleMembersHandler - Reverse delete order in deleteRoleByName — reassign users before deletion - Add role existence check in removeRoleMember; drop unused createdAt select - Add Array.isArray guard for permissions input; use consistent ?? coalescing - Fix import ordering per AGENTS.md conventions - Type-cast mongoose.models.User as Model<IUser> for proper TS inference - Add comprehensive tests: rename guards, pagination, validation, 500 paths * fix: address re-review findings for admin roles - Gate deleteRoleByName on existence check — skip user reassignment and cache invalidation when role doesn't exist (fixes test mismatch) - Reverse rename order: migrate users before renaming role so a migration failure leaves the system in a consistent state - Add .sort({ _id: 1 }) to listUsersByRole for deterministic pagination - Import shared AdminMember type from data-schemas instead of local copy; make joinedAt optional since neither groups nor roles populate it - Change IRole.description from optional to required to match schema default - Add data-layer tests for updateUsersByRole and countUsersByRole - Add handler test verifying users-first rename ordering and migration failure safety * fix: add rollback on rename failure and update PR description - Roll back user migration if updateRoleByName returns null during a rename (race: role deleted between existence check and update) - Add test verifying rollback calls updateUsersByRole in reverse - Update PR #12400 description to reflect current test counts (56 handler tests, 40 data-layer tests) and safety features * fix: rollback on rename throw, description validation, delete/DRY cleanup - Hoist isRename/trimmedName above try block so catch can roll back user migration when updateRoleByName throws (not just returns null) - Add description type + max-length (2000) validation in create and update, consistent with groups handler - Remove redundant getRoleByName existence check in deleteRoleHandler — use deleteRoleByName return value directly - Skip no-op name write when body.name equals current name (use isRename) - Extract getUserModel() accessor to DRY repeated Model<IUser> casts - Use name.trim() consistently in createRoleByName error messages - Add tests: rename-throw rollback, description validation (create+update), update delete test mocks to match simplified handler * fix: guard spurious rollback, harden createRole error path, validate before DB calls - Add migrationRan flag to prevent rollback of user migration that never ran - Return generic message on 500 in createRoleHandler, specific only for 409 - Move description validation before DB queries in updateRoleHandler - Return existing role early when update body has no changes - Wrap cache.set in createRoleByName with try/catch to prevent masking DB success - Add JSDoc on 11000 catch explaining compound unique index - Add tests: spurious rollback guard, empty update body, description validation ordering, listUsersByRole pagination * fix: validate permissions in create, RoleConflictError, rollback safety, cache consistency - Add permissions type/array validation in createRoleHandler - Introduce RoleConflictError class replacing fragile string-prefix matching - Wrap rollback in !role null path with try/catch for correct 404 response - Wrap deleteRoleByName cache.set in try/catch matching createRoleByName - Narrow updateRoleHandler body type to { name?, description? } - Add tests: non-string description in create, rollback failure logging, permissions array rejection, description max-length assertion fix * feat: prevent removing the last admin user Add guard in removeRoleMember that checks countUsersByRole before demoting an ADMIN user, returning 400 if they are the last one. * fix: move interleaved export below imports, add await to countUsersByRole * fix: paginate listRoles, null-guard permissions handler, fix export ordering - Add limit/offset/total pagination to listRoles matching the groups pattern - Add countRoles data-layer method - Omit permissions from listRoles select (getRole returns full document) - Null-guard re-fetched role in updateRolePermissionsHandler - Move interleaved export below all imports in methods/index.ts * fix: address review findings — race safety, validation DRY, type accuracy, test coverage - Add post-write admin count verification in removeRoleMember to prevent zero-admin race condition (TOCTOU → rollback if count hits 0) - Make IRole.description optional; backfill in initializeRoles for pre-existing roles that lack the field (.lean() bypasses defaults) - Extract parsePagination, validateNameParam, validateRoleName, and validateDescription helpers to eliminate duplicated validation - Add validateNameParam guard to all 7 handlers reading req.params.name - Catch 11000 in updateRoleByName and surface as 409 via RoleConflictError - Add idempotent skip in addRoleMember when user already has target role - Verify updateRolePermissions test asserts response body - Add data-layer tests: listRoles sort/pagination/projection, countRoles, and createRoleByName 11000 duplicate key race * fix: defensive rollback in removeRoleMember, type/style cleanup, test coverage - Wrap removeRoleMember post-write admin rollback in try/catch so a transient DB failure cannot leave the system with zero administrators - Replace double `as unknown[] as IRole[]` cast with `.lean<IRole[]>()` - Type parsePagination param explicitly; extract DEFAULT/MAX page constants - Preserve original error cause in updateRoleByName re-throw - Add test for rollback failure path in removeRoleMember (returns 400) - Add test for pre-existing roles missing description field (.lean()) * chore: bump @librechat/data-schemas to 0.0.47 * fix: stale cache on rename, extract renameRole helper, shared pagination, cleanup - Fix updateRoleByName cache bug: invalidate old key and populate new key when updates.name differs from roleName (prevents stale cache after rename) - Extract renameRole helper to eliminate mutable outer-scope state flags (isRename, trimmedName, migrationRan) in updateRoleHandler - Unify system-role protection to 403 for both rename-from and rename-to - Extract parsePagination to shared admin/pagination.ts; use in both roles.ts and groups.ts - Extract name.trim() to local const in createRoleByName (was called 5×) - Remove redundant findOne pre-check in deleteRoleByName - Replace getUserModel closure with local const declarations - Remove redundant description ?? '' in createRoleHandler (schema default) - Add doc comment on updateRolePermissionsHandler noting cache dependency - Add data-layer tests for cache rename behavior (old key null, new key set) * fix: harden role guards, add User.role index, validate names, improve tests - Add index on User.role field for efficient member queries at scale - Replace fragile SystemRoles key lookup with value-based Set check (6 sites) - Elevate rename rollback failure logging to CRITICAL (matches removeRoleMember) - Guard removeRoleMember against non-ADMIN system roles (403 for USER) - Fix parsePagination limit=0 gotcha: use parseInt + NaN check instead of \|\| - Add control character and reserved path segment validation to role names - Simplify validateRoleName: remove redundant casts and dead conditions - Add JSDoc to deleteRoleByName documenting non-atomic window - Split mixed value+type import in methods/index.ts per AGENTS.md - Add 9 new tests: permissions assertion, combined rename+desc, createRole with permissions, pagination edge cases, control char/reserved name rejection, system role removeRoleMember guard * fix: exact-case reserved name check, consistent validation, cleaner createRole - Remove .toLowerCase() from reserved name check so only exact matches (members, permissions) are rejected, not legitimate names like "Members" - Extract trimmed const in validateRoleName for consistent validation - Add control char check to validateNameParam for parity with body validation - Build createRole roleData conditionally to avoid passing description: undefined - Expand deleteRoleByName JSDoc documenting self-healing design and no-op trade-off * fix: scope rename rollback to only migrated users, prevent cross-role corruption Capture user IDs before forward migration so the rollback path only reverts users this request actually moved. Previously the rollback called updateUsersByRole(newName, currentName) which would sweep all users with the new role — including any independently assigned by a concurrent admin request — causing silent cross-role data corruption. Adds findUserIdsByRole and updateUsersRoleByIds to the data layer. Extracts rollbackMigratedUsers helper to deduplicate rollback sites. * fix: guard last admin in addRoleMember to prevent zero-admin lockout Since each user has exactly one role, addRoleMember implicitly removes the user from their current role. Without a guard, reassigning the sole admin to a non-admin role leaves zero admins and locks out admin management. Adds the same countUsersByRole check used in removeRoleMember. * fix: wire findUserIdsByRole and updateUsersRoleByIds into roles route The scoped rollback deps added in `c89b5db` were missing from the route DI wiring, causing renameRole to call undefined and return a 500. * fix: post-write admin guard in addRoleMember, compound role index, review cleanup - Add post-write admin count check + rollback to addRoleMember to match removeRoleMember's two-phase TOCTOU protection (prevents zero-admin via concurrent requests) - Replace single-field User.role index with compound { role: 1, tenantId: 1 } to align with existing multi-tenant index pattern (email, OAuth IDs) - Narrow listRoles dep return type to RoleListItem (projected fields only) - Refactor validateDescription to early-return style per AGENTS.md - Remove redundant double .lean() in updateRoleByName - Document rename snapshot race window in renameRole JSDoc - Document cache null-set behavior in deleteRoleByName - Add routing-coupling comment on RESERVED_ROLE_NAMES - Add test for addRoleMember post-write rollback * fix: review cleanup — system-role guard, type safety, JSDoc accuracy, tests - Add system-role guard to addRoleMember: block direct assignment to non-ADMIN system roles (403), symmetric with removeRoleMember - Fix RESERVED_ROLE_NAMES comment: explain semantic URL ambiguity, not a routing conflict (Express resolves single vs multi-segment correctly) - Replace _id: unknown with Types.ObjectId \| string per AGENTS.md - Narrow listRoles data-layer return type to Pick<IRole, 'name' \| 'description'> to match the actual .select() projection - Move updateRoleHandler param check inside try/catch for consistency - Include user IDs in all CRITICAL rollback failure logs for operator recovery - Clarify deleteRoleByName JSDoc: replace "self-healing" with "idempotent", document that recovery requires caller retry - Add tests: system-role guard, promote non-admin to ADMIN, findUserIdsByRole throw prevents migration * fix: include _id in listRoles return type to match RoleListItem Pick<IRole, 'name' \| 'description'> omits _id, making it incompatible with the handler dep's RoleListItem which requires _id. * fix: case-insensitive system role guard, reject null permissions, check updateUser result - System role name checks now use case-insensitive comparison via toUpperCase() — prevents creating 'admin' or 'user' which would collide with the legacy roles route that uppercases params - Reject permissions: null in createRole (typeof null === 'object' was bypassing the validation) - Check updateUser return in addRoleMember — return 404 if the user was deleted between the findUser and updateUser calls * fix: check updateUser return in removeRoleMember for concurrent delete safety --------- Co-authored-by: Danny Avila <danny@librechat.ai>	2026-03-27 15:44:47 -04:00
Dustin Healy	2e3d66cfe2	👥 feat: Admin Groups API Endpoints (#12387 ) * feat: add listGroups and deleteGroup methods to userGroup * feat: add admin groups handler factory and Express routes * fix: address convention violations in admin groups handlers * fix: address Copilot review findings in admin groups handlers - Escape regex in listGroups to prevent injection/ReDoS - Validate ObjectId format in all handlers accepting id/userId params - Replace N+1 findUser loop with batched findUsers query - Remove unused findGroupsByMemberId from dep interface - Map Mongoose ValidationError to 400 in create/update handlers - Validate name in updateGroupHandler (reject empty/whitespace) - Handle null updateGroupById result (race condition) - Tighten error message matching in add/remove member handlers * test: add unit tests for admin groups handlers * fix: address code review findings for admin groups Atomic delete/update handlers (single DB trip), pass through idOnTheSource, add removeMemberById for non-ObjectId members, deduplicate member results, fix error message exposure, add hard cap/sort to listGroups, replace GroupListFilter with Pick of GroupFilterOptions, validate memberIds as array, trim name in update, fix import order, and improve test hygiene with fresh IDs per test. * fix: cascade cleanup, pagination, and test coverage for admin groups Add deleteGrantsForPrincipal to systemGrant data layer and wire cascade cleanup (Config, AclEntry, SystemGrant) into deleteGroupHandler. Add limit/offset pagination to getGroupMembers. Guard empty PATCH bodies with 400. Remove dead type guard and unnecessary type cast. Add 11 new tests covering cascade delete, idempotent member removal, empty update, search filter, 500 error paths, and pagination. * fix: harden admin groups with cascade resilience, type safety, and fallback removal Wrap cascade cleanup in inner try/catch so partial failure logs but still returns 200 (group is already deleted). Replace Record<string, unknown> on deleteAclEntries with proper typed filter. Log warning for unmapped user ObjectIds in createGroup memberIds. Add removeMemberById fallback when removeUserFromGroup throws User not found for ObjectId-format userId. Extract VALID_GROUP_SOURCES constant. Add 3 new tests (60 total). * refactor: add countGroups, pagination, and projection type to data layer Extract buildGroupQuery helper, add countGroups method, support limit/offset/skip in listGroups, standardize session handling to .session(session ?? null), and tighten projection parameter from Record<string, unknown> to Record<string, 0 \| 1>. * fix: cascade resilience, pagination, validation, and error clarity for admin groups - Use Promise.allSettled for cascade cleanup so all steps run even if one fails; log individual rejections - Echo deleted group id in delete response - Add countGroups dep and wire limit/offset pagination for listGroups - Deduplicate memberIds before computing total in getGroupMembers - Use { memberIds: 1 } projection in getGroupMembers - Cap memberIds at 500 entries in createGroup - Reject search queries exceeding 200 characters - Clarify addGroupMember error for non-ObjectId userId - Document deleted-user fallback limitation in removeGroupMember * test: extend handler and DB-layer test coverage for admin groups Handler tests: projection assertion, dedup total, memberIds cap, search max length, non-ObjectId memberIds passthrough, cascade partial failure resilience, dedup scenarios, echo id in delete response. DB-layer tests: listGroups sort/filter/pagination, countGroups, deleteGroup, removeMemberById, deleteGrantsForPrincipal. * fix: cast group principalId to ObjectId for ACL entry cleanup deleteAclEntries is a thin deleteMany wrapper with no type casting, but grantPermission stores group principalId as ObjectId. Passing the raw string from req.params would leave orphaned ACL entries on group deletion. * refactor: remove redundant pagination clamping from DB listGroups Handler already clamps limit/offset at the API boundary. The DB method is a general-purpose building block and should not re-validate. * fix: add source and name validation, import order, and test coverage for admin groups - Validate source against VALID_GROUP_SOURCES in createGroupHandler - Cap name at 500 characters in both create and update handlers - Document total as upper bound in getGroupMembers response - Document ObjectId requirement for deleteAclEntries in cascade - Fix import ordering in test file (local value after type imports) - Add tests for updateGroup with description, email, avatar fields - Add tests for invalid source and name max-length in both handlers * fix: add field length caps, flatten nested try/catch, and fix logger level in admin groups Add max-length validation for description, email, avatar, and idOnTheSource in create/update handlers. Extract removeObjectIdMember helper to flatten nested try/catch per never-nesting convention. Downgrade unmapped-memberIds log from error to warn. Fix type import ordering and add missing await in removeMemberById for consistency.	2026-03-26 17:36:18 -04:00
Danny Avila	9f6d8c6e93	🧵 feat: ALS Context Middleware, Tenant Threading, and Config Cache Invalidation (#12407 ) * feat: add tenant context middleware for ALS-based isolation Introduces tenantContextMiddleware that propagates req.user.tenantId into AsyncLocalStorage, activating the Mongoose applyTenantIsolation plugin for all downstream DB queries within a request. - Strict mode (TENANT_ISOLATION_STRICT=true) returns 403 if no tenantId - Non-strict mode passes through for backward compatibility - No-op for unauthenticated requests - Includes 6 unit tests covering all paths * feat: register tenant middleware and wrap startup/auth in runAsSystem() - Register tenantContextMiddleware in Express app after capability middleware - Wrap server startup initialization in runAsSystem() for strict mode compat - Wrap auth strategy getAppConfig() calls in runAsSystem() since they run before user context is established (LDAP, SAML, OpenID, social login, AuthService) * feat: thread tenantId through all getAppConfig callers Pass tenantId from req.user to getAppConfig() across all callers that have request context, ensuring correct per-tenant cache key resolution. Also fixes getBaseConfig admin endpoint to scope to requesting admin's tenant instead of returning the unscoped base config. Files updated: - Controllers: UserController, PluginController - Middleware: checkDomainAllowed, balance - Routes: config - Services: loadConfigModels, loadDefaultModels, getEndpointsConfig, MCP - Audio services: TTSService, STTService, getVoices, getCustomConfigSpeech - Admin: getBaseConfig endpoint * feat: add config cache invalidation on admin mutations - Add clearOverrideCache(tenantId?) to flush per-principal override caches by enumerating Keyv store keys matching _OVERRIDE_: prefix - Add invalidateConfigCaches() helper that clears base config, override caches, tool caches, and endpoint config cache in one call - Wire invalidation into all 5 admin config mutation handlers (upsert, patch, delete field, delete overrides, toggle active) - Add strict mode warning when __default__ tenant fallback is used - Add 3 new tests for clearOverrideCache (all/scoped/base-preserving) * chore: update getUserPrincipals comment to reflect ALS-based tenant filtering The TODO(#12091) about missing tenantId filtering is resolved by the tenant context middleware + applyTenantIsolation Mongoose plugin. Group queries are now automatically scoped by tenantId via ALS. * fix: replace runAsSystem with baseOnly for pre-tenant code paths App configs are tenant-owned — runAsSystem() would bypass tenant isolation and return cross-tenant DB overrides. Instead, add baseOnly option to getAppConfig() that returns YAML-derived config only, with zero DB queries. All startup code, auth strategies, and MCP initialization now use getAppConfig({ baseOnly: true }) to get the YAML config without touching the Config collection. * fix: address PR review findings — middleware ordering, types, cache safety - Chain tenantContextMiddleware inside requireJwtAuth after passport auth instead of global app.use() where req.user is always undefined (Finding 1) - Remove global tenantContextMiddleware registration from index.js - Update BalanceMiddlewareOptions to include tenantId, remove redundant cast (Finding 4) - Add warning log when clearOverrideCache cannot enumerate keys on Redis (Finding 3) - Use startsWith instead of includes for cache key filtering (Finding 12) - Use generator loop instead of Array.from for key enumeration (Finding 3) - Selective barrel export — exclude _resetTenantMiddlewareStrictCache (Finding 5) - Move isMainThread check to module level, remove per-request check (Finding 9) - Move mid-file require to top of app.js (Finding 8) - Parallelize invalidateConfigCaches with Promise.all (Finding 10) - Remove clearOverrideCache from public app.js exports (internal only) - Strengthen getUserPrincipals comment re: ALS dependency (Finding 2) * fix: restore runAsSystem for startup DB ops, consolidate require, clarify baseOnly - Restore runAsSystem() around performStartupChecks, updateInterfacePermissions, initializeMCPs, and initializeOAuthReconnectManager — these make Mongoose queries that need system context in strict tenant mode (NEW-3) - Consolidate duplicate require('@librechat/api') in requireJwtAuth.js (NEW-1) - Document that baseOnly ignores role/userId/tenantId in JSDoc (NEW-2) * test: add requireJwtAuth tenant chaining + invalidateConfigCaches tests - requireJwtAuth: 5 tests verifying ALS tenant context is set after passport auth, isolated between concurrent requests, and not set when user has no tenantId (Finding 6) - invalidateConfigCaches: 4 tests verifying all four caches are cleared, tenantId is threaded through, partial failure is handled gracefully, and operations run in parallel via Promise.all (Finding 11) * fix: address Copilot review — passport errors, namespaced cache keys, /base scoping - Forward passport errors in requireJwtAuth before entering tenant middleware — prevents silent auth failures from reaching handlers (P1) - Account for Keyv namespace prefix in clearOverrideCache — stored keys are namespaced as "APP_CONFIG:_OVERRIDE_:..." not "_OVERRIDE_:...", so override caches were never actually matched/cleared (P2) - Remove role from getBaseConfig — /base should return tenant-scoped base config, not role-merged config that drifts per admin role (P2) - Return tenantStorage.run() for cleaner async semantics - Update mock cache in service.spec.ts to simulate Keyv namespacing * fix: address second review — cache safety, code quality, test reliability - Decouple cache invalidation from mutation response: fire-and-forget with logging so DB mutation success is not masked by cache failures - Extract clearEndpointConfigCache helper from inline IIFE - Move isMainThread check to lazy once-per-process guard (no import side effect) - Memoize process.env read in overrideCacheKey to avoid per-request env lookups and log flooding in strict mode - Remove flaky timer-based parallelism assertion, use structural check - Merge orphaned double JSDoc block on getUserPrincipals - Fix stale [getAppConfig] log prefix → [ensureBaseConfig] - Fix import order in tenant.spec.ts (package types before local values) - Replace "Finding 1" reference with self-contained description - Use real tenantStorage primitives in requireJwtAuth spec mock * fix: move JSDoc to correct function after clearEndpointConfigCache extraction * refactor: remove Redis SCAN from clearOverrideCache, rely on TTL expiry Redis SCAN causes 60s+ stalls under concurrent load (see #12410). APP_CONFIG defaults to FORCED_IN_MEMORY_CACHE_NAMESPACES, so the in-memory store.keys() path handles the standard case. When APP_CONFIG is Redis-backed, overrides expire naturally via overrideCacheTtl (60s default) — an acceptable window for admin config mutations. * fix: remove return from tenantStorage.run to satisfy void middleware signature * fix: address second review — cache safety, code quality, test reliability - Switch invalidateConfigCaches from Promise.all to Promise.allSettled so partial failures are logged individually instead of producing one undifferentiated error (Finding 3) - Gate overrideCacheKey strict-mode warning behind a once-per-process flag to prevent log flooding under load (Finding 4) - Add test for passport error forwarding in requireJwtAuth — the if (err) { return next(err) } branch now has coverage (Finding 5) - Add test for real partial failure in invalidateConfigCaches where clearAppConfigCache rejects (not just the swallowed endpoint error) * chore: reorder imports in index.js and app.js for consistency - Moved logger and runAsSystem imports to maintain a consistent import order across files. - Improved code readability by ensuring related imports are grouped together.	2026-03-26 17:35:00 -04:00
Danny Avila	4b6d68b3b5	🎛️ feat: DB-Backed Per-Principal Config System (#12354 ) * ✨ feat: Add Config schema, model, and methods for role-based DB config overrides Add the database foundation for principal-based configuration overrides (user, group, role) in data-schemas. Includes schema with tenantId and tenant isolation, CRUD methods, and barrel exports. * 🔧 fix: Add shebang and enforce LF line endings for git hooks The pre-commit hook was missing #!/bin/sh, and core.autocrlf=true was converting it to CRLF, both causing "Exec format error" on Windows. Add .gitattributes to force LF for .husky/* and .sh files. ✨ feat: Add admin config API routes with section-level capability checks Add /api/admin/config endpoints for managing per-principal config overrides (user, group, role). Handlers in @librechat/api use DI pattern with section-level hasConfigCapability checks for granular access control. Supports full overrides replacement, per-field PATCH via dot-paths, field deletion, toggle active, and listing. * 🐛 fix: Move deleteConfigField fieldPath from URL param to request body The path-to-regexp wildcard syntax (:fieldPath()) is not supported by the version used in Express. Send fieldPath in the DELETE request body instead, which also avoids URL-encoding issues with dotted paths. ✨ feat: Wire config resolution into getAppConfig with override caching Add mergeConfigOverrides utility in data-schemas for deep-merging DB config overrides into base AppConfig by priority order. Update getAppConfig to query DB for applicable configs when role/userId is provided, with short-TTL caching and a hasAnyConfigs feature flag for zero-cost when no DB configs exist. Also: add unique compound index on Config schema, pass userId from config middleware, and signal config changes from admin API handlers. * 🔄 refactor: Extract getAppConfig logic into packages/api as TS service Move override resolution, caching strategy, and signalConfigChange from api/server/services/Config/app.js into packages/api/src/app/appConfigService.ts using the DI factory pattern (createAppConfigService). The JS file becomes a thin wiring layer injecting loadBaseConfig, cache, and DB dependencies. * 🧹 chore: Rename configResolution.ts to resolution.ts * ✨ feat: Move admin types & capabilities to librechat-data-provider Move SystemCapabilities, CapabilityImplications, and utility functions (hasImpliedCapability, expandImplications) from data-schemas to data-provider so they are available to external consumers like the admin panel without a data-schemas dependency. Add API-friendly admin types: TAdminConfig, TAdminSystemGrant, TAdminAuditLogEntry, TAdminGroup, TAdminMember, TAdminUserSearchResult, TCapabilityCategory, and CAPABILITY_CATEGORIES. data-schemas re-exports these from data-provider and extends with config-schema-derived types (ConfigSection, SystemCapability union). Bump version to 0.8.500. * feat: Add JSON-serializable admin config API response types to data-schemas Add AdminConfig, AdminConfigListResponse, AdminConfigResponse, and AdminConfigDeleteResponse types so both LibreChat API handlers and the admin panel can share the same response contract. Bump version to 0.0.41. * refactor: Move admin capabilities & types from data-provider to data-schemas SystemCapabilities, CapabilityImplications, utility functions, CAPABILITY_CATEGORIES, and admin API response types should not be in data-provider as it gets compiled into the frontend bundle, exposing the capability surface. Moved everything to data-schemas (server-only). All consumers already import from @librechat/data-schemas, so no import changes needed elsewhere. Consolidated duplicate AdminConfig type (was in both config.ts and admin.ts). * chore: Bump @librechat/data-schemas to 0.0.42 * refactor: Reorganize admin capabilities into admin/ and types/admin.ts Split systemCapabilities.ts following data-schemas conventions: - Types (BaseSystemCapability, SystemCapability, AdminConfig, etc.) → src/types/admin.ts - Runtime code (SystemCapabilities, CapabilityImplications, utilities) → src/admin/capabilities.ts Revert data-provider version to 0.8.401 (no longer modified). * chore: Fix import ordering, rename appConfigService to service - Rename app/appConfigService.ts → app/service.ts (directory provides context) - Fix import order in admin/config.ts, types/admin.ts, types/config.ts - Add naming convention to AGENTS.md * feat: Add DB base config support (role/__base__) - Add BASE_CONFIG_PRINCIPAL_ID constant for reserved base config doc - getApplicableConfigs always includes __base__ in queries - getAppConfig queries DB even without role/userId when DB configs exist - Bump @librechat/data-schemas to 0.0.43 * fix: Address PR review issues for admin config - Add listAllConfigs method; listConfigs endpoint returns all active configs instead of only __base__ - Normalize principalId to string in all config methods to prevent ObjectId vs string mismatch on user/group lookups - Block __proto__ and all dunder-prefixed segments in field path validation to prevent prototype pollution - Fix configVersion off-by-one: default to 0, guard pre('save') with !isNew, use $inc on findOneAndUpdate - Remove unused getApplicableConfigs from admin handler deps * fix: Enable tree-shaking for data-schemas, bump packages - Switch data-schemas Rollup output to preserveModules so each source file becomes its own chunk; consumers (admin panel) can now import just the modules they need without pulling in winston/mongoose/etc. - Add sideEffects: false to data-schemas package.json - Bump data-schemas to 0.0.44, data-provider to 0.8.402 * feat: add capabilities subpath export to data-schemas Adds `@librechat/data-schemas/capabilities` subpath export so browser consumers can import BASE_CONFIG_PRINCIPAL_ID and capability constants without pulling in Node.js-only modules (winston, async_hooks, etc.). Bump version to 0.0.45. * fix: include dist/ in data-provider npm package Add explicit files field so npm includes dist/types/ in the published package. Without this, the root .gitignore exclusion of dist/ causes npm to omit type declarations, breaking TypeScript consumers. * chore: bump librechat-data-provider to 0.8.403 * feat: add GET /api/admin/config/base for raw AppConfig Returns the full AppConfig (YAML + DB base merged) so the admin panel can display actual config field values and structure. The startup config endpoint (/api/config) returns TStartupConfig which is a different shape meant for the frontend app. * chore: imports order * fix: address code review findings for admin config Critical: - Fix clearAppConfigCache: was deleting from wrong cache store (CONFIG_STORE instead of APP_CONFIG), now clears BASE and HAS_DB_CONFIGS keys - Eliminate race condition: patchConfigField and deleteConfigField now use atomic MongoDB $set/$unset with dot-path notation instead of read-modify-write cycles, removing the lost-update bug entirely - Add patchConfigFields and unsetConfigField atomic DB methods Major: - Reorder cache check before principal resolution in getAppConfig so getUserPrincipals DB query only fires on cache miss - Replace '' as ConfigSection with typed BROAD_CONFIG_ACCESS constant - Parallelize capability checks with Promise.all instead of sequential awaits in for loops - Use loose equality (== null) for cache miss check to handle both null and undefined returns from cache implementations - Set HAS_DB_CONFIGS_KEY to true on successful config fetch Minor: - Remove dead pre('save') hook from config schema (all writes use findOneAndUpdate which bypasses document hooks) - Consolidate duplicate type imports in resolution.ts - Remove dead deepGet/deepSet/deepUnset functions (replaced by atomic ops) - Add .sort({ priority: 1 }) to getApplicableConfigs query - Rename _impliedBy to impliedByMap * fix: self-referencing BROAD_CONFIG_ACCESS constant * fix: replace type-cast sentinel with proper null parameter Update hasConfigCapability to accept ConfigSection \| null where null means broad access check (MANAGE_CONFIGS or READ_CONFIGS only). Removes the '' as ConfigSection type lie from admin config handlers. * fix: remaining review findings + add tests - listAllConfigs accepts optional { isActive } filter so admin listing can show inactive configs (#9) - Standardize session application to .session(session ?? null) across all config DB methods (#15) - Export isValidFieldPath and getTopLevelSection for testability - Add 38 tests across 3 spec files: - config.spec.ts (api): path validation, prototype pollution rejection - resolution.spec.ts: deep merge, priority ordering, array replacement - config.spec.ts (data-schemas): full CRUD, ObjectId normalization, atomic $set/$unset, configVersion increment, toggle, __base__ query * fix: address second code review findings - Fix cross-user cache contamination: overrideCacheKey now handles userId-without-role case with its own cache key (#1) - Add broad capability check before DB lookup in getConfig to prevent config existence enumeration (#2/#3) - Move deleteConfigField fieldPath from request body to query parameter for proxy/load balancer compatibility (#5) - Derive BaseSystemCapability from SystemCapabilities const instead of manual string union (#6) - Return 201 on upsert creation, 200 on update (#11) - Remove inline narration comments per AGENTS.md (#12) - Type overrides as Partial<TCustomConfig> in DB methods and handler deps (#13) - Replace double as-unknown-as casts in resolution.ts with generic deepMerge<T> (#14) - Make override cache TTL injectable via AppConfigServiceDeps (#16) - Add exhaustive never check in principalModel switch (#17) * fix: remaining review findings — tests, rename, semantics - Rename signalConfigChange → markConfigsDirty with JSDoc documenting the stale-window tradeoff and overrideCacheTtl knob - Fix DEFAULT_OVERRIDE_CACHE_TTL naming convention - Add createAppConfigService tests (14 cases): cache behavior, feature flag, cross-user key isolation, fallback on error, markConfigsDirty - Add admin handler integration tests (13 cases): auth ordering, 201/200 on create/update, fieldPath from query param, markConfigsDirty calls, capability checks * fix: global flag corruption + empty overrides auth bypass - Remove HAS_DB_CONFIGS_KEY=false optimization: a scoped query returning no configs does not mean no configs exist globally. Setting the flag false from a per-principal query short-circuited all subsequent users. - Add broad manage capability check before section checks in upsertConfigOverrides: empty overrides {} no longer bypasses auth. * test: add regression and invariant tests for config system Regression tests: - Bug 1: User A's empty result does not short-circuit User B's overrides - Bug 2: Empty overrides {} returns 403 without MANAGE_CONFIGS Invariant tests (applied across ALL handlers): - All 5 mutation handlers call markConfigsDirty on success - All 5 mutation handlers return 401 without auth - All 5 mutation handlers return 403 without capability - All 3 read handlers return 403 without capability * fix: third review pass — all findings addressed Service (service.ts): - Restore HAS_DB_CONFIGS=false for base-only queries (no role/userId) so deployments with zero DB configs skip DB queries (#1) - Resolve cache once at factory init instead of per-invocation (#8) - Use BASE_CONFIG_PRINCIPAL_ID constant in overrideCacheKey (#10) - Add JSDoc to clearAppConfigCache documenting stale-window (#4) - Fix log message to not say "from YAML" (#14) Admin handlers (config.ts): - Use configVersion===1 for 201 vs 200, eliminating TOCTOU race (#2) - Add Array.isArray guard on overrides body (#5) - Import CapabilityUser from capabilities.ts, remove duplicate (#6) - Replace as-unknown-as cast with targeted type assertion (#7) - Add MAX_PATCH_ENTRIES=100 cap on entries array (#15) - Reorder deleteConfigField to validate principalType first (#12) - Export CapabilityUser from middleware/capabilities.ts DB methods (config.ts): - Remove isActive:true from patchConfigFields to prevent silent reactivation of disabled configs (#3) Schema (config.ts): - Change principalId from Schema.Types.Mixed to String (#11) Tests: - Add patchConfigField unsafe fieldPath rejection test (#9) - Add base-only HAS_DB_CONFIGS=false test (#1) - Update 201/200 tests to use configVersion instead of findConfig (#2) * fix: add read handler 401 invariant tests + document flag behavior - Add invariant: all 3 read handlers return 401 without auth - Document on markConfigsDirty that HAS_DB_CONFIGS stays true after all configs are deleted until clearAppConfigCache or restart * fix: remove HAS_DB_CONFIGS false optimization entirely getApplicableConfigs([]) only queries for __base__, not all configs. A deployment with role/group configs but no __base__ doc gets the flag poisoned to false by a base-only query, silently ignoring all scoped overrides. The optimization is not safe without a comprehensive Config.exists() check, which adds its own DB cost. Removed entirely. The flag is now write-once-true (set when configs are found or by markConfigsDirty) and only cleared by clearAppConfigCache/restart. * chore: reorder import statements in app.js for clarity * refactor: remove HAS_DB_CONFIGS_KEY machinery entirely The three-state flag (false/null/true) was the source of multiple bugs across review rounds. Every attempt to safely set it to false was defeated by getApplicableConfigs querying only a subset of principals. Removed: HAS_DB_CONFIGS_KEY constant, all reads/writes of the flag, markConfigsDirty (now a no-op concept), notifyChange wrapper, and all tests that seeded false manually. The per-user/role TTL cache (overrideCacheTtl, default 60s) is the sole caching mechanism. On cache miss, getApplicableConfigs queries the DB. This is one indexed query per user per TTL window — acceptable for the config override use case. * docs: rewrite admin panel remaining work with current state * perf: cache empty override results to avoid repeated DB queries When getApplicableConfigs returns no configs for a principal, cache baseConfig under their override key with TTL. Without this, every user with no per-principal overrides hits MongoDB on every request after the 60s cache window expires. * fix: add tenantId to cache keys + reject PUBLIC principal type - Include tenantId in override cache keys to prevent cross-tenant config contamination. Single-tenant deployments (tenantId undefined) use '_' as placeholder — no behavior change for them. - Reject PrincipalType.PUBLIC in admin config validation — PUBLIC has no PrincipalModel and is never resolved by getApplicableConfigs, so config docs for it would be dead data. - Config middleware passes req.user.tenantId to getAppConfig. * fix: fourth review pass findings DB methods (config.ts): - findConfigByPrincipal accepts { includeInactive } option so admin GET can retrieve inactive configs (#5) - upsertConfig catches E11000 duplicate key on concurrent upserts and retries without upsert flag (#2) - unsetConfigField no longer filters isActive:true, consistent with patchConfigFields (#11) - Typed filter objects replace Record<string, unknown> (#12) Admin handlers (config.ts): - patchConfigField: serial broad capability check before Promise.all to pre-warm ALS principal cache, preventing N parallel DB calls (#3) - isValidFieldPath rejects leading/trailing dots and consecutive dots (#7) - Duplicate fieldPaths in patch entries return 400 (#8) - DEFAULT_PRIORITY named constant replaces hardcoded 10 (#14) - Admin getConfig and patchConfigField pass includeInactive to findConfigByPrincipal (#5) - Route import uses barrel instead of direct file path (#13) Resolution (resolution.ts): - deepMerge has MAX_MERGE_DEPTH=10 guard to prevent stack overflow from crafted deeply nested configs (#4) * fix: final review cleanup - Remove ADMIN_PANEL_REMAINING.md (local dev notes with Windows paths) - Add empty-result caching regression test - Add tenantId to AdminConfigDeps.getAppConfig type - Restore exhaustive never check in principalModel switch - Standardize toggleConfigActive session handling to options pattern * fix: validate priority in patchConfigField handler Add the same non-negative number validation for priority that upsertConfigOverrides already has. Without this, invalid priority values could be stored via PATCH and corrupt merge ordering. * chore: remove planning doc from PR * fix: correct stale cache key strings in service tests * fix: clean up service tests and harden tenant sentinel - Remove no-op cache delete lines from regression tests - Change no-tenant sentinel from '_' to '__default__' to avoid collision with a real tenant ID when multi-tenancy is enabled - Remove unused CONFIG_STORE from AppConfigServiceDeps * chore: bump @librechat/data-schemas to 0.0.46 * fix: block prototype-poisoning keys in deepMerge Skip __proto__, constructor, and prototype keys during config merge to prevent prototype pollution via PUT /api/admin/config overrides.	2026-03-25 19:39:29 -04:00

15 commits