mirror of
https://github.com/danny-avila/LibreChat.git
synced 2026-07-02 12:22:22 +00:00
* 🧹 fix: Prune Orphaned File References on File Deletion Deleting a file via the Manage Files tab left its file_id in every agent's tool_resources.*.file_ids. Stubs accumulate until the frontend dedupe keys them as duplicates and blocks all new uploads (issue #12776). - Add removeAgentResourceFilesFromAllAgents in packages/data-schemas: a single updateMany/$pullAll across every EToolResources category. - Invoke it from processDeleteRequest after db.deleteFiles so every referencing agent is cleaned up, not just the one passed in req.body. - Wrap the cleanup in try/catch so a stale agent update cannot mask a successful file deletion. * 🧼 fix: Prune Orphaned File References on Agent Update Already-affected agents would stay broken even after the delete-time fix: the stubs sit on the agent document until something strips them. Heal them on the next save (issue #12776). - Add collectToolResourceFileIds + stripFileIdsFromToolResources helpers in @librechat/api — centralizing the tool_resources traversal used by the controller and the follow-up migration script. - In updateAgentHandler, check the effective tool_resources against the files collection. When orphans are found, either strip them from the incoming tool_resources (if the update sets them) or run the bulk cleanup (if the update leaves tool_resources untouched). * 🧰 chore: Add Migration to Clean Up Orphaned Agent File References Complements the delete-time and save-time fixes by healing agents that already accumulated orphan stubs before the upgrade (issue #12776). The script is idempotent — re-running it on a clean database is a no-op. - Add config/migrate-orphaned-agent-files.js following the existing migrate-*.js convention: --dry-run by default omitted (writes by default) and --batch-size= tuning knob. Streams agents via cursor. - Register migrate:orphaned-agent-files and :dry-run npm scripts. - Reuse collectToolResourceFileIds from @librechat/api so migration and runtime share the same traversal logic. * 🩹 fix: Address Codex/Copilot Review on Orphaned Agent File Cleanup Refines the #12776 fix series based on automated review feedback. - Scope save-time pruning to the current agent only. When a PATCH carries tool_resources, strip orphans from the incoming payload and pay the DB round-trip only then. Removes the collection-wide updateMany previously triggered when tool_resources was absent (Codex P2 / Copilot). - Wrap the orphan check in try/catch so a transient db.getFiles failure can't turn a good save into a 500 (comprehensive review #1). - Replace Object.values(EToolResources) casts with an explicit list of agent-side categories in both orphans.ts and agent.ts. code_interpreter belongs to the Assistants API and isn't a key of AgentToolResources — including it was a type lie and generated dead MongoDB clauses (comprehensive review #3, #8). - Export TOOL_RESOURCE_KEYS from @librechat/api and consume it in the migration script, dropping one duplicated definition (#4). - Cap migration results.details at 50 sample entries so the memory footprint stays bounded on deployments with thousands of corrupted agents (Codex P3). - Add migrate:orphaned-agent-files:batch npm script to match the convention set by migrate-agent-permissions / migrate-prompt-permissions (#7). - Add controller-level tests covering the three orphan-pruning paths: strip from incoming tool_resources, leave alone when tool_resources is absent, swallow db.getFiles errors and still save (#6). - Back pre-existing "should validate tool_resources in updates" test's file_ids with real File docs — the new pruning would otherwise strip them, and that test is about OCR conversion / schema filtering, not file existence. Register the File model in beforeAll so the fixture works. * 🩹 fix: Tighten TOOL_RESOURCE_KEYS Type and Align Migration Sample Output Two follow-ups from the second review pass. - Type data-schemas' TOOL_RESOURCE_KEYS as ReadonlyArray<keyof AgentToolResources> instead of readonly string[]. Data-schemas depends on data-provider, so the import is clean. Catches typos and aligns with the matching export in @librechat/api — doesn't guarantee exhaustiveness, but that's a TypeScript limitation, not a workspace one. - Align the migration's console output with DETAIL_SAMPLE_LIMIT: print every collected detail (up to 50) and, when more agents were affected than the sample size allowed, show a truncation notice. The old hard cap of 25 meant affected agents in the 26-50 range were collected but never shown. * ✅ test: Add Integration Coverage for Orphan Cleanup Paths (#12776) Exercise the delete-time and migration paths end-to-end against a real in-memory Mongo. Catches integration bugs the isolated unit tests on each layer couldn't. - api/server/services/Files/process.integration.spec.js — the primary repro: seed an Agent + File, call processDeleteRequest, assert the file_id disappears from every referencing agent's tool_resources while unrelated agents stay untouched. Also covers the no-op case and confirms a failure in the new cleanup step cannot roll back the file deletion itself. - api/test/migrate-orphaned-agent-files.spec.js — drives the migration module: --dry-run reports without writing, apply mode prunes across every tool_resource category, re-running is idempotent, and DETAIL_SAMPLE_LIMIT caps the in-memory sample on wide corruption. Mocks only the connect helper (the spec owns the mongoose instance) so the real migration code path — cursor, $pullAll, reduce — runs. * 🔒 fix: Run Orphan Cleanup Migration in System Tenant Context Codex P2 catch: under TENANT_ISOLATION_STRICT=true, the migration throws on the very first Agent.countDocuments() because the tenant isolation plugin fail-closes on queries without tenant context — which makes migrate:orphaned-agent-files unusable on the exact deployments most likely to have accumulated corruption. - Wrap the scan/prune body in runAsSystem so queries bypass the tenant filter (SYSTEM_TENANT_ID sentinel). The migration legitimately needs cross-tenant visibility — this is the same pattern seedDatabase and the S3 refresh job already use. - Add a regression test that spies on Agent.countDocuments() and asserts the active tenantStorage context is SYSTEM_TENANT_ID during the call. Pins the wrap against future regressions without the brittleness of toggling the strict-mode env var (which caches on first read). Note: the delete-time and save-time paths already run inside an authenticated HTTP request where tenantStorage.run is set by auth middleware, so the cleanup naturally scopes to the current tenant — which is the correct behavior there since file ownership is tenant-scoped. * 🧹 chore: Drop Unused path Import From Process Integration Spec Leftover from an earlier iteration that resolved the migration path via path.resolve before I switched to a relative require. The import does nothing now — removing it.
184 lines
6.5 KiB
JavaScript
184 lines
6.5 KiB
JavaScript
/**
|
|
* Integration test for the orphan-cleanup migration script used to heal
|
|
* agents corrupted before the delete-time and save-time fixes for issue
|
|
* #12776 shipped. Exercises the full module end-to-end:
|
|
* - dry-run reports orphans without writing
|
|
* - apply mode removes them
|
|
* - re-running on a cleaned database is a no-op (idempotent)
|
|
* - the DETAIL_SAMPLE_LIMIT truncation kicks in on wide corruption
|
|
*/
|
|
|
|
// Replace the migration's `./connect` helper — it opens its own connection
|
|
// via the mongo URI env var, but the test already owns the mongoose instance.
|
|
jest.mock('../../config/connect', () => jest.fn(async () => undefined));
|
|
|
|
jest.mock('@librechat/data-schemas', () => {
|
|
const actual = jest.requireActual('@librechat/data-schemas');
|
|
return {
|
|
...actual,
|
|
logger: { warn: jest.fn(), debug: jest.fn(), error: jest.fn(), info: jest.fn() },
|
|
};
|
|
});
|
|
|
|
const mongoose = require('mongoose');
|
|
const { MongoMemoryServer } = require('mongodb-memory-server');
|
|
const { agentSchema, fileSchema } = require('@librechat/data-schemas');
|
|
const { FileSources } = require('librechat-data-provider');
|
|
|
|
const { migrateOrphanedAgentFiles } = require('../../config/migrate-orphaned-agent-files');
|
|
|
|
describe('migrate-orphaned-agent-files (issue #12776)', () => {
|
|
let mongoServer;
|
|
let Agent;
|
|
let File;
|
|
const userId = () => new mongoose.Types.ObjectId();
|
|
|
|
beforeAll(async () => {
|
|
mongoServer = await MongoMemoryServer.create();
|
|
await mongoose.connect(mongoServer.getUri());
|
|
Agent = mongoose.models.Agent || mongoose.model('Agent', agentSchema);
|
|
File = mongoose.models.File || mongoose.model('File', fileSchema);
|
|
}, 30000);
|
|
|
|
afterAll(async () => {
|
|
await mongoose.disconnect();
|
|
await mongoServer.stop();
|
|
});
|
|
|
|
beforeEach(async () => {
|
|
await Agent.deleteMany({});
|
|
await File.deleteMany({});
|
|
});
|
|
|
|
const seedFile = (file_id) =>
|
|
File.create({
|
|
file_id,
|
|
user: userId(),
|
|
filename: `${file_id}.txt`,
|
|
filepath: `/tmp/${file_id}`,
|
|
object: 'file',
|
|
type: 'text/plain',
|
|
bytes: 1,
|
|
source: FileSources.text,
|
|
});
|
|
|
|
const seedAgent = (tool_resources) =>
|
|
Agent.create({
|
|
id: `agent_${Math.random().toString(36).slice(2, 10)}`,
|
|
name: `Test Agent ${Math.random().toString(36).slice(2, 6)}`,
|
|
provider: 'test',
|
|
model: 'test-model',
|
|
author: userId(),
|
|
tool_resources,
|
|
});
|
|
|
|
test('dry-run reports orphans without mutating any agent', async () => {
|
|
const keeperId = 'keeper';
|
|
await seedFile(keeperId);
|
|
const agent = await seedAgent({
|
|
file_search: { file_ids: [keeperId, 'orphan_1', 'orphan_2'] },
|
|
});
|
|
|
|
const result = await migrateOrphanedAgentFiles({ dryRun: true });
|
|
|
|
expect(result.dryRun).toBe(true);
|
|
expect(result.scannedAgents).toBe(1);
|
|
expect(result.agentsWithOrphans).toBe(1);
|
|
expect(result.totalOrphansRemoved).toBe(2);
|
|
// Dry-run reports what would change without writing — no updates counted.
|
|
expect(result.agentsUpdated).toBe(0);
|
|
|
|
const after = await Agent.findOne({ id: agent.id }).lean();
|
|
expect(after.tool_resources.file_search.file_ids).toEqual([keeperId, 'orphan_1', 'orphan_2']);
|
|
});
|
|
|
|
test('apply mode removes orphans across every tool_resource category', async () => {
|
|
const keeperA = 'k_a';
|
|
const keeperB = 'k_b';
|
|
await seedFile(keeperA);
|
|
await seedFile(keeperB);
|
|
|
|
const agent = await seedAgent({
|
|
file_search: { file_ids: [keeperA, 'o1'] },
|
|
execute_code: { file_ids: ['o2', keeperB] },
|
|
context: { file_ids: ['o3'] },
|
|
});
|
|
|
|
const result = await migrateOrphanedAgentFiles({ dryRun: false });
|
|
|
|
expect(result.dryRun).toBe(false);
|
|
expect(result.agentsWithOrphans).toBe(1);
|
|
expect(result.agentsUpdated).toBe(1);
|
|
expect(result.totalOrphansRemoved).toBe(3);
|
|
|
|
const after = await Agent.findOne({ id: agent.id }).lean();
|
|
expect(after.tool_resources.file_search.file_ids).toEqual([keeperA]);
|
|
expect(after.tool_resources.execute_code.file_ids).toEqual([keeperB]);
|
|
expect(after.tool_resources.context.file_ids).toEqual([]);
|
|
});
|
|
|
|
test('is idempotent — re-running on a clean database is a no-op', async () => {
|
|
await seedFile('keeper');
|
|
await seedAgent({ file_search: { file_ids: ['keeper', 'orphan'] } });
|
|
|
|
await migrateOrphanedAgentFiles({ dryRun: false });
|
|
const second = await migrateOrphanedAgentFiles({ dryRun: false });
|
|
|
|
expect(second.agentsWithOrphans).toBe(0);
|
|
expect(second.agentsUpdated).toBe(0);
|
|
expect(second.totalOrphansRemoved).toBe(0);
|
|
});
|
|
|
|
test('leaves agents without orphans completely alone', async () => {
|
|
await seedFile('only');
|
|
const agent = await seedAgent({ file_search: { file_ids: ['only'] } });
|
|
|
|
const result = await migrateOrphanedAgentFiles({ dryRun: false });
|
|
|
|
expect(result.scannedAgents).toBe(1);
|
|
expect(result.agentsWithOrphans).toBe(0);
|
|
const after = await Agent.findOne({ id: agent.id }).lean();
|
|
expect(after.tool_resources.file_search.file_ids).toEqual(['only']);
|
|
});
|
|
|
|
test('sample array is bounded on wide corruption (DETAIL_SAMPLE_LIMIT)', async () => {
|
|
// Seed more than the cap (50) so the truncation branch is exercised.
|
|
const agents = [];
|
|
for (let i = 0; i < 55; i++) {
|
|
agents.push(
|
|
await seedAgent({
|
|
file_search: { file_ids: [`orphan_${i}`] },
|
|
}),
|
|
);
|
|
}
|
|
|
|
const result = await migrateOrphanedAgentFiles({ dryRun: true });
|
|
|
|
expect(result.agentsWithOrphans).toBe(55);
|
|
expect(result.details.length).toBeLessThanOrEqual(50);
|
|
expect(result.details.length).toBeGreaterThan(0);
|
|
});
|
|
|
|
test('runs the body inside a system tenant context (strict-mode safe)', async () => {
|
|
// Pins the runAsSystem wrap: without it the migration throws under
|
|
// TENANT_ISOLATION_STRICT=true on the very first Agent.countDocuments(),
|
|
// blocking the intended remediation path for corrupted agents.
|
|
const { SYSTEM_TENANT_ID, tenantStorage } = require('@librechat/data-schemas');
|
|
await seedFile('keeper');
|
|
await seedAgent({ file_search: { file_ids: ['keeper', 'orphan'] } });
|
|
|
|
const contextsObserved = [];
|
|
const originalCountDocuments = Agent.countDocuments.bind(Agent);
|
|
Agent.countDocuments = jest.fn((...args) => {
|
|
contextsObserved.push(tenantStorage.getStore()?.tenantId);
|
|
return originalCountDocuments(...args);
|
|
});
|
|
|
|
try {
|
|
await migrateOrphanedAgentFiles({ dryRun: false });
|
|
expect(contextsObserved).toContain(SYSTEM_TENANT_ID);
|
|
} finally {
|
|
Agent.countDocuments = originalCountDocuments;
|
|
}
|
|
});
|
|
});
|