mirror of
https://github.com/danny-avila/LibreChat.git
synced 2026-05-13 16:07:30 +00:00
* 🧹 fix: Prune Orphaned File References on File Deletion Deleting a file via the Manage Files tab left its file_id in every agent's tool_resources.*.file_ids. Stubs accumulate until the frontend dedupe keys them as duplicates and blocks all new uploads (issue #12776). - Add removeAgentResourceFilesFromAllAgents in packages/data-schemas: a single updateMany/$pullAll across every EToolResources category. - Invoke it from processDeleteRequest after db.deleteFiles so every referencing agent is cleaned up, not just the one passed in req.body. - Wrap the cleanup in try/catch so a stale agent update cannot mask a successful file deletion. * 🧼 fix: Prune Orphaned File References on Agent Update Already-affected agents would stay broken even after the delete-time fix: the stubs sit on the agent document until something strips them. Heal them on the next save (issue #12776). - Add collectToolResourceFileIds + stripFileIdsFromToolResources helpers in @librechat/api — centralizing the tool_resources traversal used by the controller and the follow-up migration script. - In updateAgentHandler, check the effective tool_resources against the files collection. When orphans are found, either strip them from the incoming tool_resources (if the update sets them) or run the bulk cleanup (if the update leaves tool_resources untouched). * 🧰 chore: Add Migration to Clean Up Orphaned Agent File References Complements the delete-time and save-time fixes by healing agents that already accumulated orphan stubs before the upgrade (issue #12776). The script is idempotent — re-running it on a clean database is a no-op. - Add config/migrate-orphaned-agent-files.js following the existing migrate-*.js convention: --dry-run by default omitted (writes by default) and --batch-size= tuning knob. Streams agents via cursor. - Register migrate:orphaned-agent-files and :dry-run npm scripts. - Reuse collectToolResourceFileIds from @librechat/api so migration and runtime share the same traversal logic. * 🩹 fix: Address Codex/Copilot Review on Orphaned Agent File Cleanup Refines the #12776 fix series based on automated review feedback. - Scope save-time pruning to the current agent only. When a PATCH carries tool_resources, strip orphans from the incoming payload and pay the DB round-trip only then. Removes the collection-wide updateMany previously triggered when tool_resources was absent (Codex P2 / Copilot). - Wrap the orphan check in try/catch so a transient db.getFiles failure can't turn a good save into a 500 (comprehensive review #1). - Replace Object.values(EToolResources) casts with an explicit list of agent-side categories in both orphans.ts and agent.ts. code_interpreter belongs to the Assistants API and isn't a key of AgentToolResources — including it was a type lie and generated dead MongoDB clauses (comprehensive review #3, #8). - Export TOOL_RESOURCE_KEYS from @librechat/api and consume it in the migration script, dropping one duplicated definition (#4). - Cap migration results.details at 50 sample entries so the memory footprint stays bounded on deployments with thousands of corrupted agents (Codex P3). - Add migrate:orphaned-agent-files:batch npm script to match the convention set by migrate-agent-permissions / migrate-prompt-permissions (#7). - Add controller-level tests covering the three orphan-pruning paths: strip from incoming tool_resources, leave alone when tool_resources is absent, swallow db.getFiles errors and still save (#6). - Back pre-existing "should validate tool_resources in updates" test's file_ids with real File docs — the new pruning would otherwise strip them, and that test is about OCR conversion / schema filtering, not file existence. Register the File model in beforeAll so the fixture works. * 🩹 fix: Tighten TOOL_RESOURCE_KEYS Type and Align Migration Sample Output Two follow-ups from the second review pass. - Type data-schemas' TOOL_RESOURCE_KEYS as ReadonlyArray<keyof AgentToolResources> instead of readonly string[]. Data-schemas depends on data-provider, so the import is clean. Catches typos and aligns with the matching export in @librechat/api — doesn't guarantee exhaustiveness, but that's a TypeScript limitation, not a workspace one. - Align the migration's console output with DETAIL_SAMPLE_LIMIT: print every collected detail (up to 50) and, when more agents were affected than the sample size allowed, show a truncation notice. The old hard cap of 25 meant affected agents in the 26-50 range were collected but never shown. * ✅ test: Add Integration Coverage for Orphan Cleanup Paths (#12776) Exercise the delete-time and migration paths end-to-end against a real in-memory Mongo. Catches integration bugs the isolated unit tests on each layer couldn't. - api/server/services/Files/process.integration.spec.js — the primary repro: seed an Agent + File, call processDeleteRequest, assert the file_id disappears from every referencing agent's tool_resources while unrelated agents stay untouched. Also covers the no-op case and confirms a failure in the new cleanup step cannot roll back the file deletion itself. - api/test/migrate-orphaned-agent-files.spec.js — drives the migration module: --dry-run reports without writing, apply mode prunes across every tool_resource category, re-running is idempotent, and DETAIL_SAMPLE_LIMIT caps the in-memory sample on wide corruption. Mocks only the connect helper (the spec owns the mongoose instance) so the real migration code path — cursor, $pullAll, reduce — runs. * 🔒 fix: Run Orphan Cleanup Migration in System Tenant Context Codex P2 catch: under TENANT_ISOLATION_STRICT=true, the migration throws on the very first Agent.countDocuments() because the tenant isolation plugin fail-closes on queries without tenant context — which makes migrate:orphaned-agent-files unusable on the exact deployments most likely to have accumulated corruption. - Wrap the scan/prune body in runAsSystem so queries bypass the tenant filter (SYSTEM_TENANT_ID sentinel). The migration legitimately needs cross-tenant visibility — this is the same pattern seedDatabase and the S3 refresh job already use. - Add a regression test that spies on Agent.countDocuments() and asserts the active tenantStorage context is SYSTEM_TENANT_ID during the call. Pins the wrap against future regressions without the brittleness of toggling the strict-mode env var (which caches on first read). Note: the delete-time and save-time paths already run inside an authenticated HTTP request where tenantStorage.run is set by auth middleware, so the cleanup naturally scopes to the current tenant — which is the correct behavior there since file ownership is tenant-scoped. * 🧹 chore: Drop Unused path Import From Process Integration Spec Leftover from an earlier iteration that resolved the migration path via path.resolve before I switched to a relative require. The import does nothing now — removing it.
203 lines
7.2 KiB
JavaScript
203 lines
7.2 KiB
JavaScript
/**
|
|
* Integration test for the delete-time path of issue #12776.
|
|
*
|
|
* Covers the full flow through `processDeleteRequest`:
|
|
* 1. Real Agent + File docs in an in-memory Mongo.
|
|
* 2. Invoke the delete service.
|
|
* 3. Assert both the File record is gone and every agent's
|
|
* tool_resources.*.file_ids no longer references the deleted id.
|
|
*
|
|
* Uses FileSources.text so the strategy layer (disk / S3 / OpenAI) is a
|
|
* no-op — we don't need real filesystem access to exercise the agent
|
|
* reference cleanup, which is what issue #12776 is about.
|
|
*/
|
|
|
|
const mongoose = require('mongoose');
|
|
const { MongoMemoryServer } = require('mongodb-memory-server');
|
|
const { agentSchema, fileSchema, createMethods } = require('@librechat/data-schemas');
|
|
const { FileSources } = require('librechat-data-provider');
|
|
|
|
jest.mock('@librechat/data-schemas', () => {
|
|
const actual = jest.requireActual('@librechat/data-schemas');
|
|
return {
|
|
...actual,
|
|
logger: { warn: jest.fn(), debug: jest.fn(), error: jest.fn(), info: jest.fn() },
|
|
};
|
|
});
|
|
|
|
jest.mock('@librechat/agents', () => ({
|
|
EnvVar: { CODE_API_KEY: 'CODE_API_KEY' },
|
|
}));
|
|
|
|
jest.mock('@librechat/api', () => ({
|
|
sanitizeFilename: jest.fn((n) => n),
|
|
parseText: jest.fn().mockResolvedValue({ text: '', bytes: 0 }),
|
|
processAudioFile: jest.fn(),
|
|
}));
|
|
|
|
jest.mock('~/server/controllers/assistants/v2', () => ({
|
|
addResourceFileId: jest.fn(),
|
|
deleteResourceFileId: jest.fn(),
|
|
}));
|
|
|
|
jest.mock('~/server/controllers/assistants/helpers', () => ({
|
|
getOpenAIClient: jest.fn(),
|
|
}));
|
|
|
|
jest.mock('~/server/services/Tools/credentials', () => ({
|
|
loadAuthValues: jest.fn(),
|
|
}));
|
|
|
|
jest.mock('~/server/services/Files/strategies', () => ({
|
|
getStrategyFunctions: jest.fn(() => ({ deleteFile: jest.fn().mockResolvedValue(undefined) })),
|
|
}));
|
|
|
|
jest.mock('~/server/services/Files/Audio/STTService', () => ({
|
|
STTService: { getInstance: jest.fn() },
|
|
}));
|
|
|
|
jest.mock('~/server/services/Config', () => ({
|
|
checkCapability: jest.fn().mockResolvedValue(true),
|
|
}));
|
|
|
|
jest.mock('~/cache', () => ({
|
|
getLogStores: jest.fn(() => ({ get: jest.fn(), set: jest.fn(), delete: jest.fn() })),
|
|
}));
|
|
|
|
// Replace the mocked `~/models` from the sibling process.spec.js with real,
|
|
// mongoose-backed methods. All our in-memory models share this module.
|
|
jest.mock('~/models', () => {
|
|
const mongoose = require('mongoose');
|
|
const { createMethods } = require('@librechat/data-schemas');
|
|
return createMethods(mongoose, {
|
|
removeAllPermissions: jest.fn().mockResolvedValue(undefined),
|
|
});
|
|
});
|
|
|
|
require('module-alias/register');
|
|
const { processDeleteRequest } = require('./process');
|
|
|
|
describe('processDeleteRequest — agent reference cleanup (issue #12776)', () => {
|
|
let mongoServer;
|
|
let Agent;
|
|
let File;
|
|
|
|
beforeAll(async () => {
|
|
mongoServer = await MongoMemoryServer.create();
|
|
await mongoose.connect(mongoServer.getUri());
|
|
|
|
// createMethods (via ~/models) registers the File model as a side-effect,
|
|
// but we also need the Agent model registered before any queries run.
|
|
Agent = mongoose.models.Agent || mongoose.model('Agent', agentSchema);
|
|
File = mongoose.models.File || mongoose.model('File', fileSchema);
|
|
// Touch createMethods once so the migration/setup side-effects run.
|
|
createMethods(mongoose, { removeAllPermissions: jest.fn() });
|
|
}, 30000);
|
|
|
|
afterAll(async () => {
|
|
await mongoose.disconnect();
|
|
await mongoServer.stop();
|
|
});
|
|
|
|
beforeEach(async () => {
|
|
await Agent.deleteMany({});
|
|
await File.deleteMany({});
|
|
});
|
|
|
|
const seedFile = async (file_id, userId) =>
|
|
File.create({
|
|
file_id,
|
|
user: userId,
|
|
filename: `${file_id}.txt`,
|
|
filepath: `/tmp/${file_id}`,
|
|
object: 'file',
|
|
type: 'text/plain',
|
|
bytes: 1,
|
|
source: FileSources.text,
|
|
});
|
|
|
|
const seedAgent = async (authorId, tool_resources) =>
|
|
Agent.create({
|
|
id: `agent_${Math.random().toString(36).slice(2, 10)}`,
|
|
name: 'Integration Test Agent',
|
|
provider: 'test',
|
|
model: 'test-model',
|
|
author: authorId,
|
|
tool_resources,
|
|
});
|
|
|
|
const buildReq = (fileDocs, extraBody = {}) => ({
|
|
user: { id: fileDocs[0].user.toString() },
|
|
body: { files: fileDocs, ...extraBody },
|
|
config: { fileStrategy: 'local', fileConfig: {}, endpoints: {} },
|
|
});
|
|
|
|
test('strips deleted file_ids from every agent that referenced them', async () => {
|
|
const userId = new mongoose.Types.ObjectId();
|
|
const keeperId = `file_keeper_${Math.random().toString(36).slice(2, 10)}`;
|
|
const deletedId = `file_deleted_${Math.random().toString(36).slice(2, 10)}`;
|
|
|
|
const deletedFile = await seedFile(deletedId, userId);
|
|
await seedFile(keeperId, userId);
|
|
|
|
// Two agents both reference the file that's about to be deleted, plus the
|
|
// keeper. A third, unrelated agent has a different file_id and must not be
|
|
// touched by the cleanup.
|
|
const agentA = await seedAgent(userId, {
|
|
file_search: { file_ids: [deletedId, keeperId] },
|
|
});
|
|
const agentB = await seedAgent(userId, {
|
|
execute_code: { file_ids: [deletedId] },
|
|
});
|
|
const untouchedAgent = await seedAgent(userId, {
|
|
context: { file_ids: [keeperId] },
|
|
});
|
|
|
|
await processDeleteRequest({ req: buildReq([deletedFile.toObject()]), files: [deletedFile] });
|
|
|
|
expect(await File.findOne({ file_id: deletedId })).toBeNull();
|
|
expect(await File.findOne({ file_id: keeperId })).not.toBeNull();
|
|
|
|
const updatedA = await Agent.findOne({ id: agentA.id }).lean();
|
|
const updatedB = await Agent.findOne({ id: agentB.id }).lean();
|
|
const updatedUntouched = await Agent.findOne({ id: untouchedAgent.id }).lean();
|
|
|
|
expect(updatedA.tool_resources.file_search.file_ids).toEqual([keeperId]);
|
|
expect(updatedB.tool_resources.execute_code.file_ids).toEqual([]);
|
|
expect(updatedUntouched.tool_resources.context.file_ids).toEqual([keeperId]);
|
|
});
|
|
|
|
test('is a no-op when no agent references the deleted file', async () => {
|
|
const userId = new mongoose.Types.ObjectId();
|
|
const loneId = `file_lone_${Math.random().toString(36).slice(2, 10)}`;
|
|
const loneFile = await seedFile(loneId, userId);
|
|
const unrelatedAgent = await seedAgent(userId, {
|
|
file_search: { file_ids: ['other_id'] },
|
|
});
|
|
|
|
await processDeleteRequest({ req: buildReq([loneFile.toObject()]), files: [loneFile] });
|
|
|
|
expect(await File.findOne({ file_id: loneId })).toBeNull();
|
|
const after = await Agent.findOne({ id: unrelatedAgent.id }).lean();
|
|
expect(after.tool_resources.file_search.file_ids).toEqual(['other_id']);
|
|
});
|
|
|
|
test('still deletes the file when the agent cleanup step throws', async () => {
|
|
const userId = new mongoose.Types.ObjectId();
|
|
const targetId = `file_target_${Math.random().toString(36).slice(2, 10)}`;
|
|
const targetFile = await seedFile(targetId, userId);
|
|
|
|
const db = require('~/models');
|
|
const original = db.removeAgentResourceFilesFromAllAgents;
|
|
db.removeAgentResourceFilesFromAllAgents = jest
|
|
.fn()
|
|
.mockRejectedValue(new Error('simulated cleanup failure'));
|
|
|
|
try {
|
|
await processDeleteRequest({ req: buildReq([targetFile.toObject()]), files: [targetFile] });
|
|
expect(await File.findOne({ file_id: targetId })).toBeNull();
|
|
} finally {
|
|
db.removeAgentResourceFilesFromAllAgents = original;
|
|
}
|
|
});
|
|
});
|