LibreChat/api/server/services/Files/process.integration.spec.js
Danny Avila 181d705579
🧹 fix: Clean Up Orphaned Agent File Stubs After Deletion (#12781)
* 🧹 fix: Prune Orphaned File References on File Deletion

Deleting a file via the Manage Files tab left its file_id in every agent's
tool_resources.*.file_ids. Stubs accumulate until the frontend dedupe keys
them as duplicates and blocks all new uploads (issue #12776).

- Add removeAgentResourceFilesFromAllAgents in packages/data-schemas: a
  single updateMany/$pullAll across every EToolResources category.
- Invoke it from processDeleteRequest after db.deleteFiles so every
  referencing agent is cleaned up, not just the one passed in req.body.
- Wrap the cleanup in try/catch so a stale agent update cannot mask a
  successful file deletion.

* 🧼 fix: Prune Orphaned File References on Agent Update

Already-affected agents would stay broken even after the delete-time fix:
the stubs sit on the agent document until something strips them. Heal them
on the next save (issue #12776).

- Add collectToolResourceFileIds + stripFileIdsFromToolResources helpers
  in @librechat/api — centralizing the tool_resources traversal used by
  the controller and the follow-up migration script.
- In updateAgentHandler, check the effective tool_resources against the
  files collection. When orphans are found, either strip them from the
  incoming tool_resources (if the update sets them) or run the bulk
  cleanup (if the update leaves tool_resources untouched).

* 🧰 chore: Add Migration to Clean Up Orphaned Agent File References

Complements the delete-time and save-time fixes by healing agents that
already accumulated orphan stubs before the upgrade (issue #12776). The
script is idempotent — re-running it on a clean database is a no-op.

- Add config/migrate-orphaned-agent-files.js following the existing
  migrate-*.js convention: --dry-run by default omitted (writes by
  default) and --batch-size= tuning knob. Streams agents via cursor.
- Register migrate:orphaned-agent-files and :dry-run npm scripts.
- Reuse collectToolResourceFileIds from @librechat/api so migration and
  runtime share the same traversal logic.

* 🩹 fix: Address Codex/Copilot Review on Orphaned Agent File Cleanup

Refines the #12776 fix series based on automated review feedback.

- Scope save-time pruning to the current agent only. When a PATCH
  carries tool_resources, strip orphans from the incoming payload and
  pay the DB round-trip only then. Removes the collection-wide
  updateMany previously triggered when tool_resources was absent
  (Codex P2 / Copilot).
- Wrap the orphan check in try/catch so a transient db.getFiles
  failure can't turn a good save into a 500 (comprehensive review #1).
- Replace Object.values(EToolResources) casts with an explicit list of
  agent-side categories in both orphans.ts and agent.ts. code_interpreter
  belongs to the Assistants API and isn't a key of AgentToolResources —
  including it was a type lie and generated dead MongoDB clauses
  (comprehensive review #3, #8).
- Export TOOL_RESOURCE_KEYS from @librechat/api and consume it in the
  migration script, dropping one duplicated definition (#4).
- Cap migration results.details at 50 sample entries so the memory
  footprint stays bounded on deployments with thousands of corrupted
  agents (Codex P3).
- Add migrate:orphaned-agent-files:batch npm script to match the
  convention set by migrate-agent-permissions / migrate-prompt-permissions
  (#7).
- Add controller-level tests covering the three orphan-pruning paths:
  strip from incoming tool_resources, leave alone when tool_resources
  is absent, swallow db.getFiles errors and still save (#6).
- Back pre-existing "should validate tool_resources in updates" test's
  file_ids with real File docs — the new pruning would otherwise strip
  them, and that test is about OCR conversion / schema filtering, not
  file existence. Register the File model in beforeAll so the fixture
  works.

* 🩹 fix: Tighten TOOL_RESOURCE_KEYS Type and Align Migration Sample Output

Two follow-ups from the second review pass.

- Type data-schemas' TOOL_RESOURCE_KEYS as ReadonlyArray<keyof
  AgentToolResources> instead of readonly string[]. Data-schemas depends
  on data-provider, so the import is clean. Catches typos and aligns
  with the matching export in @librechat/api — doesn't guarantee
  exhaustiveness, but that's a TypeScript limitation, not a workspace
  one.
- Align the migration's console output with DETAIL_SAMPLE_LIMIT: print
  every collected detail (up to 50) and, when more agents were affected
  than the sample size allowed, show a truncation notice. The old hard
  cap of 25 meant affected agents in the 26-50 range were collected
  but never shown.

*  test: Add Integration Coverage for Orphan Cleanup Paths (#12776)

Exercise the delete-time and migration paths end-to-end against a real
in-memory Mongo. Catches integration bugs the isolated unit tests on
each layer couldn't.

- api/server/services/Files/process.integration.spec.js — the primary
  repro: seed an Agent + File, call processDeleteRequest, assert the
  file_id disappears from every referencing agent's tool_resources
  while unrelated agents stay untouched. Also covers the no-op case
  and confirms a failure in the new cleanup step cannot roll back the
  file deletion itself.
- api/test/migrate-orphaned-agent-files.spec.js — drives the migration
  module: --dry-run reports without writing, apply mode prunes across
  every tool_resource category, re-running is idempotent, and
  DETAIL_SAMPLE_LIMIT caps the in-memory sample on wide corruption.
  Mocks only the connect helper (the spec owns the mongoose instance)
  so the real migration code path — cursor, $pullAll, reduce — runs.

* 🔒 fix: Run Orphan Cleanup Migration in System Tenant Context

Codex P2 catch: under TENANT_ISOLATION_STRICT=true, the migration
throws on the very first Agent.countDocuments() because the tenant
isolation plugin fail-closes on queries without tenant context — which
makes migrate:orphaned-agent-files unusable on the exact deployments
most likely to have accumulated corruption.

- Wrap the scan/prune body in runAsSystem so queries bypass the tenant
  filter (SYSTEM_TENANT_ID sentinel). The migration legitimately needs
  cross-tenant visibility — this is the same pattern seedDatabase and
  the S3 refresh job already use.
- Add a regression test that spies on Agent.countDocuments() and
  asserts the active tenantStorage context is SYSTEM_TENANT_ID during
  the call. Pins the wrap against future regressions without the
  brittleness of toggling the strict-mode env var (which caches on
  first read).

Note: the delete-time and save-time paths already run inside an
authenticated HTTP request where tenantStorage.run is set by auth
middleware, so the cleanup naturally scopes to the current tenant —
which is the correct behavior there since file ownership is
tenant-scoped.

* 🧹 chore: Drop Unused path Import From Process Integration Spec

Leftover from an earlier iteration that resolved the migration path
via path.resolve before I switched to a relative require. The import
does nothing now — removing it.
2026-04-22 11:35:48 -07:00

203 lines
7.2 KiB
JavaScript

/**
* Integration test for the delete-time path of issue #12776.
*
* Covers the full flow through `processDeleteRequest`:
* 1. Real Agent + File docs in an in-memory Mongo.
* 2. Invoke the delete service.
* 3. Assert both the File record is gone and every agent's
* tool_resources.*.file_ids no longer references the deleted id.
*
* Uses FileSources.text so the strategy layer (disk / S3 / OpenAI) is a
* no-op — we don't need real filesystem access to exercise the agent
* reference cleanup, which is what issue #12776 is about.
*/
const mongoose = require('mongoose');
const { MongoMemoryServer } = require('mongodb-memory-server');
const { agentSchema, fileSchema, createMethods } = require('@librechat/data-schemas');
const { FileSources } = require('librechat-data-provider');
jest.mock('@librechat/data-schemas', () => {
const actual = jest.requireActual('@librechat/data-schemas');
return {
...actual,
logger: { warn: jest.fn(), debug: jest.fn(), error: jest.fn(), info: jest.fn() },
};
});
jest.mock('@librechat/agents', () => ({
EnvVar: { CODE_API_KEY: 'CODE_API_KEY' },
}));
jest.mock('@librechat/api', () => ({
sanitizeFilename: jest.fn((n) => n),
parseText: jest.fn().mockResolvedValue({ text: '', bytes: 0 }),
processAudioFile: jest.fn(),
}));
jest.mock('~/server/controllers/assistants/v2', () => ({
addResourceFileId: jest.fn(),
deleteResourceFileId: jest.fn(),
}));
jest.mock('~/server/controllers/assistants/helpers', () => ({
getOpenAIClient: jest.fn(),
}));
jest.mock('~/server/services/Tools/credentials', () => ({
loadAuthValues: jest.fn(),
}));
jest.mock('~/server/services/Files/strategies', () => ({
getStrategyFunctions: jest.fn(() => ({ deleteFile: jest.fn().mockResolvedValue(undefined) })),
}));
jest.mock('~/server/services/Files/Audio/STTService', () => ({
STTService: { getInstance: jest.fn() },
}));
jest.mock('~/server/services/Config', () => ({
checkCapability: jest.fn().mockResolvedValue(true),
}));
jest.mock('~/cache', () => ({
getLogStores: jest.fn(() => ({ get: jest.fn(), set: jest.fn(), delete: jest.fn() })),
}));
// Replace the mocked `~/models` from the sibling process.spec.js with real,
// mongoose-backed methods. All our in-memory models share this module.
jest.mock('~/models', () => {
const mongoose = require('mongoose');
const { createMethods } = require('@librechat/data-schemas');
return createMethods(mongoose, {
removeAllPermissions: jest.fn().mockResolvedValue(undefined),
});
});
require('module-alias/register');
const { processDeleteRequest } = require('./process');
describe('processDeleteRequest — agent reference cleanup (issue #12776)', () => {
let mongoServer;
let Agent;
let File;
beforeAll(async () => {
mongoServer = await MongoMemoryServer.create();
await mongoose.connect(mongoServer.getUri());
// createMethods (via ~/models) registers the File model as a side-effect,
// but we also need the Agent model registered before any queries run.
Agent = mongoose.models.Agent || mongoose.model('Agent', agentSchema);
File = mongoose.models.File || mongoose.model('File', fileSchema);
// Touch createMethods once so the migration/setup side-effects run.
createMethods(mongoose, { removeAllPermissions: jest.fn() });
}, 30000);
afterAll(async () => {
await mongoose.disconnect();
await mongoServer.stop();
});
beforeEach(async () => {
await Agent.deleteMany({});
await File.deleteMany({});
});
const seedFile = async (file_id, userId) =>
File.create({
file_id,
user: userId,
filename: `${file_id}.txt`,
filepath: `/tmp/${file_id}`,
object: 'file',
type: 'text/plain',
bytes: 1,
source: FileSources.text,
});
const seedAgent = async (authorId, tool_resources) =>
Agent.create({
id: `agent_${Math.random().toString(36).slice(2, 10)}`,
name: 'Integration Test Agent',
provider: 'test',
model: 'test-model',
author: authorId,
tool_resources,
});
const buildReq = (fileDocs, extraBody = {}) => ({
user: { id: fileDocs[0].user.toString() },
body: { files: fileDocs, ...extraBody },
config: { fileStrategy: 'local', fileConfig: {}, endpoints: {} },
});
test('strips deleted file_ids from every agent that referenced them', async () => {
const userId = new mongoose.Types.ObjectId();
const keeperId = `file_keeper_${Math.random().toString(36).slice(2, 10)}`;
const deletedId = `file_deleted_${Math.random().toString(36).slice(2, 10)}`;
const deletedFile = await seedFile(deletedId, userId);
await seedFile(keeperId, userId);
// Two agents both reference the file that's about to be deleted, plus the
// keeper. A third, unrelated agent has a different file_id and must not be
// touched by the cleanup.
const agentA = await seedAgent(userId, {
file_search: { file_ids: [deletedId, keeperId] },
});
const agentB = await seedAgent(userId, {
execute_code: { file_ids: [deletedId] },
});
const untouchedAgent = await seedAgent(userId, {
context: { file_ids: [keeperId] },
});
await processDeleteRequest({ req: buildReq([deletedFile.toObject()]), files: [deletedFile] });
expect(await File.findOne({ file_id: deletedId })).toBeNull();
expect(await File.findOne({ file_id: keeperId })).not.toBeNull();
const updatedA = await Agent.findOne({ id: agentA.id }).lean();
const updatedB = await Agent.findOne({ id: agentB.id }).lean();
const updatedUntouched = await Agent.findOne({ id: untouchedAgent.id }).lean();
expect(updatedA.tool_resources.file_search.file_ids).toEqual([keeperId]);
expect(updatedB.tool_resources.execute_code.file_ids).toEqual([]);
expect(updatedUntouched.tool_resources.context.file_ids).toEqual([keeperId]);
});
test('is a no-op when no agent references the deleted file', async () => {
const userId = new mongoose.Types.ObjectId();
const loneId = `file_lone_${Math.random().toString(36).slice(2, 10)}`;
const loneFile = await seedFile(loneId, userId);
const unrelatedAgent = await seedAgent(userId, {
file_search: { file_ids: ['other_id'] },
});
await processDeleteRequest({ req: buildReq([loneFile.toObject()]), files: [loneFile] });
expect(await File.findOne({ file_id: loneId })).toBeNull();
const after = await Agent.findOne({ id: unrelatedAgent.id }).lean();
expect(after.tool_resources.file_search.file_ids).toEqual(['other_id']);
});
test('still deletes the file when the agent cleanup step throws', async () => {
const userId = new mongoose.Types.ObjectId();
const targetId = `file_target_${Math.random().toString(36).slice(2, 10)}`;
const targetFile = await seedFile(targetId, userId);
const db = require('~/models');
const original = db.removeAgentResourceFilesFromAllAgents;
db.removeAgentResourceFilesFromAllAgents = jest
.fn()
.mockRejectedValue(new Error('simulated cleanup failure'));
try {
await processDeleteRequest({ req: buildReq([targetFile.toObject()]), files: [targetFile] });
expect(await File.findOne({ file_id: targetId })).toBeNull();
} finally {
db.removeAgentResourceFilesFromAllAgents = original;
}
});
});