LibreChat/api/server/controllers/agents/__tests__/client.contextMetadata.spec.js
Danny Avila 44c253d48a
🪙 fix: Correct Context Usage Gauge After Summarization (#13744)
* 🪙 fix: Persist Context Snapshot + Summary Marker After Summarization

The post-summarization context is correctly compacted by the SDK, but the
breakdown wasn't reliably reaching the client, leaving the gauge on the
whole-history estimate (stuck at 100% forever once a conversation compacts).

Two server changes in buildResponseMetadata:
- Snapshot guard: persist the breakdown when a PRIMARY usage event follows the
  latest snapshot (tracked via contextUsageSink.latestUsageIndex, recorded in
  the on_context_usage handler) instead of a brittle snapshot-vs-primary count.
  A summarization detour adds an extra snapshot whose only following usage is
  tagged 'summarization', which the count guard could miscount and drop.
- Summary marker: whenever a turn compacts (summaryTokens > 0), persist a
  lightweight metadata.summaryUsedTokens (the pre-invoke compacted context size)
  UNCONDITIONALLY — so even when the full snapshot can't be saved (interrupted
  final call) or never reaches the client, the per-message estimate has a signal
  to cap the discarded history.

Tests: client.contextMetadata.spec (guard + marker, incl. marker-survives-drop)
and a real-pipeline summarization integration test.

* 🪙 fix: Cap the Context Estimate at the Summary Marker

When the gauge falls back to the per-message estimate (no usable snapshot on the
branch), sumBranch summed the ENTIRE branch history — after a summarization that
discarded most of it, this over-counts and pins the gauge at 100% in perpetuity.

sumBranch now stops at the deepest summarized response (metadata.summaryUsedTokens)
and records it as summaryBaseline; the walk counts only post-summary messages,
and useTokenUsage adds the baseline. So the estimate reflects the compacted
context (summary + recent turns), not the discarded history. USD/default
behavior unchanged when no marker is present.

Test: sumBranch caps a huge pre-summary history at the compacted baseline.

* 🪙 fix: Address Codex Review on the Summarization Marker

- Branch cost/usage is no longer truncated at the summary marker — sumBranch
  caps only the CONTEXT-window count there and keeps accumulating provider
  usage/cost to the root (cumulative spend isn't discarded by compaction).
- findBranchSnapshotAnchor stops at a summarized response with no snapshot of its
  own, so it can't recover a stale PRE-summary snapshot and show discarded
  history; the summary-baseline estimate is used instead.
- Abort path: buildAbortedResponseMetadata now persists the summaryUsedTokens
  marker (pre-invoke, no completedOutputTokens ambiguity, so safe on abort) so a
  STOPPED summarized turn isn't re-summed on reload.
- Marker baseline fallback now includes summaryTokens (a separate breakdown
  field) so it doesn't under-report the compacted size. DRY'd into a shared
  computeSummaryUsedTokens used by the completion and abort paths.
- Estimate popover surfaces the summary baseline as a row so the displayed rows
  reconcile with the header total.

Tests: sumBranch cost-not-truncated + anchor-stops-at-marker (client);
computeSummaryUsedTokens fallback + abort marker (packages/api).

* 🪙 fix: Attribute Persisted Context Usage to the Snapshot Run

Match the post-snapshot primary usage to the latest snapshot's runId before
persisting metadata.contextUsage. Parallel/direct runs interleave snapshots and
usage (A snapshot → B snapshot → A usage → B no-usage); the prior index-only
guard persisted B's snapshot with A's output. finalCallOutputTokens now filters
completedOutputTokens to the snapshot's run. Untagged events (older lib/resume)
match any run for back-compat.

* 🪙 fix: Harden Summary Marker Against Tool-Loops, Stale Anchors, and Emit Races

Codex round on the summarization marker:

- Avoid double-counting earlier tool-loop outputs in the summary marker: those
  outputs sit in BOTH the latest snapshot's pre-invoke baseline AND the response
  message's tokenCount the client estimate adds on top. computeSummaryUsedTokens
  now subtracts the run's prior primary outputs (priorRunOutputTokens) — the live
  path bounds them by the snapshot's usage index, the abort path by all primaries
  (an interrupted final call emits none). Single-call turns subtract 0.
- Stop treating pre-summary anchors as active: sumBranch no longer sets
  containsAnchor once the context is capped at a summary marker, so a stale
  pre-summary snapshot can't override the summary-baseline estimate.
- Capture latestUsageIndex BEFORE awaiting emitEvent: a yield (resumable SSE /
  Redis) during parallel runs could let this call's own usage advance the index
  past the event that proves the snapshot completed, dropping a valid breakdown.

* 🪙 fix: Subtract Summarization Output from the Summary Marker

recordCollectedUsage folds the summarization call's completion into the response
message's tokenCount, while the generated summary is also in the snapshot baseline
as summaryTokens. The client estimate (summaryBaseline + responseTokenCount) thus
counted the summary twice — inflating the gauge after compaction even on a
single-call turn whenever the full snapshot is unavailable. priorRunOutputTokens
now also counts summarization-tagged output (still excluding subagent/sequential,
which recordCollectedUsage keeps out of the reported total), so the marker
subtracts it. Updated unit + guard tests.

* 🪙 fix: Refine Marker Subtraction for Summarization RunId and Abort Boundary

Two Codex follow-ups on the marker-subtraction logic:

- Subtract summarization output regardless of runId: the summarize detour is its
  own model-end call that may carry a distinct runId, but its output still lands
  in this response's tokenCount AND the snapshot baseline (summaryTokens). It is
  now counted unconditionally (still within the response's own usageEmitSink),
  while primaries keep the parallel-run runId filter.
- Don't subtract primaries on the abort path: the job stores no snapshot/usage
  boundary, so a primary that completed AFTER the latest snapshot is NOT in the
  baseline; subtracting it would cancel real output and under-report. priorRun-
  OutputTokens gains an includePrimary flag (false for abort) — abort subtracts
  only the always-pre-snapshot summarization output.

* 🪙 fix: Run-Scope Summary Subtraction and Stop Subtracting on Abort

Two Codex follow-ups, resolved by reverting the round-4 detour:

- Run-scope the summarization subtraction: the summarize detour inherits the
  graph run id (traceConfig spreads config.metadata.run_id), so its usage shares
  the answer snapshot's runId — it is NOT a distinct run. priorRunOutputTokens now
  filters summarization by runId like primaries, so a parallel sibling run's
  summary (different runId, in the sibling's baseline) is no longer subtracted from
  this branch's marker. Drops the includePrimary flag added last round.
- Stop subtracting on the abort path: abort tokenCount is countTokens(text)
  (abortMiddleware) or absent (agents route) — it does not fold in summarization or
  earlier-call output the way recordCollectedUsage does, so the marker must keep
  the full baseline. buildAbortedResponseMetadata now subtracts nothing.
2026-06-14 18:23:30 -04:00

139 lines
5.5 KiB
JavaScript
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

const AgentClient = require('../client');
/** Minimal post-(maybe-)summary snapshot. baseUsed = maxContextTokens(1000) -
* remainingContextTokens(700) = 300, so the marker (summaryUsedTokens) is 300. */
const snapshot = (summaryTokens) => ({
runId: 'run-1',
agentId: 'agent-1',
breakdown: {
maxContextTokens: 1000,
instructionTokens: 50,
systemMessageTokens: 50,
dynamicInstructionTokens: 0,
toolSchemaTokens: 0,
summaryTokens,
toolCount: 0,
messageCount: 1,
messageTokens: 20,
availableForMessages: 900,
},
contextBudget: 1000,
remainingContextTokens: 700,
prePruneContextTokens: 300,
effectiveInstructionTokens: 50,
calibrationRatio: 1,
});
const primary = { input_tokens: 10, output_tokens: 5, total_tokens: 15 };
const summarizationUsage = { ...primary, usage_type: 'summarization' };
const primaryFor = (runId, output_tokens) => ({
input_tokens: 10,
output_tokens,
total_tokens: 10 + output_tokens,
provider: 'openAI',
runId,
});
function buildMeta({ snap, latestUsageIndex, usageEvents }) {
const self = {
collectedThoughtSignatures: null,
usageEmitSink: usageEvents,
contextUsageSink: snap
? { latest: snap, count: 1, latestUsageIndex }
: { latest: null, count: 0 },
};
return AgentClient.prototype.buildResponseMetadata.call(self);
}
describe('AgentClient.buildResponseMetadata — snapshot persistence + summary marker', () => {
it('persists the snapshot when a primary usage follows it (normal turn)', () => {
const meta = buildMeta({ snap: snapshot(0), latestUsageIndex: 0, usageEvents: [primary] });
expect(meta.contextUsage).toBeDefined();
expect(meta.summaryUsedTokens).toBeUndefined();
});
it('persists the post-summary snapshot when the only pre-primary usage is the summarization', () => {
/** A summarized turn: the summarization usage precedes the post-summary
* snapshot (index 1), then the model's primary usage follows it. The old
* count guard miscounted and dropped this; the new guard keeps it. The
* marker subtracts the summarization output (5): the generated summary is in
* the snapshot baseline (summaryTokens) AND the response tokenCount, so
* 300 5 = 295 keeps the client estimate from counting it twice. */
const meta = buildMeta({
snap: snapshot(80),
latestUsageIndex: 1,
usageEvents: [summarizationUsage, primary],
});
expect(meta.contextUsage).toBeDefined();
expect(meta.summaryUsedTokens).toBe(295);
});
it('still emits the summary marker when the final call emitted no usage', () => {
/** Interrupted summarized turn: no primary usage follows the latest snapshot,
* so the snapshot is (correctly) not persisted — but the coarse marker
* survives so the client estimate still caps the discarded history. The
* summarization output (5) is subtracted (300 5 = 295). */
const meta = buildMeta({
snap: snapshot(80),
latestUsageIndex: 1,
usageEvents: [summarizationUsage],
});
expect(meta.contextUsage).toBeUndefined();
expect(meta.summaryUsedTokens).toBe(295);
});
it('drops the snapshot and emits no marker when the final call had no usage and no summary', () => {
const meta = buildMeta({ snap: snapshot(0), latestUsageIndex: 1, usageEvents: [primary] });
expect(meta.contextUsage).toBeUndefined();
expect(meta.summaryUsedTokens).toBeUndefined();
});
it('does not persist the snapshot when only a parallel run produced post-snapshot usage', () => {
/** A snapshot (run-1) → B snapshot (run-1 is latest) but the only following
* usage belongs to a sibling run (run-2). The guard must NOT persist run-1's
* snapshot with run-2's output — it falls back to the per-message estimate. */
const meta = buildMeta({
snap: snapshot(0),
latestUsageIndex: 0,
usageEvents: [primaryFor('run-2', 99)],
});
expect(meta.contextUsage).toBeUndefined();
});
it('persists with the snapshot run output when its own primary usage follows', () => {
const meta = buildMeta({
snap: snapshot(0),
latestUsageIndex: 0,
usageEvents: [primaryFor('run-2', 99), primaryFor('run-1', 7)],
});
expect(meta.contextUsage).toBeDefined();
expect(meta.contextUsage.completedOutputTokens).toBe(7);
});
it('subtracts earlier tool-loop output from the summary marker (interrupted turn)', () => {
/** Multi-call summarized turn stopped before the final usage: the earlier
* call (output 40) is baked into baseUsed (300), so the marker is 300 40 =
* 260. No primary follows the snapshot, so the full snapshot is not persisted
* and the client uses this marker — which must not double-count the 40 that
* the response tokenCount also carries. */
const meta = buildMeta({
snap: snapshot(80),
latestUsageIndex: 1,
usageEvents: [primaryFor('run-1', 40)],
});
expect(meta.contextUsage).toBeUndefined();
expect(meta.summaryUsedTokens).toBe(260);
});
it('subtracts only this runs earlier output, not a parallel runs', () => {
const meta = buildMeta({
snap: snapshot(80),
latestUsageIndex: 2,
usageEvents: [primaryFor('run-2', 999), primaryFor('run-1', 40), primaryFor('run-1', 5)],
});
/** baseUsed 300 run-1's earlier 40 = 260; run-2's 999 is ignored. */
expect(meta.summaryUsedTokens).toBe(260);
/** run-1's own primary follows the snapshot → snapshot persisted with output 5. */
expect(meta.contextUsage.completedOutputTokens).toBe(5);
});
});