LibreChat/api/server/controllers
Danny Avila d18d62e7c1
🪙 refactor: Reconcile Context Gauge to Actual Provider Tokens (#13780)
* 🪙 fix: Reconcile Context Gauge to Actual Provider Tokens

The context gauge could read several× too high (e.g. 213K when the real prompt
was 56K) and stay there across reloads. Root cause: the SDK's calibrationRatio is
`cumulativeProviderReported / cumulativeRawSent`, but a provider's server-side
web search injects large fetched content into the prompt that the SDK never sent
or counted — pinning the ratio at its cap (5) and multiplying every later message
estimate, including post-summary ones. The gauge rendered (and persisted) that
inflated estimate, never the provider's actual token count.

Fix: reconcile the snapshot to the call's ACTUAL prompt tokens (input + cache),
which already arrive in on_token_usage. Only messageTokens is calibration-scaled
(instructions/summary are raw tiktoken), so keep those and set messageTokens to
the remainder, recomputing free space. Shared `promptTokensFromUsage` +
`reconcileContextUsage` in data-provider; applied server-side in
buildPersistedContextUsage (reload-stable) and client-side in useUsageHandler on
each primary usage (corrects at turn-end, no follow-up needed). Also drop the
summary double-count from the Breakdown Messages row.

Deferred (separate agents PR): the SDK over-calibration also fires summarization
prematurely; fixing it needs decoupling real-content estimation from server-side
injection headroom without weakening pruning-overflow safety.

* 🪙 fix: Harden Token Reconciliation for Provider-less + Resume Paths

Codex review on the reconciliation:
- promptTokensFromUsage: when the provider is absent (custom/OpenAI-compatible
  payloads), fall back to the same magnitude heuristic normalizeUsageUnits uses
  (cache ≤ input ⇒ already included) so cached events aren't re-inflated.
- Resume: backfillUsage restores a primary call's usage without replaying a live
  on_token_usage (Redis mode), so the live reconcile never ran and a reconnected
  session stayed on the inflated estimate. New reconcileBackfill reconciles the
  restored snapshot from the final primary call after contextHandler installs it.

* 🪙 fix: Reconcile Resume Snapshot Server-Side, Not via Backfill

Codex: the client reconcileBackfill scanned the resumed run's collectedUsage and
applied the final primary to the latest snapshot — but on a mid-call resume that
usage belongs to an EARLIER call, corrupting the restored gauge.

Move the resume reconciliation server-side: GenerationJobManager.persistTokenUsage
reconciles the stored contextUsage to a primary usage's actual prompt tokens as it
arrives. That usage is the post-invoke truth for the call the latest stored
snapshot precedes (no snapshot is captured between a call's pre-invoke dispatch
and its usage), so it's correct by construction and run-matched. A mid-call resume
(no usage yet) keeps the raw snapshot instead of mis-applying an earlier call's
tokens; it reconciles once the call completes. Removed client reconcileBackfill;
the live-path reconcile (non-resume) stays.

* 🪙 fix: Guard Reconciliation Against Replays and Snapshot Races

Two Codex concurrency findings on the reconciliation:
- Client: reconcile only on a NEWLY folded primary usage. A replayed duplicate
  (folded=false on resume) can be an earlier tool-loop call sharing the run id,
  which would overwrite the latest snapshot with an earlier, smaller prompt. Moved
  the reconcile after the folded guard.
- Server: serialize the context-usage write through the same per-stream queue as
  the token-usage write. persistTokenUsage reconciles the stored snapshot
  (read-modify-write); an unserialized trackContextUsage could store a newer
  snapshot between the read and write — or a stale reconciled write could land
  after a newer snapshot — clobbering the newer run's gauge when calls interleave.
  FIFO keeps each call's snapshot ahead of its own usage and behind the next.

* chore: import order in GenerationJobManager.ts
2026-06-16 11:05:44 -04:00
..
__tests__ 🤫 refactor: Silent MCP OAuth Refresh on Mid-Session 401 (#13369) 2026-06-10 13:12:42 -04:00
agents 🪙 refactor: Reconcile Context Gauge to Actual Provider Tokens (#13780) 2026-06-16 11:05:44 -04:00
assistants 🔐 feat: Add Signed CloudFront File Downloads (#12970) 2026-05-06 19:48:30 -04:00
auth 🤝 fix: Honor OPENID_REUSE_TOKENS in Admin OAuth Exchange (#13154) 2026-05-18 09:34:58 -04:00
AuthController.js feat: Make OpenID Token Reuse Window Configurable (#13546) 2026-06-06 15:15:58 -04:00
AuthController.spec.js feat: Make OpenID Token Reuse Window Configurable (#13546) 2026-06-06 15:15:58 -04:00
Balance.js 🤫 chore: Quiet Repetitive Log Noise from Balance, CloudFront, and Capability Paths (#13461) 2026-06-01 20:40:16 -04:00
Balance.spec.js 🤫 chore: Quiet Repetitive Log Noise from Balance, CloudFront, and Capability Paths (#13461) 2026-06-01 20:40:16 -04:00
EndpointController.js refactor: Integrate Capabilities into Agent File Uploads and Tool Handling (#5048) 2024-12-19 13:04:48 -05:00
FavoritesController.js 📌 feat: Add Pin Support for Model Specs (#11219) 2026-04-09 18:37:25 -04:00
FavoritesController.spec.js 📌 feat: Add Pin Support for Model Specs (#11219) 2026-04-09 18:37:25 -04:00
mcp.js 🗄️ fix: Gate Request-Scoped MCP Servers Out of Persistent Tool Cache (#13672) 2026-06-13 11:26:49 -04:00
ModelController.js 🏗️ refactor: Remove Redundant Caching, Migrate Config Services to TypeScript (#12466) 2026-03-30 16:49:48 -04:00
PermissionsController.js 🪪 fix: Filter ACL Principal Details (#13524) 2026-06-05 19:06:41 -04:00
PluginController.js 🪪 fix: Resolve Group-Scoped Config Overrides (#13176) 2026-05-18 10:16:20 -04:00
PluginController.spec.js 🪪 fix: Resolve Group-Scoped Config Overrides (#13176) 2026-05-18 10:16:20 -04:00
SkillStatesController.js 🗂️ feat: Add Deployment Skill Directory (#13523) 2026-06-05 10:24:28 -04:00
TokenConfigController.js 🗂️ fix: Scope Token Config Cache (#13770) 2026-06-15 15:25:19 -04:00
tools.js 🧯 fix: Harden Data Retention Semantics (#13049) 2026-05-19 21:58:42 -04:00
TwoFactorController.js 🔑 fix: Require OTP Verification for 2FA Re-Enrollment and Backup Code Regeneration (#12223) 2026-03-14 01:51:31 -04:00
UserController.js 🤫 refactor: Silent MCP OAuth Refresh on Mid-Session 401 (#13369) 2026-06-10 13:12:42 -04:00
UserController.spec.js 🛂 fix: Normalize Verification Flow Error Responses (#13558) 2026-06-06 15:08:43 -04:00