LibreChat

mirror of https://github.com/danny-avila/LibreChat.git synced 2026-06-25 08:56:10 +00:00

History

Danny Avila d18d62e7c1 🪙 refactor: Reconcile Context Gauge to Actual Provider Tokens (#13780 ) * 🪙 fix: Reconcile Context Gauge to Actual Provider Tokens The context gauge could read several× too high (e.g. 213K when the real prompt was 56K) and stay there across reloads. Root cause: the SDK's calibrationRatio is `cumulativeProviderReported / cumulativeRawSent`, but a provider's server-side web search injects large fetched content into the prompt that the SDK never sent or counted — pinning the ratio at its cap (5) and multiplying every later message estimate, including post-summary ones. The gauge rendered (and persisted) that inflated estimate, never the provider's actual token count. Fix: reconcile the snapshot to the call's ACTUAL prompt tokens (input + cache), which already arrive in on_token_usage. Only messageTokens is calibration-scaled (instructions/summary are raw tiktoken), so keep those and set messageTokens to the remainder, recomputing free space. Shared `promptTokensFromUsage` + `reconcileContextUsage` in data-provider; applied server-side in buildPersistedContextUsage (reload-stable) and client-side in useUsageHandler on each primary usage (corrects at turn-end, no follow-up needed). Also drop the summary double-count from the Breakdown Messages row. Deferred (separate agents PR): the SDK over-calibration also fires summarization prematurely; fixing it needs decoupling real-content estimation from server-side injection headroom without weakening pruning-overflow safety. * 🪙 fix: Harden Token Reconciliation for Provider-less + Resume Paths Codex review on the reconciliation: - promptTokensFromUsage: when the provider is absent (custom/OpenAI-compatible payloads), fall back to the same magnitude heuristic normalizeUsageUnits uses (cache ≤ input ⇒ already included) so cached events aren't re-inflated. - Resume: backfillUsage restores a primary call's usage without replaying a live on_token_usage (Redis mode), so the live reconcile never ran and a reconnected session stayed on the inflated estimate. New reconcileBackfill reconciles the restored snapshot from the final primary call after contextHandler installs it. * 🪙 fix: Reconcile Resume Snapshot Server-Side, Not via Backfill Codex: the client reconcileBackfill scanned the resumed run's collectedUsage and applied the final primary to the latest snapshot — but on a mid-call resume that usage belongs to an EARLIER call, corrupting the restored gauge. Move the resume reconciliation server-side: GenerationJobManager.persistTokenUsage reconciles the stored contextUsage to a primary usage's actual prompt tokens as it arrives. That usage is the post-invoke truth for the call the latest stored snapshot precedes (no snapshot is captured between a call's pre-invoke dispatch and its usage), so it's correct by construction and run-matched. A mid-call resume (no usage yet) keeps the raw snapshot instead of mis-applying an earlier call's tokens; it reconciles once the call completes. Removed client reconcileBackfill; the live-path reconcile (non-resume) stays. * 🪙 fix: Guard Reconciliation Against Replays and Snapshot Races Two Codex concurrency findings on the reconciliation: - Client: reconcile only on a NEWLY folded primary usage. A replayed duplicate (folded=false on resume) can be an earlier tool-loop call sharing the run id, which would overwrite the latest snapshot with an earlier, smaller prompt. Moved the reconcile after the folded guard. - Server: serialize the context-usage write through the same per-stream queue as the token-usage write. persistTokenUsage reconciles the stored snapshot (read-modify-write); an unserialized trackContextUsage could store a newer snapshot between the read and write — or a stale reconciled write could land after a newer snapshot — clobbering the newer run's gauge when calls interleave. FIFO keeps each call's snapshot ahead of its own usage and behind the next. * chore: import order in GenerationJobManager.ts		2026-06-16 11:05:44 -04:00
..
callbacks.spec.js	🗂️ feat: Add Agent File Authoring Tools (#13435 )	2026-06-03 23:58:12 -04:00
client.contextMetadata.spec.js	🪙 fix: Correct Context Usage Gauge After Summarization (#13744 )	2026-06-14 18:23:30 -04:00
client.memory.spec.js	🧰 refactor: Unify code-execution tools (#12767 )	2026-04-25 04:02:01 -04:00
jobReplacement.spec.js	🏁 fix: Message Race Condition if Cancelled Early (#11462 )	2026-01-21 13:57:12 -05:00
modelEndHandler.spec.js	💸 feat: Per-Agent Endpoint Token Config in Multi-Endpoint Billing (#13738 )	2026-06-14 12:00:32 -04:00
openai.spec.js	🧾 fix: Bill Subagent Child-Run Model Usage in Parent Transactions (#13683 )	2026-06-13 14:55:48 -04:00
request.resumeMetadata.spec.js	🪢 fix: Tie MCP Cleanup To Resumable Runs (#13769 )	2026-06-15 15:26:03 -04:00
responses.unit.spec.js	🧾 fix: Bill Subagent Child-Run Model Usage in Parent Transactions (#13683 )	2026-06-13 14:55:48 -04:00
usageEvents.integration.spec.js	🪙 refactor: Reconcile Context Gauge to Actual Provider Tokens (#13780 )	2026-06-16 11:05:44 -04:00
usageEvents.live.spec.js	📊 feat: Real-Time Context Window & Token Usage Tracking (#13670 )	2026-06-13 19:38:28 -04:00
v1.duplicate-actions.spec.js	🛡️ refactor: Scope Action Mutations by Parent Resource Ownership (#12237 )	2026-03-15 10:19:29 -04:00
v1.spec.js	📦 refactor: Consolidate DB models, encapsulating Mongoose usage in `data-schemas` (#11830 )	2026-03-21 14:28:53 -04:00