LibreChat

mirror of https://github.com/danny-avila/LibreChat.git synced 2026-06-09 17:31:19 +00:00

History

Danny Avila 0fe203aaca 🧠 fix: charge Gemini reasoning tokens in agent usage accounting (#13014 ) * 🧠 fix: charge Gemini reasoning tokens in agent usage accounting Resolves #13006. `usage.ts` previously billed `usage.output_tokens` directly. For Vertex AI Gemini thinking models, `@langchain/google-common`'s streaming path emits `output_tokens = candidatesTokenCount` only, dropping `thoughtsTokenCount`. Reasoning was billed at zero and the `total_tokens === input_tokens + output_tokens` invariant was broken. The fix lives in agents (danny-avila/agents#157) — but this is also a defense-in-depth backstop in case agents misses a path or another provider exhibits the same shape. `resolveCompletionTokens(usage)` adds `output_token_details.reasoning` back when (and only when) the gap is present (`total - input > output`), so providers that already include reasoning in `output_tokens` (OpenAI o-series, Anthropic, the Google-API wrapper) are no-ops — no double-counting. - `SplitUsage` gains a `completion` field; all four billing call sites in `processUsageGroup` use it instead of `usage.output_tokens`. - `total_output_tokens` in the result also reflects the corrected count. - `UsageMetadata` interface in `IJobStore.ts` adds the `output_token_details` field for type safety. - 4 new tests in `usage.spec.ts` cover: Vertex undercount fix, OpenAI no-double-count, structured spend path with cache + reasoning, no-op when no details present. * 🩹 fix: simplify reasoning correction to invariant-based gap check Initial fix gated the correction on `output_token_details.reasoning > 0`, which doesn't help in the live failure case: when google-common's stream emits the buggy fallback usage_metadata, output_token_details is empty ({}) and the gate exits early. Live debugging showed the reliable signal is the documented invariant itself: `total_tokens === input_tokens + output_tokens`. When buggy streams undercount output, total exceeds input + output by exactly the unbilled reasoning. Use `total - input` as the corrected output. This is provider-agnostic and stays a no-op for compliant providers (OpenAI/Anthropic/Google-via-CustomChatGoogleGenerativeAI), where the gap is zero. Live verified end-to-end against gemini-3-flash-preview: - With agents fix in place: output_tokens=437 → billed 437 (no-op) - Backstop only (no agents fix, buggy input): raw 135, billed 297 (= total 309 - input 12, matches actual API charge) Updated tests to cover both scenarios.		2026-05-08 12:29:43 -04:00
..
src	🧠 fix: charge Gemini reasoning tokens in agent usage accounting (#13014 )	2026-05-08 12:29:43 -04:00
types	🔬 ci: Add TypeScript Type Checks to Backend Workflow and Fix All Type Errors (#12451 )	2026-03-28 21:06:39 -04:00
.gitignore
babel.config.cjs
jest.config.mjs	🌱 fix: Inject Code-Tool Files Into Graph Sessions on First Call (+ read_file Sandbox Fallback) (#12831 )	2026-04-27 08:56:39 +09:00
jest.setup.cjs	🌱 fix: Inject Code-Tool Files Into Graph Sessions on First Call (+ read_file Sandbox Fallback) (#12831 )	2026-04-27 08:56:39 +09:00
package.json	🧱 refactor: typed CodeEnvRef + kind discriminator + principal-aware sandbox cache (#12960 )	2026-05-08 12:29:43 -04:00
rollup.config.js
tsconfig-paths-bootstrap.mjs
tsconfig.build.json
tsconfig.json	📦 chore: npm audit fixes and Mongoose 8.23 TypeScript follow-ups (#12996 )	2026-05-07 09:47:40 -04:00
tsconfig.spec.json	📦 chore: Update TypeScript Config for TS v7 (#12794 )	2026-04-23 12:51:03 -04:00