mirror of
https://github.com/danny-avila/LibreChat.git
synced 2026-06-09 17:31:19 +00:00
* 🧠 fix: charge Gemini reasoning tokens in agent usage accounting Resolves #13006. `usage.ts` previously billed `usage.output_tokens` directly. For Vertex AI Gemini thinking models, `@langchain/google-common`'s streaming path emits `output_tokens = candidatesTokenCount` only, dropping `thoughtsTokenCount`. Reasoning was billed at zero and the `total_tokens === input_tokens + output_tokens` invariant was broken. The fix lives in agents (danny-avila/agents#157) — but this is also a defense-in-depth backstop in case agents misses a path or another provider exhibits the same shape. `resolveCompletionTokens(usage)` adds `output_token_details.reasoning` back when (and only when) the gap is present (`total - input > output`), so providers that already include reasoning in `output_tokens` (OpenAI o-series, Anthropic, the Google-API wrapper) are no-ops — no double-counting. - `SplitUsage` gains a `completion` field; all four billing call sites in `processUsageGroup` use it instead of `usage.output_tokens`. - `total_output_tokens` in the result also reflects the corrected count. - `UsageMetadata` interface in `IJobStore.ts` adds the `output_token_details` field for type safety. - 4 new tests in `usage.spec.ts` cover: Vertex undercount fix, OpenAI no-double-count, structured spend path with cache + reasoning, no-op when no details present. * 🩹 fix: simplify reasoning correction to invariant-based gap check Initial fix gated the correction on `output_token_details.reasoning > 0`, which doesn't help in the live failure case: when google-common's stream emits the buggy fallback usage_metadata, output_token_details is empty ({}) and the gate exits early. Live debugging showed the reliable signal is the documented invariant itself: `total_tokens === input_tokens + output_tokens`. When buggy streams undercount output, total exceeds input + output by exactly the unbilled reasoning. Use `total - input` as the corrected output. This is provider-agnostic and stays a no-op for compliant providers (OpenAI/Anthropic/Google-via-CustomChatGoogleGenerativeAI), where the gap is zero. Live verified end-to-end against gemini-3-flash-preview: - With agents fix in place: output_tokens=437 → billed 437 (no-op) - Backstop only (no agents fix, buggy input): raw 135, billed 297 (= total 309 - input 12, matches actual API charge) Updated tests to cover both scenarios. |
||
|---|---|---|
| .. | ||
| src | ||
| types | ||
| .gitignore | ||
| babel.config.cjs | ||
| jest.config.mjs | ||
| jest.setup.cjs | ||
| package.json | ||
| rollup.config.js | ||
| tsconfig-paths-bootstrap.mjs | ||
| tsconfig.build.json | ||
| tsconfig.json | ||
| tsconfig.spec.json | ||