LibreChat/api
Danny Avila 89bf2ab7b4
💎 fix: Stop Double-Counting Cache Tokens for Gemini/OpenAI in Usage Spend (#12868)
* 💎 fix: Stop Double-Counting Cache Tokens for Gemini/OpenAI in Usage Spend (#12855)

Different providers report `usage_metadata.input_tokens` with different
semantics:

  - Anthropic / Bedrock: `input_tokens` EXCLUDES cache; cache reads/writes
    arrive separately and must be added to get the total prompt size.
  - Gemini / OpenAI: `input_tokens` ALREADY INCLUDES cached tokens
    (Google's `promptTokenCount`, OpenAI's `prompt_tokens`). Their
    `input_token_details.cache_*` are subsets of `input_tokens`.

`recordCollectedUsage` treated both schemes as additive, so for cache-hit
requests on Gemini/OpenAI it added cache tokens on top of an
`input_tokens` value that already contained them — overcharging users by
the cache_hit_rate (e.g., ~67% cache hit ≈ 1.67x overcharge). This
matches the issue reporter's GCP billing comparison.

Adds a small `splitUsage` helper that classifies the provider by model
name and computes `inputOnly` (the non-cached portion) plus the
all-inclusive `totalInput` for both the spend math and the returned
`input_tokens` summary. The helper defaults to additive semantics (the
historical behavior) so unknown providers are unaffected.

Updates existing OpenAI-shaped tests that previously asserted the buggy
additive math, and adds Gemini regression tests using the exact numbers
from the issue report (input=11125, cache_read=7441 → input=3684).

Anthropic / Bedrock paths remain bit-identical to before.

* 🔧 refactor: Classify Cache-Token Semantics by Provider, Not Model Name

Follows up the previous commit. Replaces a model-name regex
(`gemini|gpt|o[1-9]|chatgpt`) with an explicit `Providers` enum lookup
keyed off the `usage.provider` field — `UsageMetadata.provider` already
exists in `IJobStore.ts` but was never being populated.

  - `callbacks.js#ModelEndHandler` now attaches `usage.provider` from
    `agentContext.provider` alongside `usage.model`.
  - `usage.ts` uses a `SUBSET_PROVIDERS` set (`openAI`, `azureOpenAI`,
    `google`, `vertexai`, `xai`, `deepseek`, `openrouter`, `moonshot`)
    backed by the canonical `Providers` enum from
    `librechat-data-provider`.
  - `xai`, `deepseek`, `openrouter`, `moonshot` extend `ChatOpenAI` so
    they inherit subset semantics (verified in node_modules).
  - Defaults to additive when `usage.provider` is missing, so the title
    flow (which doesn't propagate provider) and any pre-this-PR usage
    entries keep their existing behavior.

Tests: switch fixtures from model-name signaling to explicit `provider`
field, plus a Vertex AI case and a "missing provider" fallback case.
2026-04-29 08:36:00 +09:00
..
app 🌱 fix: Inject Code-Tool Files Into Graph Sessions on First Call (+ read_file Sandbox Fallback) (#12831) 2026-04-27 08:56:39 +09:00
cache 🚦 fix: ERR_ERL_INVALID_IP_ADDRESS and IPv6 Key Collisions in IP Rate Limiters (#12319) 2026-03-19 21:48:03 -04:00
config 🔊 fix: Preserve Log Metadata on Console for Warn/Error Levels (#12737) 2026-04-19 21:49:41 -07:00
db 🐛 fix: Resolve MeiliSearch Startup Sync Failure from Model Loading Order (#12397) 2026-03-25 14:02:44 -04:00
models 🗑️ chore: Remove Action Test Suite and Update Mock Implementations (#12268) 2026-03-21 14:28:55 -04:00
server 💎 fix: Stop Double-Counting Cache Tokens for Gemini/OpenAI in Usage Spend (#12868) 2026-04-29 08:36:00 +09:00
strategies 🔐 feat: Admin Auth Support for SAML and Social OAuth Providers (#12472) 2026-03-30 22:49:44 -04:00
test 🌱 fix: Inject Code-Tool Files Into Graph Sessions on First Call (+ read_file Sandbox Fallback) (#12831) 2026-04-27 08:56:39 +09:00
utils 🦉 feat: Claude Opus 4.7 Model Support (#12698) 2026-04-16 14:51:00 -04:00
jest.config.js 📏 refactor: Add File Size Limits to Conversation Imports (#12221) 2026-03-14 03:06:29 -04:00
jsconfig.json
package.json 🌱 fix: Inject Code-Tool Files Into Graph Sessions on First Call (+ read_file Sandbox Fallback) (#12831) 2026-04-27 08:56:39 +09:00
typedefs.js