LibreChat

mirror of https://github.com/danny-avila/LibreChat.git synced 2026-05-14 00:19:40 +00:00

History

Danny Avila 89bf2ab7b4 💎 fix: Stop Double-Counting Cache Tokens for Gemini/OpenAI in Usage Spend (#12868 ) * 💎 fix: Stop Double-Counting Cache Tokens for Gemini/OpenAI in Usage Spend (#12855) Different providers report `usage_metadata.input_tokens` with different semantics: - Anthropic / Bedrock: `input_tokens` EXCLUDES cache; cache reads/writes arrive separately and must be added to get the total prompt size. - Gemini / OpenAI: `input_tokens` ALREADY INCLUDES cached tokens (Google's `promptTokenCount`, OpenAI's `prompt_tokens`). Their `input_token_details.cache_` are subsets of `input_tokens`. `recordCollectedUsage` treated both schemes as additive, so for cache-hit requests on Gemini/OpenAI it added cache tokens on top of an `input_tokens` value that already contained them — overcharging users by the cache_hit_rate (e.g., ~67% cache hit ≈ 1.67x overcharge). This matches the issue reporter's GCP billing comparison. Adds a small `splitUsage` helper that classifies the provider by model name and computes `inputOnly` (the non-cached portion) plus the all-inclusive `totalInput` for both the spend math and the returned `input_tokens` summary. The helper defaults to additive semantics (the historical behavior) so unknown providers are unaffected. Updates existing OpenAI-shaped tests that previously asserted the buggy additive math, and adds Gemini regression tests using the exact numbers from the issue report (input=11125, cache_read=7441 → input=3684). Anthropic / Bedrock paths remain bit-identical to before. 🔧 refactor: Classify Cache-Token Semantics by Provider, Not Model Name Follows up the previous commit. Replaces a model-name regex (`gemini\|gpt\|o[1-9]\|chatgpt`) with an explicit `Providers` enum lookup keyed off the `usage.provider` field — `UsageMetadata.provider` already exists in `IJobStore.ts` but was never being populated. - `callbacks.js#ModelEndHandler` now attaches `usage.provider` from `agentContext.provider` alongside `usage.model`. - `usage.ts` uses a `SUBSET_PROVIDERS` set (`openAI`, `azureOpenAI`, `google`, `vertexai`, `xai`, `deepseek`, `openrouter`, `moonshot`) backed by the canonical `Providers` enum from `librechat-data-provider`. - `xai`, `deepseek`, `openrouter`, `moonshot` extend `ChatOpenAI` so they inherit subset semantics (verified in node_modules). - Defaults to additive when `usage.provider` is missing, so the title flow (which doesn't propagate provider) and any pre-this-PR usage entries keep their existing behavior. Tests: switch fixtures from model-name signaling to explicit `provider` field, plus a Vertex AI case and a "missing provider" fallback case.		2026-04-29 08:36:00 +09:00
..
app	🌱 fix: Inject Code-Tool Files Into Graph Sessions on First Call (+ read_file Sandbox Fallback) (#12831 )	2026-04-27 08:56:39 +09:00
cache	🚦 fix: ERR_ERL_INVALID_IP_ADDRESS and IPv6 Key Collisions in IP Rate Limiters (#12319 )	2026-03-19 21:48:03 -04:00
config	🔊 fix: Preserve Log Metadata on Console for Warn/Error Levels (#12737 )	2026-04-19 21:49:41 -07:00
db	🐛 fix: Resolve MeiliSearch Startup Sync Failure from Model Loading Order (#12397 )	2026-03-25 14:02:44 -04:00
models	🗑️ chore: Remove Action Test Suite and Update Mock Implementations (#12268 )	2026-03-21 14:28:55 -04:00
server	💎 fix: Stop Double-Counting Cache Tokens for Gemini/OpenAI in Usage Spend (#12868 )	2026-04-29 08:36:00 +09:00
strategies	🔐 feat: Admin Auth Support for SAML and Social OAuth Providers (#12472 )	2026-03-30 22:49:44 -04:00
test	🌱 fix: Inject Code-Tool Files Into Graph Sessions on First Call (+ read_file Sandbox Fallback) (#12831 )	2026-04-27 08:56:39 +09:00
utils	🦉 feat: Claude Opus 4.7 Model Support (#12698 )	2026-04-16 14:51:00 -04:00
jest.config.js	📏 refactor: Add File Size Limits to Conversation Imports (#12221 )	2026-03-14 03:06:29 -04:00
jsconfig.json
package.json	🌱 fix: Inject Code-Tool Files Into Graph Sessions on First Call (+ read_file Sandbox Fallback) (#12831 )	2026-04-27 08:56:39 +09:00
typedefs.js