mirror of
https://github.com/danny-avila/LibreChat.git
synced 2026-07-03 04:42:11 +00:00
2 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
4ee68d5240
|
💸 feat: Per-Agent Endpoint Token Config in Multi-Endpoint Billing (#13738)
* 💸 feat: Per-Agent Endpoint Token Config in Multi-Endpoint Billing
Price each collected/emitted usage item with the producing agent's resolved
endpoint token config, instead of the primary agent's for the whole graph.
Previously AgentClient.recordCollectedUsage and the subagent usage emitter used
a single this.options.endpointTokenConfig (the primary's) for every usage item.
A connected agent or subagent on a different custom endpoint that shares a model
id with an entry in the primary's tokenConfig was therefore mis-priced (a model
absent from it already fell back to the built-in rate map — no regression).
- Tag each usage with its producing agent: ModelEndHandler stamps
usage.agentId = agentContext.agentId; createSubagentUsageSink stamps the
child's subagentAgentId (UsageMetadata gains an optional agentId).
- buildAgentToolContext retains endpointTokenConfig so initialize.js can build
an agentId -> endpointTokenConfig map from agentToolContexts (the one map that
holds every agent, including pure subagents pruned from agentConfigs).
- AgentClient.resolveAgentEndpointTokenConfig(usage) looks up that map by
agentId, falling back to the primary config; used by both the billing path
(new optional resolveEndpointTokenConfig on recordCollectedUsage) and the
subagent cost emitter.
- recordCollectedUsage's resolver is optional and falls back to the batch
endpointTokenConfig, so the shared responses.js/openai.js call sites are
unchanged.
- Tests: two-endpoint graph with a colliding model id prices per-agent; resolver
nullish falls back to batch; subagent sink tags the child agent id.
* fix: Align emit-path cost with per-agent billing; honor known-agent built-in pricing
Addresses Codex review on the per-agent endpoint token config:
- Emit path (callbacks.js) now prices each on_token_usage event with the
producing agent's config (resolved via usageCost.resolveEndpointTokenConfig),
so streamed/persisted metadata.usage.cost matches the per-agent balance
transaction. The agentId tag is resolved server-side and stripped from the
emitted/persisted payload.
- Resolver (resolveAgentTokenConfig) now treats a known agent's config as
authoritative, including undefined → built-in pricing, so a known non-custom
agent in a custom-primary graph is no longer charged the primary's rates.
Only untagged/unknown usage falls back to the primary config.
- endpointTokenConfigByAgentId records every known agent (value may be
undefined) so the resolver distinguishes known-no-rates from unknown.
|
||
|
|
d90567204e
|
🛟 fix: persist Vertex Gemini 3 thoughtSignatures across DB round-trips (#13026)
When a tool round-trip is interrupted between the tool result and the
model's text reply (user aborted, network drop, pod restart, ...) and
LibreChat persists the partial assistant message, the next conversation
turn reconstructs an `AIMessage` from `formatAgentMessages` that has
`tool_calls` populated but no `additional_kwargs.signatures`. Vertex
Gemini 3 rejects the resumed request with 400 because the most recent
historical functionCall has no `thought_signature`.
## Storage shape
Capture as `Record<tool_call_id, signature>` rather than a flat array.
This addresses the codex P1 review:
> When an assistant turn contains multiple sequential tool-call batches,
> this restoration path writes all persisted thoughtSignatures onto only
> the last tool-bearing AIMessage. Vertex/Gemini validates signatures
> for each step in the current tool-calling turn, so earlier
> functionCall steps reconstructed without their signature can still
> fail with 400.
A single agent run can fire multiple `chat_model_end` events when the
loop cycles the LLM with intervening tool results — each cycle owns a
distinct `tool_call_id`. Per-id storage maps each signature back onto
the right reconstructed `AIMessage`, not just the last one.
## Mapping
`additional_kwargs.signatures` is a flat array indexed by *response part*
(text + functionCall interleaved). `tool_calls` is just the function
calls in their original order. Non-empty signatures correspond 1:1 with
tool_calls in order — see `partsToSignatures` in
`@langchain/google-common`. Single-pass walk maps `signatures[i]` (when
non-empty) onto the i-th `tool_call.id`.
## Pipeline
| Stage | File | Change |
|---|---|---|
| Capture | callbacks.js | `ModelEndHandler` accepts `Record<string,string>` map; walks signatures + tool_calls in tandem to record per-id. Gated on the map being provided — non-Vertex flows are no-op (and also no-op even when provided, since they don't emit signatures). |
| Plumbing | initialize.js | Allocate `collectedThoughtSignatures = {}`, share with handler + client. Always allocated; the JSDoc explicitly documents that it stays empty for non-Vertex providers. |
| Surface | client.js | `sendCompletion` returns `metadata.thoughtSignatures` when the map has entries; falls through unchanged when empty. |
| Persist | (existing BaseClient.handleRespCompletion) | Writes `metadata` from `sendCompletion` onto `responseMessage.metadata`. Mongoose `Mixed` — no migration. |
| Restore | formatMessages.js | Track every tool-bearing AIMessage produced from a TMessage. For each, build a position-aligned `additional_kwargs.signatures` array (empty placeholders for tool_calls without a stored sig). Agents' `fixThoughtSignatures` dispatches non-empty entries to functionCall parts in order. |
## Live verification
- **Single-step:** real Vertex `gemini-3.1-flash-lite-preview` resume-after-tool case. With fix ✅ / without ❌ 400.
- **Multi-step (codex case):** real two-step agent loop (list /tmp → echo done). Each step's signature attaches to its own reconstructed AIMessage. With fix ✅ / without ❌ 400.
- **Cross-provider:** Anthropic Claude haiku-4.5 + OpenAI gpt-5-mini accept the persisted/restored shape unchanged.
## Tests
`modelEndHandler.spec.js` (new) — 6 tests:
- maps non-empty signatures onto tool_call_ids in order
- accumulates per-id across multiple `model_end` events (multi-step)
- no-op when `collectedThoughtSignatures` is null
- no-op when `signatures` field missing (non-Vertex)
- no-op when `tool_calls` missing
- preserves existing `collectedUsage` array contract
`formatAgentMessages.spec.js` — 6 new tests:
- restores onto the AIMessage that owns the tool_call
- per-step attachment for multi-step turns (codex review case)
- preserves tool_call ordering when signatures are partial
- no-op when metadata.thoughtSignatures absent
- no-op when assistant has no tool_calls
- no-op when stored ids don't match any current tool_call
37 passing across 3 suites; 15 existing formatAgentMessages tests unchanged.
## Compatibility
- Backward-compatible — restore gated on `metadata.thoughtSignatures` being a populated object; capture gated on the map being provided.
- No schema migration — uses `Message.metadata: Mixed` already in place.
- Cross-provider safe — non-Vertex providers tolerate the field (verified live against Anthropic + OpenAI converters).
- Pairs with [agents#159](https://github.com/danny-avila/agents/pull/159) for full coverage on histories that mix plain-text and toolcall AIMessages.
|