Gh 5910 stream tool call argument deltas#5937
Open
i-zoufeng wants to merge 9 commits intospring-projects:mainfrom
Open
Gh 5910 stream tool call argument deltas#5937i-zoufeng wants to merge 9 commits intospring-projects:mainfrom
i-zoufeng wants to merge 9 commits intospring-projects:mainfrom
Conversation
… streaming When deepseek-reasoner is used with streaming and tool calls, the API requires the assistant message replayed in subsequent rounds to include reasoning_content; otherwise it returns "thinking is enabled but reasoning_content is missing in assistant tool call message". Spring AI dropped the field on three independent paths: 1. DeepSeekStreamFunctionCallingHelper#merge did not accumulate the reasoning_content delta nor forward the prefix flag across chunks. 2. DeepSeekChatModel#createRequest passed null for reasoning_content when re-serializing a DeepSeekAssistantMessage. Even after reading the typed field, Prompt#mutate() (called from buildRequestPrompt) downgrades AssistantMessage subclasses to the plain superclass via instructionsCopy, so subclass-only fields cannot survive a round trip; only metadata does. 3. DeepSeekApi#chatCompletionStream emits each pre-tool-call chunk (reasoning delta, content delta) as its own one-element window for streaming UX and only merges chunks once the tool-call window opens. The chunk that triggers ToolCallingManager#executeToolCalls therefore carries only tool_calls, and the assistant message stored in the next round's conversation history has empty content and no reasoning_content. Fixes: - Merge reasoning_content and forward prefix in the chunk-window merger. - Mirror reasoningContent and prefix from DeepSeekAssistantMessage into the message metadata, and read them back from metadata in createRequest, so the round trip works whether the message is a DeepSeekAssistantMessage or a plain AssistantMessage created by Prompt#instructionsCopy. - In DeepSeekChatModel#internalStream, maintain per-stream-call accumulators for content and reasoning_content. Before invoking executeToolCalls, enrich the assistant message with the accumulated values so the next round carries the full pre-tool-call context. Adds regression tests covering chunk-merge accumulation, the AssistantMessage to ChatCompletionMessage round trip, and the streaming-path enrichment. Fixes spring-projectsgh-5898 Signed-off-by: i-zoufeng <15536835114@163.com> Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ericbottard
reviewed
May 7, 2026
|
|
||
| private static final class ToolCallBuilder { | ||
|
|
||
| @Nullable private String id; |
Member
There was a problem hiding this comment.
@Nullable should appear after the visibility modifier.
This may apply to other places in this PR.
Author
There was a problem hiding this comment.
Fixed in the latest force-push — @nullable now follows the visibility modifier on the three ToolCallBuilder fields. Verified there are no other instances in this PR (only StreamingToolCallBuilder#toPartial which uses type-use placement intentionally).
Streaming providers currently merge tool call argument fragments into a single complete ToolCall before exposing it to the client, which loses the per-token UX for use cases where the arguments themselves are the user-visible output (e.g. code generation tools). This commit lays the foundation for delta streaming: - AssistantMessage.ToolCall gains a `partial` boolean. When `true`, `arguments` carries only the delta fragment from the latest chunk and downstream tool execution must not run on that frame. The existing 4-arg constructor delegates with `partial=false` so all existing call sites remain source-compatible. - MessageAggregator now stitches incoming ToolCall frames keyed by id. Non-partial frames are treated as authoritative (the provider has the complete arguments string); partial frames append fragments as a fallback for providers that emit only deltas. Provider stream wiring follows in a subsequent commit; this change is intentionally isolated to the common types and aggregator so the contract can be reviewed independently. spring-projectsgh-5910 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: i-zoufeng <15536835114@163.com>
When DeepSeek emits a tool call mid-stream, DeepSeekApi previously
collapsed the entire SSE window into a single merged chunk before
exposing it. Clients only saw the tool call after every argument
fragment had arrived — losing the per-token UX for tools whose
arguments are themselves the user-visible output (e.g. code
generation tools that the model writes via a write_code tool).
This commit replaces the windowUntil + reduce in
DeepSeekApi#chatCompletionStream with a window expansion that emits:
- one frame per pre-merge SSE chunk, carrying that chunk's own
argument fragment with the cumulative tool-call identity
(id/type/name) stamped from earlier chunks and finish_reason left
unset; and
- a single terminal merged frame with the concatenated arguments and
finish_reason = TOOL_CALLS, which is the authoritative frame
downstream tool execution and message aggregation act on.
DeepSeekChatModel#buildGeneration marks the partial frames with
ToolCall.partial = true so AssistantMessage.hasToolCalls() returns
false on intermediate frames, keeping tool-execution gating, advisor
routing, and persistence on the final merged frame only.
The empty SSE marker chunk that closes the window
(`delta: {}, finish_reason: tool_calls`) is dropped — the merged
frame already carries the finish_reason.
spring-projectsgh-5910
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: i-zoufeng <15536835114@163.com>
Mirror the DeepSeek change for Mistral: replace MistralAiApi's windowUntil + reduce in chatCompletionStream with a window expansion that emits one frame per pre-merge SSE chunk (carrying that chunk's argument fragment with the cumulative tool-call identity stamped on) followed by the merged authoritative frame. MistralAiChatModel#buildGeneration uses the absence of finish_reason on a tool-call chunk as the partial signal, so partial frames roll up into the same MessageAggregator path that DeepSeek already uses. spring-projectsgh-5910 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: i-zoufeng <15536835114@163.com>
Replace MiniMaxApi#chatCompletionStream's windowUntil + reduce with a window expansion that emits one frame per pre-merge SSE chunk (carrying that chunk's argument fragment with the cumulative tool-call identity stamped on) followed by the merged authoritative frame. MiniMaxChatModel marks intermediate frames with ToolCall.partial = true based on the absence of finish_reason, so the existing reduce-based merge that handles MiniMax's empty-id-means-continuation quirk now produces partial frames during streaming and a single non-partial complete frame at the end. spring-projectsgh-5910 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: i-zoufeng <15536835114@163.com>
Two related changes to OpenAiChatModel#internalStream: - Mark each streaming chunk's tool-call frames as partial so AssistantMessage.hasToolCalls() and the tool-execution gate stay silent on intermediate frames. Per-chunk frames carry only the argument fragment that arrived in their SSE event. - Replace the .collectList().flatMapMany(...) terminal aggregation with a doOnNext-based side-channel accumulator and a concatWith that emits a single non-partial aggregated frame after the SSE stream completes. Per-chunk frames now flow through to subscribers as they arrive — restoring streaming UX for both plain text and tool-call use cases — while the aggregated frame remains the authoritative target for tool execution and recursive next rounds. Tool execution gating moves to the post-aggregation flatMap and relies on the partial filter inside hasToolCalls(): only the non-partial aggregated frame triggers ToolExecutionEligibilityPredicate and the recursive internalStream call. spring-projectsgh-5910 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: i-zoufeng <15536835114@163.com>
Anthropic's content_block_delta with input_json events were being silently buffered into StreamingState until message_delta closed the turn — so subscribers only saw the complete tool call, never the per-token argument deltas. Emit a partial ToolCall frame for each input_json delta, carrying just the fragment from that SSE event with the tool id/name stamped from the in-flight content_block_start. The partial flag keeps the tool-execution gate, advisor routing, and persistence quiet on these intermediate frames; the message_delta event still emits the authoritative non-partial ToolCall list with the final concatenated arguments. spring-projectsgh-5910 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: i-zoufeng <15536835114@163.com>
ConverseChatResponseStream previously buffered tool_use input fragments into StreamingToolCallBuilder and only emitted the complete tool call from visitMetadata once the stream closed. Emit a partial ToolCall frame from visitContentBlockDelta for each tool_use input chunk, carrying just that event's input fragment with the in-flight builder's id and name stamped on. The partial flag keeps the tool-execution gate, advisor routing, and persistence quiet on these intermediate frames; visitMetadata still emits the authoritative non-partial tool calls when the stream completes. spring-projectsgh-5910 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: i-zoufeng <15536835114@163.com>
568f070 to
fa9a7f3
Compare
…stream-tool-call-argument-deltas # Conflicts: # models/spring-ai-minimax/src/main/java/org/springframework/ai/minimax/MiniMaxChatModel.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Thank you for taking time to contribute this pull request!
You might have already read the contributor guide, but as a reminder, please make sure to:
git commit -s) per the DCOmainbranch and squash your commitsFor more details, please check the contributor guide.
Thank you upfront!