Gh 5910 stream tool call argument deltas by i-zoufeng · Pull Request #5937 · spring-projects/spring-ai

i-zoufeng · 2026-05-05T00:56:05Z

Thank you for taking time to contribute this pull request!
You might have already read the contributor guide, but as a reminder, please make sure to:

Add a Signed-off-by line to each commit (git commit -s) per the DCO
Rebase your changes on the latest main branch and squash your commits
Add/Update unit tests as needed
Run a build and make sure all tests pass prior to submission

For more details, please check the contributor guide.
Thank you upfront!

… streaming When deepseek-reasoner is used with streaming and tool calls, the API requires the assistant message replayed in subsequent rounds to include reasoning_content; otherwise it returns "thinking is enabled but reasoning_content is missing in assistant tool call message". Spring AI dropped the field on three independent paths: 1. DeepSeekStreamFunctionCallingHelper#merge did not accumulate the reasoning_content delta nor forward the prefix flag across chunks. 2. DeepSeekChatModel#createRequest passed null for reasoning_content when re-serializing a DeepSeekAssistantMessage. Even after reading the typed field, Prompt#mutate() (called from buildRequestPrompt) downgrades AssistantMessage subclasses to the plain superclass via instructionsCopy, so subclass-only fields cannot survive a round trip; only metadata does. 3. DeepSeekApi#chatCompletionStream emits each pre-tool-call chunk (reasoning delta, content delta) as its own one-element window for streaming UX and only merges chunks once the tool-call window opens. The chunk that triggers ToolCallingManager#executeToolCalls therefore carries only tool_calls, and the assistant message stored in the next round's conversation history has empty content and no reasoning_content. Fixes: - Merge reasoning_content and forward prefix in the chunk-window merger. - Mirror reasoningContent and prefix from DeepSeekAssistantMessage into the message metadata, and read them back from metadata in createRequest, so the round trip works whether the message is a DeepSeekAssistantMessage or a plain AssistantMessage created by Prompt#instructionsCopy. - In DeepSeekChatModel#internalStream, maintain per-stream-call accumulators for content and reasoning_content. Before invoking executeToolCalls, enrich the assistant message with the accumulated values so the next round carries the full pre-tool-call context. Adds regression tests covering chunk-merge accumulation, the AssistantMessage to ChatCompletionMessage round trip, and the streaming-path enrichment. Fixes spring-projectsgh-5898 Signed-off-by: i-zoufeng <15536835114@163.com> Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

ericbottard · 2026-05-07T13:34:19Z

+
+	private static final class ToolCallBuilder {
+
+		@Nullable private String id;


@Nullable should appear after the visibility modifier.

This may apply to other places in this PR.

Fixed in the latest force-push — @nullable now follows the visibility modifier on the three ToolCallBuilder fields. Verified there are no other instances in this PR (only StreamingToolCallBuilder#toPartial which uses type-use placement intentionally).

Streaming providers currently merge tool call argument fragments into a single complete ToolCall before exposing it to the client, which loses the per-token UX for use cases where the arguments themselves are the user-visible output (e.g. code generation tools). This commit lays the foundation for delta streaming: - AssistantMessage.ToolCall gains a `partial` boolean. When `true`, `arguments` carries only the delta fragment from the latest chunk and downstream tool execution must not run on that frame. The existing 4-arg constructor delegates with `partial=false` so all existing call sites remain source-compatible. - MessageAggregator now stitches incoming ToolCall frames keyed by id. Non-partial frames are treated as authoritative (the provider has the complete arguments string); partial frames append fragments as a fallback for providers that emit only deltas. Provider stream wiring follows in a subsequent commit; this change is intentionally isolated to the common types and aggregator so the contract can be reviewed independently. spring-projectsgh-5910 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: i-zoufeng <15536835114@163.com>

When DeepSeek emits a tool call mid-stream, DeepSeekApi previously collapsed the entire SSE window into a single merged chunk before exposing it. Clients only saw the tool call after every argument fragment had arrived — losing the per-token UX for tools whose arguments are themselves the user-visible output (e.g. code generation tools that the model writes via a write_code tool). This commit replaces the windowUntil + reduce in DeepSeekApi#chatCompletionStream with a window expansion that emits: - one frame per pre-merge SSE chunk, carrying that chunk's own argument fragment with the cumulative tool-call identity (id/type/name) stamped from earlier chunks and finish_reason left unset; and - a single terminal merged frame with the concatenated arguments and finish_reason = TOOL_CALLS, which is the authoritative frame downstream tool execution and message aggregation act on. DeepSeekChatModel#buildGeneration marks the partial frames with ToolCall.partial = true so AssistantMessage.hasToolCalls() returns false on intermediate frames, keeping tool-execution gating, advisor routing, and persistence on the final merged frame only. The empty SSE marker chunk that closes the window (`delta: {}, finish_reason: tool_calls`) is dropped — the merged frame already carries the finish_reason. spring-projectsgh-5910 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: i-zoufeng <15536835114@163.com>

Mirror the DeepSeek change for Mistral: replace MistralAiApi's windowUntil + reduce in chatCompletionStream with a window expansion that emits one frame per pre-merge SSE chunk (carrying that chunk's argument fragment with the cumulative tool-call identity stamped on) followed by the merged authoritative frame. MistralAiChatModel#buildGeneration uses the absence of finish_reason on a tool-call chunk as the partial signal, so partial frames roll up into the same MessageAggregator path that DeepSeek already uses. spring-projectsgh-5910 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: i-zoufeng <15536835114@163.com>

Replace MiniMaxApi#chatCompletionStream's windowUntil + reduce with a window expansion that emits one frame per pre-merge SSE chunk (carrying that chunk's argument fragment with the cumulative tool-call identity stamped on) followed by the merged authoritative frame. MiniMaxChatModel marks intermediate frames with ToolCall.partial = true based on the absence of finish_reason, so the existing reduce-based merge that handles MiniMax's empty-id-means-continuation quirk now produces partial frames during streaming and a single non-partial complete frame at the end. spring-projectsgh-5910 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: i-zoufeng <15536835114@163.com>

Two related changes to OpenAiChatModel#internalStream: - Mark each streaming chunk's tool-call frames as partial so AssistantMessage.hasToolCalls() and the tool-execution gate stay silent on intermediate frames. Per-chunk frames carry only the argument fragment that arrived in their SSE event. - Replace the .collectList().flatMapMany(...) terminal aggregation with a doOnNext-based side-channel accumulator and a concatWith that emits a single non-partial aggregated frame after the SSE stream completes. Per-chunk frames now flow through to subscribers as they arrive — restoring streaming UX for both plain text and tool-call use cases — while the aggregated frame remains the authoritative target for tool execution and recursive next rounds. Tool execution gating moves to the post-aggregation flatMap and relies on the partial filter inside hasToolCalls(): only the non-partial aggregated frame triggers ToolExecutionEligibilityPredicate and the recursive internalStream call. spring-projectsgh-5910 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: i-zoufeng <15536835114@163.com>

Anthropic's content_block_delta with input_json events were being silently buffered into StreamingState until message_delta closed the turn — so subscribers only saw the complete tool call, never the per-token argument deltas. Emit a partial ToolCall frame for each input_json delta, carrying just the fragment from that SSE event with the tool id/name stamped from the in-flight content_block_start. The partial flag keeps the tool-execution gate, advisor routing, and persistence quiet on these intermediate frames; the message_delta event still emits the authoritative non-partial ToolCall list with the final concatenated arguments. spring-projectsgh-5910 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: i-zoufeng <15536835114@163.com>

ConverseChatResponseStream previously buffered tool_use input fragments into StreamingToolCallBuilder and only emitted the complete tool call from visitMetadata once the stream closed. Emit a partial ToolCall frame from visitContentBlockDelta for each tool_use input chunk, carrying just that event's input fragment with the in-flight builder's id and name stamped on. The partial flag keeps the tool-execution gate, advisor routing, and persistence quiet on these intermediate frames; visitMetadata still emits the authoritative non-partial tool calls when the stream completes. spring-projectsgh-5910 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: i-zoufeng <15536835114@163.com>

…stream-tool-call-argument-deltas # Conflicts: # models/spring-ai-minimax/src/main/java/org/springframework/ai/minimax/MiniMaxChatModel.java

ericbottard reviewed May 7, 2026

View reviewed changes

i-zoufeng and others added 7 commits May 7, 2026 22:52

i-zoufeng force-pushed the gh-5910-stream-tool-call-argument-deltas branch from 568f070 to fa9a7f3 Compare May 7, 2026 14:55

Merge remote-tracking branch 'origin/main' into spring-projectsgh-5910-…

7564918

…stream-tool-call-argument-deltas # Conflicts: # models/spring-ai-minimax/src/main/java/org/springframework/ai/minimax/MiniMaxChatModel.java

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gh 5910 stream tool call argument deltas#5937

Gh 5910 stream tool call argument deltas#5937
i-zoufeng wants to merge 9 commits intospring-projects:mainfrom
i-zoufeng:gh-5910-stream-tool-call-argument-deltas

i-zoufeng commented May 5, 2026

Uh oh!

ericbottard May 7, 2026

Uh oh!

i-zoufeng May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		private static final class ToolCallBuilder {

		@Nullable private String id;

Conversation

i-zoufeng commented May 5, 2026

Uh oh!

ericbottard May 7, 2026

Choose a reason for hiding this comment

Uh oh!

i-zoufeng May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants