Skip to content

Gh 5910 stream tool call argument deltas#5937

Open
i-zoufeng wants to merge 9 commits intospring-projects:mainfrom
i-zoufeng:gh-5910-stream-tool-call-argument-deltas
Open

Gh 5910 stream tool call argument deltas#5937
i-zoufeng wants to merge 9 commits intospring-projects:mainfrom
i-zoufeng:gh-5910-stream-tool-call-argument-deltas

Conversation

@i-zoufeng
Copy link
Copy Markdown

Thank you for taking time to contribute this pull request!
You might have already read the contributor guide, but as a reminder, please make sure to:

  • Add a Signed-off-by line to each commit (git commit -s) per the DCO
  • Rebase your changes on the latest main branch and squash your commits
  • Add/Update unit tests as needed
  • Run a build and make sure all tests pass prior to submission

For more details, please check the contributor guide.
Thank you upfront!

… streaming

When deepseek-reasoner is used with streaming and tool calls, the API
requires the assistant message replayed in subsequent rounds to include
reasoning_content; otherwise it returns "thinking is enabled but
reasoning_content is missing in assistant tool call message".

Spring AI dropped the field on three independent paths:

1. DeepSeekStreamFunctionCallingHelper#merge did not accumulate the
   reasoning_content delta nor forward the prefix flag across chunks.

2. DeepSeekChatModel#createRequest passed null for reasoning_content
   when re-serializing a DeepSeekAssistantMessage. Even after reading
   the typed field, Prompt#mutate() (called from buildRequestPrompt)
   downgrades AssistantMessage subclasses to the plain superclass via
   instructionsCopy, so subclass-only fields cannot survive a round
   trip; only metadata does.

3. DeepSeekApi#chatCompletionStream emits each pre-tool-call chunk
   (reasoning delta, content delta) as its own one-element window for
   streaming UX and only merges chunks once the tool-call window
   opens. The chunk that triggers ToolCallingManager#executeToolCalls
   therefore carries only tool_calls, and the assistant message stored
   in the next round's conversation history has empty content and no
   reasoning_content.

Fixes:

- Merge reasoning_content and forward prefix in the chunk-window merger.
- Mirror reasoningContent and prefix from DeepSeekAssistantMessage into
  the message metadata, and read them back from metadata in
  createRequest, so the round trip works whether the message is a
  DeepSeekAssistantMessage or a plain AssistantMessage created by
  Prompt#instructionsCopy.
- In DeepSeekChatModel#internalStream, maintain per-stream-call
  accumulators for content and reasoning_content. Before invoking
  executeToolCalls, enrich the assistant message with the accumulated
  values so the next round carries the full pre-tool-call context.

Adds regression tests covering chunk-merge accumulation, the
AssistantMessage to ChatCompletionMessage round trip, and the
streaming-path enrichment.

Fixes spring-projectsgh-5898

Signed-off-by: i-zoufeng <15536835114@163.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

private static final class ToolCallBuilder {

@Nullable private String id;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nullable should appear after the visibility modifier.

This may apply to other places in this PR.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest force-push — @nullable now follows the visibility modifier on the three ToolCallBuilder fields. Verified there are no other instances in this PR (only StreamingToolCallBuilder#toPartial which uses type-use placement intentionally).

i-zoufeng and others added 7 commits May 7, 2026 22:52
Streaming providers currently merge tool call argument fragments into
a single complete ToolCall before exposing it to the client, which
loses the per-token UX for use cases where the arguments themselves
are the user-visible output (e.g. code generation tools).

This commit lays the foundation for delta streaming:

- AssistantMessage.ToolCall gains a `partial` boolean. When `true`,
  `arguments` carries only the delta fragment from the latest chunk
  and downstream tool execution must not run on that frame. The
  existing 4-arg constructor delegates with `partial=false` so all
  existing call sites remain source-compatible.

- MessageAggregator now stitches incoming ToolCall frames keyed by
  id. Non-partial frames are treated as authoritative (the provider
  has the complete arguments string); partial frames append fragments
  as a fallback for providers that emit only deltas.

Provider stream wiring follows in a subsequent commit; this change
is intentionally isolated to the common types and aggregator so the
contract can be reviewed independently.

spring-projectsgh-5910

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: i-zoufeng <15536835114@163.com>
When DeepSeek emits a tool call mid-stream, DeepSeekApi previously
collapsed the entire SSE window into a single merged chunk before
exposing it. Clients only saw the tool call after every argument
fragment had arrived — losing the per-token UX for tools whose
arguments are themselves the user-visible output (e.g. code
generation tools that the model writes via a write_code tool).

This commit replaces the windowUntil + reduce in
DeepSeekApi#chatCompletionStream with a window expansion that emits:
- one frame per pre-merge SSE chunk, carrying that chunk's own
  argument fragment with the cumulative tool-call identity
  (id/type/name) stamped from earlier chunks and finish_reason left
  unset; and
- a single terminal merged frame with the concatenated arguments and
  finish_reason = TOOL_CALLS, which is the authoritative frame
  downstream tool execution and message aggregation act on.

DeepSeekChatModel#buildGeneration marks the partial frames with
ToolCall.partial = true so AssistantMessage.hasToolCalls() returns
false on intermediate frames, keeping tool-execution gating, advisor
routing, and persistence on the final merged frame only.

The empty SSE marker chunk that closes the window
(`delta: {}, finish_reason: tool_calls`) is dropped — the merged
frame already carries the finish_reason.

spring-projectsgh-5910

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: i-zoufeng <15536835114@163.com>
Mirror the DeepSeek change for Mistral: replace MistralAiApi's
windowUntil + reduce in chatCompletionStream with a window
expansion that emits one frame per pre-merge SSE chunk (carrying
that chunk's argument fragment with the cumulative tool-call
identity stamped on) followed by the merged authoritative frame.

MistralAiChatModel#buildGeneration uses the absence of finish_reason
on a tool-call chunk as the partial signal, so partial frames roll
up into the same MessageAggregator path that DeepSeek already uses.

spring-projectsgh-5910

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: i-zoufeng <15536835114@163.com>
Replace MiniMaxApi#chatCompletionStream's windowUntil + reduce with a
window expansion that emits one frame per pre-merge SSE chunk
(carrying that chunk's argument fragment with the cumulative
tool-call identity stamped on) followed by the merged authoritative
frame. MiniMaxChatModel marks intermediate frames with
ToolCall.partial = true based on the absence of finish_reason, so
the existing reduce-based merge that handles MiniMax's
empty-id-means-continuation quirk now produces partial frames during
streaming and a single non-partial complete frame at the end.

spring-projectsgh-5910

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: i-zoufeng <15536835114@163.com>
Two related changes to OpenAiChatModel#internalStream:

- Mark each streaming chunk's tool-call frames as partial so
  AssistantMessage.hasToolCalls() and the tool-execution gate stay
  silent on intermediate frames. Per-chunk frames carry only the
  argument fragment that arrived in their SSE event.

- Replace the .collectList().flatMapMany(...) terminal aggregation
  with a doOnNext-based side-channel accumulator and a concatWith
  that emits a single non-partial aggregated frame after the SSE
  stream completes. Per-chunk frames now flow through to subscribers
  as they arrive — restoring streaming UX for both plain text and
  tool-call use cases — while the aggregated frame remains the
  authoritative target for tool execution and recursive next rounds.

Tool execution gating moves to the post-aggregation flatMap and
relies on the partial filter inside hasToolCalls(): only the
non-partial aggregated frame triggers ToolExecutionEligibilityPredicate
and the recursive internalStream call.

spring-projectsgh-5910

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: i-zoufeng <15536835114@163.com>
Anthropic's content_block_delta with input_json events were being
silently buffered into StreamingState until message_delta closed the
turn — so subscribers only saw the complete tool call, never the
per-token argument deltas.

Emit a partial ToolCall frame for each input_json delta, carrying
just the fragment from that SSE event with the tool id/name stamped
from the in-flight content_block_start. The partial flag keeps the
tool-execution gate, advisor routing, and persistence quiet on these
intermediate frames; the message_delta event still emits the
authoritative non-partial ToolCall list with the final concatenated
arguments.

spring-projectsgh-5910

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: i-zoufeng <15536835114@163.com>
ConverseChatResponseStream previously buffered tool_use input
fragments into StreamingToolCallBuilder and only emitted the
complete tool call from visitMetadata once the stream closed.

Emit a partial ToolCall frame from visitContentBlockDelta for each
tool_use input chunk, carrying just that event's input fragment with
the in-flight builder's id and name stamped on. The partial flag
keeps the tool-execution gate, advisor routing, and persistence
quiet on these intermediate frames; visitMetadata still emits the
authoritative non-partial tool calls when the stream completes.

spring-projectsgh-5910

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: i-zoufeng <15536835114@163.com>
@i-zoufeng i-zoufeng force-pushed the gh-5910-stream-tool-call-argument-deltas branch from 568f070 to fa9a7f3 Compare May 7, 2026 14:55
…stream-tool-call-argument-deltas

# Conflicts:
#	models/spring-ai-minimax/src/main/java/org/springframework/ai/minimax/MiniMaxChatModel.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants