Skip to content

Multi-turn tool call chat completions conversations#712

Open
jaredoconnell wants to merge 14 commits into
vllm-project:mainfrom
jaredoconnell:feat/multi-turn-tools-chat
Open

Multi-turn tool call chat completions conversations#712
jaredoconnell wants to merge 14 commits into
vllm-project:mainfrom
jaredoconnell:feat/multi-turn-tools-chat

Conversation

@jaredoconnell
Copy link
Copy Markdown
Collaborator

Summary

This PR adds client-side chat completions conversations to the http backend.

Details

  • The design is data-driven. The only data passed to the backend is an API field not included in datasets (required vs auto) and the behavior on how to handle missing tool calls.
  • Allows external datasets or synthetic data. If synthetic data, it's a simple JSON result with a field populated with the same generation logic as any other synthetic data in GuideLLM.
  • Allows specifying tool calls as auto or required to the model. Good for testing various scenarios. Models behave differently depending on the value set. required is best for predictability.
  • Allows specifying how to handle missing tool calls. Useful to set whether it's okay or an error condition, and if it is okay, whether to end the conversation early or continue the conversation.

Test Plan

  • Run the tests
  • Follow the documentation to run vLLM with tool calls enabled

Related Issues


  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

Copy link
Copy Markdown
Collaborator

@sjmonson sjmonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not finished reviewing; will add more comments in a bit. Github is acting up and not letting me add to this review for some reason.

Comment thread src/guidellm/backends/openai/http.py Outdated
Comment thread src/guidellm/benchmark/outputs/html.py Outdated
Comment thread src/guidellm/data/preprocessors/mappers.py Outdated
Comment thread src/guidellm/schemas/info.py Outdated
Comment thread src/guidellm/scheduler/worker.py
Comment thread src/guidellm/backends/openai/request_handlers.py Outdated
Comment thread src/guidellm/backends/openai/http.py Outdated
Comment thread src/guidellm/backends/openai/http.py Outdated
Comment thread src/guidellm/backends/openai/request_handlers.py Outdated
Comment thread src/guidellm/backends/openai/request_handlers.py
Comment thread src/guidellm/backends/openai/request_handlers.py Outdated
Comment thread src/guidellm/backends/openai/request_handlers.py Outdated
Comment thread src/guidellm/backends/openai/request_handlers.py Outdated
Comment thread src/guidellm/schemas/response.py Outdated
Comment thread src/guidellm/schemas/response.py Outdated
Comment thread src/guidellm/__main__.py Outdated
Comment thread src/guidellm/__main__.py Outdated
Comment thread docs/guides/multiturn.md
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 5, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jaredoconnell.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label May 5, 2026
Comment thread docs/guides/multiturn.md
Comment thread src/guidellm/data/deserializers/synthetic.py Outdated
Comment thread src/guidellm/scheduler/worker.py
Copy link
Copy Markdown
Collaborator

@dbutenhof dbutenhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally got through this. Looks good. Needs a rebase.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 6, 2026

@jaredoconnell, this project requires a linear history on feature branches.
Your PR contains merge commits. Please rebase your branch against main
and remove them.

You can do this by running:
git pull --rebase upstream main

@sjmonson
Copy link
Copy Markdown
Collaborator

sjmonson commented May 7, 2026

augment review

@augmentcode
Copy link
Copy Markdown

augmentcode Bot commented May 7, 2026

🤖 Augment PR Summary

Summary: This PR adds support for pre-planned multi-turn tool-calling conversations when benchmarking OpenAI-compatible /v1/chat/completions backends.

Changes:

  • Introduces a per-backend tool_call_missing_behavior mode to control whether missing tool calls error, cancel, or continue.
  • Adds tool-call payload propagation via new StreamingToolCall schemas and wires them through GenerationResponse and request stats.
  • Extends the chat completions request handler to inject per-turn tool definitions, serialize tool calls, and rebuild streamed tool_calls across SSE deltas.
  • Adds synthetic dataset support for tool-call turns, including optional variable-length synthetic tool responses.
  • Updates the column mapper and finalizer to support sparse per-turn tool/tool-response columns and to flag tool-call turns.
  • Adds a dataset preprocessor to extract prompts/system/tool messages from OpenAI-style messages arrays.
  • Updates docs with a dedicated Tool Calling guide and adds extensive unit tests covering validation and worker behavior.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread src/guidellm/backends/openai/http.py
Comment thread docs/guides/tool_calling.md Outdated
Comment thread src/guidellm/scheduler/worker.py
Comment thread src/guidellm/scheduler/worker.py
Comment thread src/guidellm/data/preprocessors/mappers.py Outdated
Comment thread src/guidellm/backends/openai/http.py
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 12, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jaredoconnell.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Comment thread src/guidellm/backends/openai/http.py
Comment thread docs/guides/tool_calling.md Outdated
Comment thread src/guidellm/scheduler/worker.py
@sjmonson
Copy link
Copy Markdown
Collaborator

@Mergifyio rebase

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 14, 2026

rebase

❌ Base branch update has failed

Details

Git reported the following error:

Rebasing (1/14)
Auto-merging docs/guides/multiturn.md
Auto-merging src/guidellm/__main__.py
CONFLICT (content): Merge conflict in src/guidellm/__main__.py
Auto-merging src/guidellm/backends/openai/http.py
CONFLICT (content): Merge conflict in src/guidellm/backends/openai/http.py
Auto-merging src/guidellm/backends/openai/request_handlers.py
Auto-merging src/guidellm/data/preprocessors/mappers.py
Auto-merging src/guidellm/data/schemas.py
Auto-merging src/guidellm/utils/cli.py
CONFLICT (content): Merge conflict in src/guidellm/utils/cli.py
error: could not apply ea40711... AI Generated Continuation-based client-side tool call support
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply ea40711... AI Generated Continuation-based client-side tool call support

@jaredoconnell
Copy link
Copy Markdown
Collaborator Author

I will rebase locally. I enabled rerere in anticipation of this.

dbutenhof
dbutenhof previously approved these changes May 14, 2026
Copy link
Copy Markdown
Collaborator

@dbutenhof dbutenhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good; I ran it against my server with Qwen/Qwen3-0.6B and the GuideLLM sample command from tool_calling.md, and it seems to run OK:

ℹ Tool Call Metrics Statistics (Completed Requests)
|===========|======|=======|=======|=======|======|=======|======|======|
| Benchmark | Output Tokens             |||| Output Count            ||||
| Strategy  | Per Request || Per Second   || Per Request || Per Second ||
|           | Mdn  | p95   | Mdn   | Mean  | Mdn  | p95   | Mdn  | Mean |
|-----------|------|-------|-------|-------|------|-------|------|------|
| constant  | 77.0 | 692.0 | 152.7 | 309.4 | 1.0  | 19.0  | 1.5  | 4.8  |
|===========|======|=======|=======|=======|======|=======|======|======|

jaredoconnell and others added 13 commits May 14, 2026 17:48
Assisted-by: Cursor AI
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Assisted-by: Cursor AI
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Adds variable size responses for synthetic data, and better handles edge cases for external datasets.
Also improves documentation.

Assisted-by: Cursor AI
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Assisted-by: Cursor AI
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Assisted-by: Cursor AI
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Extracts functionality to new static methods.

Assisted-by: Claude Code Sonnet 4.5
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
These are the diffs recommended in the comments. They are untested, and require some follow-up changes.

Co-authored-by: Samuel Monson <smonson@irbash.net>
Signed-off-by: Jared O'Connell <46976761+jaredoconnell@users.noreply.github.com>
Moves documentation. Switches fully to exceptions to stop conversations early.
Also includes info gained from vLLM contributor.

Assisted-by: Cursor AI
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Assisted-by: Cursor AI Claude 4.6 Opus High Thinking
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Assisted-by: Cursor AI Claude 4.6 Opus High Thinking
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Assisted-by: Cursor AI
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Removed all worker logic except changed conversations to end if any exception occurs.

Assisted-by: Cursor AI Claude 4.6 Opus High
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Assisted-by: Cursor AI
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support benchmarking of reasoning and tool calling Chat Completions requests

3 participants