Multi-turn tool call chat completions conversations by jaredoconnell · Pull Request #712 · vllm-project/guidellm

jaredoconnell · 2026-04-29T18:06:09Z

Summary

This PR adds client-side chat completions conversations to the http backend.

Details

The design is data-driven. The only data passed to the backend is an API field not included in datasets (required vs auto) and the behavior on how to handle missing tool calls.
Allows external datasets or synthetic data. If synthetic data, it's a simple JSON result with a field populated with the same generation logic as any other synthetic data in GuideLLM.
Allows specifying tool calls as auto or required to the model. Good for testing various scenarios. Models behave differently depending on the value set. required is best for predictability.
Allows specifying how to handle missing tool calls. Useful to set whether it's okay or an error condition, and if it is okay, whether to end the conversation early or continue the conversation.

Test Plan

Run the tests
Follow the documentation to run vLLM with tool calls enabled

Related Issues

Resolves Support benchmarking of reasoning and tool calling Chat Completions requests #416

"I certify that all code in this PR is my own, except as noted below."

Use of AI

Includes AI-assisted code completion
Includes code generated by an AI application
Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

sjmonson

Not finished reviewing; will add more comments in a bit. Github is acting up and not letting me add to this review for some reason.

mergify · 2026-05-05T01:42:30Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jaredoconnell.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

dbutenhof

Finally got through this. Looks good. Needs a rebase.

mergify · 2026-05-06T21:43:45Z

@jaredoconnell, this project requires a linear history on feature branches.
Your PR contains merge commits. Please rebase your branch against main
and remove them.

You can do this by running:
git pull --rebase upstream main

sjmonson · 2026-05-07T23:31:17Z

augment review

augmentcode · 2026-05-07T23:38:36Z

🤖 Augment PR Summary

Summary: This PR adds support for pre-planned multi-turn tool-calling conversations when benchmarking OpenAI-compatible /v1/chat/completions backends.

Changes:

Introduces a per-backend tool_call_missing_behavior mode to control whether missing tool calls error, cancel, or continue.
Adds tool-call payload propagation via new StreamingToolCall schemas and wires them through GenerationResponse and request stats.
Extends the chat completions request handler to inject per-turn tool definitions, serialize tool calls, and rebuild streamed tool_calls across SSE deltas.
Adds synthetic dataset support for tool-call turns, including optional variable-length synthetic tool responses.
Updates the column mapper and finalizer to support sparse per-turn tool/tool-response columns and to flag tool-call turns.
Adds a dataset preprocessor to extract prompts/system/tool messages from OpenAI-style messages arrays.
Updates docs with a dedicated Tool Calling guide and adds extensive unit tests covering validation and worker behavior.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 2 suggestions posted.

Comment augment review to trigger a new review at any time.

mergify · 2026-05-12T02:41:49Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jaredoconnell.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

sjmonson · 2026-05-14T18:18:59Z

@Mergifyio rebase

mergify · 2026-05-14T18:19:12Z

rebase

❌ Base branch update has failed

Details

Git reported the following error:

Rebasing (1/14)
Auto-merging docs/guides/multiturn.md
Auto-merging src/guidellm/__main__.py
CONFLICT (content): Merge conflict in src/guidellm/__main__.py
Auto-merging src/guidellm/backends/openai/http.py
CONFLICT (content): Merge conflict in src/guidellm/backends/openai/http.py
Auto-merging src/guidellm/backends/openai/request_handlers.py
Auto-merging src/guidellm/data/preprocessors/mappers.py
Auto-merging src/guidellm/data/schemas.py
Auto-merging src/guidellm/utils/cli.py
CONFLICT (content): Merge conflict in src/guidellm/utils/cli.py
error: could not apply ea40711... AI Generated Continuation-based client-side tool call support
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply ea40711... AI Generated Continuation-based client-side tool call support

jaredoconnell · 2026-05-14T18:39:10Z

I will rebase locally. I enabled rerere in anticipation of this.

dbutenhof

The code looks good; I ran it against my server with Qwen/Qwen3-0.6B and the GuideLLM sample command from tool_calling.md, and it seems to run OK:

ℹ Tool Call Metrics Statistics (Completed Requests)
|===========|======|=======|=======|=======|======|=======|======|======|
| Benchmark | Output Tokens             |||| Output Count            ||||
| Strategy  | Per Request || Per Second   || Per Request || Per Second ||
|           | Mdn  | p95   | Mdn   | Mean  | Mdn  | p95   | Mdn  | Mean |
|-----------|------|-------|-------|-------|------|-------|------|------|
| constant  | 77.0 | 692.0 | 152.7 | 309.4 | 1.0  | 19.0  | 1.5  | 4.8  |
|===========|======|=======|=======|=======|======|=======|======|======|

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Adds variable size responses for synthetic data, and better handles edge cases for external datasets. Also improves documentation. Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Extracts functionality to new static methods. Assisted-by: Claude Code Sonnet 4.5 Signed-off-by: Jared O'Connell <joconnel@redhat.com>

These are the diffs recommended in the comments. They are untested, and require some follow-up changes. Co-authored-by: Samuel Monson <smonson@irbash.net> Signed-off-by: Jared O'Connell <46976761+jaredoconnell@users.noreply.github.com>

Moves documentation. Switches fully to exceptions to stop conversations early. Also includes info gained from vLLM contributor. Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Assisted-by: Cursor AI Claude 4.6 Opus High Thinking Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Removed all worker logic except changed conversations to end if any exception occurs. Assisted-by: Cursor AI Claude 4.6 Opus High Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

sjmonson requested changes May 1, 2026

View reviewed changes

mergify Bot added the needs-rebase label May 5, 2026

jaredoconnell commented May 5, 2026

View reviewed changes

Comment thread docs/guides/multiturn.md

Comment thread src/guidellm/data/deserializers/synthetic.py Outdated

Comment thread src/guidellm/scheduler/worker.py

dbutenhof reviewed May 6, 2026

View reviewed changes

augmentcode Bot reviewed May 7, 2026

View reviewed changes

Comment thread src/guidellm/backends/openai/http.py

Comment thread docs/guides/tool_calling.md Outdated

sjmonson requested changes May 8, 2026

View reviewed changes

Comment thread src/guidellm/scheduler/worker.py

Comment thread src/guidellm/scheduler/worker.py

Comment thread src/guidellm/data/preprocessors/mappers.py Outdated

Comment thread src/guidellm/backends/openai/http.py

jaredoconnell commented May 12, 2026

View reviewed changes

Comment thread src/guidellm/backends/openai/http.py

Comment thread docs/guides/tool_calling.md Outdated

Comment thread src/guidellm/scheduler/worker.py

dbutenhof previously approved these changes May 14, 2026

View reviewed changes

jaredoconnell and others added 13 commits May 14, 2026 17:48

AI Generated Continuation-based client-side tool call support

c958d81

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Redesign tool call design to make it entirely data driven

00aff78

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Improve handling of multi-turn data

3ff6878

Adds variable size responses for synthetic data, and better handles edge cases for external datasets. Also improves documentation. Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Clarify the type of tool call support

e361862

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Use a more generic design and improve docs

fcedaab

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Fix linter errors and test

bc5be0a

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Refactor code to improve clarity

003104d

Extracts functionality to new static methods. Assisted-by: Claude Code Sonnet 4.5 Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Apply suggestions from code review

f35b787

These are the diffs recommended in the comments. They are untested, and require some follow-up changes. Co-authored-by: Samuel Monson <smonson@irbash.net> Signed-off-by: Jared O'Connell <46976761+jaredoconnell@users.noreply.github.com>

Addressed review feedback and integrated prior suggestions

5121cc7

Moves documentation. Switches fully to exceptions to stop conversations early. Also includes info gained from vLLM contributor. Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Clarify and correct removal of output token limits on tool call turns

ff36b57

Assisted-by: Cursor AI Claude 4.6 Opus High Thinking Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Add way of specifying non-contiguous tool call turns in synthetic data

3a30731

Assisted-by: Cursor AI Claude 4.6 Opus High Thinking Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Fix linting errors

8e69fd1

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Address review comments

2631688

Removed all worker logic except changed conversations to end if any exception occurs. Assisted-by: Cursor AI Claude 4.6 Opus High Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Update unit tests for changes made in main

fbc667c

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

jaredoconnell dismissed dbutenhof’s stale review via fbc667c May 14, 2026 22:04

jaredoconnell force-pushed the feat/multi-turn-tools-chat branch from 5fb71c1 to fbc667c Compare May 14, 2026 22:04

mergify Bot removed the needs-rebase label May 14, 2026

dbutenhof approved these changes May 15, 2026

View reviewed changes

Conversation

jaredoconnell commented Apr 29, 2026

Summary

Details

Test Plan

Related Issues

Use of AI

Uh oh!

sjmonson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented May 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dbutenhof left a comment

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented May 6, 2026

Uh oh!

sjmonson commented May 7, 2026

Uh oh!

augmentcode Bot commented May 7, 2026

Uh oh!

augmentcode Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented May 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sjmonson commented May 14, 2026

Uh oh!

mergify Bot commented May 14, 2026

❌ Base branch update has failed

Uh oh!

jaredoconnell commented May 14, 2026

Uh oh!

dbutenhof left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants