Skip to content

fix: Qwen3.5 tool-call chat-template tokenization#634

Merged
vivekkalyan merged 2 commits intomainfrom
fix/qwen3.5-tool-call-chat-template
Apr 1, 2026
Merged

fix: Qwen3.5 tool-call chat-template tokenization#634
vivekkalyan merged 2 commits intomainfrom
fix/qwen3.5-tool-call-chat-template

Conversation

@vivekkalyan
Copy link
Copy Markdown
Collaborator

@vivekkalyan vivekkalyan commented Mar 27, 2026

This PR fixes Qwen3.5 training-time tokenization for assistant tool calls.

What changed

  • adds a regression test for a trajectory that contains an assistant Choice with tool calls whose function.arguments are still OpenAI-style JSON strings
  • normalizes tool-call arguments to mappings before tokenizer.apply_chat_template() when the loaded chat template uses tool_call.arguments|items
  • documents the Qwen3.5-specific motivation inline at the call site

Why

Qwen3.5's Hugging Face chat template iterates over tool-call arguments with tool_call.arguments|items, which requires a mapping.

ART was passing the OpenAI response shape through training preprocessing unchanged, where Choice.message.tool_calls[*].function.arguments is a JSON string. Once a sampled trajectory with assistant tool calls reached tokenization, Jinja raised TypeError: Can only get item pairs from a mapping.

ART already handled this shape on the rendering side; the training preprocessing path was the missing piece.

Impact

Qwen3.5 training runs that include assistant tool calls no longer fail during apply_chat_template().

@vivekkalyan vivekkalyan changed the title [codex] Fix Qwen3.5 tool-call chat-template tokenization fix: Qwen3.5 tool-call chat-template tokenization Mar 28, 2026
@vivekkalyan vivekkalyan force-pushed the fix/qwen3.5-tool-call-chat-template branch from fcfa43c to e3c3d73 Compare March 30, 2026 20:29
PeterWofford added a commit to PeterWofford/ART that referenced this pull request Apr 1, 2026
…nizer-level fix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vivekkalyan vivekkalyan marked this pull request as ready for review April 1, 2026 23:43
Copy link
Copy Markdown

@PeterWofford PeterWofford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix — clean scope, good test, and the template detection ("tool_call.arguments|items" check) is a smart way to avoid unnecessary work for non-Qwen models.

A few things I noticed while testing this against our IVR nav training pipeline (Qwen3.5-35B-A3B):

1. assert isinstance(arguments_json, str) is too strict (line 58)

If arguments are already a dict (e.g. from an upstream normalization path), this crashes instead of being a no-op. We hit this when combining your PR with another fix that normalized earlier in the pipeline.

Suggest replacing the assert with a type check:

arguments = function["arguments"]
if isinstance(arguments, str):
    arguments = json.loads(arguments)
elif not isinstance(arguments, dict):
    raise TypeError(f"Expected str or dict for tool_call arguments, got {type(arguments)}")

2. No json.loads error handling

If a tool_call has malformed JSON in arguments, this will raise JSONDecodeError with no context. Low probability but worth a defensive try/except for a library:

try:
    arguments = json.loads(arguments_json)
except (json.JSONDecodeError, ValueError):
    arguments = {}

3. chat_template can be a dict

Some multi-template tokenizers return dict[str, str] from tokenizer.chat_template. The assert isinstance(chat_template, str) on line 35 would crash for those. Could guard with:

if not isinstance(chat_template, str):
    return messages

4. Test coverage gap — context messages

The test covers a Choice object (generated turn) but not a plain dict message with string tool_call arguments. Training data context messages often have this shape too and flow through the same path. Worth adding a test case with a context dict that has string arguments alongside the Choice.


Tested this PR end-to-end on our Qwen3.5-35B-A3B GRPO+RULER training run — with the assert relaxed, it works correctly. Training is running clean.

@vivekkalyan vivekkalyan merged commit 7ffa8b4 into main Apr 1, 2026
9 checks passed
@vivekkalyan
Copy link
Copy Markdown
Collaborator Author

choosing to not to be too defensive unless these scenarios actually would happen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants