feat: add MXFP4/MXFP8 quantization support (llmc_compressor format) and related tests by xin3he · Pull Request #1865 · intel/auto-round

xin3he · 2026-05-28T05:51:28Z

Description

This pull request adds support for MXFP4 and MXFP8 (Mixed-Precision Floating Point) model-free quantization to the auto_round library, alongside robust handling and configuration for these schemes. The changes include new quantization logic, configuration generation, updated validation, and comprehensive tests to ensure correct behavior and compatibility with the existing codebase.

MXFP4/MXFP8 Quantization Support:

Added support for MXFP4 and MXFP8 quantization schemes in model-free mode, including new logic for quantizing weights, generating output in the correct packed formats, and integrating with existing APIs.
Implemented _quantize_weight_mxfp function for quantizing weights into MXFP formats, utilizing existing utilities for exponent sharing and packing.
Updated scheme validation to allow MXFP4 and MXFP8 (with group_size=32) and improved error messages for unsupported configurations.

Configuration and Format Handling:

Added logic to generate quantization configuration files in the compressed-tensors/llm-compressor style for MXFP schemes, ensuring compatibility with downstream tools.

Robustness Improvements:

Modified shard processing to preserve original quantized tensors (such as FP8, FP4-packed) for ignored or skipped layers, preventing unwanted dequantization.
Added and updated tests to verify correct preservation of original quantized tensors and to ensure MXFP quantization logic produces expected output shapes and formats.

Testing Enhancements:

Added comprehensive end-to-end and unit tests for MXFP4/MXFP8 quantization, configuration output, and scheme validation, ensuring robust coverage of new features.

Miscellaneous:

Updated supported and unsupported scheme lists in tests to reflect new MXFP support.

These changes collectively enable efficient and correct model-free quantization using MXFP4 and MXFP8, while maintaining compatibility and robustness across the quantization pipeline.

Type of Change

New feature

Related Issues

Fixes or relates to #1741

MXFP8 and MXFP4 Accuracy is verified with Qwen/Qwen3-0.6B using lm_eval vllm backend.
To quantize deepseekv4

auto-round /workspace/models/deepseek-ai/DeepSeek-V4-Flash --model_free --scheme MXFP4 --ignore_layers  ffn.experts --output_dir /workspace/models/deepseek-ai/DeepSeek-V4-Flash-MXFP4

loading deepseekV4 MXFP4/MXFP8 CT format with vLLM is WIP

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.
The CUDA CI has passed. You can trigger it by commenting /azp run Unit-Test-CUDA-AutoRound.

Copilot

Pull request overview

Adds MXFP4 / MXFP8 model-free (RTN) quantization to the auto_round library, routing through the existing quant_mx shared-exponent logic and QuantLinear.pack() to emit compressed-tensors / llm-compressor style outputs. Also tweaks shard processing to keep already-quantized tensors for ignored/skipped layers, and broadens model-free supported formats.

Changes:

New _quantize_weight_mxfp() plus an MXFP branch in _quantize_single_tensor(), scheme validation, and compressed-tensors style config generation via _build_mxfp_quantization_config().
_process_shard() now preserves original tensors for ignored/skipped .weight layers before FP8/FP4 dequantization; SUPPORTED_FORMATS and quantize_and_save() accept llm_compressor variants.
New unit / end-to-end tests for MXFP4 / MXFP8 model-free flows and an ignored-FP8-preservation test.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
auto_round/compressors/model_free.py	Adds MXFP weight quantization, compressed-tensors config builder, MXFP validation branch, ignored-tensor preservation in shard processing, and expanded supported formats.
test/test_cpu/quantization/test_model_free.py	Adds shape/dtype, end-to-end, AutoRound API, shard, ignored-FP8 preservation, and scheme-validation tests for MXFP.

Signed-off-by: Xin He <xin3.he@intel.com>

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Signed-off-by: Xin He <xin3.he@intel.com>

Copilot AI review requested due to automatic review settings May 28, 2026 05:51

Copilot started reviewing on behalf of xin3he May 28, 2026 05:51 View session

xin3he changed the title ~~feat: add MXFP4/MXFP8 quantization support and related tests~~ feat: add MXFP4/MXFP8 quantization support (llmc_compressor format) and related tests May 28, 2026

Copilot AI reviewed May 28, 2026

View reviewed changes

Comment thread test/test_cpu/quantization/test_model_free.py

Comment thread auto_round/compressors/model_free.py Outdated

Comment thread auto_round/compressors/model_free.py

Comment thread auto_round/compressors/model_free.py

Comment thread auto_round/compressors/model_free.py

xin3he added 2 commits May 28, 2026 06:27

feat: add MXFP4/MXFP8 quantization support and related tests

6618d37

Signed-off-by: Xin He <xin3.he@intel.com>

update document

aedf04f

Signed-off-by: Xin He <xin3.he@intel.com>

xin3he force-pushed the xinhe/5-28 branch from 9f61613 to aedf04f Compare May 28, 2026 06:27

xin3he marked this pull request as draft May 29, 2026 01:16

xin3he and others added 3 commits May 29, 2026 13:10

Potential fix for pull request finding

6c3019d

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

update per comments

82e0962

Signed-off-by: Xin He <xin3.he@intel.com>

fix segmentation failure

87a12cb

Signed-off-by: Xin He <xin3.he@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add MXFP4/MXFP8 quantization support (llmc_compressor format) and related tests#1865

feat: add MXFP4/MXFP8 quantization support (llmc_compressor format) and related tests#1865
xin3he wants to merge 5 commits into
mainfrom
xinhe/5-28

xin3he commented May 28, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xin3he commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xin3he commented May 28, 2026 •

edited

Loading