Skip to content

refactor and support for multi algs fusion#1852

Open
n1ck-guo wants to merge 26 commits into
mainfrom
hengguo/refactor_algs
Open

refactor and support for multi algs fusion#1852
n1ck-guo wants to merge 26 commits into
mainfrom
hengguo/refactor_algs

Conversation

@n1ck-guo
Copy link
Copy Markdown
Contributor

@n1ck-guo n1ck-guo commented May 26, 2026

Description

What this PR does
Introduces a composable QuantizationPipeline that separates pre-processing algorithms (e.g. AWQ) from the terminal block-quantizer (e.g. AutoRound/RTN), and lets users compose them declaratively via config lists.

Key changes:

  • QuantizationPipeline (algorithms/quantization/pipeline.py): new orchestration layer — [preprocessors…] + block_quantizer. Replaces the implicit algorithm coupling in DataDrivenCompressor.
  • BasePipelineMember / BaseWeightTransformer / BaseQuantizer (base.py): clean class hierarchy with unified lifecycle hooks (prepare_run, quantize_block, finalize_run).
  • AWQConfig + AWQQuantizer refactored as a BaseWeightTransformer — pure weight-smoothing preprocessor, no quantization loop of its own.
  • DiffusionMixin injected dynamically at pipeline construction time (is_diffusion=True) — no if is_diffusion branches in algorithm code.
  • CLI (auto_round/cli/) rewritten to expose --alg_configs for composing pipelines from the command line.

Usage: AWQ + AutoRound fusion

from auto_round import AutoRound
from auto_round.algorithms.quantization.awq.config import AWQConfig
from auto_round.algorithms.quantization.sign_round.config import SignRoundConfig
ar = AutoRound(
    [AWQConfig(), SignRoundConfig(iters=200)],
    model_name,
    scheme="W4A16",
)
model, layer_config = ar.quantize()

Passing a list of configs activates the pipeline: AWQ smoothing runs first on each block, then AutoRound's SignSGD optimization runs on the smoothed weights. Passing a single config (old API) continues to work unchanged.
Compatibility

  • Single-config API (AutoRound(model, ...)) is fully backward compatible.
  • All existing CPU tests pass; pre-existing environment failures (missing auto-round-lib, device fixtures) are unrelated to this PR.

Type of Change

New feature

Related Issues

Fixes or relates to #

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.
  • The CUDA CI has passed. You can trigger it by commenting /azp run Unit-Test-CUDA-AutoRound.

Signed-off-by: n1ck-guo <heng.guo@intel.com>
Copilot AI review requested due to automatic review settings May 26, 2026 00:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a composable quantization “pipeline” abstraction that separates preprocessors (e.g., AWQ smoothing) from the terminal block quantizer (e.g., RTN / SignRound), and refactors compressor + CLI wiring to support multi-algorithm composition via ordered config lists.

Changes:

  • Added QuantizationPipeline / BlockContext orchestration and a unified quantizer/preprocessor lifecycle (prepare_run, block_forward_hooks, pre_quantize_block, quantize_block, finalize_run).
  • Refactored quantizers (RTN, SignRound, SignRoundV2, AWQ) to use the new pipeline context and hook model.
  • Rewrote the CLI into auto_round/cli/ with algorithm handlers + config building, and added docs + tests for pipeline composition.

Reviewed changes

Copilot reviewed 37 out of 37 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/test_cuda/quantization/test_torch_compile.py Update test imports/usages for new BaseQuantizer naming.
test/test_cuda/algorithms/test_alg_ext.py Update alg-ext regression expectations for SignRoundV2 wrapper behavior.
test/test_cpu/utils/test_alg_ext.py Update import smoke for new SignRoundV2 location.
test/test_cpu/core/test_pipeline_fail_fast.py New unit tests for registry + pipeline construction failure modes.
test/test_cpu/core/test_awq_autoround_smoke.py New smoke test for AWQ + AutoRound fusion via config list.
docs/step_by_step.md Document AWQ as a preprocessing algorithm + CLI/API composition examples.
docs/step_by_step_CN.md Chinese translation updates for the AWQ section.
auto_round/context/model.py Minor refactor/formatting + rename doc reference to BaseQuantizer.
auto_round/compressors/zero_shot.py Adapt zero-shot RTN path to use BlockContext.
auto_round/compressors/entry.py Centralized alias/config resolution + routing for pipeline/preprocessors and model-free path.
auto_round/compressors/data_driven.py Refactor block loop to pipeline lifecycle + hook scheduling via BlockContext.
auto_round/compressors/config.py Removed legacy ExtraConfig container module.
auto_round/compressors/base.py Build and expose QuantizationPipeline; quantizer becomes a forwarding property.
auto_round/compressors/init.py Remove legacy config exports; keep lazy imports for compressors.
auto_round/cli/parser.py New argparse construction module for quantize/list/eval commands.
auto_round/cli/main.py New CLI router + recipe defaults + algorithm-help printing.
auto_round/cli/algorithms.py New CLI algorithm registry/handlers that build ordered config lists.
auto_round/cli/init.py Export CLI entrypoints from the new CLI package.
auto_round/calibration/state.py Update doc reference from BaseQuantizers to BaseQuantizer.
auto_round/autoround.py Add alg_configs fast-path into new entry point; adjust skip args.
auto_round/algorithms/transforms/rotation/config.py Add/expand RotationConfig docstring and minor formatting.
auto_round/algorithms/quantization/sign_roundv2/quantizer.py Move imatrix hooks into quantizer hook context + minor cleanups.
auto_round/algorithms/quantization/sign_round/quantizer.py Convert block quantization API to ctx (BlockContext) and use BlockIO.
auto_round/algorithms/quantization/sign_round/config.py Replace sparse docstring with a clearer/structured one.
auto_round/algorithms/quantization/rtn/quantizer.py Convert RTN to ctx API; move imatrix hook logic into context manager.
auto_round/algorithms/quantization/rtn/config.py Add docstring and minor logging formatting.
auto_round/algorithms/quantization/registry.py New alias→config registry used by entry routing.
auto_round/algorithms/quantization/pipeline.py New pipeline abstraction (BlockContext, RunContext, policy merge, IO helpers).
auto_round/algorithms/quantization/config.py Add top-level QuantizationConfig docstring.
auto_round/algorithms/quantization/base.py Introduce BasePipelineMember, BaseWeightTransformer, BaseQuantizer, mixins; new hook lifecycle.
auto_round/algorithms/quantization/awq/mappings.py Downgrade a log line from warning to info for hybrid-attention mapping build.
auto_round/algorithms/quantization/awq/config.py Reframe AWQConfig as a preprocessor config; improve validation and repr.
auto_round/algorithms/quantization/awq/init.py Export AWQConfig/AWQQuantizer from the awq subpackage.
auto_round/algorithms/quantization/init.py Re-export pipeline types/quantizers; currently contains duplicate AWQ imports.
auto_round/alg_ext.py Removed legacy alg-ext implementation module.
auto_round/main.py Replace huge legacy CLI with a shim that forwards to auto_round.cli.main.
Comments suppressed due to low confidence (1)

auto_round/algorithms/quantization/init.py:41

  • Duplicate imports for AWQConfig/AWQQuantizer are present twice in this module, which is redundant and can trigger linting/formatting issues. Remove the second pair of imports and keep a single import location.
from auto_round.algorithms.quantization.awq.config import AWQConfig
from auto_round.algorithms.quantization.awq.quantizer import AWQQuantizer
from auto_round.algorithms.quantization.rtn.config import RTNConfig
from auto_round.algorithms.quantization.rtn.quantizer import RTNQuantizer, OptimizedRTNQuantizer

Comment thread auto_round/compressors/entry.py Outdated
Comment thread auto_round/compressors/data_driven.py
Comment thread auto_round/compressors/__init__.py
@wenhuach21
Copy link
Copy Markdown
Contributor

The common tuning parameters should be transferable across different algorithms, e.g., for awq and signround, the shared parameter is weight clip ratio

n1ck-guo and others added 7 commits May 26, 2026 10:06
Comment thread AGENTS.md Outdated
Comment thread auto_round/algorithms/base.py Outdated
Comment thread auto_round/algorithms/quantization/registry.py Outdated
Comment thread auto_round/algorithms/quantization/config.py
n1ck-guo added 5 commits May 28, 2026 15:14
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
@n1ck-guo n1ck-guo requested a review from WeiweiZhang1 May 29, 2026 01:25
n1ck-guo added 3 commits May 29, 2026 10:22
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
@chensuyue
Copy link
Copy Markdown
Contributor

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

n1ck-guo added 5 commits June 3, 2026 15:54
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
@chensuyue
Copy link
Copy Markdown
Contributor

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants