Skip to content

feat: add overlap function for multi-blocks compression#1850

Open
ZaneMark wants to merge 3 commits into
mainfrom
feature/overlap_for_nblocks
Open

feat: add overlap function for multi-blocks compression#1850
ZaneMark wants to merge 3 commits into
mainfrom
feature/overlap_for_nblocks

Conversation

@ZaneMark
Copy link
Copy Markdown
Contributor

@ZaneMark ZaneMark commented May 25, 2026

Description

Introducing overlap support for multi-block compression, enabling overlapping block windows during tuning.

Motivation:

  • Overlapping windows may improve statistical continuity.

Key changes:

  • Add CLI argument --nblocks_overlap to control overlap between adjacent block windows
  • Implement overlap-aware window iteration

Type of Change

New feature

Checklist Before Submitting

  • [√] My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.
  • The CUDA CI has passed. You can trigger it by commenting /azp run Unit-Test-CUDA-AutoRound.

To Do

  • Dynamic windows and dynamic overlap

Outcome

Qwen3-8B

Model Scheme Blocks/Overlap Alg arc_challenge arc_easy boolq hellaswag lambada_openai mmlu openbookqa piqa truthfulqa_mc1 winogrande AVG
qwen3-8b W4A16 n1_o0 alg 0.5247 0.8249 0.8691 0.5623 0.6365 0.7209 0.2920 0.7677 0.3611 0.6922 0.6251
qwen3-8b W4A16 n1_o0 noalg 0.5350 0.8232 0.8700 0.5589 0.6429 0.7220 0.3120 0.7688 0.3562 0.6843 0.6273
qwen3-8b W4A16 n4_o3 alg 0.5392 0.8262 0.8709 0.5597 0.6425 0.7178 0.3040 0.7650 0.3525 0.6969 0.6275
qwen3-8b W2A16 n4_o3 alg 0.4053 0.7412 0.7734 0.4420 0.4984 0.5682 0.2840 0.7127 0.3244 0.6440 0.5394
qwen3-8b W2A16 n8_o7 alg 0.4096 0.7277 0.7498 0.4412 0.4995 0.5605 0.2800 0.7057 0.3023 0.6409 0.5317
qwen3-8b W2A16 n2_o1 alg 0.4215 0.7563 0.7954 0.4434 0.5115 0.5664 0.2840 0.7176 0.3158 0.6464 0.5458

Llama-3.1-8B-Instruct

Model Scheme Blocks/Overlap Alg arc_challenge arc_easy boolq hellaswag lambada_openai mmlu openbookqa piqa truthfulqa_mc1 winogrande AVG
llama-3.1-8b-instruct W4A16 n1_o0 alg 0.5196 0.8178 0.8486 0.5905 0.7153 0.6707 0.3560 0.7982 0.3684 0.7403 0.6425
llama-3.1-8b-instruct W4A16 n1_o0 noalg 0.5247 0.8249 0.8465 0.5905 0.7178 0.6644 0.3480 0.7971 0.3672 0.7214 0.6403
llama-3.1-8b-instruct W4A16 n2_o1 alg 0.5333 0.8194 0.8502 0.5884 0.7194 0.6700 0.3520 0.7992 0.3611 0.7348 0.6428
llama-3.1-8b-instruct W2A16 n1_o0 noalg 0.3643 0.6818 0.7174 0.4424 0.4333 0.3937 0.2400 0.7155 0.3011 0.6156 0.4905
llama-3.1-8b-instruct W2A16 n1_o0 alg 0.3737 0.7058 0.7798 0.4632 0.5682 0.4530 0.2700 0.7214 0.2913 0.6622 0.5289
llama-3.1-8b-instruct INT4 n1_o0 noalg 0.4309 0.7635 0.8272 0.5509 0.6350 0.5852 0.3060 0.7726 0.3415 0.6843 0.5897
llama-3.1-8b-instruct INT4 n8_o7 noalg 0.4309 0.7630 0.8174 0.5577 0.6422 0.5857 0.3160 0.7650 0.3293 0.6827 0.5890

Copilot AI review requested due to automatic review settings May 25, 2026 06:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds overlap support for multi-block compression/tuning so adjacent block windows can share a configurable number of blocks, aiming to improve continuity during calibration.

Changes:

  • Introduce --nblocks_overlap / nblocks_overlap configuration and validate it (0 <= nblocks_overlap < nblocks).
  • Implement overlap-aware block-window iteration and progress accounting.
  • Add logic to compute “stride” outputs for the next window and snapshot/restore overlapped tail blocks.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
auto_round/compressors/data_driven.py Implements overlap window iteration, stride-based next-input selection, and tail snapshot/restore helpers.
auto_round/compressors/config.py Plumbs nblocks_overlap through extra config/dataclass.
auto_round/compressors/base.py Parses nblocks_overlap and enforces basic validity constraints.
auto_round/main.py Adds the --nblocks_overlap CLI flag and forwards it into tuning config.

Comment thread auto_round/compressors/data_driven.py Outdated
Comment on lines +445 to +447
for module in tail_modules:
state = {key: value.detach().cpu().clone() for key, value in module.state_dict().items()}
attrs = {}
Comment on lines 41 to 47
lr: float = None,
lr_scheduler: Callable = None,
minmax_lr: float = None,
nblocks: int = 1,
nblocks_overlap: int = 0,
to_quant_block_names: Union[str, list, None] = None,
scale_dtype: str = "fp16",
Comment on lines +711 to +722
def _get_block_window_starts(self, block_names: list, nblocks: int) -> list[int]:
overlap = self.nblocks_overlap if nblocks > 1 else 0
stride = nblocks - overlap
block_starts = []
block_idx = 0
while block_idx < len(block_names):
remaining = len(block_names) - block_idx
if block_idx > 0 and overlap > 0 and remaining <= overlap:
break
block_starts.append(block_idx)
block_idx += stride
return block_starts
@wenhuach21
Copy link
Copy Markdown
Contributor

as discussed offline, the implementation probably has bugs from algorithm perspective

@ZaneMark
Copy link
Copy Markdown
Contributor Author

ZaneMark commented Jun 1, 2026

10 tasks acc_norm

W2A16

model scheme nblocks&overlap AVG arc_challenge arc_easy boolq hellaswag lambada_openai mmlu openbookqa piqa truthfulqa_mc1 winogrande
qwen3-0.6b W2A16+noalg n1o0 0.3482 0.2423 0.3657 0.6183 0.3163 0.0943 0.2292 0.2660 0.5745 0.2742 0.5012
qwen3-0.6b W2A16+noalg n2o1 0.3533 0.2398 0.3662 0.6162 0.3195 0.1069 0.2297 0.2920 0.5745 0.2644 0.5233
qwen3-0.6b W2A16+noalg n4o3 0.3603 0.2406 0.3725 0.6187 0.3267 0.1487 0.2295 0.2820 0.5865 0.2742 0.5233
qwen3-0.6b W2A16+noalg nblocks2 0.3496 0.2372 0.3750 0.6031 0.3152 0.0957 0.2297 0.2700 0.5653 0.2815 0.5233
qwen3-0.6b W2A16+noalg nblocks4 0.3430 0.2312 0.3704 0.6028 0.3094 0.0716 0.2290 0.2820 0.5631 0.2729 0.4972
--- --- --- --- --- --- --- --- --- --- --- --- --- ---
qwen3-4b W2A16+noalg n1o0 0.4667 0.3370 0.5762 0.7187 0.4747 0.3002 0.3873 0.3460 0.6556 0.3023 0.5691
qwen3-4b W2A16+noalg n2o1 0.4822 0.3626 0.5985 0.6673 0.4879 0.3611 0.4266 0.3580 0.6621 0.3060 0.5919
qwen3-4b W2A16+noalg n4o3 0.4831 0.3532 0.5947 0.7425 0.4736 0.3600 0.4233 0.3300 0.6600 0.3023 0.5912
--- --- --- --- --- --- --- --- --- --- --- --- --- ---
qwen3-8b W2A16+noalg n1o0 0.5391 0.4087 0.6873 0.8028 0.5629 0.4231 0.5111 0.3700 0.7160 0.2962 0.6133
qwen3-8b W2A16+noalg n2o1 0.5428 0.4326 0.7020 0.7835 0.5616 0.4526 0.5102 0.3480 0.7122 0.2987 0.6267
qwen3-8b W2A16+noalg n4o3 0.5402 0.4147 0.6919 0.8040 0.5485 0.4689 0.4994 0.3480 0.6942 0.2974 0.6346
--- --- --- --- --- --- --- --- --- --- --- --- --- ---
llama-3.1-8b-instruct W2A16+noalg n1o0 0.5148 0.3814 0.6389 0.7275 0.5793 0.4366 0.4016 0.3660 0.7182 0.2840 0.6140
llama-3.1-8b-instruct W2A16+noalg n2o1 0.5184 0.3575 0.6145 0.7771 0.5883 0.4694 0.3900 0.3620 0.7024 0.3072 0.6156
llama-3.1-8b-instruct W2A16+noalg n4o3 0.5023 0.3618 0.6023 0.7266 0.5544 0.4527 0.3481 0.3500 0.6888 0.3048 0.6338

W4A16

model scheme nblocks&overlap AVG arc_challenge arc_easy boolq hellaswag lambada_openai mmlu openbookqa piqa truthfulqa_mc1 winogrande
qwen3-0.6b W4A16+noalg n1o0 0.4450 0.3191 0.5400 0.5853 0.4541 0.3544 0.4034 0.3320 0.6627 0.2668 0.5320
qwen3-0.6b W4A16+noalg n2o1 0.4433 0.3285 0.5210 0.5575 0.4530 0.3658 0.4145 0.3180 0.6589 0.2668 0.5493
qwen3-0.6b W4A16+noalg n4o3 0.4559 0.3166 0.5442 0.6266 0.4524 0.3681 0.4239 0.3260 0.6681 0.2754 0.5580
--- --- --- --- --- --- --- --- --- --- --- --- --- ---
qwen3-4b W4A16+noalg n1o0 0.6237 0.5247 0.7702 0.8566 0.6711 0.5764 0.6756 0.4180 0.7481 0.3476 0.6488
qwen3-4b W4A16+noalg n2o1 0.6269 0.5324 0.7719 0.8492 0.6750 0.5884 0.6809 0.4020 0.7568 0.3537 0.6582
qwen3-4b W4A16+noalg n4o3 0.6269 0.5401 0.7921 0.8456 0.6715 0.5880 0.6762 0.3900 0.7573 0.3574 0.6511
--- --- --- --- --- --- --- --- --- --- --- --- --- ---
qwen3-8b W4A16+noalg n1o0 0.6508 0.5478 0.7900 0.8676 0.7363 0.6336 0.7207 0.4160 0.7699 0.3427 0.6835
qwen3-8b W4A16+noalg n2o1 0.6541 0.5580 0.7942 0.8703 0.7386 0.6363 0.7190 0.4120 0.7715 0.3599 0.6811
qwen3-8b W4A16+noalg n4o3 0.6591 0.5708 0.8136 0.8694 0.7402 0.6394 0.7186 0.4120 0.7737 0.3537 0.6993
qwen3-8b W4A16+noalg n8o7 0.6612 0.5751 0.8274 0.8716 0.7375 0.6447 0.7163 0.4060 0.7748 0.3647 0.6938

@chensuyue
Copy link
Copy Markdown
Contributor

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@wenhuach21
Copy link
Copy Markdown
Contributor

as overlapping will cause 2X cost, comparing it with baseline directly is not very fair. So please comparing yours with iters x N, e.g., comparing n2o1 with iters 400 and n4o3 with iters 800.
Besides, we needa best recipe, so please run auto-round-best with itesr 1000, nsamples 512, alg_ext and try to combine with overlap

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants