feat: add overlap function for multi-blocks compression by ZaneMark · Pull Request #1850 · intel/auto-round

ZaneMark · 2026-05-25T06:18:01Z

Description

Introducing overlap support for multi-block compression, enabling overlapping block windows during tuning.

Motivation:

Overlapping windows may improve statistical continuity.

Key changes:

Add CLI argument --nblocks_overlap to control overlap between adjacent block windows
Implement overlap-aware window iteration

Type of Change

New feature

Checklist Before Submitting

[√] My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.
The CUDA CI has passed. You can trigger it by commenting /azp run Unit-Test-CUDA-AutoRound.

To Do

Dynamic windows and dynamic overlap

Outcome

Qwen3-8B

Model	Scheme	Blocks/Overlap	Alg	arc_challenge	arc_easy	boolq	hellaswag	lambada_openai	mmlu	openbookqa	piqa	truthfulqa_mc1	winogrande	AVG
qwen3-8b	W4A16	n1_o0	alg	0.5247	0.8249	0.8691	0.5623	0.6365	0.7209	0.2920	0.7677	0.3611	0.6922	0.6251
qwen3-8b	W4A16	n1_o0	noalg	0.5350	0.8232	0.8700	0.5589	0.6429	0.7220	0.3120	0.7688	0.3562	0.6843	0.6273
qwen3-8b	W4A16	n4_o3	alg	0.5392	0.8262	0.8709	0.5597	0.6425	0.7178	0.3040	0.7650	0.3525	0.6969	0.6275
qwen3-8b	W2A16	n4_o3	alg	0.4053	0.7412	0.7734	0.4420	0.4984	0.5682	0.2840	0.7127	0.3244	0.6440	0.5394
qwen3-8b	W2A16	n8_o7	alg	0.4096	0.7277	0.7498	0.4412	0.4995	0.5605	0.2800	0.7057	0.3023	0.6409	0.5317
qwen3-8b	W2A16	n2_o1	alg	0.4215	0.7563	0.7954	0.4434	0.5115	0.5664	0.2840	0.7176	0.3158	0.6464	0.5458

Llama-3.1-8B-Instruct

Model	Scheme	Blocks/Overlap	Alg	arc_challenge	arc_easy	boolq	hellaswag	lambada_openai	mmlu	openbookqa	piqa	truthfulqa_mc1	winogrande	AVG
llama-3.1-8b-instruct	W4A16	n1_o0	alg	0.5196	0.8178	0.8486	0.5905	0.7153	0.6707	0.3560	0.7982	0.3684	0.7403	0.6425
llama-3.1-8b-instruct	W4A16	n1_o0	noalg	0.5247	0.8249	0.8465	0.5905	0.7178	0.6644	0.3480	0.7971	0.3672	0.7214	0.6403
llama-3.1-8b-instruct	W4A16	n2_o1	alg	0.5333	0.8194	0.8502	0.5884	0.7194	0.6700	0.3520	0.7992	0.3611	0.7348	0.6428
llama-3.1-8b-instruct	W2A16	n1_o0	noalg	0.3643	0.6818	0.7174	0.4424	0.4333	0.3937	0.2400	0.7155	0.3011	0.6156	0.4905
llama-3.1-8b-instruct	W2A16	n1_o0	alg	0.3737	0.7058	0.7798	0.4632	0.5682	0.4530	0.2700	0.7214	0.2913	0.6622	0.5289
llama-3.1-8b-instruct	INT4	n1_o0	noalg	0.4309	0.7635	0.8272	0.5509	0.6350	0.5852	0.3060	0.7726	0.3415	0.6843	0.5897
llama-3.1-8b-instruct	INT4	n8_o7	noalg	0.4309	0.7630	0.8174	0.5577	0.6422	0.5857	0.3160	0.7650	0.3293	0.6827	0.5890

Copilot

Pull request overview

Adds overlap support for multi-block compression/tuning so adjacent block windows can share a configurable number of blocks, aiming to improve continuity during calibration.

Changes:

Introduce --nblocks_overlap / nblocks_overlap configuration and validate it (0 <= nblocks_overlap < nblocks).
Implement overlap-aware block-window iteration and progress accounting.
Add logic to compute “stride” outputs for the next window and snapshot/restore overlapped tail blocks.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
auto_round/compressors/data_driven.py	Implements overlap window iteration, stride-based next-input selection, and tail snapshot/restore helpers.
auto_round/compressors/config.py	Plumbs `nblocks_overlap` through extra config/dataclass.
auto_round/compressors/base.py	Parses `nblocks_overlap` and enforces basic validity constraints.
auto_round/main.py	Adds the `--nblocks_overlap` CLI flag and forwards it into tuning config.

+        for module in tail_modules:
+            state = {key: value.detach().cpu().clone() for key, value in module.state_dict().items()}
+            attrs = {}


        lr: float = None,
        lr_scheduler: Callable = None,
        minmax_lr: float = None,
        nblocks: int = 1,
+        nblocks_overlap: int = 0,
        to_quant_block_names: Union[str, list, None] = None,
        scale_dtype: str = "fp16",


+    def _get_block_window_starts(self, block_names: list, nblocks: int) -> list[int]:
+        overlap = self.nblocks_overlap if nblocks > 1 else 0
+        stride = nblocks - overlap
+        block_starts = []
+        block_idx = 0
+        while block_idx < len(block_names):
+            remaining = len(block_names) - block_idx
+            if block_idx > 0 and overlap > 0 and remaining <= overlap:
+                break
+            block_starts.append(block_idx)
+            block_idx += stride
+        return block_starts


wenhuach21 · 2026-05-25T07:23:06Z

as discussed offline, the implementation probably has bugs from algorithm perspective

ZaneMark · 2026-06-01T05:51:43Z

10 tasks acc_norm

W2A16

model	scheme	nblocks&overlap	AVG	arc_challenge	arc_easy	boolq	hellaswag	lambada_openai	mmlu	openbookqa	piqa	truthfulqa_mc1	winogrande
qwen3-0.6b	W2A16+noalg	n1o0	0.3482	0.2423	0.3657	0.6183	0.3163	0.0943	0.2292	0.2660	0.5745	0.2742	0.5012
qwen3-0.6b	W2A16+noalg	n2o1	0.3533	0.2398	0.3662	0.6162	0.3195	0.1069	0.2297	0.2920	0.5745	0.2644	0.5233
qwen3-0.6b	W2A16+noalg	n4o3	0.3603	0.2406	0.3725	0.6187	0.3267	0.1487	0.2295	0.2820	0.5865	0.2742	0.5233
qwen3-0.6b	W2A16+noalg	nblocks2	0.3496	0.2372	0.3750	0.6031	0.3152	0.0957	0.2297	0.2700	0.5653	0.2815	0.5233
qwen3-0.6b	W2A16+noalg	nblocks4	0.3430	0.2312	0.3704	0.6028	0.3094	0.0716	0.2290	0.2820	0.5631	0.2729	0.4972
---	---	---	---	---	---	---	---	---	---	---	---	---	---
qwen3-4b	W2A16+noalg	n1o0	0.4667	0.3370	0.5762	0.7187	0.4747	0.3002	0.3873	0.3460	0.6556	0.3023	0.5691
qwen3-4b	W2A16+noalg	n2o1	0.4822	0.3626	0.5985	0.6673	0.4879	0.3611	0.4266	0.3580	0.6621	0.3060	0.5919
qwen3-4b	W2A16+noalg	n4o3	0.4831	0.3532	0.5947	0.7425	0.4736	0.3600	0.4233	0.3300	0.6600	0.3023	0.5912
---	---	---	---	---	---	---	---	---	---	---	---	---	---
qwen3-8b	W2A16+noalg	n1o0	0.5391	0.4087	0.6873	0.8028	0.5629	0.4231	0.5111	0.3700	0.7160	0.2962	0.6133
qwen3-8b	W2A16+noalg	n2o1	0.5428	0.4326	0.7020	0.7835	0.5616	0.4526	0.5102	0.3480	0.7122	0.2987	0.6267
qwen3-8b	W2A16+noalg	n4o3	0.5402	0.4147	0.6919	0.8040	0.5485	0.4689	0.4994	0.3480	0.6942	0.2974	0.6346
---	---	---	---	---	---	---	---	---	---	---	---	---	---
llama-3.1-8b-instruct	W2A16+noalg	n1o0	0.5148	0.3814	0.6389	0.7275	0.5793	0.4366	0.4016	0.3660	0.7182	0.2840	0.6140
llama-3.1-8b-instruct	W2A16+noalg	n2o1	0.5184	0.3575	0.6145	0.7771	0.5883	0.4694	0.3900	0.3620	0.7024	0.3072	0.6156
llama-3.1-8b-instruct	W2A16+noalg	n4o3	0.5023	0.3618	0.6023	0.7266	0.5544	0.4527	0.3481	0.3500	0.6888	0.3048	0.6338

W4A16

model	scheme	nblocks&overlap	AVG	arc_challenge	arc_easy	boolq	hellaswag	lambada_openai	mmlu	openbookqa	piqa	truthfulqa_mc1	winogrande
qwen3-0.6b	W4A16+noalg	n1o0	0.4450	0.3191	0.5400	0.5853	0.4541	0.3544	0.4034	0.3320	0.6627	0.2668	0.5320
qwen3-0.6b	W4A16+noalg	n2o1	0.4433	0.3285	0.5210	0.5575	0.4530	0.3658	0.4145	0.3180	0.6589	0.2668	0.5493
qwen3-0.6b	W4A16+noalg	n4o3	0.4559	0.3166	0.5442	0.6266	0.4524	0.3681	0.4239	0.3260	0.6681	0.2754	0.5580
---	---	---	---	---	---	---	---	---	---	---	---	---	---
qwen3-4b	W4A16+noalg	n1o0	0.6237	0.5247	0.7702	0.8566	0.6711	0.5764	0.6756	0.4180	0.7481	0.3476	0.6488
qwen3-4b	W4A16+noalg	n2o1	0.6269	0.5324	0.7719	0.8492	0.6750	0.5884	0.6809	0.4020	0.7568	0.3537	0.6582
qwen3-4b	W4A16+noalg	n4o3	0.6269	0.5401	0.7921	0.8456	0.6715	0.5880	0.6762	0.3900	0.7573	0.3574	0.6511
---	---	---	---	---	---	---	---	---	---	---	---	---	---
qwen3-8b	W4A16+noalg	n1o0	0.6508	0.5478	0.7900	0.8676	0.7363	0.6336	0.7207	0.4160	0.7699	0.3427	0.6835
qwen3-8b	W4A16+noalg	n2o1	0.6541	0.5580	0.7942	0.8703	0.7386	0.6363	0.7190	0.4120	0.7715	0.3599	0.6811
qwen3-8b	W4A16+noalg	n4o3	0.6591	0.5708	0.8136	0.8694	0.7402	0.6394	0.7186	0.4120	0.7737	0.3537	0.6993
qwen3-8b	W4A16+noalg	n8o7	0.6612	0.5751	0.8274	0.8716	0.7375	0.6447	0.7163	0.4060	0.7748	0.3647	0.6938

chensuyue · 2026-06-01T06:26:49Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-06-01T06:26:58Z

Azure Pipelines successfully started running 1 pipeline(s).

wenhuach21 · 2026-06-01T06:48:41Z

as overlapping will cause 2X cost, comparing it with baseline directly is not very fair. So please comparing yours with iters x N, e.g., comparing n2o1 with iters 400 and n4o3 with iters 800.
Besides, we needa best recipe, so please run auto-round-best with itesr 1000, nsamples 512, alg_ext and try to combine with overlap

feat: add overlap function for multi-blocks compression

4628fa3

Copilot AI review requested due to automatic review settings May 25, 2026 06:18

Copilot started reviewing on behalf of ZaneMark May 25, 2026 06:18 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

ZaneMark added 2 commits June 1, 2026 13:41

fix: wire nblocks overlap through sign-round config

3516880

fix: retain overlap state across block windows

b62b167

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add overlap function for multi-blocks compression#1850

feat: add overlap function for multi-blocks compression#1850
ZaneMark wants to merge 3 commits into
mainfrom
feature/overlap_for_nblocks

ZaneMark commented May 25, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

wenhuach21 commented May 25, 2026

Uh oh!

ZaneMark commented Jun 1, 2026 •

edited

Loading

Uh oh!

chensuyue commented Jun 1, 2026

Uh oh!

azure-pipelines Bot commented Jun 1, 2026

Uh oh!

wenhuach21 commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ZaneMark commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Checklist Before Submitting

To Do

Outcome

Qwen3-8B

Llama-3.1-8B-Instruct

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

wenhuach21 commented May 25, 2026

Uh oh!

ZaneMark commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chensuyue commented Jun 1, 2026

Uh oh!

azure-pipelines Bot commented Jun 1, 2026

Uh oh!

wenhuach21 commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ZaneMark commented May 25, 2026 •

edited

Loading

ZaneMark commented Jun 1, 2026 •

edited

Loading