feat: add overlap function for multi-blocks compression#1850
Conversation
There was a problem hiding this comment.
Pull request overview
Adds overlap support for multi-block compression/tuning so adjacent block windows can share a configurable number of blocks, aiming to improve continuity during calibration.
Changes:
- Introduce
--nblocks_overlap/nblocks_overlapconfiguration and validate it (0 <= nblocks_overlap < nblocks). - Implement overlap-aware block-window iteration and progress accounting.
- Add logic to compute “stride” outputs for the next window and snapshot/restore overlapped tail blocks.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| auto_round/compressors/data_driven.py | Implements overlap window iteration, stride-based next-input selection, and tail snapshot/restore helpers. |
| auto_round/compressors/config.py | Plumbs nblocks_overlap through extra config/dataclass. |
| auto_round/compressors/base.py | Parses nblocks_overlap and enforces basic validity constraints. |
| auto_round/main.py | Adds the --nblocks_overlap CLI flag and forwards it into tuning config. |
| for module in tail_modules: | ||
| state = {key: value.detach().cpu().clone() for key, value in module.state_dict().items()} | ||
| attrs = {} |
| lr: float = None, | ||
| lr_scheduler: Callable = None, | ||
| minmax_lr: float = None, | ||
| nblocks: int = 1, | ||
| nblocks_overlap: int = 0, | ||
| to_quant_block_names: Union[str, list, None] = None, | ||
| scale_dtype: str = "fp16", |
| def _get_block_window_starts(self, block_names: list, nblocks: int) -> list[int]: | ||
| overlap = self.nblocks_overlap if nblocks > 1 else 0 | ||
| stride = nblocks - overlap | ||
| block_starts = [] | ||
| block_idx = 0 | ||
| while block_idx < len(block_names): | ||
| remaining = len(block_names) - block_idx | ||
| if block_idx > 0 and overlap > 0 and remaining <= overlap: | ||
| break | ||
| block_starts.append(block_idx) | ||
| block_idx += stride | ||
| return block_starts |
|
as discussed offline, the implementation probably has bugs from algorithm perspective |
|
10 tasks acc_norm W2A16
W4A16
|
|
/azp run Unit-Test-CUDA-AutoRound |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
as overlapping will cause 2X cost, comparing it with baseline directly is not very fair. So please comparing yours with iters x N, e.g., comparing n2o1 with iters 400 and n4o3 with iters 800. |
Description
Introducing overlap support for multi-block compression, enabling overlapping block windows during tuning.
Motivation:
Key changes:
--nblocks_overlapto control overlap between adjacent block windowsType of Change
New feature
Checklist Before Submitting
/azp run Unit-Test-CUDA-AutoRound.To Do
Outcome
Qwen3-8B
Llama-3.1-8B-Instruct