fix random rotation and update rotation doc.#1884
Conversation
Signed-off-by: lkk12014402 <kaokao.lv@intel.com>
There was a problem hiding this comment.
Pull request overview
This PR addresses correctness and usability around rotation-based preprocessing for quantization: it fixes an inconsistency in SpinQuant’s random Hadamard R1 block rotation (matrix symmetry assumption), and refreshes the step-by-step documentation to describe rotation options (QuaRot/SpinQuant and per-linear block rotation).
Changes:
- Fix random Hadamard online R1 block weight rotation to preserve equivalence with the activation-side hook (
x @ R). - Replace the legacy “Hadamard Transform” doc section with a new “Rotation” section (English + Chinese) describing rotation modes and usage.
- Add rotation usage examples and parameter tables in the step-by-step docs.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
| docs/step_by_step.md | Replaces Hadamard section with Rotation section and adds usage guidance/examples. |
| docs/step_by_step_CN.md | Chinese translation of the Rotation documentation updates. |
| auto_round/algorithms/transforms/spinquant/preprocessor.py | Fixes random Hadamard online R1 block rotation by aligning weight fusion with hook convention. |
| # R1 only (fast, good baseline improvement) | ||
| ar = AutoRound(model, scheme="MXFP4", rotation_config=SpinQuantConfig(r1=True)) | ||
|
|
||
| # R1 + R2 (better, no runtime overhead after fuse) | ||
| ar = AutoRound(model, scheme="MXFP4", rotation_config=SpinQuantConfig(r1=True, r2=True)) |
| # R1 + R2 + R3 + R4 (best accuracy, slight runtime overhead from hooks) | ||
| ar = AutoRound(model, scheme="MXFP4", rotation_config=SpinQuantConfig(r1=True, r2=True, r3=True, r4=True)) |
| | `"quarot"` | `SpinQuantConfig(r1=True, r2=True, r3=True, r4=True)` — deterministic Hadamard, no training | | ||
| | `"spinquant"` | `SpinQuantConfig(r1=True, r2=True, r3=True, r4=True, trainable_rotation=True)` — **experimental**, see note below | |
| | `r1` / `r2` / `r3` / `r4` | `False` | Enable rotation at each position | | ||
| | `online_r1_rotation` | `True` | R1 via hook (`True`) or fused into weights (`False`) | | ||
| | `random_r1` / `r2` / `r3` / `r4` | `False` | Use random Hadamard (H×diag(±1)) instead of deterministic | | ||
| | `rotation_size` | `None` (auto) | Block rotation dimension; auto-detected from model dimensions | | ||
| | `trainable_rotation` | `False` | Enable SpinQuant learnable rotation (**experimental**) | |
| - **Deterministic rotations** (R1–R4): Only metadata (type + seed) is stored — matrices are regenerated on load | ||
| - **Random rotations**: The random sign vector is stored as a compact int8 buffer (~hidden_size bytes) | ||
| - **Online hooks** (R3/R4): Automatically re-registered during model loading |
| # 仅 R1(速度快,良好的基准提升) | ||
| ar = AutoRound(model, scheme="MXFP4", rotation_config=SpinQuantConfig(r1=True)) | ||
|
|
||
| # R1 + R2(更好,融合后无运行时开销) | ||
| ar = AutoRound(model, scheme="MXFP4", rotation_config=SpinQuantConfig(r1=True, r2=True)) |
| # R1 + R2 + R3 + R4(最佳精度,hook 带来少许运行时开销) | ||
| ar = AutoRound(model, scheme="MXFP4", rotation_config=SpinQuantConfig(r1=True, r2=True, r3=True, r4=True)) |
| | `"quarot"` | `SpinQuantConfig(r1=True, r2=True, r3=True, r4=True)` — 确定性 Hadamard,无需训练 | | ||
| | `"spinquant"` | `SpinQuantConfig(r1=True, r2=True, r3=True, r4=True, trainable_rotation=True)` — **实验性**,见下方说明 | |
| | `r1` / `r2` / `r3` / `r4` | `False` | 启用各位置的旋转 | | ||
| | `online_r1_rotation` | `True` | R1 通过 hook 应用(`True`)或融合到权重中(`False`) | | ||
| | `random_r1` / `r2` / `r3` / `r4` | `False` | 使用随机 Hadamard(H×diag(±1))而非确定性 | | ||
| | `rotation_size` | `None`(自动) | 块旋转维度;从模型维度自动检测 | | ||
| | `trainable_rotation` | `False` | 启用 SpinQuant 可学习旋转(**实验性**) | |
| - **确定性旋转**(R1–R4):仅存储元数据(类型 + 种子)——矩阵在加载时重新生成 | ||
| - **随机旋转**:随机符号向量以紧凑的 int8 buffer 存储(约 hidden_size 字节) | ||
| - **在线 hook**(R3/R4):在模型加载时自动重新注册 |
| # Hadamard matrix is orthonormal but NOT symmetric, so | ||
| # rotate_in_channels_ (which applies R.T) would break | ||
| # equivalence. Pass R.T so it computes W @ (R.T).T = W @ R. | ||
| rotate_in_channels_(module, R_in=R.T) |
There was a problem hiding this comment.
Add unit tests to verify equivalence.
| - [Enable multiple gpus calibration in lm_head quantization](#enable-multiple-gpus-calibration-in-lm_head-quantization) | ||
| + [Adjust Hyperparameters](#adjust-hyperparameters) | ||
| + [Hadamard Transform-Research Feature](#hadamard-transform) | ||
| + [Rotation](#rotation) |
There was a problem hiding this comment.
please recover the research feature. Any feature that lacks effective kernel support or has low adoption among serving frameworks should fall into this category.
| ### Rotation | ||
|
|
||
| **Research feature with no effective kernels currently available and typically low community adoption.** | ||
| AutoRound supports rotation-based transforms to improve quantization accuracy. Rotation redistributes outliers in weights and activations before quantization, making the distribution more uniform and quantization-friendly. |
There was a problem hiding this comment.
As mentioned before, this is a user guide rather than a technical notebook. Focus on presenting the data and helping users understand when to use this feature and how to use it easily.
Detailed implementation should be hidden behind expandable sections (e.g., "
Description
Fix the asymmetry of the random matrix, and update the rotation usage documentation.