Add experiment matrix for CUT3R vs TTT3R by ntujiachen002-star · Pull Request #25 · Inception3D/TTT3R

ntujiachen002-star · 2026-03-25T07:09:48Z

Add experiment matrix for CUT3R vs TTT3R

This PR adds:

experiment scripts
experiment results
hardware/software configuration

The experiment compares CUT3R and TTT3R under different frame_interval settings.

- Add forward_recurrent_analysis() to ARCroco3DStereo: captures per-frame state token trajectories and state→image cross-attention maps without modifying any existing inference path - Add analysis/state_freq_analysis.py: computes temporal variance per state token and projects it back to image space via cross-attention weights, producing per-frame frequency heatmaps and summary plots for the NeurIPS paper (Exp 1 – State Token Frequency Visualization) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- New script analysis/freq_error_analysis.py: correlates state-token temporal frequency with depth reconstruction error using ScanNet GT - Fix cross-attention bug in forward_recurrent_analysis: apply softmax before averaging (blocks.py returns raw logits, not probabilities) - Update experiment.md with ExpA run instructions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Frame-level analysis correlating state change magnitude, cumulative drift, attention entropy, and model confidence with depth error. Addresses the failure of patch-level spatial correlation (ExpA). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Randomly samples N scenes, runs inference on each, collects per-scene confidence-error and state_change-error correlations for statistical validation of the confidence calibration finding. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- New update type ttt3r_conf in forward_recurrent_analysis: multiplies TTT3R's soft mask by confidence gate (clamp(conf/scale, 0, 1)) - Ablation script compares cut3r, ttt3r, and ttt3r_conf with multiple conf_gate_scale values across 10 ScanNet scenes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Computes FFT power spectra, frequency band decomposition, windowed STFT, and online EMA-based spectral features to correlate with depth error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add timestamp-based nearest-neighbor depth matching and TUM associations file support. Handles rgb/depth directory naming convention (vs ScanNet's color/depth). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Runs spectral analysis on multiple scenes from both datasets, computes per-scene signal-error correlations, and produces aggregate comparison grouped by dataset. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add ttt3r_spectral and cut3r_spectral update types that use per-token EMA-based frequency analysis to modulate state update strength: - Tokens with high oscillation energy get suppressed (lower learning rate) - Tokens with stable trajectories update normally - Train-free, ~zero computational overhead (only EMA + norm) Hyperparameters: spectral_ema_momentum (0.95), spectral_running_momentum (0.95), spectral_temperature (2.0). Also add spectral_ablation.py for evaluating across ScanNet + TUM with temperature sweep. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

running_energy initialized to zero caused ratio >> 1 on first call, making alpha ≈ 0 and freezing state for the entire sequence. Fix: warm-start running_energy to the actual energy on the first call so ratio = 1 initially and alpha = 0.5 (no suppression at start). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implement compute_frame_novelty() and filter_views_by_novelty() in ARCroco3DStereo. Frames whose inter-frame difference has low low-frequency energy (dominated by noise, not structural change) are skipped before encoding, reducing redundant state updates. Also add frame_novelty_analysis.py to: - Visualize per-frame novelty scores - Correlate novelty with state token oscillation (motivation evidence) - Compare depth error with and without frame filtering Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Previous metric (low_freq_energy / total_energy) is ~constant for natural images regardless of motion magnitude, causing 0% frames to be filtered. New design: - compute_frame_novelty: returns raw low-freq energy (un-normalised) - filter_views_by_novelty: adaptive skip based on running EMA mean, skip frames whose energy < skip_ratio * running_mean Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Fair depth comparison: both full and filtered sequences evaluated on the same kept_indices frames, removing selection bias - TUM depth matching: use rgb.txt/depth.txt timestamp associations instead of stem-based filename matching Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…selines

- Add _mem_spectral_gate() and cut3r_memgate/ttt3r_memgate update types to gate pose_retriever memory writes by spectral_change - Add memgate_ablation.py for B2 ablation experiments - Add research_progress.md and run_experiments.sh for experiment tracking - Ignore analysis_results/ in .gitignore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- _geo_consistency_gate(): scale-invariant log-depth change vs EMA baseline - cut3r_geogate / ttt3r_geogate update types - geogate_ablation.py: ablation script with tau sweep Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Unifies all three layers under frequency-domain analysis: Layer 1: LFE(RGB diff) for frame filtering Layer 2: token trajectory HF energy for state modulation Layer 3: LFE(depth diff) for state update gating Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add geo_gate_freq_cutoff config (default 4 = top 25% freq, was 8) - Ablation sweeps tau x cutoff: c2/c4/c8 x t2/t3/t5 - Wider cutoff captures mid-freq geometric detail Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- CLAUDE.md: full technical reference for Claude Code sessions - research_progress.md: updated with B2/B3 results and freq-domain unification - B3 geogate freq-domain: ttt3r_geo_t2_c4 = -7.16%, cutoff-insensitive Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- 更新 CLAUDE.md 和 research_progress.md 记录 SIASU 实验结果 - 修正 run_experiments.sh 模型路径 (src/ → model/) - .gitignore 添加 model/ 目录 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- model.py: 添加 cut3r_joint / ttt3r_joint 更新类型 (SIASU × GeoGate) - joint_ablation.py: 12 配置联合消融 (L1/L2/L3/L23/L123 × cut3r/ttt3r) - 结果: L23+ttt3r -7.5% 最优, L123+ttt3r -6.2% (L1 跳帧与 L2/L3 冲突) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- 数据集路径从根分区迁移到 /mnt/sda/szy/research/dataset/ - eval 脚本添加频域超参支持 (spectral_temperature, geo_gate_tau, geo_gate_freq_cutoff) - 新增本地数据预处理脚本 (prepare_scannet_local.py, prepare_tum_local.py) - 新增双卡并行评测脚本 (run_parallel_eval.sh, run_formal_eval.sh) - CLAUDE.md 新增 Formal Evaluation 章节 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- eval/relpose/metadata.py: add kitti_odo_s1_{200,500,1000} dataset configs - eval/relpose/prepare_kitti_odometry.py: convert KITTI odometry poses to TUM format and symlink frames into eval pipeline directory structure - eval/relpose/run_kitti_odo.sh: run cut3r/ttt3r/aconst/brake/ortho on kitti_odo_s1_200 and kitti_odo_s1_1000 for OOD generalization validation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…script ortho is not implemented in zjc branch; use momentum_inv_t1 (best brake variant) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

All other eval scripts use src/cut3r_512_dpt_4_64.pth because add_path_to_dust3r adds model parent dir to sys.path for dust3r imports. Model dir path would fail to import dust3r. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Includes ATE/RPE results for 5 methods x 2 sequence lengths (200f, 1000f). Key finding: adaptive dampening (ortho -36.3%, momentum -32.4%) dominates on long sequences while constant dampening wins on short sequences. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…quences Systematic analysis covering: cold-start degeneracy, adaptation-vs-drift phase confusion, variance analysis (CV 66.5% vs 22.3%), bias-variance tradeoff interpretation, and RPE-ATE dissociation. Proposes warmup scheduling as a unified strategy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add per-sequence and aggregate ATE RMSE tables for both 200f and 1000f. RMSE trends are consistent with ATE Mean: random best on 200f (-41.2%), ortho best on 1000f (-35.2%). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- prepare_kitti_odometry.py: add --full flag for full-length sequences - metadata.py: add kitti_odo_full dataset config - run_kitti_odo_full.sh: run 5 methods on all 11 seqs (00-10) - setup_kitti_full.sh: one-click data download & preparation - deploy_server.sh: full server deployment (clone to results) - model_all_methods.py.bak: model.py with all 5 update types Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- download_kitti.sh: download 65GB zip, extract only 00-10, verify integrity, delete zip to save space. Supports wget -c resume on interruption. - setup_kitti_full.sh: refactored to call download_kitti.sh Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ethods) Comprehensive report covering all 11 KITTI sequences with full-length evaluation. Includes ATE, KITTI official metrics (t_err/r_err), gamma sweep, auto-gamma variants, and per-sequence breakdown with analysis. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Appendix A: hardware/software environment with exact versions Appendix B: per-configuration hyperparameters and formulas Appendix C: exact commands, output structure, runtime, determinism Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…_log Major corrections: - _error_log.txt had duplicate entries from incremental runs, causing wrong averages (e.g., ddd3r_g2 showed 390m instead of correct 172m) - All data now extracted from per-seq eval_metric.txt (authoritative) - Added ATE Mean alongside RMSE for all tables (Section 1 + Section 3) - Full RPE table with Mean/RMSE for all 14 methods - Updated analysis with corrected rankings and dual-metric comparison Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add 6 missing auto_gamma CLI args to eval/mv_recon/launch.py (auto_gamma, auto_gamma_warmup, auto_gamma_max, auto_gamma_k, auto_gamma_lo, auto_gamma_hi) — fixes AttributeError when apply_ddd3r_config reads these from args - Add 4 auto-gamma configs to run_dtu_allconfigs.sh (14 total): auto_steep_clamp, auto_steep_sigmoid, auto_warmup_linear, auto_warmup_threshold Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…report - Section 5.3: fix 'Mean excl. 01' for ddd3r_g1 (119.01→115.62), ddd3r_g2 (106.47→106.20), auto_warmup_linear (120.87→117.74) — old values were computed from RMSE instead of Mean - Fix duplicate section 5.5 → renumber to 5.5 Auto-Gamma, 5.6 Short vs Long, 5.7 Brake Paradox Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sum(4541+1101+4661+801+271+2761+1101+1101+4071+1591+1201) = 23201 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

DTU MVSNet 22-scene evaluation with all 14 DDD3R configurations. Results confirm over-update is length-dependent: at 49 frames, dampening provides no benefit (consistent with Sintel). Includes report, scripts, and 7scenes experiment preparation.

- Add Section 2.2: median Acc and per-scene win-rate table (many DDD3R variants improve on 14/22 scenes despite higher mean) - Soften analysis: mean is outlier-sensitive, median tells a different story - Add reproducibility note: 22-scene auto-discover patch vs hardcoded 15-scene - Remove overly definitive "50-90 frame threshold" claim - Address Codex review feedback

- Add complete 7scenes evaluation results for DDD3R spectrum (cut3r, ttt3r, constant, brake, ortho, ddd3r_g1-g5, 4 auto-gamma variants) - All configs use identical evaluation protocol matching TTT3R source: kf_every=2, max_frames=200, resolution 512x384, ICP threshold 0.1m - Depth preprocessing follows SimpleRecon z-buffer projection - Baseline validated against TTT3R paper Figure 9 - Add eval_from_npy.py for split inference/evaluation pipeline - Add apply_ddd3r_config() to eval/relpose/launch.py - Key results: brake best Chamfer (0.0271, -71.7% vs CUT3R), all DDD3R variants achieve 64-72% improvement over CUT3R

Nica and others added 30 commits March 20, 2026 16:03

add experiment.md with run instructions for Exp1

f75a457

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add spectral analysis script for state token frequency-domain study

4c2e715

Computes FFT power spectra, frequency band decomposition, windowed STFT, and online EMA-based spectral features to correlate with depth error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Support TUM dataset format in spectral analysis

10a2ba6

Add timestamp-based nearest-neighbor depth matching and TUM associations file support. Handles rgb/depth directory naming convention (vs ScanNet's color/depth). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add batch spectral analysis across ScanNet + TUM scenes

aa55d5a

Runs spectral analysis on multiple scenes from both datasets, computes per-scene signal-error correlations, and produces aggregate comparison grouped by dataset. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix build_views: use bool tensors instead of None for update/reset

82dd23f

Rename novelty → spectral_change throughout

e9cc10b

Add batch_frame_novelty.py for Layer 1 batch validation

e0f3dd1

Add metric_comparison.py: spectral_change vs L2/high_freq/mid_freq ba…

5164275

…selines

Add dynamic_token_analysis.py: validate dynamic token hypothesis

a70e1b8

Fix UnicodeEncodeError in memgate_ablation.py

9b731da

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add B3 geometric consistency gate for state updates

c553f59

- _geo_consistency_gate(): scale-invariant log-depth change vs EMA baseline - cut3r_geogate / ttt3r_geogate update types - geogate_ablation.py: ablation script with tau sweep Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Tune B3 geo gate: configurable freq cutoff, expand sweep

8706886

- Add geo_gate_freq_cutoff config (default 4 = top 25% freq, was 8) - Ablation sweeps tau x cutoff: c2/c4/c8 x t2/t3/t5 - Wider cutoff captures mid-freq geometric detail Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add Claude Code context prompt for new sessions

dc6dceb

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Layer 2 SIASU ablation 完成: ttt3r_spectral -8.3%, τ不敏感

099f724

- 更新 CLAUDE.md 和 research_progress.md 记录 SIASU 实验结果 - 修正 run_experiments.sh 模型路径 (src/ → model/) - .gitignore 添加 model/ 目录 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

更新文档: Joint Ablation 结果, L23+ttt3r -7.5% 为最终方案

5819be3

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Jachen Zhu and others added 30 commits March 28, 2026 15:45

fix: robust peak VRAM logging with nvidia-smi fallback

e56c1ee

exp: add local SAFE224 reset sensitivity study with reproducible report

3765068

feat: add server-ready waymo/nuscenes relpose pipelines

ed1f661

fix: harden waymo relpose conversion and linux script execution

d431347

perf: enable h200-friendly bf16 tf32 and multi-gpu defaults

c7ba452

results: add H200 nuScenes relpose run outputs and runlog

6b7343f

docs: add full nuscenes trainval h200 relpose results

1c31182

fix: enable alpha_drift in brake and clean duplicated nuscenes summary

6be34c2

docs: complete nuscenes/depth metrics and update runlog

1720b70

docs: rewrite CLAUDE_zjc with metric-focused brake summary

bbe13d7

fix: replace ttt3r_ortho with ttt3r_momentum_inv_t1 in kitti_odo run …

1fbbaf5

…script ortho is not implemented in zjc branch; use momentum_inv_t1 (best brake variant) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: add ATE RMSE metrics to KITTI odometry report

632e8eb

Add per-sequence and aggregate ATE RMSE tables for both 200f and 1000f. RMSE trends are consistent with ATE Mean: random best on 200f (-41.2%), ortho best on 1000f (-35.2%). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add DD3R training notes markdown

383e240

fix: correct total frame count 22410→23201 in KITTI report

753d83f

sum(4541+1101+4661+801+271+2761+1101+1101+4071+1591+1201) = 23201 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: rewrite CLAUDE for DDD3R on zjc

0f0ef35

docs: fix 7scenes report claims

633536b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add experiment matrix for CUT3R vs TTT3R#25

Add experiment matrix for CUT3R vs TTT3R#25
ntujiachen002-star wants to merge 97 commits intoInception3D:mainfrom
szy233:zjc

ntujiachen002-star commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ntujiachen002-star commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants