Add experiment matrix for CUT3R vs TTT3R#25
Open
ntujiachen002-star wants to merge 97 commits intoInception3D:mainfrom
Open
Add experiment matrix for CUT3R vs TTT3R#25ntujiachen002-star wants to merge 97 commits intoInception3D:mainfrom
ntujiachen002-star wants to merge 97 commits intoInception3D:mainfrom
Conversation
- Add forward_recurrent_analysis() to ARCroco3DStereo: captures per-frame state token trajectories and state→image cross-attention maps without modifying any existing inference path - Add analysis/state_freq_analysis.py: computes temporal variance per state token and projects it back to image space via cross-attention weights, producing per-frame frequency heatmaps and summary plots for the NeurIPS paper (Exp 1 – State Token Frequency Visualization) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New script analysis/freq_error_analysis.py: correlates state-token temporal frequency with depth reconstruction error using ScanNet GT - Fix cross-attention bug in forward_recurrent_analysis: apply softmax before averaging (blocks.py returns raw logits, not probabilities) - Update experiment.md with ExpA run instructions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Frame-level analysis correlating state change magnitude, cumulative drift, attention entropy, and model confidence with depth error. Addresses the failure of patch-level spatial correlation (ExpA). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Randomly samples N scenes, runs inference on each, collects per-scene confidence-error and state_change-error correlations for statistical validation of the confidence calibration finding. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- New update type ttt3r_conf in forward_recurrent_analysis: multiplies TTT3R's soft mask by confidence gate (clamp(conf/scale, 0, 1)) - Ablation script compares cut3r, ttt3r, and ttt3r_conf with multiple conf_gate_scale values across 10 ScanNet scenes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Computes FFT power spectra, frequency band decomposition, windowed STFT, and online EMA-based spectral features to correlate with depth error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add timestamp-based nearest-neighbor depth matching and TUM associations file support. Handles rgb/depth directory naming convention (vs ScanNet's color/depth). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Runs spectral analysis on multiple scenes from both datasets, computes per-scene signal-error correlations, and produces aggregate comparison grouped by dataset. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add ttt3r_spectral and cut3r_spectral update types that use per-token EMA-based frequency analysis to modulate state update strength: - Tokens with high oscillation energy get suppressed (lower learning rate) - Tokens with stable trajectories update normally - Train-free, ~zero computational overhead (only EMA + norm) Hyperparameters: spectral_ema_momentum (0.95), spectral_running_momentum (0.95), spectral_temperature (2.0). Also add spectral_ablation.py for evaluating across ScanNet + TUM with temperature sweep. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
running_energy initialized to zero caused ratio >> 1 on first call, making alpha ≈ 0 and freezing state for the entire sequence. Fix: warm-start running_energy to the actual energy on the first call so ratio = 1 initially and alpha = 0.5 (no suppression at start). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implement compute_frame_novelty() and filter_views_by_novelty() in ARCroco3DStereo. Frames whose inter-frame difference has low low-frequency energy (dominated by noise, not structural change) are skipped before encoding, reducing redundant state updates. Also add frame_novelty_analysis.py to: - Visualize per-frame novelty scores - Correlate novelty with state token oscillation (motivation evidence) - Compare depth error with and without frame filtering Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previous metric (low_freq_energy / total_energy) is ~constant for natural images regardless of motion magnitude, causing 0% frames to be filtered. New design: - compute_frame_novelty: returns raw low-freq energy (un-normalised) - filter_views_by_novelty: adaptive skip based on running EMA mean, skip frames whose energy < skip_ratio * running_mean Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fair depth comparison: both full and filtered sequences evaluated on the same kept_indices frames, removing selection bias - TUM depth matching: use rgb.txt/depth.txt timestamp associations instead of stem-based filename matching Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add _mem_spectral_gate() and cut3r_memgate/ttt3r_memgate update types to gate pose_retriever memory writes by spectral_change - Add memgate_ablation.py for B2 ablation experiments - Add research_progress.md and run_experiments.sh for experiment tracking - Ignore analysis_results/ in .gitignore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- _geo_consistency_gate(): scale-invariant log-depth change vs EMA baseline - cut3r_geogate / ttt3r_geogate update types - geogate_ablation.py: ablation script with tau sweep Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Unifies all three layers under frequency-domain analysis: Layer 1: LFE(RGB diff) for frame filtering Layer 2: token trajectory HF energy for state modulation Layer 3: LFE(depth diff) for state update gating Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add geo_gate_freq_cutoff config (default 4 = top 25% freq, was 8) - Ablation sweeps tau x cutoff: c2/c4/c8 x t2/t3/t5 - Wider cutoff captures mid-freq geometric detail Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- CLAUDE.md: full technical reference for Claude Code sessions - research_progress.md: updated with B2/B3 results and freq-domain unification - B3 geogate freq-domain: ttt3r_geo_t2_c4 = -7.16%, cutoff-insensitive Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- 更新 CLAUDE.md 和 research_progress.md 记录 SIASU 实验结果 - 修正 run_experiments.sh 模型路径 (src/ → model/) - .gitignore 添加 model/ 目录 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- model.py: 添加 cut3r_joint / ttt3r_joint 更新类型 (SIASU × GeoGate) - joint_ablation.py: 12 配置联合消融 (L1/L2/L3/L23/L123 × cut3r/ttt3r) - 结果: L23+ttt3r -7.5% 最优, L123+ttt3r -6.2% (L1 跳帧与 L2/L3 冲突) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- 数据集路径从根分区迁移到 /mnt/sda/szy/research/dataset/ - eval 脚本添加频域超参支持 (spectral_temperature, geo_gate_tau, geo_gate_freq_cutoff) - 新增本地数据预处理脚本 (prepare_scannet_local.py, prepare_tum_local.py) - 新增双卡并行评测脚本 (run_parallel_eval.sh, run_formal_eval.sh) - CLAUDE.md 新增 Formal Evaluation 章节 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- eval/relpose/metadata.py: add kitti_odo_s1_{200,500,1000} dataset configs
- eval/relpose/prepare_kitti_odometry.py: convert KITTI odometry poses to
TUM format and symlink frames into eval pipeline directory structure
- eval/relpose/run_kitti_odo.sh: run cut3r/ttt3r/aconst/brake/ortho on
kitti_odo_s1_200 and kitti_odo_s1_1000 for OOD generalization validation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…script ortho is not implemented in zjc branch; use momentum_inv_t1 (best brake variant) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All other eval scripts use src/cut3r_512_dpt_4_64.pth because add_path_to_dust3r adds model parent dir to sys.path for dust3r imports. Model dir path would fail to import dust3r. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Includes ATE/RPE results for 5 methods x 2 sequence lengths (200f, 1000f). Key finding: adaptive dampening (ortho -36.3%, momentum -32.4%) dominates on long sequences while constant dampening wins on short sequences. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…quences Systematic analysis covering: cold-start degeneracy, adaptation-vs-drift phase confusion, variance analysis (CV 66.5% vs 22.3%), bias-variance tradeoff interpretation, and RPE-ATE dissociation. Proposes warmup scheduling as a unified strategy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add per-sequence and aggregate ATE RMSE tables for both 200f and 1000f. RMSE trends are consistent with ATE Mean: random best on 200f (-41.2%), ortho best on 1000f (-35.2%). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- prepare_kitti_odometry.py: add --full flag for full-length sequences - metadata.py: add kitti_odo_full dataset config - run_kitti_odo_full.sh: run 5 methods on all 11 seqs (00-10) - setup_kitti_full.sh: one-click data download & preparation - deploy_server.sh: full server deployment (clone to results) - model_all_methods.py.bak: model.py with all 5 update types Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- download_kitti.sh: download 65GB zip, extract only 00-10, verify integrity, delete zip to save space. Supports wget -c resume on interruption. - setup_kitti_full.sh: refactored to call download_kitti.sh Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ethods) Comprehensive report covering all 11 KITTI sequences with full-length evaluation. Includes ATE, KITTI official metrics (t_err/r_err), gamma sweep, auto-gamma variants, and per-sequence breakdown with analysis. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Appendix A: hardware/software environment with exact versions Appendix B: per-configuration hyperparameters and formulas Appendix C: exact commands, output structure, runtime, determinism Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…_log Major corrections: - _error_log.txt had duplicate entries from incremental runs, causing wrong averages (e.g., ddd3r_g2 showed 390m instead of correct 172m) - All data now extracted from per-seq eval_metric.txt (authoritative) - Added ATE Mean alongside RMSE for all tables (Section 1 + Section 3) - Full RPE table with Mean/RMSE for all 14 methods - Updated analysis with corrected rankings and dual-metric comparison Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add 6 missing auto_gamma CLI args to eval/mv_recon/launch.py (auto_gamma, auto_gamma_warmup, auto_gamma_max, auto_gamma_k, auto_gamma_lo, auto_gamma_hi) — fixes AttributeError when apply_ddd3r_config reads these from args - Add 4 auto-gamma configs to run_dtu_allconfigs.sh (14 total): auto_steep_clamp, auto_steep_sigmoid, auto_warmup_linear, auto_warmup_threshold Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…report - Section 5.3: fix 'Mean excl. 01' for ddd3r_g1 (119.01→115.62), ddd3r_g2 (106.47→106.20), auto_warmup_linear (120.87→117.74) — old values were computed from RMSE instead of Mean - Fix duplicate section 5.5 → renumber to 5.5 Auto-Gamma, 5.6 Short vs Long, 5.7 Brake Paradox Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sum(4541+1101+4661+801+271+2761+1101+1101+4071+1591+1201) = 23201 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DTU MVSNet 22-scene evaluation with all 14 DDD3R configurations. Results confirm over-update is length-dependent: at 49 frames, dampening provides no benefit (consistent with Sintel). Includes report, scripts, and 7scenes experiment preparation.
- Add Section 2.2: median Acc and per-scene win-rate table (many DDD3R variants improve on 14/22 scenes despite higher mean) - Soften analysis: mean is outlier-sensitive, median tells a different story - Add reproducibility note: 22-scene auto-discover patch vs hardcoded 15-scene - Remove overly definitive "50-90 frame threshold" claim - Address Codex review feedback
- Add complete 7scenes evaluation results for DDD3R spectrum (cut3r, ttt3r, constant, brake, ortho, ddd3r_g1-g5, 4 auto-gamma variants) - All configs use identical evaluation protocol matching TTT3R source: kf_every=2, max_frames=200, resolution 512x384, ICP threshold 0.1m - Depth preprocessing follows SimpleRecon z-buffer projection - Baseline validated against TTT3R paper Figure 9 - Add eval_from_npy.py for split inference/evaluation pipeline - Add apply_ddd3r_config() to eval/relpose/launch.py - Key results: brake best Chamfer (0.0271, -71.7% vs CUT3R), all DDD3R variants achieve 64-72% improvement over CUT3R
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add experiment matrix for CUT3R vs TTT3R
This PR adds:
The experiment compares CUT3R and TTT3R under different frame_interval settings.