Skip to content

LinJ0866/M4-SAM

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

M⁴-SAM

"M⁴-SAM: Multi-Modal Mixture-of-Experts with Memory-Augmented SAM for RGB-D Video Salient Object Detection" by Jiyuan Liu, Jia Lin, Xiaofei Zhou*, Runmin Cong, Deyang Liu, Zhi Liu 🎉 CVPR 2026 Accepted!

📑 Paper (arXiv) (to be added) | 💻 Code(Github)

🧠 Overview

We propose M⁴-SAM, a prompt-free framework that adapts SAM2 for RGB-D video salient object detection by introducing modality-related PEFT, hierarchical feature fusion, and prompt-free memory initialization.

Key Highlights:

  • 💡 Modality-Aware MoE-LoRA: elevates vanilla LoRA with convolutional experts and modality-specific routing for adaptive RGB-D feature fusion and efficient fine-tuning.
  • 🧩 Gated Multi-Level Feature Fusion: hierarchically aggregates multi-scale encoder features with an adaptive gating mechanism to balance spatial details and semantic context.
  • 🚀 Pseudo-Guided Initialization: bootstraps the memory bank using a coarse mask as a pseudo prior, enabling zero-shot VSOD without manual prompts.

⚡ Start

Code is coming soon! Stay tuned.

Prepare Dataset

RDVS, ViDSOD-100 and DViSal

Pretrained Checkpoint

Dependent Models: SAM2 — download sam2.1_hiera_large.pt

Acknowledgement

Our work would not have been possible without the following open-source projects:

Thanks for their great contributions!

Citation

If you find our work useful, please cite our paper, thank you!

@inproceedings{liu2026m4sam,
  title={M$^4$-SAM: Multi-Modal Mixture-of-Experts with Memory-Augmented SAM for RGB-D Video Salient Object Detection},
  author={Liu, Jiyuan and Lin, Jia and Zhou, Xiaofei and Cong, Runmin and Liu, Deyang and Liu, Zhi},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}

About

[CVPR 2026] Forked code repository for "M⁴-SAM: Multi-modal Mixture-of-Experts with Memory-Augmented SAM for RGB-D Video Salient Object Detection"

Resources

License

Stars

Watchers

Forks

Contributors