Skip to content

LinJ0866/SFM-Net

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SFM-Net

"Scale-invariant Feature Matching Network for V-D-T Few-Shot Semantic Segmentation"
by Xiaofei Zhou, Jia Lin, Dongmei Chen, Deyang Liu, Jiyong Zhang and Runmin Cong (Corresponding author: Dongmei Chen, Runmin Cong)
Accepted at the IEEE Transactions on Image Processing (T-IP)

📑 Paper | 🌐 Project Page

🧠 Overview

We propose SFM-Net, a novel framework for V-D-T (visible-depth-thermal) few-shot semantic segmentation.

✨ Key Highlights:

  • 🔄 Asymmetric Multi-modal Fusion: Thermal images are fused with RGB in the encoder stage to extract rich semantic features. In contrast, we treat Depth as prior geometric information. It is integrated via a Prior-related Fusion (PF) module in the later stages to refine coarse predictions, avoiding noise interference from sparse depth maps during early feature extraction.
  • 📏 Scale-invariant Feature Matching: To address significant object scale variations between support and query images, we propose Pixel-to-Patch Pooling (PTP-pool) units that utilize multi-scale pooling kernels to generate feature patches, enabling robust correlation modeling between pixels and patches across different sizes.

Changelogs

  • 260221: adapt code to torch2.7-cuda12.6

⚡ Start

Requirements

  • Python 3.10
  • PyTorch 2.7.0
  • cuda 12.6

Conda environment settings:

conda create -n sfmnet python=3.10
conda activate sfmnet

pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu126 # modify with real environment

pip install -r requirements.txt

Dataset

pretrain checkpoint download

train and test

Use scripts/train.sh and scripts/test.sh to train and inference separately.

Benchmark Results (ResNet-50)

fold0 fold1 fold2 fold3 mIoU FB-IoU
VDT-2048-5i (1shot) 82.7 71.6 66.7 57.5 69.6 83.6
VDT-2048-5i (5shot) 82.8 71.7 67.2 57.5 69.8 83.7
Tokyo (1shot) 32.1 16.9 34.8 47.5 32.8 62.2
Tokyo (5shot) 34.4 21.3 37.2 48.2 35.3 63.5

The benchmark results of our work can be accessed in:

Acknowledgement

The work is based on DCAMA and PMNet. Thanks for the open source contributions to these efforts!

Citation

if you find our work useful, please cite our paper, thank you!

@ARTICLE{zhou2026scale,
  author={Zhou, Xiaofei and Lin, Jia and Chen, Dongmei and Liu, Deyang and Zhang, Jiyong and Cong, Runmin},
  journal={IEEE Transactions on Image Processing}, 
  title={Scale-Invariant Feature Matching Network for V-D-T Few-Shot Semantic Segmentation}, 
  year={2026},
  volume={35},
  number={},
  pages={2198-2209},
  doi={10.1109/TIP.2026.3663882}
}

About

[T-IP] Official code repository for "Scale-invariant Feature Matching Network for V-D-T Few-Shot Semantic Segmentation"

Resources

License

Stars

Watchers

Forks

Contributors