An experimental playground for crafting adversarial attacks against deep Optical Character Recognition (OCR) models. Inspired by the original CLOVA AI benchmark. The project currently implements the BIM (Basic Iterative Method) attack to demonstrate how small perturbations can flip model predictions.
- Adversarial attack implementation: BIM (extensible design for more)
- Visualization of original vs adversarial images and perturbation maps
- Evaluation metrics: attack success rate, L2 and L∞ norms
- Supports both CTC-based and Attention-based recognition models
- Cross-platform: dedicated Windows script
attack_win.py
Place the following weights inside saved_models/:
- ✅
None-ResNet-None-CTC.pth - ✅
None-VGG-BiLSTM-CTC.pth - ✅
TPS-ResNet-BiLSTM-Attn.pth - ✅
TPS-ResNet-BiLSTM-CTC.pth
TextVenom/
├── attack.py # Main attack script (Linux/macOS)
├── attack_win.py # Windows optimized script
├── src/
│ ├── model.py # Model definitions
│ ├── dataset.py # Dataset utilities
│ ├── utils.py # Helper functions
│ ├── visualization.py # Visualization helpers
│ └── modules/ # Sub modules
├── saved_models/ # Pretrained weights
├── CUTE80/ # Test dataset
└── README.md # English documentation
Follow the upstream repository for dependency setup: https://github.com/clovaai/deep-text-recognition-benchmark
Link: https://drive.google.com/drive/folders/15WPsuPJDCzhp2SvYZLRj8mAlT3zmoAMW
Place the four model files into saved_models/.
Windows:
python attack_win.py
python attack_win.py --model_path "saved_models/TPS-ResNet-BiLSTM-Attn.pth"Linux / macOS:
python attack.py
python attack.py --model_path "saved_models/TPS-ResNet-BiLSTM-Attn.pth"Automatically generated outputs:
- Original vs adversarial image comparison
- Original vs adversarial predictions
- Perturbation heat/difference map
Iterative procedure:
- Initialize with the clean input
- Compute gradients and apply a step update
- Project back into the ε L∞ ball
- Stop after fixed iterations or early criteria
Parameters:
epsilon: max perturbation (default 0.3)alpha: step size (default 0.01)num_iterations: iterations (default 20)
- Attack success rate
- Mean L2 norm
- Mean L∞ norm
Example:
Model: TPS-ResNet-BiLSTM-Attn
Attack Success Rate: 85.3%
Mean L2: 0.123
Mean L∞: 0.301
Highlights the vulnerability of OCR systems under carefully crafted perturbations and provides a baseline for robustness and defense research.
- Adversarial training
- Input preprocessing (denoise / smoothing)
- Ensemble strategies
- Adversarial sample detection
For research and security evaluation only. Do not use for malicious or illegal purposes. You assume all responsibility.
- Deep Text Recognition Benchmark
- Kurakin et al., 2016
- Goodfellow et al., 2014
Issues and PRs are welcome—especially new attack/defense methods.
"The best way to attack is to make the enemy think they're winning while you control the game."