I'm an engineer at xAI focusing on multimodal, VLM and world models.
🤗 Open Source Projects:
- Megatron-LM MoE: Scaling up mixture of experts
- ActGPT: browser-use agent
- Cosmos: state-of-the-art generative world models
- NeMo DFM: large-scale training and inference framework for diffusion models
- NeMo: scalable training framework for LLMs transformers
- LongVILA: Long-Context VLM for long videos (ICLR'25)
- Channel Pruning: Accelerating Very Deep Neural Networks (ICCV'17)
- Epipolar Transformers: Accurate multi-camera pose understanding (CVPR'20)
- AMC: AutoML for model compression (ECCV'18)
- KL Loss: Accurate Object Detection (CVPR'19)
- FSAF: single-shot object detection (CVPR'19)
🎙️ Invited Talks





