Native Multimodal Models are World Learners
Python 1.4k 54
Next-Token Prediction is All You Need
Python 2.3k 90
Emu Series: Generative Multimodal Models from BAAI
Python 1.8k 84
EVA Series: Visual Representation Fantasies from BAAI
Python 2.6k 189
Painter & SegGPT Series: Vision Foundation Models from BAAI
Python 2.6k 181
[CVPR'25 Highlight] You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale
Python 706 18
[ICLR 2026] 🐻 Uniform Discrete Diffusion with Metric Path for Video Generation
[ICLR'24 Spotlight] Uni3D: 3D Visual Representation from BAAI
[ICLR 2025] Autoregressive Video Generation without Vector Quantization
[ICLR 2026] Unified Vision-Language-Action Model
MTVCraft: An Open Veo3-style Audio-Video Generation Demo
[NeurIPS 2025] Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards
EVE Series: Encoder-Free Vision-Language Models from BAAI