Tags

239 tags across the wiki

Pages tagged world-model

3D-VLA: A 3D Vision-Language-Action Generative World Model

📄 **[Read on arXiv](https://arxiv.org/abs/2403.09631)** 3D-VLA addresses a fundamental limitation of existing vision-language-action models: their reliance on 2D visual representations, which lack the spatial depth unde…

Cosmos World Foundation Model Platform For Physical Ai

source-summary

📄 **[Read on arXiv](https://arxiv.org/abs/2501.03575)** The Cosmos World Foundation Model Platform addresses Physical AI's critical challenge: the scarcity of safe, high-quality training data. By providing high-fidelity…

Drive-OccWorld: Driving in the Occupancy World

source-summary

📄 **[Read on arXiv](https://arxiv.org/abs/2408.14197)** Drive-OccWorld introduces a vision-centric 4D occupancy forecasting world model that directly integrates with end-to-end planning. The core premise is that current…

DriveDreamer: Towards Real-World-Driven World Models for Autonomous Driving

source-summary

[Read on arXiv](https://arxiv.org/abs/2309.09777) DriveDreamer (ECCV 2024) is the first world model built entirely from real-world driving data, addressing fundamental limitations of prior approaches that relied on simu…

Gaussianworld Gaussian World Model For Streaming 3D Occupancy Prediction

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2412.10373)** GaussianWorld introduces a world model paradigm for 3D occupancy prediction that explicitly models scene evolution over time, rather than treating frames as indepe…

Genad Generalized Predictive Model For Autonomous Driving

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2403.09630)** > **Note:** This is the CVPR 2024 Highlight paper on large-scale video prediction for driving, NOT the ECCV 2024 paper wiki/sources/papers/genad-generative-end-to-…

Hermes A Unified Self Driving World Model For Simultaneous 3D Scene Understanding And Generation

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2501.14729)** HERMES tackles a fundamental limitation in autonomous driving: existing systems treat 3D scene understanding and future scene generation as separate problems. Driv…

LAW: Enhancing End-to-End Autonomous Driving with Latent World Model

source-summary

[Read on arXiv](https://arxiv.org/abs/2406.08481) LAW (CASIA, ICLR 2025) introduces a self-supervised latent world model that enhances end-to-end autonomous driving by learning to predict future latent states of the dri…

Occworld Learning A 3D Occupancy World Model For Autonomous Driving

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2311.16038)** OccWorld introduces a generative world model that operates in 3D semantic occupancy space, jointly forecasting future scene evolution and ego vehicle trajectories.…

Unisim Learning Interactive Real World Simulators

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2310.06114)** UniSim addresses a fundamental bottleneck in embodied AI: the lack of high-fidelity, interactive simulators that generalize across domains. Rather than building se…

Vista A Generalizable Driving World Model With High Fidelity And Versatile Controllability

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2405.17398)** Vista (NeurIPS 2024) is a generalizable driving world model that achieves high-fidelity video prediction at 10 Hz and 576x1024 resolution with versatile multi-moda…

WoTE: End-to-End Driving with Online Trajectory Evaluation via BEV World Model

source-summary

📄 **[Read on arXiv](https://arxiv.org/abs/2504.01941)** End-to-end driving models typically output a single trajectory and trust it entirely, with no mechanism to evaluate whether the predicted path is safe before execu…