Papers | ML Systems Wiki

SparseDriveV2: Scoring is All You Need for End-to-End Autonomous Driving

2026 arXiv

:page_facing_up: **[Read on arXiv](https://arxiv.org/abs/2603.29163)** SparseDriveV2 by Sun et al. (2026) pushes the performance boundary of scoring-based trajectory planning by demonstrating that "scoring is all you ne…

paper autonomous-driving end-to-end sparse-representation +1

DrivoR: Driving on Registers

2026 arXiv 3

📄 **[Read on arXiv](https://arxiv.org/abs/2601.05083)** DrivoR is a full-transformer autonomous driving architecture that uses camera-aware register tokens to compress multi-camera Vision Transformer features into a com…

paper autonomous-driving e2e perception +3

WoTE: End-to-End Driving with Online Trajectory Evaluation via BEV World Model

2025 arXiv 81

📄 **[Read on arXiv](https://arxiv.org/abs/2504.01941)** End-to-end driving models typically output a single trajectory and trust it entirely, with no mechanism to evaluate whether the predicted path is safe before execu…

paper autonomous-driving vla world-model +3

SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving

2025 CVPR 2025

[Read on arXiv](https://arxiv.org/abs/2505.16805) SOLVE proposes a synergistic framework that combines a Vision-Language Model (VLM) reasoning branch (SOLVE-VLM) with an end-to-end (E2E) driving network (SOLVE-E2E), con…

paper autonomous-driving vla chain-of-thought +1

SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment

2025 CVPR 2025 89

📄 **[Read on arXiv](https://arxiv.org/abs/2503.09594)** Many driving VLM efforts improve language understanding (VQA, scene descriptions) but sacrifice actual driving performance. A model can correctly answer questions…

paper autonomous-driving vla vlm +3

S4-Driver: Scalable Self-Supervised Driving MLLM with Spatio-Temporal Visual Representation

2025 CVPR 16

📄 **[Read on arXiv](https://arxiv.org/abs/2505.24139)** S4-Driver is a self-supervised framework that adapts Multimodal Large Language Models (MLLMs) for autonomous vehicle motion planning. The system processes multi-vi…

paper autonomous-driving self-supervised multimodal +3

Pseudo-Simulation for Autonomous Driving (NAVSIM v2)

2025 CoRL 2025 62

:page_facing_up: **[Read on arXiv](https://arxiv.org/abs/2506.04218)** Pseudo-Simulation by Cao, Hallgarten et al. (Tubingen / Shanghai AI Lab / NVIDIA / Stanford, CoRL 2025) introduces a novel evaluation paradigm for a…

paper autonomous-driving benchmark simulation +1

Orion Holistic End To End Autonomous Driving By Vision Language Instructed Action Generation

2025 arxiv 100

📄 **[Read on arXiv](https://arxiv.org/abs/2503.19755)** ORION bridges the reasoning-action gap in driving VLAs through a three-component architecture consisting of QT-Former (visual encoding), an LLM reasoning core, and…

paper autonomous-driving vla vlm +3

Opendrivevla Towards End To End Autonomous Driving With Large Vision Language Action Model

2025 arXiv

📄 **[Read on arXiv](https://arxiv.org/abs/2503.23463)** OpenDriveVLA introduces a Vision-Language Action model specifically designed for end-to-end autonomous driving. Unlike previous approaches that use VLMs as supplem…

autonomous-driving vla end-to-end language-model

OccMamba: Semantic Occupancy Prediction with State Space Models

2025 CVPR 32

**[Read on arXiv](https://arxiv.org/abs/2408.09859)** OccMamba is the first Mamba-based network for semantic occupancy prediction, replacing transformer architectures' quadratic complexity with Mamba's linear complexity…

paper autonomous-driving perception 3d-occupancy +2

Momad Momentum Aware Planning In End To End Autonomous Driving

2025 CVPR 60

📄 **[Read on arXiv](https://arxiv.org/abs/2503.03125)** End-to-end autonomous driving systems suffer from a critical limitation: temporal inconsistency. Current systems operate in a "one-shot" manner, making trajectory…

autonomous-driving planning end-to-end trajectory-prediction

LAW: Enhancing End-to-End Autonomous Driving with Latent World Model

2025 ICLR

[Read on arXiv](https://arxiv.org/abs/2406.08481) LAW (CASIA, ICLR 2025) introduces a self-supervised latent world model that enhances end-to-end autonomous driving by learning to predict future latent states of the dri…

paper autonomous-driving world-model self-supervised +1

Hermes A Unified Self Driving World Model For Simultaneous 3D Scene Understanding And Generation

2025 arXiv 38

📄 **[Read on arXiv](https://arxiv.org/abs/2501.14729)** HERMES tackles a fundamental limitation in autonomous driving: existing systems treat 3D scene understanding and future scene generation as separate problems. Driv…

autonomous-driving world-model 3d-scene perception +1

GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectory Generation

2025 CVPR

[Read on arXiv](https://arxiv.org/abs/2503.05689) GoalFlow (Horizon Robotics / HKU, CVPR 2025) introduces a goal-driven flow matching framework for multimodal trajectory generation in autonomous driving. The method achi…

paper autonomous-driving flow-matching planning +1

GaussRender: Learning 3D Occupancy with Gaussian Rendering

2025 ICCV 2025 13

:page_facing_up: **[Read on arXiv](https://arxiv.org/abs/2502.05040)** GaussRender by Chambon et al. (Valeo AI / Sorbonne, ICCV 2025) introduces a plug-and-play training-time module that improves 3D occupancy prediction…

paper autonomous-driving perception 3d-occupancy +1

GaussianLSS: Toward Real-world BEV Perception with Depth Uncertainty via Gaussian Splatting

2025 CVPR 18

📄 **[Read on arXiv](https://arxiv.org/abs/2504.01957)** Bird's-Eye View (BEV) perception faces a fundamental trade-off between accuracy and computational efficiency. High-performing 3D projection methods like BEVFormer…

paper autonomous-driving bev perception +2

GaussianFlowOcc: Sparse and Weakly Supervised Occupancy Estimation using Gaussian Splatting and Temporal Flow

2025 ICCV 2025 19

:page_facing_up: **[Read on arXiv](https://arxiv.org/abs/2502.17288)** GaussianFlowOcc (ICCV 2025) introduces a transformative approach to 3D semantic occupancy estimation for autonomous driving by replacing traditional…

paper autonomous-driving perception 3d-occupancy +2

EMMA: End-to-End Multimodal Model for Autonomous Driving

2025 TMLR 150

📄 **[Read on arXiv](https://arxiv.org/abs/2410.23262)** EMMA is Waymo's industry-scale demonstration of the "everything as language tokens" paradigm for autonomous driving. A single large multimodal foundation model uni…

paper autonomous-driving vla vlm +3

Drivetransformer Unified Transformer For Scalable End To End Autonomous Driving

2025 ICLR 2025 91

📄 **[Read on arXiv](https://arxiv.org/abs/2503.07656)** DriveTransformer represents a fundamental departure from existing end-to-end autonomous driving approaches. Rather than following sequential perception-prediction-…

autonomous-driving transformer end-to-end planning

DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

2025 arXiv 55

📄 **[Read on arXiv](https://arxiv.org/abs/2505.16278)** DriveMoE introduces a dual-level Mixture-of-Experts (MoE) architecture to driving Vision-Language-Action models. The key innovation is applying expert specializati…

paper autonomous-driving vla mixture-of-experts +3

DriveGPT: Scaling Autoregressive Behavior Models for Driving

2025 ICML

[Read on arXiv](https://arxiv.org/abs/2412.14415) DriveGPT (Cruise, ICML 2025) is the first work to systematically study scaling laws for autoregressive behavior models in autonomous driving. Drawing inspiration from th…

paper autonomous-driving scaling-laws autoregressive +1

Dima Distilling Multi Modal Large Language Models For Autonomous Driving

2025 CVPR 34

📄 **[Read on arXiv](https://arxiv.org/abs/2501.09757)** DiMA addresses the core tension in autonomous driving between vision-based planners (efficient but fragile on rare scenarios) and LLM-based approaches (strong reas…

autonomous-driving knowledge-distillation multimodal language-model

DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving

2025 CVPR

[Read on arXiv](https://arxiv.org/abs/2411.15139) DiffusionDrive (HUST/Horizon Robotics, CVPR 2025 Highlight) proposes a truncated diffusion model for end-to-end autonomous driving that achieves real-time inference whil…

paper autonomous-driving diffusion end-to-end +1

CarPlanner: Consistent Auto-regressive RL Planner for Autonomous Driving

2025 CVPR

[Read on arXiv](https://arxiv.org/abs/2502.19908) CarPlanner (Zhejiang University + Cainiao Network, CVPR 2025) introduces a consistent autoregressive reinforcement learning planner that is the first RL-based planner to…

paper autonomous-driving reinforcement-learning planning +1

BridgeAD: Bridging Past and Future End-to-End Autonomous Driving with Historical Prediction

2025 CVPR 22

📄 **[Read on arXiv](https://arxiv.org/abs/2503.14182)** BridgeAD tackles a critical limitation in end-to-end autonomous driving: the ineffective utilization of historical temporal information. Current systems either agg…

paper autonomous-driving end-to-end prediction +2

BEVDiffuser: Plug-and-Play Diffusion Model for BEV Denoising with Ground-Truth Guidance

2025 CVPR 14

**[Read on arXiv](https://arxiv.org/abs/2502.19694)** BEVDiffuser addresses a fundamental but under-explored problem in BEV-based perception: the inherent noise in BEV feature maps caused by sensor limitations and the l…

paper autonomous-driving perception bev +2

Autovala Vision Language Action Model For End To End Autonomous Driving

2025 arXiv 110

📄 **[Read on arXiv](https://arxiv.org/abs/2506.13757)** AutoVLA presents a unified approach to autonomous driving that integrates vision, language understanding, and action generation within a single autoregressive mode…

autonomous-driving vla reinforcement-learning end-to-end +1

AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

2025 arXiv 75

Bo Jiang, Shaoyu Chen, Qian Zhang, Wenyu Liu, Xinggang Wang, arXiv, 2025. 📄 **[Read on arXiv](https://arxiv.org/abs/2503.07608)** AlphaDrive is the first application of GRPO (Group Relative Policy Optimization) reinforc…

paper autonomous-driving vla vlm +3

Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail

2025 arXiv 42

Yan Wang, Wenjie Luo, Junjie Bai, Yulong Cao, Marco Pavone + 37 co-authors (NVIDIA), arXiv, 2025. 📄 **[Read on arXiv](https://arxiv.org/abs/2511.00088)** Alpamayo-R1 is NVIDIA's production-grade Vision-Language-Action (…

paper autonomous-driving vla vlm +3

VLP: Vision Language Planning for Autonomous Driving

2024 CVPR 155

📄 **[Read on arXiv](https://arxiv.org/abs/2401.05577)** VLP (Vision Language Planning) by Pan et al. (CVPR 2024) represents a fundamentally different approach to using language in autonomous driving compared to instruct…

paper autonomous-driving vla vlm +2

Vista A Generalizable Driving World Model With High Fidelity And Versatile Controllability

2024 NeurIPS 2024

📄 **[Read on arXiv](https://arxiv.org/abs/2405.17398)** Vista (NeurIPS 2024) is a generalizable driving world model that achieves high-fidelity video prediction at 10 Hz and 576x1024 resolution with versatile multi-moda…

autonomous-driving world-model diffusion video-prediction +3

VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

2024 arXiv 140

📄 **[Read on arXiv](https://arxiv.org/abs/2402.13243)** VADv2 by Chen et al. (2024) is the successor to VAD, addressing a fundamental limitation of deterministic planners in autonomous driving: they output a single traj…

autonomous-driving end-to-end planning vectorized-representation +2

Talk2Drive Towards Personalized Autonomous Driving With Large Language Models

2024 IEEE ITSC 2024 80

📄 **[Read on arXiv](https://arxiv.org/abs/2312.09397)** Talk2Drive introduces an LLM-based framework for personalized autonomous driving through natural language interaction, demonstrated in real-world field experiments…

autonomous-driving llm planning nlp +2

SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction

2024 CVPR 50

📄 **[Read on arXiv](https://arxiv.org/abs/2404.09502)** Dense 3D occupancy prediction from multi-view cameras has become a key perception task for autonomous driving, but most methods process the full voxel volume -- in…

autonomous-driving perception 3d-occupancy computer-vision +2

SparseOcc: Fully Sparse 3D Occupancy Prediction

2024 ECCV 80

📄 **[Read on arXiv](https://arxiv.org/abs/2312.17118)** 3D occupancy prediction has become a critical perception paradigm for autonomous driving, but existing methods process dense 3D volumes even though over 90% of vox…

autonomous-driving perception 3d-occupancy sparse-representation +3

SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation

2024 ICRA 2025 181

:page_facing_up: **[Read on arXiv](https://arxiv.org/abs/2405.19620)** SparseDrive by Sun et al. (ICRA 2025) proposes a paradigm shift from dense BEV-based end-to-end driving to fully sparse scene representations. The c…

paper autonomous-driving end-to-end sparse-representation +1

Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving

2024 arXiv 102

📄 **[Read on arXiv](https://arxiv.org/abs/2410.22313)** Two dominant paradigms exist in autonomous driving: large vision-language models (LVLMs) with strong reasoning but poor trajectory precision, and end-to-end (E2E)…

paper autonomous-driving vla vlm +3

SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction

2024 CVPR 60

📄 **[Read on arXiv](https://arxiv.org/abs/2311.12754)** SelfOcc (Huang et al., Tsinghua University, CVPR 2024) introduces the first self-supervised framework for vision-based 3D occupancy prediction that works with mult…

autonomous-driving perception 3d-occupancy self-supervised +3

RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion

2024 CVPR 2025 15

:page_facing_up: **[Read on arXiv](https://arxiv.org/abs/2412.12725)** RaCFormer by Chu et al. (USTC, CVPR 2025) addresses a fundamental problem in radar-camera fusion for 3D object detection: the image-to-BEV transform…

paper autonomous-driving perception radar +2

PARA-Drive: Parallelized Architecture for Real-time Autonomous Driving

2024 CVPR 2024

[Read on CVF Open Access](https://openaccess.thecvf.com/content/CVPR2024/html/Weng_PARA-Drive_Parallelized_Architecture_for_Real-time_Autonomous_Driving_CVPR_2024_paper.html) PARA-Drive (NVIDIA Research / USC / Stanford…

paper autonomous-driving end-to-end real-time +1

Occworld Learning A 3D Occupancy World Model For Autonomous Driving

2024 ECCV 198

📄 **[Read on arXiv](https://arxiv.org/abs/2311.16038)** OccWorld introduces a generative world model that operates in 3D semantic occupancy space, jointly forecasting future scene evolution and ego vehicle trajectories.…

autonomous-driving world-model 3d-occupancy planning

OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving

2024 ECCV 50

📄 **[Read on arXiv](https://arxiv.org/abs/2404.15014)** OccGen reframes 3D semantic occupancy prediction as a conditional generative problem rather than a purely discriminative one. Prior occupancy methods (SurroundOcc,…

autonomous-driving perception 3d-occupancy diffusion +3

NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking

2024 NeurIPS 2024 100

:page_facing_up: **[Read on arXiv](https://arxiv.org/abs/2406.15349)** Autonomous vehicle evaluation has long been split between two unsatisfying extremes: open-loop metrics that replay logged trajectories and compare p…

autonomous-driving benchmark simulation evaluation +2

Lmdrive Closed Loop End To End Driving With Large Language Models

2024 CVPR 294

📄 **[Read on arXiv](https://arxiv.org/abs/2312.07488)** LMDrive is the first system to demonstrate and benchmark LLM-based driving in closed-loop simulation, introducing the LangAuto benchmark with ~64K instruction-foll…

paper autonomous-driving llm e2e +2

Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

2024 CVPR 2024

[Read on arXiv](https://arxiv.org/abs/2312.03031) This paper (CVPR 2024, NVIDIA / Nanjing University) delivers a "wake-up call" to the autonomous driving research community by demonstrating that simple baselines using o…

paper autonomous-driving evaluation benchmark +1

Hydra-MDP: End-to-End Multimodal Planning with Multi-Target Hydra-Distillation

2024 CVPR 2024 Autonomous Grand Challenge (1st place) 50

:page_facing_up: **[Read on arXiv](https://arxiv.org/abs/2406.06978)** Hydra-MDP addresses a fundamental limitation of imitation learning for autonomous driving: standard behavior cloning learns only to mimic human demo…

autonomous-driving end-to-end planning knowledge-distillation +2

GenAD: Generative End-to-End Autonomous Driving

2024 ECCV

[Read on arXiv](https://arxiv.org/abs/2402.11502) GenAD (ECCV 2024) reframes end-to-end autonomous driving as a generative modeling problem, simultaneously generating future trajectories for all traffic participants rat…

paper autonomous-driving end-to-end generative +1

Genad Generalized Predictive Model For Autonomous Driving

2024 CVPR 2024 Highlight

📄 **[Read on arXiv](https://arxiv.org/abs/2403.09630)** > **Note:** This is the CVPR 2024 Highlight paper on large-scale video prediction for driving, NOT the ECCV 2024 paper wiki/sources/papers/genad-generative-end-to-…

autonomous-driving video-prediction diffusion foundation-model +2

GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding

2024 arXiv 41

**[Read on arXiv](https://arxiv.org/abs/2412.13193)** GaussTR is a Gaussian-based Transformer framework that achieves zero-shot semantic occupancy prediction without any 3D annotations. The key idea is to combine sparse…

paper autonomous-driving perception 3d-occupancy +2

Gaussianworld Gaussian World Model For Streaming 3D Occupancy Prediction

2024 arXiv 2024 59

📄 **[Read on arXiv](https://arxiv.org/abs/2412.10373)** GaussianWorld introduces a world model paradigm for 3D occupancy prediction that explicitly models scene evolution over time, rather than treating frames as indepe…

autonomous-driving world-model 3d-occupancy gaussian-splatting +1

GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting

2024 arXiv 2024 47

:page_facing_up: **[Read on arXiv](https://arxiv.org/abs/2408.11447)** GaussianOcc by Gan et al. (University of Tokyo / RIKEN / South China University of Technology / SIAT-CAS) is a systematic method that applies Gaussi…

paper autonomous-driving perception 3d-occupancy +2

GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction

2024 arXiv 57

**[Read on arXiv](https://arxiv.org/abs/2412.04384)** GaussianFormer-2 addresses 3D semantic occupancy prediction for vision-centric autonomous driving by rethinking how 3D Gaussians represent occupied space. The origin…

paper autonomous-driving perception 3d-occupancy +1

Gaussianformer Scene As Gaussians For Vision Based 3D Semantic Occupancy Prediction

2024 ECCV 128

📄 **[Read on arXiv](https://arxiv.org/abs/2405.17429)** GaussianFormer introduces a fundamentally different scene representation for 3D semantic occupancy prediction: instead of dense voxel grids, scenes are modeled as…

autonomous-driving perception 3d-occupancy gaussian-representation

GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation

2024 arXiv 20

📄 **[Read on arXiv](https://arxiv.org/abs/2407.14108)** Bird's-eye view (BEV) semantic segmentation from multi-camera images is a core perception task in autonomous driving, but existing image-to-BEV transformation meth…

autonomous-driving perception bev gaussian-splatting +2

Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving

2024 ICRA

[Read on arXiv](https://arxiv.org/abs/2310.01957) Driving with LLMs (Wayve, ICRA 2024) is one of the first concrete demonstrations of using a large language model as the decision-making "brain" for autonomous driving. T…

paper autonomous-driving language-model explainability +1

Driving Gaussian Composite Gaussian Splatting For Surrounding Dynamic Driving Scenes

2024 CVPR 398

📄 **[Read on arXiv](https://arxiv.org/abs/2312.07920)** DrivingGaussian addresses photorealistic 3D scene reconstruction for dynamic autonomous driving environments using Gaussian splatting. The core challenge is that d…

autonomous-driving 3d-reconstruction gaussian-splatting simulation

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

2024 arXiv 416

📄 **[Read on arXiv](https://arxiv.org/abs/2402.12289)** DriveVLM proposes a hierarchical approach to integrating Vision-Language Models into autonomous driving, emphasizing scene understanding and multi-level planning r…

paper autonomous-driving vlm planning

DriveLM: Driving with Graph Visual Question Answering

2024 ECCV 448

📄 **[Read on arXiv](https://arxiv.org/abs/2312.14150)** DriveLM formalizes driving reasoning as Graph Visual Question Answering (GVQA), where QA pairs are connected via logical dependencies forming a reasoning graph tha…

paper autonomous-driving vlm reasoning +2

DriveGPT4: Interpretable End-to-End Autonomous Driving via Large Language Model

2024 IEEE RA-L 576

Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee K. Wong, Zhenguo Li, Hengshuang Zhao, IEEE Robotics and Automation Letters, 2024. 📄 **[Read on arXiv](https://arxiv.org/abs/2310.01412)** DriveGPT4 applie…

paper autonomous-driving vla vlm +2

DriveDreamer: Towards Real-World-Driven World Models for Autonomous Driving

2024 ECCV 2024

[Read on arXiv](https://arxiv.org/abs/2309.09777) DriveDreamer (ECCV 2024) is the first world model built entirely from real-world driving data, addressing fundamental limitations of prior approaches that relied on simu…

paper autonomous-driving world-model generation +1

Drive-OccWorld: Driving in the Occupancy World

2024 AAAI 2025 49

📄 **[Read on arXiv](https://arxiv.org/abs/2408.14197)** Drive-OccWorld introduces a vision-centric 4D occupancy forecasting world model that directly integrates with end-to-end planning. The core premise is that current…

paper autonomous-driving world-model 3d-occupancy +3

Covla Comprehensive Vision Language Action Dataset For Autonomous Driving

2024 WACV 2025 Oral 30

📄 **[Read on arXiv](https://arxiv.org/abs/2408.10845)** Autonomous driving systems face the "long tail" problem -- handling countless rare and complex driving scenarios beyond common situations. While traditional rule-b…

autonomous-driving vla multimodal dataset +3

Bevnext Reviving Dense Bev Frameworks For 3D Object Detection

2024 CVPR 2024 80

📄 [arXiv:2312.01696](https://arxiv.org/abs/2312.01696) BEVNeXt revives dense BEV (bird's-eye-view) frameworks for camera-based 3D object detection, demonstrating that with the right design choices, dense approaches can…

autonomous-driving perception bev transformer +2

Asyncdriver Asynchronous Large Language Model Enhanced Planner For Autonomous Driving

2024 ECCV 41

📄 **[Read on arXiv](https://arxiv.org/abs/2406.14556)** AsyncDriver addresses the practical deployment problem of LLM-enhanced driving planners: LLMs are too slow for frame-by-frame planning. The key insight is that hig…

autonomous-driving language-model planning asynchronous

Agent-Driver: A Language Agent for Autonomous Driving

2024 COLM 2024 140

📄 **[Read on arXiv](https://arxiv.org/abs/2311.10813)** Agent-Driver reframes autonomous driving as a cognitive agent problem, positioning a large language model as the central reasoning and planning engine rather than…

paper autonomous-driving llm planning +3

VAD: Vectorized Scene Representation for Efficient Autonomous Driving

2023 ICCV 567

📄 **[Read on arXiv](https://arxiv.org/abs/2303.12077)** VAD (Vectorized Scene Representation for Efficient Autonomous Driving) by Jiang et al. (ICCV 2023) is a pivotal paper in the shift from dense rasterized scene repr…

paper autonomous-driving planning vectorized-representation

Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving

2023 CVPR 2023 180

📄 **[Read on arXiv](https://arxiv.org/abs/2305.06242)** Think Twice (Jia et al., 2023) addresses a fundamental imbalance in end-to-end autonomous driving: while the community has invested heavily in sophisticated encode…

autonomous-driving end-to-end planning imitation-learning +2

SurroundOcc: Multi-camera 3D Occupancy Prediction for Autonomous Driving

2023 ICCV

📄 **[Read on arXiv](https://arxiv.org/abs/2303.09551)** SurroundOcc addresses the problem of dense 3D semantic occupancy prediction from multi-camera images for autonomous driving. Unlike 3D object detection, which repr…

autonomous-driving perception occupancy 3d-reconstruction +3

Reason2Drive Towards Interpretable And Chain Based Reasoning For Autonomous Driving

2023 ECCV 107

📄 **[Read on arXiv](https://arxiv.org/abs/2312.03661)** Reason2Drive provides the largest reasoning chain dataset for driving (>600K video-text pairs from nuScenes, Waymo, and ONCE) and introduces an aggregated evaluati…

paper autonomous-driving vla reasoning +2

Planning-oriented Autonomous Driving

2023 CVPR 1201

📄 **[Read on arXiv](https://arxiv.org/abs/2212.10156)** UniAD (Unified Autonomous Driving) is a planning-oriented end-to-end framework that unifies perception, prediction, and planning into a single differentiable netwo…

paper autonomous-driving uniad planning +1

OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction

2023 ICCV 280

📄 **[Read on arXiv](https://arxiv.org/abs/2304.05316)** Vision-based 3D semantic occupancy prediction aims to predict the semantic class and occupancy status of every voxel in a 3D volume surrounding the ego vehicle, us…

autonomous-driving perception transformer computer-vision +3

Languagempc Large Language Models As Decision Makers For Autonomous Driving

2023 arXiv 100

📄 **[Read on arXiv](https://arxiv.org/abs/2310.03026)** LanguageMPC addresses a fundamental limitation in autonomous driving: traditional planners (MPC, RL) struggle with complex scenarios that require high-level reason…

autonomous-driving llm planning nlp +3

GPT-Driver: Learning to Drive with GPT

2023 NeurIPS FMDM Workshop 396

📄 **[Read on arXiv](https://arxiv.org/abs/2310.01415)** GPT-Driver reformulates autonomous driving motion planning as a language modeling problem. Scene context (object positions, velocities, lane geometry) and ego vehi…

paper autonomous-driving vla llm +2

FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin

2023 arXiv

📄 **[Read on arXiv](https://arxiv.org/abs/2311.12058)** Occupancy prediction has emerged as a powerful perception paradigm for autonomous driving, predicting per-voxel semantic labels in 3D space to handle arbitrary obj…

autonomous-driving perception 3d-occupancy bev +3

Fb Bev Bev Representation From Forward Backward View Transformations

2023 ICCV 150

📄 **[Read on arXiv](https://arxiv.org/abs/2308.02236)** FB-BEV addresses a fundamental tension in camera-based BEV perception for autonomous driving: **forward projection** methods (like Lift-Splat-Shoot) generate BEV f…

autonomous-driving perception bev transformer +1

DriveMLM: Aligning Multi-Modal LLMs with Behavioral Planning States

2023 arXiv 241

📄 **[Read on arXiv](https://arxiv.org/abs/2312.09245)** DriveMLM proposes using a multimodal LLM as a plug-and-play behavioral planning module within existing autonomous driving stacks (Apollo, Autoware), rather than re…

paper autonomous-driving vla llm +2

DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving

2023 ICCV 2023

📄 **[Read on arXiv](https://arxiv.org/abs/2308.00398)** DriveAdapter (Jia et al., ICCV 2023) identifies and addresses a fundamental structural problem in end-to-end autonomous driving: the tight coupling between percept…

autonomous-driving end-to-end planning imitation-learning +2

Drive as You Speak: Enabling Human-Like Interaction with Large Language Models in Autonomous Vehicles

2023 arXiv

📄 **[Read on arXiv](https://arxiv.org/abs/2309.10228)** Drive as You Speak (DAYS) proposes a framework for enabling natural language interaction between human passengers and autonomous vehicles using large language mode…

paper autonomous-driving llm planning +3

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

2023 CVPR 2023

📄 **[Read on arXiv](https://arxiv.org/abs/2211.10439)** BEVFormer v2 addresses a critical bottleneck in camera-based 3D perception for autonomous driving: the inability to leverage powerful modern 2D image backbones (e.…

autonomous-driving perception bev transformer +2

TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving

2022 IEEE TPAMI 2023 600

📄 **[Read on arXiv](https://arxiv.org/abs/2205.15997)** TransFuser (Chitta et al., 2022) is a foundational paper for transformer-based sensor fusion in end-to-end autonomous driving. The key problem it addresses is how…

paper autonomous-driving e2e transformer +1

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

2022 ECCV 1826

📄 **[Read on arXiv](https://arxiv.org/abs/2203.17270)** Li, Wang, Li, Xie, Sima, Lu, Yu, Dai (Shanghai AI Lab / Nanjing University / HKU), ECCV, 2022. - [Paper](https://arxiv.org/abs/2203.17270) BEVFormer generates a un…

paper autonomous-driving perception bev +1

VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation

2020 CVPR 1035

📄 **[Read on arXiv](https://arxiv.org/abs/2005.04259)** VectorNet (Gao et al., Waymo/Google, CVPR 2020) is a foundational paper that moved motion prediction and map encoding away from rasterized image-based representati…

paper autonomous-driving prediction vectorized-representation

Nuscenes A Multimodal Dataset For Autonomous Driving

2020 CVPR 7791

📄 **[Read on arXiv](https://arxiv.org/abs/1903.11027)** nuScenes is a large-scale multimodal dataset for autonomous driving that provides synchronized data from 6 cameras (360-degree coverage), 1 LiDAR, 5 radars, GPS, a…

paper autonomous-driving benchmark dataset

Lift Splat Shoot Encoding Images From Arbitrary Camera Rigs By Implicitly Unprojecting To 3D

2020 ECCV 1510

📄 **[Read on arXiv](https://arxiv.org/abs/2008.05711)** Lift, Splat, Shoot (LSS) introduced a differentiable pipeline for transforming multi-camera images into a unified bird's-eye view (BEV) representation without requ…

paper autonomous-driving perception bev

Learning Lane Graph Representations for Motion Forecasting

2020 ECCV 750

📄 **[Read on arXiv](https://arxiv.org/abs/2007.13732)** LaneGCN introduces a graph neural network architecture for motion forecasting in autonomous driving that operates directly on the lane graph structure of HD maps.…

paper autonomous-driving prediction lanegcn

Talk2Car: Taking Control of Your Self-Driving Car

2019 EMNLP-IJCNLP 182

📄 **[Read on arXiv](https://arxiv.org/abs/1909.10838)** For autonomous vehicles to be truly useful as personal transportation, passengers should be able to issue natural-language commands like "park behind that blue car…

paper autonomous-driving vla grounding +2

Learning by Cheating

2019 CoRL 632

📄 **[Read on arXiv](https://arxiv.org/abs/1912.12294)** Learning by Cheating introduces a two-stage training paradigm for end-to-end autonomous driving that has become one of the most influential design patterns in the…

paper autonomous-driving imitation-learning privileged-supervision

ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst

2019 RSS 2019 844

📄 **[Read on arXiv](https://arxiv.org/abs/1812.03079)** Bansal, Krizhevsky, Ogale (Waymo Research), RSS, 2019. - [Paper](https://arxiv.org/abs/1812.03079) ChauffeurNet is Waymo's mid-level imitation learning system that…

paper autonomous-driving imitation-learning planning

Textual Explanations for Self-Driving Vehicles

2018 ECCV 427

📄 **[Read on arXiv](https://arxiv.org/abs/1807.11546)** End-to-end driving models produce control signals without any rationale, making them opaque and untrustworthy for safety-critical deployment. This paper by Kim et…

paper autonomous-driving vla explainability +2

End-to-end Driving via Conditional Imitation Learning

2018 ICRA 1227

📄 **[Read on arXiv](https://arxiv.org/abs/1710.02410)** This paper introduces conditional imitation learning for end-to-end autonomous driving, where a neural network policy is conditioned on a discrete high-level comma…

paper autonomous-driving imitation-learning e2e +1

CARLA: An Open Urban Driving Simulator

2017 CoRL 6490

Dosovitskiy, Ros, Codevilla, Lopez, Koltun (Intel Labs / Toyota Research Institute / CVC Barcelona), CoRL, 2017. 📄 **[Read on arXiv](https://arxiv.org/abs/1711.03938)** CARLA (Car Learning to Act) is an open-source simu…

paper autonomous-driving benchmark simulator

End to End Learning for Self-Driving Cars

2016 arXiv 4537

📄 **[Read on arXiv](https://arxiv.org/abs/1604.07316)** This paper from NVIDIA, commonly known as "DAVE-2" or the "NVIDIA end-to-end driving paper," demonstrates that a single convolutional neural network can learn to m…

paper autonomous-driving e2e