Autonomous Driving Seminal Papers
Goal: build a durable corpus of high-impact autonomous driving papers, prioritizing papers with strong citation footprints, lasting conceptual importance, or clear influence on later systems.
Collection rule
Use citation count as a filter, not a definition. The corpus should include:
- papers that exceed roughly 1000 citations,
- papers that introduced a durable concept even if newer or less cited,
- benchmark or system papers that reshaped evaluation or architecture choices.
Seed list by area
Perception
- PointNet
- PointNet++
- VoxelNet
- PointPillars
- SECOND
- PV-RCNN
- CenterPoint
- Lift, Splat, Shoot
- BEVFormer
- FB-BEV
- SurroundOcc
- DETR3D
- OccFormer
- BEVNeXt
Prediction
- DESIRE
- Social LSTM
- Trajectron / Trajectron++
- CoverNet
- MultiPath
- LaneGCN
- VectorNet
- TNT
- MTR
Planning / system
- ChauffeurNet
- Conditional Imitation Learning
- Learning by Cheating
- TransFuser
- TCP
- VAD
- VADv2
- UniAD
Evaluation / benchmarks / data
- KITTI
- nuScenes
- Waymo Open Dataset
- Argoverse / Argoverse 2
- CARLA
- NAVSIM
Ingest priorities
- Build dataset and benchmark pages first because they anchor later method comparisons.
- Ingest one canonical paper per cluster before adding near-duplicates.
- Maintain explicit notes on whether each paper supports modular, hybrid, or e2e interpretations.
Already seeded in batch 01
- End To End Learning For Self Driving Cars
- End To End Driving Via Conditional Imitation Learning
- Carla An Open Urban Driving Simulator
- Chauffeurnet Learning To Drive By Imitating The Best And Synthesizing The Worst
- Learning By Cheating
- Vectornet Encoding Hd Maps And Agent Dynamics From Vectorized Representation
- Learning Lane Graph Representations For Motion Forecasting
- Lift Splat Shoot Encoding Images From Arbitrary Camera Rigs By Implicitly Unprojecting To 3D
- Nuscenes A Multimodal Dataset For Autonomous Driving
- Bevformer Learning Birds Eye View Representation From Multi Camera Images Via Spatiotemporal Transformers
- Transfuser Imitation With Transformer Based Sensor Fusion For Autonomous Driving
- Planning Oriented Autonomous Driving
- Vad Vectorized Scene Representation For Efficient Autonomous Driving
Added in batch 02 (AutoVLA corpus — planning/VLA overlap)
- Textual Explanations For Self Driving — BDD-X dataset, explainability
- Talk2Car — language command grounding
- Simlingo — CARLA challenge winner, vision-only VLA
- Orion — Bench2Drive SOTA
- Emma — Waymo industry-scale model
- Alpamayo R1 — NVIDIA production VLA
- Wote Bev World Model — BEV trajectory verification
- Drivemoe — MoE for driving
- Think Twice Before Driving Towards Scalable Decoders For End To End Autonomous Driving — Cascaded decoder refinement, CARLA SOTA
- Bevformer V2 Adapting Modern Image Backbones To Birds Eye View Recognition Via Perspective Supervision — Perspective supervision for backbone-agnostic BEV perception
- Driveadapter Breaking The Coupling Barrier Of Perception And Planning In End To End Autonomous Driving — Decoupled perception-planning via adapter module (ICCV 2023)
- Fb Bev Bev Representation From Forward Backward View Transformations — Unified forward-backward view transformation for BEV (ICCV 2023)
- Vadv2 End To End Vectorized Autonomous Driving Via Probabilistic Planning — Probabilistic planning via action vocabulary, successor to VAD
- Navsim Data Driven Non Reactive Autonomous Vehicle Simulation — Non-reactive simulation benchmark bridging open-loop and closed-loop evaluation (NeurIPS 2024)