ESC

Research Timeline

Publications grouped by research direction across time.

VLA / Driving

23
2018
End-to-end Driving via Conditional Imitation Learning
2018 1227 cit.
Textual Explanations for Self-Driving Vehicles
2018 427 cit.
2019
Talk2Car: Taking Control of Your Self-Driving Car
2019 182 cit.
2023
DriveMLM: Aligning Multi-Modal LLMs with Behavioral Planning States
2023 241 cit.
GPT-Driver: Learning to Drive with GPT
2023 396 cit.
Reason2Drive Towards Interpretable And Chain Based Reasoning For Autonomous Driving
2023 107 cit.
2024
Covla Comprehensive Vision Language Action Dataset For Autonomous Driving
2024 30 cit.
DriveGPT4: Interpretable End-to-End Autonomous Driving via Large Language Model
2024 576 cit.
DriveLM: Driving with Graph Visual Question Answering
2024 448 cit.
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
2024 416 cit.
Lmdrive Closed Loop End To End Driving With Large Language Models
2024 294 cit.
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
2024 102 cit.
VLP: Vision Language Planning for Autonomous Driving
2024 155 cit.
2025
Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail
2025 42 cit.
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
2025 75 cit.
Autovala Vision Language Action Model For End To End Autonomous Driving
2025 110 cit.
DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving
2025 55 cit.
EMMA: End-to-End Multimodal Model for Autonomous Driving
2025 150 cit.
Opendrivevla Towards End To End Autonomous Driving With Large Vision Language Action Model
2025
Orion Holistic End To End Autonomous Driving By Vision Language Instructed Action Generation
2025 100 cit.
SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving
2025
SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment
2025 89 cit.
WoTE: End-to-End Driving with Online Trajectory Evaluation via BEV World Model
2025 81 cit.

End-to-End

25
2016
End to End Learning for Self-Driving Cars
2016 4537 cit.
2018
End-to-end Driving via Conditional Imitation Learning
2018 1227 cit.
2019
ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst
2019 844 cit.
Learning by Cheating
2019 632 cit.
2022
TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving
2022 600 cit.
2023
DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving
2023
Robocat A Self Improving Generalist Agent For Robotic Manipulation
2023
Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving
2023 180 cit.
2024
Covla Comprehensive Vision Language Action Dataset For Autonomous Driving
2024 30 cit.
GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation
2024 50 cit.
Hydra-MDP: End-to-End Multimodal Planning with Multi-Target Hydra-Distillation
2024 50 cit.
LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
2024
Lmdrive Closed Loop End To End Driving With Large Language Models
2024 294 cit.
Octo An Open Source Generalist Robot Policy
2024 400 cit.
RT-H: Action Hierarchies Using Language
2024
RoboFlamingo: Vision-Language Foundation Models as Effective Robot Imitators
2024 100 cit.
RoboVLMs: What Matters in Building Vision-Language-Action Models
2024 50 cit.
Scaling Cross Embodied Learning One Policy For Manipulation Navigation Locomotion And Aviation
2024 100 cit.
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
2024 102 cit.
Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation (GR-1)
2024 150 cit.
2025
DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving
2025 55 cit.
Orion Holistic End To End Autonomous Driving By Vision Language Instructed Action Generation
2025 100 cit.
SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment
2025 89 cit.
WoTE: End-to-End Driving with Online Trajectory Evaluation via BEV World Model
2025 81 cit.
2026
DrivoR: Driving on Registers
2026 3 cit.

Perception

33
2020
Lift Splat Shoot Encoding Images From Arbitrary Camera Rigs By Implicitly Unprojecting To 3D
2020 1510 cit.
2022
BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
2022 1826 cit.
2023
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision
2023
Fb Bev Bev Representation From Forward Backward View Transformations
2023 150 cit.
FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin
2023
OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction
2023 280 cit.
SurroundOcc: Multi-camera 3D Occupancy Prediction for Autonomous Driving
2023
Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving
2023 180 cit.
2024
Bevnext Reviving Dense Bev Frameworks For 3D Object Detection
2024 80 cit.
Drive-OccWorld: Driving in the Occupancy World
2024 49 cit.
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
2024 41 cit.
GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation
2024 20 cit.
GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction
2024 57 cit.
GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting
2024 47 cit.
Gaussianformer Scene As Gaussians For Vision Based 3D Semantic Occupancy Prediction
2024 128 cit.
Gaussianworld Gaussian World Model For Streaming 3D Occupancy Prediction
2024 59 cit.
OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving
2024 50 cit.
RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion
2024 15 cit.
SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction
2024 60 cit.
SparseOcc: Fully Sparse 3D Occupancy Prediction
2024 80 cit.
SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction
2024 50 cit.
VLP: Vision Language Planning for Autonomous Driving
2024 155 cit.
YOLOv10: Real-Time End-to-End Object Detection
2024 5988 cit.
2025
BEVDiffuser: Plug-and-Play Diffusion Model for BEV Denoising with Ground-Truth Guidance
2025 14 cit.
EMMA: End-to-End Multimodal Model for Autonomous Driving
2025 150 cit.
GaussRender: Learning 3D Occupancy with Gaussian Rendering
2025 13 cit.
GaussianFlowOcc: Sparse and Weakly Supervised Occupancy Estimation using Gaussian Splatting and Temporal Flow
2025 19 cit.
GaussianLSS: Toward Real-world BEV Perception with Depth Uncertainty via Gaussian Splatting
2025 18 cit.
Hermes A Unified Self Driving World Model For Simultaneous 3D Scene Understanding And Generation
2025 38 cit.
OccMamba: Semantic Occupancy Prediction with State Space Models
2025 32 cit.
S4-Driver: Scalable Self-Supervised Driving MLLM with Spatio-Temporal Visual Representation
2025 16 cit.
WoTE: End-to-End Driving with Online Trajectory Evaluation via BEV World Model
2025 81 cit.
2026
DrivoR: Driving on Registers
2026 3 cit.

Planning

39
2019
ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst
2019 844 cit.
2023
Drive as You Speak: Enabling Human-Like Interaction with Large Language Models in Autonomous Vehicles
2023
DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving
2023
DriveMLM: Aligning Multi-Modal LLMs with Behavioral Planning States
2023 241 cit.
GPT-Driver: Learning to Drive with GPT
2023 396 cit.
Languagempc Large Language Models As Decision Makers For Autonomous Driving
2023 100 cit.
Planning-oriented Autonomous Driving
2023 1201 cit.
Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving
2023 180 cit.
VAD: Vectorized Scene Representation for Efficient Autonomous Driving
2023 567 cit.
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
2023 450 cit.
2024
Agent-Driver: A Language Agent for Autonomous Driving
2024 140 cit.
Asyncdriver Asynchronous Large Language Model Enhanced Planner For Autonomous Driving
2024 41 cit.
Drive-OccWorld: Driving in the Occupancy World
2024 49 cit.
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
2024 416 cit.
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
2024
Hydra-MDP: End-to-End Multimodal Planning with Multi-Target Hydra-Distillation
2024 50 cit.
LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
2024 200 cit.
NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking
2024 100 cit.
Occworld Learning A 3D Occupancy World Model For Autonomous Driving
2024 198 cit.
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
2024 102 cit.
SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation
2024 181 cit.
Talk2Drive Towards Personalized Autonomous Driving With Large Language Models
2024 80 cit.
VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning
2024 140 cit.
VLP: Vision Language Planning for Autonomous Driving
2024 155 cit.
Vista A Generalizable Driving World Model With High Fidelity And Versatile Controllability
2024
2025
Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail
2025 42 cit.
BridgeAD: Bridging Past and Future End-to-End Autonomous Driving with Historical Prediction
2025 22 cit.
CarPlanner: Consistent Auto-regressive RL Planner for Autonomous Driving
2025
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving
2025
DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving
2025 55 cit.
Drivetransformer Unified Transformer For Scalable End To End Autonomous Driving
2025 91 cit.
EMMA: End-to-End Multimodal Model for Autonomous Driving
2025 150 cit.
GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectory Generation
2025
Momad Momentum Aware Planning In End To End Autonomous Driving
2025 60 cit.
Orion Holistic End To End Autonomous Driving By Vision Language Instructed Action Generation
2025 100 cit.
S4-Driver: Scalable Self-Supervised Driving MLLM with Spatio-Temporal Visual Representation
2025 16 cit.
WoTE: End-to-End Driving with Online Trajectory Evaluation via BEV World Model
2025 81 cit.
2026
DrivoR: Driving on Registers
2026 3 cit.
SparseDriveV2: Scoring is All You Need for End-to-End Autonomous Driving
2026

Foundation Models

45
2017
Attention Is All You Need
2017 171783 cit.
2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019 112487 cit.
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
2019 2100 cit.
2020
Language Models are Few-Shot Learners
2020 56138 cit.
Scaling Laws for Neural Language Models
2020 7436 cit.
2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2021 91128 cit.
Emerging Properties in Self-Supervised Vision Transformers (DINO)
2021 10798 cit.
Learning Transferable Visual Models From Natural Language Supervision
2021 57987 cit.
On The Opportunities And Risks Of Foundation Models
2021 6057 cit.
Prefix Tuning Optimizing Continuous Prompts For Generation
2021 6753 cit.
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
2021 44596 cit.
2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
2022 8650 cit.
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
2022 16871 cit.
Flamingo: a Visual Language Model for Few-Shot Learning
2022 7824 cit.
High-Resolution Image Synthesis with Latent Diffusion Models
2022 31987 cit.
Lora Low Rank Adaptation Of Large Language Models
2022 29175 cit.
Palm Scaling Language Modeling With Pathways
2022 9058 cit.
RT-1: Robotics Transformer for Real-World Control at Scale
2022 2019 cit.
Scaling Instruction-Finetuned Language Models (Flan-PaLM / Flan-T5)
2022 3987 cit.
Training Compute-Optimal Large Language Models
2022 4116 cit.
Training Language Models to Follow Instructions with Human Feedback
2022 24355 cit.
2023
Direct Preference Optimization Your Language Model Is Secretly A Reward Model
2023 8520 cit.
GPT-4 Technical Report
2023 26297 cit.
Llama 2: Open Foundation and Fine-Tuned Chat Models
2023 22411 cit.
Mistral 7B
2023 4052 cit.
Qlora Efficient Finetuning Of Quantized Language Models
2023 5975 cit.
Robocat A Self Improving Generalist Agent For Robotic Manipulation
2023
Segment Anything
2023 19692 cit.
Toolformer: Language Models Can Teach Themselves to Use Tools
2023 3994 cit.
Visual Instruction Tuning (LLaVA)
2023 13533 cit.
2024
GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation
2024 50 cit.
LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
2024
LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
2024 200 cit.
Mixtral Of Experts
2024 3089 cit.
Octo An Open Source Generalist Robot Policy
2024 400 cit.
RT-H: Action Hierarchies Using Language
2024
RoboFlamingo: Vision-Language Foundation Models as Effective Robot Imitators
2024 100 cit.
RoboVLMs: What Matters in Building Vision-Language-Action Models
2024 50 cit.
SAM 2: Segment Anything in Images and Videos
2024 3925 cit.
Scaling Cross Embodied Learning One Policy For Manipulation Navigation Locomotion And Aviation
2024 100 cit.
Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation (GR-1)
2024 150 cit.
2025
Deepseek R1 Incentivizing Reasoning Capability In Llms Via Reinforcement Learning
2025 1920 cit.
Gemini 25 Pushing The Frontier With Advanced Reasoning Multimodality Long Context And Next Generation Agentic Capabilities
2025 1943 cit.
Gemma 3 Technical Report
2025 1120 cit.
Qwen3 Technical Report
2025 3706 cit.

Robotics

39
2021
On The Opportunities And Risks Of Foundation Models
2021 6057 cit.
2022
A Generalist Agent
2022 1018 cit.
RT-1: Robotics Transformer for Real-World Control at Scale
2022 2019 cit.
2023
PaLM-E: An Embodied Multimodal Language Model
2023 2491 cit.
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
2023 2686 cit.
Robocat A Self Improving Generalist Agent For Robotic Manipulation
2023
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
2023 450 cit.
2024
3D-VLA: A 3D Vision-Language-Action Generative World Model
2024 140 cit.
Autort Embodied Foundation Models For Large Scale Orchestration Of Robotic Agents
2024 110 cit.
GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation
2024 50 cit.
Hpt Scaling Proprioceptive Visual Learning With Heterogeneous Pre Trained Transformers
2024 134 cit.
LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
2024
Octo An Open Source Generalist Robot Policy
2024 400 cit.
OpenVLA: An Open-Source Vision-Language-Action Model
2024 1883 cit.
RT-H: Action Hierarchies Using Language
2024
RoboFlamingo: Vision-Language Foundation Models as Effective Robot Imitators
2024 100 cit.
RoboVLMs: What Matters in Building Vision-Language-Action Models
2024 50 cit.
Robotic Control via Embodied Chain-of-Thought Reasoning
2024
Scaling Cross Embodied Learning One Policy For Manipulation Navigation Locomotion And Aviation
2024 100 cit.
Unisim Learning Interactive Real World Simulators
2024 200 cit.
Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation (GR-1)
2024 150 cit.
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
2024 139 cit.
pi0: A Vision-Language-Action Flow Model for General Robot Control
2024 1381 cit.
2025
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control
2025 140 cit.
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy
2025 54 cit.
FAST: Efficient Action Tokenization for Vision-Language-Action Models
2025 353 cit.
Gemini Robotics Bringing Ai Into The Physical World
2025
Groot N1 An Open Foundation Model For Generalist Humanoid Robots
2025 602 cit.
Helix: A Vision-Language-Action Model for Generalist Humanoid Control
2025
Knowledge Insulating Vision-Language-Action Models
2025
OpenVLA-OFT: Optimizing Speed and Success for VLA Fine-Tuning
2025 364 cit.
RDT-1B: A Diffusion Foundation Model for Bimanual Manipulation
2025
Self-Improving Embodied Foundation Models
2025 18 cit.
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
2025 224 cit.
SpatialVLA: Exploring Spatial Representations for VLA Models
2025 292 cit.
Towards Embodiment Scaling Laws in Robot Locomotion
2025 10 cit.
UniAct: Universal Actions for Enhanced Embodied Foundation Models
2025 60 cit.
pi*0.6: A VLA That Learns From Experience
2025 93 cit.
pi0.5: A Vision-Language-Action Model with Open-World Generalization
2025 681 cit.

Ilya Top 30

29
1993
Keeping Neural Networks Simple by Minimizing the Description Length of the Weights
1993 1279 cit.
2004
A Tutorial Introduction to the Minimum Description Length Principle
2004 381 cit.
2008
Machine Super Intelligence
2008 63 cit.
2011
The First Law of Complexodynamics
2011
2012
ImageNet Classification with Deep Convolutional Neural Networks
2012 127906 cit.
2014
Neural Machine Translation by Jointly Learning to Align and Translate
2014 29150 cit.
Neural Turing Machines
2014 2505 cit.
Quantifying The Rise And Fall Of Complexity In Closed Systems The Coffee Automaton
2014 26 cit.
Recurrent Neural Network Regularization
2014 2986 cit.
2015
CS231n: Deep Learning for Computer Vision
2015
Deep Residual Learning for Image Recognition
2015 224592 cit.
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
2015 3131 cit.
Multi Scale Context Aggregation By Dilated Convolutions
2015 9295 cit.
Pointer Networks
2015 3380 cit.
The Unreasonable Effectiveness of Recurrent Neural Networks
2015
Understanding LSTM Networks
2015
2016
Identity Mappings in Deep Residual Networks
2016 11060 cit.
Order Matters Sequence To Sequence For Sets
2016 1018 cit.
Variational Lossy Autoencoder
2016 700 cit.
2017
A Simple Neural Network Module for Relational Reasoning
2017 1679 cit.
Attention Is All You Need
2017 171783 cit.
Kolmogorov Complexity and Algorithmic Randomness
2017 106 cit.
Neural Message Passing For Quantum Chemistry
2017 8754 cit.
2018
Relational Recurrent Neural Networks
2018 220 cit.
2019
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
2019 2100 cit.
2020
Denoising Diffusion Probabilistic Models
2020 28939 cit.
Scaling Laws for Neural Language Models
2020 7436 cit.
2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2021 91128 cit.
2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
2022 16871 cit.