ESC

Tags

239 tags across the wiki

paper 114 autonomous-driving 92 foundation-model 55 transformer 53 vla 49 planning 42 robotics 41 computer-vision 36 perception 32 ilya-30 29 multimodal 29 nlp 26 end-to-end 24 language-modeling 24 llm 17 reasoning 17 imitation-learning 16 3d-occupancy 15 vlm 15 bev 14 diffusion 13 e2e 12 reinforcement-learning 12 world-model 12 chain-of-thought 10 benchmark 9 scaling 9 cross-embodiment 7 driving 6 gaussian-splatting 6 generative-models 6 image-classification 6 information-theory 6 questions 6 self-supervised 6 sources 6 alignment 5 attention 5 cnn 5 foundation 5 knowledge-distillation 5 language-model 5 prediction 5 simulation 5 evaluation 4 image-generation 4 instruction-tuning 4 mixture-of-experts 4 rnn 4 sequence-to-sequence 4 sparse-representation 4 video-prediction 4 explainability 3 flow-matching 3 lstm 3 map 3 occupancy 3 open-source 3 semantic-segmentation 3 sequence-modeling 3 trajectory-prediction 3 vectorized-representation 3 3d-detection 2 3d-perception 2 3d-reconstruction 2 action-representation 2 autonomy 2 autoregressive 2 bimanual 2 closed-loop 2 complexity-theory 2 dataset 2 deployment 2 distributed-training 2 efficient-inference 2 embodied 2 fine-tuning 2 foundation-models 2 foundational 2 gaussian-representation 2 generation 2 generative 2 human-interaction 2 humanoid 2 manipulation 2 memory-augmented-networks 2 ml 2 multi-camera 2 multilingual 2 object-detection 2 parameter-efficient-fine-tuning 2 prompting 2 real-time 2 regularization 2 relational-reasoning 2 residual-networks 2 rlhf 2 scaling-laws 2 segmentation 2 self-improvement 2 self-supervised-learning 2 state-space 2 systems 2 thermodynamics 2 vision-language-model 2 vision-transformer 2 visual-question-answering 2 zero-shot 2 3d 1 3d-scene 1 3d-semantic-occupancy 1 agenda 1 agentic 1 agi 1 algorithmic-information-theory 1 algorithmic-randomness 1 asynchronous 1 attention-mechanism 1 batch 1 bayesian-inference 1 behavior-forecasting 1 camera-fusion 1 classifier-guidance 1 combinatorial-optimization 1 comparison 1 compression 1 computability 1 concept 1 contrastive-learning 1 control 1 convolutional-neural-networks 1 corpus 1 course 1 data-collection 1 decoupled 1 deep-learning 1 denoising 1 depth-estimation 1 dexterous-manipulation 1 differentiable-programming 1 diffusion-policy 1 diffusion-transformer 1 dilated-convolutions 1 dropout 1 efficient 1 embodied-ai 1 embodiment 1 emergent-abilities 1 end-to-end-learning 1 evaluation-metric 1 few-shot 1 few-shot-learning 1 foundations 1 frontend 1 gaussian 1 gaussian-rendering 1 generalist-agent 1 generalization 1 gpu-training 1 graph-neural-networks 1 grounding 1 grpo 1 hierarchical 1 high-frequency-control 1 hosting 1 ilya 1 image-captioning 1 image-text-retrieval 1 in-context-learning 1 inductive-bias 1 intelligence-measurement 1 interactive-annotation 1 interactive-segmentation 1 knowledge-preservation 1 kolmogorov-complexity 1 lanegcn 1 locomotion 1 machine-translation 1 mamba 1 mdl 1 message-passing 1 minimum-description-length 1 model-parallelism 1 model-predictive-control 1 model-selection 1 modular 1 molecular-property-prediction 1 multi-embodiment 1 multi-task 1 natural-language 1 neural-radiance-fields 1 neuro-symbolic 1 obsidian 1 open-world 1 optimization 1 orchestration 1 parallel-architecture 1 parameter-efficient 1 permutation-invariance 1 personalization 1 physical-ai 1 pipeline-parallelism 1 pointer-mechanism 1 privileged-supervision 1 probabilistic-planning 1 proprioception 1 quantization 1 queue 1 radar 1 recurrent-neural-networks 1 representation-learning 1 scene-understanding 1 search 1 seminal 1 sensor-fusion 1 set-modeling 1 siamese-networks 1 simulator 1 source 1 sparse-models 1 spatial-reasoning 1 speech-recognition 1 survey 1 synthesis 1 taxonomy 1 temporal 1 temporal-modeling 1 thesis 1 tokenization 1 tool-use 1 training 1 uniad 1 unified-stack 1 vanishing-gradients 1 variational-autoencoders 1 video-generation 1 video-understanding 1 visual-traces 1 vit 1

Pages tagged end-to-end

Agent-Driver: A Language Agent for Autonomous Driving
source-summary

📄 **[Read on arXiv](https://arxiv.org/abs/2311.10813)** Agent-Driver reframes autonomous driving as a cognitive agent problem, positioning a large language model as the central reasoning and planning engine rather than…

Autovala Vision Language Action Model For End To End Autonomous Driving
paper

📄 **[Read on arXiv](https://arxiv.org/abs/2506.13757)** AutoVLA presents a unified approach to autonomous driving that integrates vision, language understanding, and action generation within a single autoregressive mode…

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision
paper

📄 **[Read on arXiv](https://arxiv.org/abs/2211.10439)** BEVFormer v2 addresses a critical bottleneck in camera-based 3D perception for autonomous driving: the inability to leverage powerful modern 2D image backbones (e.…

BridgeAD: Bridging Past and Future End-to-End Autonomous Driving with Historical Prediction
source-summary

📄 **[Read on arXiv](https://arxiv.org/abs/2503.14182)** BridgeAD tackles a critical limitation in end-to-end autonomous driving: the ineffective utilization of historical temporal information. Current systems either agg…

Covla Comprehensive Vision Language Action Dataset For Autonomous Driving
paper

📄 **[Read on arXiv](https://arxiv.org/abs/2408.10845)** Autonomous driving systems face the "long tail" problem -- handling countless rare and complex driving scenarios beyond common situations. While traditional rule-b…

DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving
source-summary

[Read on arXiv](https://arxiv.org/abs/2411.15139) DiffusionDrive (HUST/Horizon Robotics, CVPR 2025 Highlight) proposes a truncated diffusion model for end-to-end autonomous driving that achieves real-time inference whil…

DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving
paper

📄 **[Read on arXiv](https://arxiv.org/abs/2308.00398)** DriveAdapter (Jia et al., ICCV 2023) identifies and addresses a fundamental structural problem in end-to-end autonomous driving: the tight coupling between percept…

Drivetransformer Unified Transformer For Scalable End To End Autonomous Driving
paper

📄 **[Read on arXiv](https://arxiv.org/abs/2503.07656)** DriveTransformer represents a fundamental departure from existing end-to-end autonomous driving approaches. Rather than following sequential perception-prediction-…

GenAD: Generative End-to-End Autonomous Driving
source-summary

[Read on arXiv](https://arxiv.org/abs/2402.11502) GenAD (ECCV 2024) reframes end-to-end autonomous driving as a generative modeling problem, simultaneously generating future trajectories for all traffic participants rat…

Hydra-MDP: End-to-End Multimodal Planning with Multi-Target Hydra-Distillation
paper

:page_facing_up: **[Read on arXiv](https://arxiv.org/abs/2406.06978)** Hydra-MDP addresses a fundamental limitation of imitation learning for autonomous driving: standard behavior cloning learns only to mimic human demo…

Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?
source-summary

[Read on arXiv](https://arxiv.org/abs/2312.03031) This paper (CVPR 2024, NVIDIA / Nanjing University) delivers a "wake-up call" to the autonomous driving research community by demonstrating that simple baselines using o…

LAW: Enhancing End-to-End Autonomous Driving with Latent World Model
source-summary

[Read on arXiv](https://arxiv.org/abs/2406.08481) LAW (CASIA, ICLR 2025) introduces a self-supervised latent world model that enhances end-to-end autonomous driving by learning to predict future latent states of the dri…

Momad Momentum Aware Planning In End To End Autonomous Driving
paper

📄 **[Read on arXiv](https://arxiv.org/abs/2503.03125)** End-to-end autonomous driving systems suffer from a critical limitation: temporal inconsistency. Current systems operate in a "one-shot" manner, making trajectory…

NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking
paper

:page_facing_up: **[Read on arXiv](https://arxiv.org/abs/2406.15349)** Autonomous vehicle evaluation has long been split between two unsatisfying extremes: open-loop metrics that replay logged trajectories and compare p…

OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction
paper

📄 **[Read on arXiv](https://arxiv.org/abs/2304.05316)** Vision-based 3D semantic occupancy prediction aims to predict the semantic class and occupancy status of every voxel in a 3D volume surrounding the ego vehicle, us…

Opendrivevla Towards End To End Autonomous Driving With Large Vision Language Action Model
paper

📄 **[Read on arXiv](https://arxiv.org/abs/2503.23463)** OpenDriveVLA introduces a Vision-Language Action model specifically designed for end-to-end autonomous driving. Unlike previous approaches that use VLMs as supplem…

PARA-Drive: Parallelized Architecture for Real-time Autonomous Driving
source-summary

[Read on CVF Open Access](https://openaccess.thecvf.com/content/CVPR2024/html/Weng_PARA-Drive_Parallelized_Architecture_for_Real-time_Autonomous_Driving_CVPR_2024_paper.html) PARA-Drive (NVIDIA Research / USC / Stanford…

SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving
source-summary

[Read on arXiv](https://arxiv.org/abs/2505.16805) SOLVE proposes a synergistic framework that combines a Vision-Language Model (VLM) reasoning branch (SOLVE-VLM) with an end-to-end (E2E) driving network (SOLVE-E2E), con…

SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation
source-summary

:page_facing_up: **[Read on arXiv](https://arxiv.org/abs/2405.19620)** SparseDrive by Sun et al. (ICRA 2025) proposes a paradigm shift from dense BEV-based end-to-end driving to fully sparse scene representations. The c…

SparseDriveV2: Scoring is All You Need for End-to-End Autonomous Driving
source-summary

:page_facing_up: **[Read on arXiv](https://arxiv.org/abs/2603.29163)** SparseDriveV2 by Sun et al. (2026) pushes the performance boundary of scoring-based trajectory planning by demonstrating that "scoring is all you ne…

Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving
paper

📄 **[Read on arXiv](https://arxiv.org/abs/2305.06242)** Think Twice (Jia et al., 2023) addresses a fundamental imbalance in end-to-end autonomous driving: while the community has invested heavily in sophisticated encode…

Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation (GR-1)
paper

📄 **[Read on arXiv](https://arxiv.org/abs/2312.13139)** GR-1 addresses a fundamental bottleneck in robot learning: the scarcity of diverse, high-quality robot demonstration data. The key insight is that robot trajectori…

VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning
paper

📄 **[Read on arXiv](https://arxiv.org/abs/2402.13243)** VADv2 by Chen et al. (2024) is the successor to VAD, addressing a fundamental limitation of deterministic planners in autonomous driving: they output a single traj…

YOLOv10: Real-Time End-to-End Object Detection
paper

📄 **[Read on arXiv](https://arxiv.org/abs/2405.14458)** Real-time object detection is critical infrastructure for autonomous driving, robotics, and augmented reality, yet the dominant YOLO family has long relied on non-…