Tags
239 tags across the wiki
Pages tagged e2e
📄 **[Read on arXiv](https://arxiv.org/abs/2505.16278)** DriveMoE introduces a dual-level Mixture-of-Experts (MoE) architecture to driving Vision-Language-Action models. The key innovation is applying expert specializati…
📄 **[Read on arXiv](https://arxiv.org/abs/2601.05083)** DrivoR is a full-transformer autonomous driving architecture that uses camera-aware register tokens to compress multi-camera Vision Transformer features into a com…
📄 **[Read on arXiv](https://arxiv.org/abs/1604.07316)** This paper from NVIDIA, commonly known as "DAVE-2" or the "NVIDIA end-to-end driving paper," demonstrates that a single convolutional neural network can learn to m…
"End-to-end" is one of the most overloaded terms in autonomous driving. This page defines a clear taxonomy, traces the evolution of E2E systems, and maps the current landscape. The literature uses "end-to-end" to mean a…
📄 **[Read on arXiv](https://arxiv.org/abs/1710.02410)** This paper introduces conditional imitation learning for end-to-end autonomous driving, where a neural network policy is conditioned on a discrete high-level comma…
📄 **[Read on arXiv](https://arxiv.org/abs/2312.07488)** LMDrive is the first system to demonstrate and benchmark LLM-based driving in closed-loop simulation, introducing the LangAuto benchmark with ~64K instruction-foll…
Stream-specific open questions for the end-to-end autonomous driving pillar. See wiki/queries/open-questions for the full tree across all streams. 1. **Unified vs. decoupled VLA:** Will EMMA's "everything as language to…
📄 **[Read on arXiv](https://arxiv.org/abs/2503.19755)** ORION bridges the reasoning-action gap in driving VLAs through a three-component architecture consisting of QT-Former (visual encoding), an LLM reasoning core, and…
📄 **[Read on arXiv](https://arxiv.org/abs/2410.22313)** Two dominant paradigms exist in autonomous driving: large vision-language models (LVLMs) with strong reasoning but poor trajectory precision, and end-to-end (E2E)…
📄 **[Read on arXiv](https://arxiv.org/abs/2503.09594)** Many driving VLM efforts improve language understanding (VQA, scene descriptions) but sacrifice actual driving performance. A model can correctly answer questions…
📄 **[Read on arXiv](https://arxiv.org/abs/2205.15997)** TransFuser (Chitta et al., 2022) is a foundational paper for transformer-based sensor fusion in end-to-end autonomous driving. The key problem it addresses is how…
📄 **[Read on arXiv](https://arxiv.org/abs/2504.01941)** End-to-end driving models typically output a single trajectory and trust it entirely, with no mechanism to evaluate whether the predicted path is safe before execu…