Tags

239 tags across the wiki

Pages tagged imitation-learning

ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst

📄 **[Read on arXiv](https://arxiv.org/abs/1812.03079)** Bansal, Krizhevsky, Ogale (Waymo Research), RSS, 2019. - [Paper](https://arxiv.org/abs/1812.03079) ChauffeurNet is Waymo's mid-level imitation learning system that…

Covla Comprehensive Vision Language Action Dataset For Autonomous Driving

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2408.10845)** Autonomous driving systems face the "long tail" problem -- handling countless rare and complex driving scenarios beyond common situations. While traditional rule-b…

DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2308.00398)** DriveAdapter (Jia et al., ICCV 2023) identifies and addresses a fundamental structural problem in end-to-end autonomous driving: the tight coupling between percept…

End-to-end Driving via Conditional Imitation Learning

source-summary

📄 **[Read on arXiv](https://arxiv.org/abs/1710.02410)** This paper introduces conditional imitation learning for end-to-end autonomous driving, where a neural network policy is conditioned on a discrete high-level comma…

GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2410.06158)** GR-2 is a generalist robot manipulation agent from ByteDance Research that leverages large-scale video-language pretraining to build a world model for robotic cont…

Hydra-MDP: End-to-End Multimodal Planning with Multi-Target Hydra-Distillation

paper

:page_facing_up: **[Read on arXiv](https://arxiv.org/abs/2406.06978)** Hydra-MDP addresses a fundamental limitation of imitation learning for autonomous driving: standard behavior cloning learns only to mimic human demo…

Learning by Cheating

source-summary

📄 **[Read on arXiv](https://arxiv.org/abs/1912.12294)** Learning by Cheating introduces a two-stage training paradigm for end-to-end autonomous driving that has become one of the most influential design patterns in the…

LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2406.11815)** LLARVA addresses the "embodiment gap" between large multimodal models (LMMs) and robotic control. While VLMs trained on internet-scale data excel at visual underst…

Octo An Open Source Generalist Robot Policy

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2405.12213)** Octo is a transformer-based generalist robot policy trained on 800,000 robot trajectories from the Open X-Embodiment dataset, spanning 25 diverse datasets and mult…

Robocat A Self Improving Generalist Agent For Robotic Manipulation

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2306.11706)** RoboCat, developed by Google DeepMind, is a multi-embodiment, multi-task generalist agent for robotic manipulation built on a transformer-based architecture. The p…

RoboFlamingo: Vision-Language Foundation Models as Effective Robot Imitators

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2311.01378)** RoboFlamingo addresses the question of whether publicly available vision-language models (VLMs) can serve as effective backbones for robot imitation learning, with…

RoboVLMs: What Matters in Building Vision-Language-Action Models

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2412.14058)** RoboVLMs is a large-scale empirical study from Tsinghua University, ByteDance Research, and collaborators that systematically investigates the design principles fo…

RT-H: Action Hierarchies Using Language

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2403.01823)** RT-H (Robot Transformer with Action Hierarchies) introduces a hierarchical approach to multi-task robot control that uses natural language as an intermediate repre…

Scaling Cross Embodied Learning One Policy For Manipulation Navigation Locomotion And Aviation

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2408.11812)** CrossFormer addresses a fundamental limitation in robot learning: the requirement for specialized policies for each robotic platform. Traditional approaches train…

Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2305.06242)** Think Twice (Jia et al., 2023) addresses a fundamental imbalance in end-to-end autonomous driving: while the community has invested heavily in sophisticated encode…

Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation (GR-1)

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2312.13139)** GR-1 addresses a fundamental bottleneck in robot learning: the scarcity of diverse, high-quality robot demonstration data. The key insight is that robot trajectori…