Papers | ML Systems Wiki

Qwen3 Technical Report

2025 arXiv 3706

📄 **[Read on arXiv](https://arxiv.org/abs/2505.09388)** Qwen3, developed by the Qwen team at Alibaba, represents a major step forward in open-weight language models by offering a comprehensive family spanning both dense…

nlp language-modeling transformer mixture-of-experts +4

Orion Holistic End To End Autonomous Driving By Vision Language Instructed Action Generation

2025 arxiv 100

📄 **[Read on arXiv](https://arxiv.org/abs/2503.19755)** ORION bridges the reasoning-action gap in driving VLAs through a three-component architecture consisting of QT-Former (visual encoding), an LLM reasoning core, and…

paper autonomous-driving vla vlm +3

Gemini Robotics Bringing Ai Into The Physical World

2025 arXiv

📄 **[Read on arXiv](https://arxiv.org/abs/2503.20020)** Gemini Robotics introduces a family of AI models built on Gemini 2.0 designed to extend advanced multimodal capabilities into physical robotics. The work addresses…

robotics foundation-model multimodal reasoning

Gemini 25 Pushing The Frontier With Advanced Reasoning Multimodality Long Context And Next Generation Agentic Capabilities

2025 arXiv 1943

📄 **[Read on arXiv](https://arxiv.org/abs/2507.06261)** Gemini 2.5 is Google's frontier multimodal model family, built on a sparse Mixture-of-Experts (MoE) Transformer architecture. It represents a major advance in reas…

nlp multimodal foundation-model transformer +5

Deepseek R1 Incentivizing Reasoning Capability In Llms Via Reinforcement Learning

2025 arXiv 1920

📄 **[Read on arXiv](https://arxiv.org/abs/2501.12948)** DeepSeek-R1 demonstrates that sophisticated reasoning capabilities -- including self-verification, reflection, and extended chain-of-thought -- can emerge in large…

nlp reinforcement-learning language-modeling reasoning +4

Autovala Vision Language Action Model For End To End Autonomous Driving

2025 arXiv 110

📄 **[Read on arXiv](https://arxiv.org/abs/2506.13757)** AutoVLA presents a unified approach to autonomous driving that integrates vision, language understanding, and action generation within a single autoregressive mode…

autonomous-driving vla reinforcement-learning end-to-end +1

AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

2025 arXiv 75

Bo Jiang, Shaoyu Chen, Qian Zhang, Wenyu Liu, Xinggang Wang, arXiv, 2025. 📄 **[Read on arXiv](https://arxiv.org/abs/2503.07608)** AlphaDrive is the first application of GRPO (Group Relative Policy Optimization) reinforc…

paper autonomous-driving vla vlm +3

Robotic Control via Embodied Chain-of-Thought Reasoning

2024 arXiv

[Read on arXiv](https://arxiv.org/abs/2407.08693) ECoT (UC Berkeley / Stanford / University of Warsaw, 2024) introduces Embodied Chain-of-Thought reasoning for Vision-Language-Action (VLA) models, demonstrating that gen…

paper robotics vla chain-of-thought +1

LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks

2024 ICML 2024 Spotlight 200

📄 **[Read on arXiv](https://arxiv.org/abs/2402.01817)** This paper by Subbarao Kambhampati and colleagues at Arizona State University addresses one of the most important questions in modern AI: can large language models…

nlp planning reasoning llm +2

DriveLM: Driving with Graph Visual Question Answering

2024 ECCV 448

📄 **[Read on arXiv](https://arxiv.org/abs/2312.14150)** DriveLM formalizes driving reasoning as Graph Visual Question Answering (GVQA), where QA pairs are connected via logical dependencies forming a reasoning graph tha…

paper autonomous-driving vlm reasoning +2

Agent-Driver: A Language Agent for Autonomous Driving

2024 COLM 2024 140

📄 **[Read on arXiv](https://arxiv.org/abs/2311.10813)** Agent-Driver reframes autonomous driving as a cognitive agent problem, positioning a large language model as the central reasoning and planning engine rather than…

paper autonomous-driving llm planning +3

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

2023 NeurIPS 2023 3561

📄 **[Read on arXiv](https://arxiv.org/abs/2305.10601)** Language models are typically used in a left-to-right token-generation mode, which limits their ability to explore alternative reasoning paths or backtrack from mi…

paper nlp reasoning language-modeling +4

Reason2Drive Towards Interpretable And Chain Based Reasoning For Autonomous Driving

2023 ECCV 107

📄 **[Read on arXiv](https://arxiv.org/abs/2312.03661)** Reason2Drive provides the largest reasoning chain dataset for driving (>600K video-text pairs from nuScenes, Waymo, and ONCE) and introduces an aggregated evaluati…

paper autonomous-driving vla reasoning +2

ReAct: Synergizing Reasoning and Acting in Language Models

2023 ICLR 2023 8533

📄 **[Read on arXiv](https://arxiv.org/abs/2210.03629)** Large language models had demonstrated two powerful capabilities in isolation: chain-of-thought reasoning for multi-step problem solving, and action generation for…

paper nlp reasoning language-modeling +3

GPT-Driver: Learning to Drive with GPT

2023 NeurIPS FMDM Workshop 396

📄 **[Read on arXiv](https://arxiv.org/abs/2310.01415)** GPT-Driver reformulates autonomous driving motion planning as a language modeling problem. Scene context (object positions, velocities, lane geometry) and ego vehi…

paper autonomous-driving vla llm +2

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

2022 NeurIPS 2022 16871

📄 **[Read on arXiv](https://arxiv.org/abs/2201.11903)** Wei et al., arXiv 2201.11903, 2022 (NeurIPS 2022). - [Paper](https://arxiv.org/abs/2201.11903) Chain-of-thought (CoT) prompting demonstrates that including interme…

paper ilya-30 llm prompting +2