Open Questions: LLM Reasoning for Autonomy
Stream-specific open questions for LLM reasoning applied to driving and robotics. See Open Questions for the full tree across all streams.
Language role in autonomy
-
Language at maturity: As driving VLAs improve, does language remain as intermediate reasoning (Senna, ORION), get absorbed into dense embeddings, or evolve into something else? DiMA's strategy of distilling then discarding the LLM suggests language may be a training-time tool, not a runtime necessity.
-
Reasoning vs. planning: "LLMs Can't Plan" (ICML 2024, 200+ citations) argues LLMs should reason, not plan — they need external model-based verifiers. Is this distinction fundamental, or will larger models with better training overcome planning limitations?
-
Chain-of-thought overhead: ECoT gets +28% success from embodied CoT, but at inference cost. Tree of Thoughts showed that deliberate multi-path exploration dramatically improves complex problem solving. Is CoT essential for safety-critical decisions, or can its benefits be distilled into faster models (as DeepSeek-R1 distills to 1.5B)?
-
Structured vs. free-form reasoning: DriveLM's Graph VQA imposes structure (perception→prediction→planning). Reason2Drive uses chain-based decomposition. Free-form CoT is more flexible. Which level of structure optimizes the reasoning-accuracy trade-off for driving?
RL and reasoning emergence
-
RL-emergent reasoning for driving: DeepSeek-R1 showed CoT emerges from RL with rule-based rewards in math/code. Alpamayo-R1 applies RL to driving reasoning. Can driving produce sufficiently clean reward signals for RL-emergent reasoning, or is the reward specification problem fundamentally harder than math verification?
-
Reward function design: Math and code have verifiable correctness signals. Driving has safety (no collision), comfort (smooth ride), progress (reach goal), and efficiency (minimize time). Are these sufficient for GRPO-style RL, or does driving need learned reward models?
Cognitive architecture
-
Dual-process reasoning: AutoVLA dynamically switches between fast (direct action) and slow (CoT) reasoning. How should the system decide when to think deeply? Is complexity estimation itself a learnable skill?
-
LLM as interface vs. core: Drive as You Speak uses LLMs as passenger interfaces. Should production AV stacks separate the "interaction LLM" from the "planning LLM/VLA," or can a single model serve both roles safely?
-
Agent frameworks at scale: Agent-Driver established the LLM-as-agent framework with tool use and memory. AsyncDriver decouples LLM reasoning from real-time planning via async updates. Is the agent framework (tools + memory + reasoning) the right paradigm for autonomous driving, or is it too slow for safety-critical real-time control?
Partially answered
- Q2 (Reasoning vs. planning): Evidence from "LLMs Can't Plan" and the success of VLA + world-model-verifier architectures (WoTE) supports the distinction. But DriveGPT's scaling laws suggest that with enough data, autoregressive prediction may subsume explicit planning.
- Q3 (CoT overhead): DeepSeek-R1's distillation to 1.5B models shows reasoning can be compressed. DiMA's distill-and-discard approach is the driving analog.
- Q7 (Dual-process): AutoVLA's dual-process approach is promising. Qwen3's unified thinking mode (same model dynamically chooses deep vs. quick thinking) may be the foundation model analog.
Key papers for this stream
| Paper | Relevance |
|---|---|
| Llms Cant Plan But Can Help Planning In Llm Modulo Frameworks | LLMs should reason, not plan |
| Chain Of Thought Prompting Elicits Reasoning In Large Language Models | Original CoT for reasoning |
| Deepseek R1 Incentivizing Reasoning Capability In Llms Via Reinforcement Learning | RL-emergent reasoning |
| Tree Of Thoughts Deliberate Problem Solving With Large Language Models | Multi-path deliberate reasoning |
| Ecot Embodied Chain Of Thought Reasoning For Vision Language Action Models | Embodied CoT for VLAs |
| Drivelm Driving With Graph Visual Question Answering | Structured graph reasoning |
| A Language Agent For Autonomous Driving | LLM-as-agent for driving |
| Senna Bridging Large Vision Language Models And End To End Autonomous Driving | Language as intermediate reasoning |
| Dima Distilling Multi Modal Large Language Models For Autonomous Driving | Distill and discard LLM |
| Autovala Vision Language Action Model For End To End Autonomous Driving | Dual-process adaptive reasoning |