ESC

Open Questions: LLM Reasoning for Autonomy

Stream-specific open questions for LLM reasoning applied to driving and robotics. See Open Questions for the full tree across all streams.

Language role in autonomy

  1. Language at maturity: As driving VLAs improve, does language remain as intermediate reasoning (Senna, ORION), get absorbed into dense embeddings, or evolve into something else? DiMA's strategy of distilling then discarding the LLM suggests language may be a training-time tool, not a runtime necessity.

  2. Reasoning vs. planning: "LLMs Can't Plan" (ICML 2024, 200+ citations) argues LLMs should reason, not plan — they need external model-based verifiers. Is this distinction fundamental, or will larger models with better training overcome planning limitations?

  3. Chain-of-thought overhead: ECoT gets +28% success from embodied CoT, but at inference cost. Tree of Thoughts showed that deliberate multi-path exploration dramatically improves complex problem solving. Is CoT essential for safety-critical decisions, or can its benefits be distilled into faster models (as DeepSeek-R1 distills to 1.5B)?

  4. Structured vs. free-form reasoning: DriveLM's Graph VQA imposes structure (perception→prediction→planning). Reason2Drive uses chain-based decomposition. Free-form CoT is more flexible. Which level of structure optimizes the reasoning-accuracy trade-off for driving?

RL and reasoning emergence

  1. RL-emergent reasoning for driving: DeepSeek-R1 showed CoT emerges from RL with rule-based rewards in math/code. Alpamayo-R1 applies RL to driving reasoning. Can driving produce sufficiently clean reward signals for RL-emergent reasoning, or is the reward specification problem fundamentally harder than math verification?

  2. Reward function design: Math and code have verifiable correctness signals. Driving has safety (no collision), comfort (smooth ride), progress (reach goal), and efficiency (minimize time). Are these sufficient for GRPO-style RL, or does driving need learned reward models?

Cognitive architecture

  1. Dual-process reasoning: AutoVLA dynamically switches between fast (direct action) and slow (CoT) reasoning. How should the system decide when to think deeply? Is complexity estimation itself a learnable skill?

  2. LLM as interface vs. core: Drive as You Speak uses LLMs as passenger interfaces. Should production AV stacks separate the "interaction LLM" from the "planning LLM/VLA," or can a single model serve both roles safely?

  3. Agent frameworks at scale: Agent-Driver established the LLM-as-agent framework with tool use and memory. AsyncDriver decouples LLM reasoning from real-time planning via async updates. Is the agent framework (tools + memory + reasoning) the right paradigm for autonomous driving, or is it too slow for safety-critical real-time control?

Partially answered

  • Q2 (Reasoning vs. planning): Evidence from "LLMs Can't Plan" and the success of VLA + world-model-verifier architectures (WoTE) supports the distinction. But DriveGPT's scaling laws suggest that with enough data, autoregressive prediction may subsume explicit planning.
  • Q3 (CoT overhead): DeepSeek-R1's distillation to 1.5B models shows reasoning can be compressed. DiMA's distill-and-discard approach is the driving analog.
  • Q7 (Dual-process): AutoVLA's dual-process approach is promising. Qwen3's unified thinking mode (same model dynamically chooses deep vs. quick thinking) may be the foundation model analog.

Key papers for this stream

Paper Relevance
Llms Cant Plan But Can Help Planning In Llm Modulo Frameworks LLMs should reason, not plan
Chain Of Thought Prompting Elicits Reasoning In Large Language Models Original CoT for reasoning
Deepseek R1 Incentivizing Reasoning Capability In Llms Via Reinforcement Learning RL-emergent reasoning
Tree Of Thoughts Deliberate Problem Solving With Large Language Models Multi-path deliberate reasoning
Ecot Embodied Chain Of Thought Reasoning For Vision Language Action Models Embodied CoT for VLAs
Drivelm Driving With Graph Visual Question Answering Structured graph reasoning
A Language Agent For Autonomous Driving LLM-as-agent for driving
Senna Bridging Large Vision Language Models And End To End Autonomous Driving Language as intermediate reasoning
Dima Distilling Multi Modal Large Language Models For Autonomous Driving Distill and discard LLM
Autovala Vision Language Action Model For End To End Autonomous Driving Dual-process adaptive reasoning