ESC

Open Questions

This page is the root of the open-questions tree. Each research pillar has its own dedicated page with stream-specific questions grounded in the papers we've ingested.

Question tree

Overview
├── 1. End-to-End Driving (9 questions)
   ├── Unified vs. decoupled VLA architecture
   ├── Generative vs. discriminative planning
   ├── RL vs. imitation ceiling
   ├── Scaling laws for driving
   └── ...  open-questions-e2e

├── 2. Vision-Language-Action Models (10 questions)
   ├── Dual-system generality
   ├── Cross-embodiment scaling limits
   ├── RL for VLAs
   ├── Robotics  driving transfer gap
   └── ...  open-questions-vla

├── 3. LLM Reasoning for Autonomy (9 questions)
   ├── Language role at maturity
   ├── Reasoning vs. planning distinction
   ├── RL-emergent reasoning for driving
   ├── Dual-process cognitive architecture
   └── ...  open-questions-llm-reasoning

├── 4. Foundation Models & Cross-Embodiment (10 questions)
   ├── Compute-optimal scaling for embodied AI
   ├── Open vs. closed model trajectory
   ├── Cross-embodiment action universality
   ├── Alignment for physical systems
   └── ...  open-questions-foundation-models

└── 5. BEV Perception & 3D Occupancy (10 questions)
    ├── Dense vs. sparse vs. Gaussian
    ├── Occupancy world models
    ├── Self-supervised methods
    ├── Occupancy role in E2E planning
    └── ...  open-questions-bev-perception

Stream pages

Stream Questions Key tension Top papers
End-to-End Driving 9 Unified vs. decoupled, generative vs. discriminative UniAD, DriveTransformer, EMMA, DiffusionDrive
VLA Models 10 Dual-system convergence, cross-embodiment limits pi0, CrossFormer, OpenVLA, GR00T N1
LLM Reasoning 9 Language as scaffold vs. core, reasoning vs. planning LLMs Can't Plan, DeepSeek-R1, ECoT, DriveLM
Foundation Models 10 Open vs. closed, scaling laws for embodied AI Scaling Laws, HPT, CLIP, LoRA, Cosmos
BEV & 3D Occupancy 10 Dense vs. Gaussian, occupancy in E2E GaussianFormer, OccWorld, BEVNeXt, OccMamba

Total: 48 open questions across 5 research pillars, grounded in 198 papers spanning 1993-2026.

Cross-cutting themes

These questions recur across multiple streams and may represent the deepest open problems:

1. The RL frontier

Every stream is hitting an imitation learning ceiling. CarPlanner (E2E), pi0.6 (VLA), DeepSeek-R1 (reasoning), AlphaDrive (driving VLM) all show RL pushes beyond SFT. But reward design for physical systems remains the bottleneck. - E2E: Open Questions E2E Q5 - VLA: Open Questions Vla Q5 - Reasoning: Open Questions Llm Reasoning Q5-Q6

2. Scaling laws for embodied AI

Language scaling laws are well-established. Do they transfer to multimodal embodied data? - E2E: Open Questions E2E Q6 (DriveGPT scaling) - Foundation: Open Questions Foundation Models Q1 (compute-optimal embodied) - VLA: Open Questions Vla Q2 (cross-embodiment scaling)

3. Distillation as deployment pattern

Train large, distill small appears universal: Gemma 3, DeepSeek-R1, DiMA all use it. Is this the deployment path for safety-critical systems? - Foundation: Open Questions Foundation Models Q4 - Reasoning: Open Questions Llm Reasoning Q1, Q3 - E2E: Open Questions E2E Q9

4. Evaluation adequacy

Every stream questions whether current benchmarks measure what matters. - E2E: Open Questions E2E Q7 (NAVSIM/Bench2Drive) - BEV: Open Questions Bev Perception Q10 (mIoU vs. planning quality) - VLA: Open Questions Vla Q7 (open-world failure modes)

5. Explicit structure vs. learned representations

The central tension of the entire wiki: when does hand-designed structure help? - E2E: Open Questions E2E Q1-Q3 - BEV: Open Questions Bev Perception Q9 - Reasoning: Open Questions Llm Reasoning Q4