Open Questions
This page is the root of the open-questions tree. Each research pillar has its own dedicated page with stream-specific questions grounded in the papers we've ingested.
Question tree
Overview
├── 1. End-to-End Driving (9 questions)
│ ├── Unified vs. decoupled VLA architecture
│ ├── Generative vs. discriminative planning
│ ├── RL vs. imitation ceiling
│ ├── Scaling laws for driving
│ └── ... → open-questions-e2e
│
├── 2. Vision-Language-Action Models (10 questions)
│ ├── Dual-system generality
│ ├── Cross-embodiment scaling limits
│ ├── RL for VLAs
│ ├── Robotics → driving transfer gap
│ └── ... → open-questions-vla
│
├── 3. LLM Reasoning for Autonomy (9 questions)
│ ├── Language role at maturity
│ ├── Reasoning vs. planning distinction
│ ├── RL-emergent reasoning for driving
│ ├── Dual-process cognitive architecture
│ └── ... → open-questions-llm-reasoning
│
├── 4. Foundation Models & Cross-Embodiment (10 questions)
│ ├── Compute-optimal scaling for embodied AI
│ ├── Open vs. closed model trajectory
│ ├── Cross-embodiment action universality
│ ├── Alignment for physical systems
│ └── ... → open-questions-foundation-models
│
└── 5. BEV Perception & 3D Occupancy (10 questions)
├── Dense vs. sparse vs. Gaussian
├── Occupancy world models
├── Self-supervised methods
├── Occupancy role in E2E planning
└── ... → open-questions-bev-perception
Stream pages
| Stream | Questions | Key tension | Top papers |
|---|---|---|---|
| End-to-End Driving | 9 | Unified vs. decoupled, generative vs. discriminative | UniAD, DriveTransformer, EMMA, DiffusionDrive |
| VLA Models | 10 | Dual-system convergence, cross-embodiment limits | pi0, CrossFormer, OpenVLA, GR00T N1 |
| LLM Reasoning | 9 | Language as scaffold vs. core, reasoning vs. planning | LLMs Can't Plan, DeepSeek-R1, ECoT, DriveLM |
| Foundation Models | 10 | Open vs. closed, scaling laws for embodied AI | Scaling Laws, HPT, CLIP, LoRA, Cosmos |
| BEV & 3D Occupancy | 10 | Dense vs. Gaussian, occupancy in E2E | GaussianFormer, OccWorld, BEVNeXt, OccMamba |
Total: 48 open questions across 5 research pillars, grounded in 198 papers spanning 1993-2026.
Cross-cutting themes
These questions recur across multiple streams and may represent the deepest open problems:
1. The RL frontier
Every stream is hitting an imitation learning ceiling. CarPlanner (E2E), pi0.6 (VLA), DeepSeek-R1 (reasoning), AlphaDrive (driving VLM) all show RL pushes beyond SFT. But reward design for physical systems remains the bottleneck. - E2E: Open Questions E2E Q5 - VLA: Open Questions Vla Q5 - Reasoning: Open Questions Llm Reasoning Q5-Q6
2. Scaling laws for embodied AI
Language scaling laws are well-established. Do they transfer to multimodal embodied data? - E2E: Open Questions E2E Q6 (DriveGPT scaling) - Foundation: Open Questions Foundation Models Q1 (compute-optimal embodied) - VLA: Open Questions Vla Q2 (cross-embodiment scaling)
3. Distillation as deployment pattern
Train large, distill small appears universal: Gemma 3, DeepSeek-R1, DiMA all use it. Is this the deployment path for safety-critical systems? - Foundation: Open Questions Foundation Models Q4 - Reasoning: Open Questions Llm Reasoning Q1, Q3 - E2E: Open Questions E2E Q9
4. Evaluation adequacy
Every stream questions whether current benchmarks measure what matters. - E2E: Open Questions E2E Q7 (NAVSIM/Bench2Drive) - BEV: Open Questions Bev Perception Q10 (mIoU vs. planning quality) - VLA: Open Questions Vla Q7 (open-world failure modes)
5. Explicit structure vs. learned representations
The central tension of the entire wiki: when does hand-designed structure help? - E2E: Open Questions E2E Q1-Q3 - BEV: Open Questions Bev Perception Q9 - Reasoning: Open Questions Llm Reasoning Q4
Navigation
- Overview — Wiki overview and five research pillars
- Research Thesis — Current thesis synthesizing these questions
- Vision Language Action — VLA concept page
- End To End Architectures — E2E concept page
- Perception — BEV/perception concept page
- Foundation Models — Foundation models concept page