ESC

SparseDriveV2: Scoring is All You Need for End-to-End Autonomous Driving

:page_facing_up: Read on arXiv

Overview

SparseDriveV2 by Sun et al. (2026) pushes the performance boundary of scoring-based trajectory planning by demonstrating that "scoring is all you need" -- given sufficient coverage of the action space, a simple score-and-select approach outperforms complex generative planners. The core innovation is a scalable trajectory vocabulary that factorizes spatiotemporal trajectories into geometric paths and velocity profiles, enabling combinatorial explosion of candidates (1024 paths x 256 velocities = 262,144 total) while remaining computationally tractable.

SparseDriveV2 achieves 92.0 PDMS and 90.1 EPDMS on NAVSIM, and 89.15 Driving Score with 70.00% Success Rate on Bench2Drive using a lightweight ResNet-34 backbone. The paper presents a systematic scaling study (using Hydra-MDP as a baseline) showing that planning performance improves consistently as trajectory anchors increase in density, without saturation before computational limits -- suggesting that larger vocabularies will yield further gains.

Key Contributions

  • Factorized trajectory vocabulary: Decomposes spatiotemporal trajectories into independent geometric paths (spatial) and velocity profiles (temporal), enabling 262,144 candidates from only 1024+256=1280 base elements
  • Scaling law for trajectory anchors: Demonstrates that planning performance improves consistently as vocabulary density increases, with no saturation observed before memory limits
  • Two-stage scalable scoring: Coarse factorized scoring with progressive filtering (128/64 → 20/20) prunes the 262K candidates to 400 composed trajectories, followed by fine-grained trajectory re-conditioning scoring
  • Metric supervision training: Multi-task objectives including metric-based losses for safety, progress, comfort, and rule compliance aligned with the PDMS evaluation metric
  • 92.0 PDMS on NAVSIM v1, 90.1 EPDMS on NAVSIM v2: New state-of-the-art on NAVSIM; also achieves 89.15 Driving Score / 70.00% Success Rate on Bench2Drive with ResNet-34 backbone

Architecture / Method

SparseDriveV2 framework overview

  ┌───────────────────────────────────────────────────────────┐
                Factorized Trajectory Vocabulary              
                                                             
    ┌─────────────────────┐    ┌──────────────────────────┐  
     Geometric Paths          Velocity Profiles          
     (1024 anchors)           (256 anchors)              
     [straight, turn,         [accel, decel, const,      
      lane change, ...]        stop, ...]                
    └─────────┬───────────┘    └──────────┬───────────────┘  
  └────────────┼───────────────────────────┼──────────────────┘
                                          
                                          
  ┌───────────────────────────────────────────────────────────┐
    Stage 1: Coarse Factorized Scoring                       
    Score paths independently ──► 128 paths (layer 1)  20  
    Score velocities independently ──► 64 vels (layer 1)  20
    262,144 candidates ──► 400 composed trajectories         
  └──────────────────────────┬────────────────────────────────┘
                             
                             
  ┌───────────────────────────────────────────────────────────┐
    Stage 2: Fine-Grained Scoring                            
    Compose path + velocity ──► full trajectory              
    Trajectory re-conditioning via scene interaction modules 
    400 ──► Best trajectory                                  
  └──────────────────────────┬────────────────────────────────┘
                             
                             
  ┌───────────────────────────────────────────────────────────┐
    Metric Supervision: Safety + Progress + Comfort + Rules  
  └───────────────────────────────────────────────────────────┘

The architecture extends Sparsedrive End To End Autonomous Driving Via Sparse Scene Representation with a fundamentally new planning module:

Backbone: SparseDriveV2 uses a lightweight ResNet-34 (21.8M parameters) for NAVSIM experiments and ResNet-50 for Bench2Drive experiments.

Factorized Vocabulary Construction: Instead of directly discretizing the full trajectory space, SparseDriveV2 separates trajectories into two independent components: - Geometric paths (1024 anchors): Describe the spatial shape of the trajectory (straight, left turn, lane change, etc.) as sequences of 2D waypoints - Velocity profiles (256 anchors): Describe the temporal progression along a path (accelerate, decelerate, constant speed, stop) - Composition: Any path can be combined with any velocity profile, yielding 1024 x 256 = 262,144 spatiotemporal trajectory candidates

Trajectory factorization

Two-Stage Scoring Pipeline: - Stage 1 (Coarse): Score paths and velocities independently using scene feature interactions via deformable aggregation modules. Progressive filtering retains top 128 paths and 64 velocities in the first decoder layer, then refines to top 20 paths and 20 velocities in the second layer, yielding 400 final composed trajectory hypotheses. - Stage 2 (Fine): Compose the selected paths and velocities into full trajectories. Each composed trajectory undergoes trajectory re-conditioning through additional scene interaction modules to account for spatiotemporal dependencies. Select the highest-scoring trajectory as the final plan.

Scaling study

Metric Supervision: Training losses are directly aligned with the PDMS evaluation metric, including sub-losses for: - Safety: Collision probability with predicted agent trajectories - Progress: Distance traveled toward the navigation goal - Comfort: Smoothness of acceleration and jerk profiles - Rule compliance: Lane keeping, speed limit adherence, traffic signal obedience

Results

Method PDMS
UniAD ~75
VAD ~78
SparseDrive (V1) ~85
SparseDriveV2 92.0
Metric Score
EPDMS (corrected protocol) 90.1
EPDMS* (original protocol) 86.7
Ego Progress 91.1

Bench2Drive Benchmark

Metric Score
Driving Score 89.15
Success Rate 70.00%
Multi-Ability (mean) 67.67

Scaling Behavior

The motivating scaling study of Hydra-MDP shows performance gains from 1,024 to 16,384 anchors without saturation, suggesting insufficient action space coverage -- not inherent approach limitations -- constrained prior static vocabulary methods. In SparseDriveV2, scaling vocabulary from 512×128 to 1024×256 anchors improves EPDMS from 88.7 to 90.1.

Qualitative navigation results

Limitations

  • The 262K vocabulary, while efficient through factorization, still requires significant memory and compute for scoring
  • Factorized scoring in Stage 1 may prune good trajectories if path quality depends on velocity (e.g., a tight turn requires specific speed)
  • The vocabulary is pre-defined and fixed; novel maneuvers not represented in the vocabulary cannot be executed
  • Evaluated on NAVSIM v1, NAVSIM v2, and Bench2Drive; performance on fully closed-loop simulators (e.g., CARLA) is not reported
  • The "scoring is all you need" claim assumes sufficient vocabulary coverage, which may not hold in out-of-distribution scenarios

Connections