Tags

239 tags across the wiki

Pages tagged bev

BEVDiffuser: Plug-and-Play Diffusion Model for BEV Denoising with Ground-Truth Guidance

**[Read on arXiv](https://arxiv.org/abs/2502.19694)** BEVDiffuser addresses a fundamental but under-explored problem in BEV-based perception: the inherent noise in BEV feature maps caused by sensor limitations and the l…

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2211.10439)** BEVFormer v2 addresses a critical bottleneck in camera-based 3D perception for autonomous driving: the inability to leverage powerful modern 2D image backbones (e.…

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

source-summary

📄 **[Read on arXiv](https://arxiv.org/abs/2203.17270)** Li, Wang, Li, Xie, Sima, Lu, Yu, Dai (Shanghai AI Lab / Nanjing University / HKU), ECCV, 2022. - [Paper](https://arxiv.org/abs/2203.17270) BEVFormer generates a un…

Bevnext Reviving Dense Bev Frameworks For 3D Object Detection

paper

📄 [arXiv:2312.01696](https://arxiv.org/abs/2312.01696) BEVNeXt revives dense BEV (bird's-eye-view) frameworks for camera-based 3D object detection, demonstrating that with the right design choices, dense approaches can…

Drive-OccWorld: Driving in the Occupancy World

source-summary

📄 **[Read on arXiv](https://arxiv.org/abs/2408.14197)** Drive-OccWorld introduces a vision-centric 4D occupancy forecasting world model that directly integrates with end-to-end planning. The core premise is that current…

Fb Bev Bev Representation From Forward Backward View Transformations

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2308.02236)** FB-BEV addresses a fundamental tension in camera-based BEV perception for autonomous driving: **forward projection** methods (like Lift-Splat-Shoot) generate BEV f…

FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2311.12058)** Occupancy prediction has emerged as a powerful perception paradigm for autonomous driving, predicting per-voxel semantic labels in 3D space to handle arbitrary obj…

GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2407.14108)** Bird's-eye view (BEV) semantic segmentation from multi-camera images is a core perception task in autonomous driving, but existing image-to-BEV transformation meth…

GaussianLSS: Toward Real-world BEV Perception with Depth Uncertainty via Gaussian Splatting

source-summary

📄 **[Read on arXiv](https://arxiv.org/abs/2504.01957)** Bird's-Eye View (BEV) perception faces a fundamental trade-off between accuracy and computational efficiency. High-performing 3D projection methods like BEVFormer…

Lift Splat Shoot Encoding Images From Arbitrary Camera Rigs By Implicitly Unprojecting To 3D

source-summary

📄 **[Read on arXiv](https://arxiv.org/abs/2008.05711)** Lift, Splat, Shoot (LSS) introduced a differentiable pipeline for transforming multi-camera images into a unified bird's-eye view (BEV) representation without requ…

Open Questions: BEV Perception & 3D Occupancy

query

Stream-specific open questions for the BEV perception and 3D occupancy pillar. See wiki/queries/open-questions for the full tree across all streams. 1. **Dense vs. sparse vs. Gaussian:** BEVNeXt revived dense BEV to 64.…

SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2311.12754)** SelfOcc (Huang et al., Tsinghua University, CVPR 2024) introduces the first self-supervised framework for vision-based 3D occupancy prediction that works with mult…

VLP: Vision Language Planning for Autonomous Driving

source-summary

📄 **[Read on arXiv](https://arxiv.org/abs/2401.05577)** VLP (Vision Language Planning) by Pan et al. (CVPR 2024) represents a fundamentally different approach to using language in autonomous driving compared to instruct…

WoTE: End-to-End Driving with Online Trajectory Evaluation via BEV World Model

source-summary

📄 **[Read on arXiv](https://arxiv.org/abs/2504.01941)** End-to-end driving models typically output a single trajectory and trust it entirely, with no mechanism to evaluate whether the predicted path is safe before execu…