Tags
239 tags across the wiki
paper 114
autonomous-driving 92
foundation-model 55
transformer 53
vla 49
planning 42
robotics 41
computer-vision 36
perception 32
ilya-30 29
multimodal 29
nlp 26
end-to-end 24
language-modeling 24
llm 17
reasoning 17
imitation-learning 16
3d-occupancy 15
vlm 15
bev 14
diffusion 13
e2e 12
reinforcement-learning 12
world-model 12
chain-of-thought 10
benchmark 9
scaling 9
cross-embodiment 7
driving 6
gaussian-splatting 6
generative-models 6
image-classification 6
information-theory 6
questions 6
self-supervised 6
sources 6
alignment 5
attention 5
cnn 5
foundation 5
knowledge-distillation 5
language-model 5
prediction 5
simulation 5
evaluation 4
image-generation 4
instruction-tuning 4
mixture-of-experts 4
rnn 4
sequence-to-sequence 4
sparse-representation 4
video-prediction 4
explainability 3
flow-matching 3
lstm 3
map 3
occupancy 3
open-source 3
semantic-segmentation 3
sequence-modeling 3
trajectory-prediction 3
vectorized-representation 3
3d-detection 2
3d-perception 2
3d-reconstruction 2
action-representation 2
autonomy 2
autoregressive 2
bimanual 2
closed-loop 2
complexity-theory 2
dataset 2
deployment 2
distributed-training 2
efficient-inference 2
embodied 2
fine-tuning 2
foundation-models 2
foundational 2
gaussian-representation 2
generation 2
generative 2
human-interaction 2
humanoid 2
manipulation 2
memory-augmented-networks 2
ml 2
multi-camera 2
multilingual 2
object-detection 2
parameter-efficient-fine-tuning 2
prompting 2
real-time 2
regularization 2
relational-reasoning 2
residual-networks 2
rlhf 2
scaling-laws 2
segmentation 2
self-improvement 2
self-supervised-learning 2
state-space 2
systems 2
thermodynamics 2
vision-language-model 2
vision-transformer 2
visual-question-answering 2
zero-shot 2
3d 1
3d-scene 1
3d-semantic-occupancy 1
agenda 1
agentic 1
agi 1
algorithmic-information-theory 1
algorithmic-randomness 1
asynchronous 1
attention-mechanism 1
batch 1
bayesian-inference 1
behavior-forecasting 1
camera-fusion 1
classifier-guidance 1
combinatorial-optimization 1
comparison 1
compression 1
computability 1
concept 1
contrastive-learning 1
control 1
convolutional-neural-networks 1
corpus 1
course 1
data-collection 1
decoupled 1
deep-learning 1
denoising 1
depth-estimation 1
dexterous-manipulation 1
differentiable-programming 1
diffusion-policy 1
diffusion-transformer 1
dilated-convolutions 1
dropout 1
efficient 1
embodied-ai 1
embodiment 1
emergent-abilities 1
end-to-end-learning 1
evaluation-metric 1
few-shot 1
few-shot-learning 1
foundations 1
frontend 1
gaussian 1
gaussian-rendering 1
generalist-agent 1
generalization 1
gpu-training 1
graph-neural-networks 1
grounding 1
grpo 1
hierarchical 1
high-frequency-control 1
hosting 1
ilya 1
image-captioning 1
image-text-retrieval 1
in-context-learning 1
inductive-bias 1
intelligence-measurement 1
interactive-annotation 1
interactive-segmentation 1
knowledge-preservation 1
kolmogorov-complexity 1
lanegcn 1
locomotion 1
machine-translation 1
mamba 1
mdl 1
message-passing 1
minimum-description-length 1
model-parallelism 1
model-predictive-control 1
model-selection 1
modular 1
molecular-property-prediction 1
multi-embodiment 1
multi-task 1
natural-language 1
neural-radiance-fields 1
neuro-symbolic 1
obsidian 1
open-world 1
optimization 1
orchestration 1
parallel-architecture 1
parameter-efficient 1
permutation-invariance 1
personalization 1
physical-ai 1
pipeline-parallelism 1
pointer-mechanism 1
privileged-supervision 1
probabilistic-planning 1
proprioception 1
quantization 1
queue 1
radar 1
recurrent-neural-networks 1
representation-learning 1
scene-understanding 1
search 1
seminal 1
sensor-fusion 1
set-modeling 1
siamese-networks 1
simulator 1
source 1
sparse-models 1
spatial-reasoning 1
speech-recognition 1
survey 1
synthesis 1
taxonomy 1
temporal 1
temporal-modeling 1
thesis 1
tokenization 1
tool-use 1
training 1
uniad 1
unified-stack 1
vanishing-gradients 1
variational-autoencoders 1
video-generation 1
video-understanding 1
visual-traces 1
vit 1
Pages tagged multi-camera
OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving
paper
📄 **[Read on arXiv](https://arxiv.org/abs/2404.15014)** OccGen reframes 3D semantic occupancy prediction as a conditional generative problem rather than a purely discriminative one. Prior occupancy methods (SurroundOcc,…
SurroundOcc: Multi-camera 3D Occupancy Prediction for Autonomous Driving
paper
📄 **[Read on arXiv](https://arxiv.org/abs/2303.09551)** SurroundOcc addresses the problem of dense 3D semantic occupancy prediction from multi-camera images for autonomous driving. Unlike 3D object detection, which repr…