Paper Fact-Check Tracker
Updated: 2026-04-11
Purpose: coordinate a full factuality audit of every page in wiki/sources/papers/ against the original paper.
Per-paper status lives in YAML frontmatter under paper-faithfullness.
Status Legend
unchecked— no direct paper-vs-summary audit completed yetaudited-solid— sampled against the original paper and materially faithfulaudited-needs-tightening— mostly faithful but contains notable imprecision or unsupported framingaudited-needs-correction— contains factual errors or materially misleading claims
[2026-04-11] random-sample audit | 10 additional paper summaries (seed 20260417)
- Deterministic sample seed:
20260417 - Sampled pages:
uniact-universal-actions-for-enhanced-embodied-foundation-models,swin-transformer-hierarchical-vision-transformer-using-shifted-windows,attention-is-all-you-need,driveadapter-breaking-the-coupling-barrier-of-perception-and-planning-in-end-to-end-autonomous-driving,vectornet-encoding-hd-maps-and-agent-dynamics-from-vectorized-representation,scaling-instruction-finetuned-language-models,smolvla-a-vision-language-action-model-for-affordable-robotics,fb-bev-bev-representation-from-forward-backward-view-transformations,a-language-agent-for-autonomous-driving, andcosmos-world-foundation-model-platform-for-physical-ai. - Found
2clear corrections:vectornet-encoding-hd-maps-and-agent-dynamics-from-vectorized-representationunderstated the FLOP reduction as70%even though the paper reports10.56Gvs0.041GFLOPs for the main comparison (about200xfewer /99.6%lower), andsmolvla-a-vision-language-action-model-for-affordable-roboticsoverstated the memory advantage overpi0as7xeven though the paper states6x less memory. - The other
8sampled pages did not show material paper-vs-summary failures in this pass and were left unchanged. - Actual frontmatter counts after this sample remain mixed because legacy labels are still present elsewhere in the corpus:
audited-solid160,audited-clean15,audited-fixed20,audited-needs-tightening1,audited-needs-correction1.
[2026-04-11] random-sample audit | 20-paper serious-error check (seed 20260415)
- Deterministic sample seed:
20260415 - Sampled pages:
law-enhancing-end-to-end-autonomous-driving-with-latent-world-model,gaussianformer-scene-as-gaussians-for-vision-based-3d-semantic-occupancy-prediction,qlora-efficient-finetuning-of-quantized-language-models,drivegpt4-interpretable-end-to-end-autonomous-driving-via-large-language-model,bevformer-learning-birds-eye-view-representation-from-multi-camera-images-via-spatiotemporal-transformers,mamba-linear-time-sequence-modeling-with-selective-state-spaces,roboflamingo-vision-language-foundation-models-as-effective-robot-imitators,toolformer-language-models-can-teach-themselves-to-use-tools,hpt-scaling-proprioceptive-visual-learning-with-heterogeneous-pre-trained-transformers,flashocc-fast-and-memory-efficient-occupancy-prediction-via-channel-to-height-plugin,vad-vectorized-scene-representation-for-efficient-autonomous-driving,s4-driver-scalable-self-supervised-driving-mllm-with-spatio-temporal-visual-representation,yolov10-real-time-end-to-end-object-detection,bevdiffuser-plug-and-play-diffusion-model-for-bev-denoising,an-image-is-worth-16x16-words-transformers-for-image-recognition-at-scale,helix-a-vla-for-generalist-humanoid-control,drivemlm-aligning-multi-modal-llms-with-behavioral-planning-states,training-language-models-to-follow-instructions-with-human-feedback,driving-gaussian-composite-gaussian-splatting-for-surrounding-dynamic-driving-scenes, andrt-2-vision-language-action-models-transfer-web-knowledge-to-robotic-control. - Found
1serious error:flashocc-fast-and-memory-efficient-occupancy-prediction-via-channel-to-height-pluginwas downgraded toaudited-needs-correctionbecause the summary reported materially wrong mIoU/speed tradeoffs and mischaracterized the plug-in results. The corrected page now reflects the paper's actual BEVDetOcc / UniOcc / FBOcc comparisons and resource numbers. - Found
1smaller overstatement:rt-2-vision-language-action-models-transfer-web-knowledge-to-robotic-controlwas downgraded toaudited-needs-tighteningbecause the page claimed a quantified chain-of-thought improvement even though the paper presents that variant qualitatively rather than as a main measured gain. - The other
18sampled pages did not show serious paper-vs-summary failures and were left unchanged. - Actual frontmatter counts after this sample remain mixed because legacy labels are still present elsewhere in the corpus:
audited-solid164,audited-clean15,audited-fixed16,audited-needs-tightening1,audited-needs-correction1.
[2026-04-11] random-sample audit | 20 additional paper summaries (seed 20260416)
- Deterministic sample seed:
20260416 - Sampled pages:
knowledge-insulating-vision-language-action-models,openvla-an-open-source-vision-language-action-model,occmamba-semantic-occupancy-prediction-with-state-space-models,gaussianformer-2-probabilistic-gaussian-superposition-for-efficient-3d-occupancy-prediction,machine-super-intelligence,unleashing-large-scale-video-generative-pre-training-for-visual-robot-manipulation,wote-end-to-end-driving-with-online-trajectory-evaluation-via-bev-world-model,orion-holistic-end-to-end-autonomous-driving-by-vision-language-instructed-action-generation,gaussianlss-toward-real-world-bev-perception-with-depth-uncertainty-via-gaussian-splatting,drivor-driving-on-registers,keeping-neural-networks-simple-by-minimizing-the-description-length-of-the-weights,vista-a-generalizable-driving-world-model-with-high-fidelity-and-versatile-controllability,autort-embodied-foundation-models-for-large-scale-orchestration-of-robotic-agents,mixtral-of-experts,video-prediction-policy-a-generalist-robot-policy-with-predictive-visual-representations,gaussianflowocc-sparse-occupancy-with-gaussian-splatting-and-temporal-flow,chauffeurnet-learning-to-drive-by-imitating-the-best-and-synthesizing-the-worst,palm-scaling-language-modeling-with-pathways,bevnext-reviving-dense-bev-frameworks-for-3d-object-detection, andimagenet-classification-with-deep-convolutional-neural-networks. - Found
1clear correction:gaussianformer-2-probabilistic-gaussian-superposition-for-efficient-3d-occupancy-predictionhad conflated the 25,600-Gaussian ablation setting with the 12,800-Gaussian nuScenes main result, so the summary overstated the number of Gaussians and cited the wrong main-result memory figure. The page now reflects the paper's actual nuScenes setup. - Found
1smaller but real overstatement:chauffeurnet-learning-to-drive-by-imitating-the-best-and-synthesizing-the-worstsummarized closed-loop gains as broad aggregate percentage reductions that the paper does not report. The results section now mirrors the paper's scenario-based evaluations and real-world demo claims instead of introducing unsupported aggregate numbers. - The other
18sampled pages did not show material paper-vs-summary failures in this pass and were left unchanged. - Actual frontmatter counts after this sample remain mixed because legacy labels are still present elsewhere in the corpus:
audited-solid162,audited-clean15,audited-fixed18,audited-needs-tightening1,audited-needs-correction1.
[2026-04-11] targeted factuality audit | simlingo + talk2car
- Audited
2papers against arxiv and AlphaXiv ground truth. simlingo-vision-only-closed-loop-autonomous-driving-with-language-action-alignment: fixed (audited-fixed). Action Dreaming success rates were wrong — wiki stated "28.22 to 72.96" but the paper (Table 5) reports baseline 24.52% and SimLingo with dreaming data 81.13%. Corrected in the Results section. All other claims (CVPR 2025 venue, Bench2Drive DS 85.07% / SR 67.27%, CARLA Leaderboard 2.0 state-of-the-art, camera-only, instruction refusal) are faithful.talk2car-taking-control-of-your-self-driving-car: clean (audited-clean). All factual claims verified: 11,959 commands over 850 nuScenes training videos, EMNLP-IJCNLP 2019 venue, authors (Deruyttere, Vandenhende, Grujicic, Van Gool, Moens), AP50 evaluation metric, two-tower baseline description, and Talk2Car-Trajectory extension characterization are all faithful to the source.- Current corpus counts:
audited-solid188,audited-needs-tightening5,audited-needs-correction2,audited-fixed2,audited-clean2,unchecked0.
[2026-04-11] non-arxiv source audit | cs231n and understanding-lstm-networks
- Audited
2non-arXiv blog/course sources against their live original URLs. cs231n-convolutional-neural-networks-for-visual-recognition: fixed (audited-fixed). The title in frontmatter and document heading was wrong — the course is officially titled "CS231n: Deep Learning for Computer Vision" (confirmed from bothhttp://cs231n.github.io/andhttps://cs231n.stanford.edu/), not "Convolutional Neural Networks for Visual Recognition" (the older pre-2017 name). Fixed in frontmattertitlefield and# h1heading. All other factual claims (instructors Fei-Fei Li, Karpathy, Johnson; course structure; architectural descriptions) are consistent with the source.understanding-lstm-networks: clean (audited-clean). All factual claims verified againsthttps://colah.github.io/posts/2015-08-Understanding-LSTMs/: three-gate architecture (forget, input, output), LSTM attribution to Hochreiter & Schmidhuber (1997), GRU attribution to Cho et al. (2014), cell state equations, and the additive gradient highway framing are all faithful to the source.- Current corpus counts:
audited-solid188,audited-needs-tightening5,audited-needs-correction2,audited-fixed1,audited-clean1,unchecked0.
[2026-04-11] random-sample audit | 10 additional paper summaries (seed 20260414)
- Deterministic sample seed:
20260414 - Sampled pages:
on-the-opportunities-and-risks-of-foundation-models,drivegpt-scaling-autoregressive-behavior-models-for-driving,prefix-tuning-optimizing-continuous-prompts-for-generation,planning-oriented-autonomous-driving,vlp-vision-language-planning-for-autonomous-driving,pi0-a-vision-language-action-flow-model-for-general-robot-control,blip-bootstrapping-language-image-pre-training-for-unified-vision-language-understanding-and-generation,rdt-1b-a-diffusion-foundation-model-for-bimanual-manipulation,qwen3-technical-report,gr-2-a-generative-video-language-action-model-with-web-scale-knowledge-for-robot-manipulation - Found
2summaries needing changes:pi0-a-vision-language-action-flow-model-for-general-robot-controlwas downgraded toaudited-needs-tighteningbecause the page asserted positive cross-embodiment transfer that the paper does not isolate in a dedicated ablation;rdt-1b-a-diffusion-foundation-model-for-bimanual-manipulationwas downgraded toaudited-needs-tighteningbecause its limitation section incorrectly implied the evaluation was primarily in simulation even though the paper's main experiments are on real ALOHA dual-arm robots. - The other
8sampled pages matched the source papers closely enough to keep their currentaudited-solidstatus:on-the-opportunities-and-risks-of-foundation-models,drivegpt-scaling-autoregressive-behavior-models-for-driving,prefix-tuning-optimizing-continuous-prompts-for-generation,planning-oriented-autonomous-driving,vlp-vision-language-planning-for-autonomous-driving,blip-bootstrapping-language-image-pre-training-for-unified-vision-language-understanding-and-generation,qwen3-technical-report, andgr-2-a-generative-video-language-action-model-with-web-scale-knowledge-for-robot-manipulation. - Current corpus counts after this sample:
audited-solid183,audited-needs-tightening11,audited-needs-correction3,unchecked0.
[2026-04-11] targeted correction audit | lift-splat-shoot and occgen
- Audited
2pages flaggedaudited-needs-correctionagainst arxiv and AlphaXiv ground truth. lift-splat-shoot-encoding-images-from-arbitrary-camera-rigs-by-implicitly-unprojecting-to-3d: clean (audited-clean). All factual claims verified — authors (Philion, Fidler), venue (ECCV 2020), backbone (EfficientNet-B0), BEV encoder (ResNet-18), depth binning, cumulative-sum splat trick, nuScenes/Lyft evaluation, and planning formulation all match the arxiv abstract and AlphaXiv overview. No material corrections needed.occgen-generative-multi-modal-3d-occupancy-prediction-for-autonomous-driving: fixed (audited-fixed). Three locations had camera-only and LiDAR-only mIoU scores swapped. Paper Table 9 shows C-OccGen (camera-only) = 14.5 mIoU and L-OccGen (LiDAR-only) = 16.8 mIoU; wiki had these reversed. The results table comparison baseline rows were also mislabelled: C-CONet = 12.8 is camera-only (not LiDAR-only) and L-CONet = 15.8 is LiDAR-only (not camera-only). The TPVFormer row (15.1) was relabelled to OpenOccupancy (multi-modal baseline); TPVFormer's actual nuScenes-Occupancy score is 7.8 (camera-only). Fixed in overview paragraph, key-contributions bullet, and results table.- Current corpus counts:
audited-solid186,audited-needs-tightening8,audited-needs-correction1,audited-clean1,audited-fixed1,unchecked0.
[2026-04-11] random-sample audit | 10 additional paper summaries (seed 20260413)
- Deterministic sample seed:
20260413 - Sampled pages:
alpamayo-r1-bridging-reasoning-and-action-prediction-for-autonomous-driving,gaussianocc-fully-self-supervised-3d-occupancy-estimation-with-gaussian-splatting,simlingo-vision-only-closed-loop-autonomous-driving-with-language-action-alignment,drivetransformer-unified-transformer-for-scalable-end-to-end-autonomous-driving,voxposer-composable-3d-value-maps-for-robotic-manipulation-with-language-models,senna-bridging-large-vision-language-models-and-end-to-end-autonomous-driving,unisim-learning-interactive-real-world-simulators,variational-lossy-autoencoder,emerging-properties-in-self-supervised-vision-transformers,momad-momentum-aware-planning-in-end-to-end-autonomous-driving - Found
1summary needing changes:simlingo-vision-only-closed-loop-autonomous-driving-with-language-action-alignmentwas downgraded toaudited-needs-tighteningbecause the page overstated the effect of Action Dreaming on closed-loop driving and treated the benchmark evidence more broadly than the paper supports. - The other
9sampled pages matched the source papers closely enough to keep their currentaudited-solidstatus:alpamayo-r1-bridging-reasoning-and-action-prediction-for-autonomous-driving,gaussianocc-fully-self-supervised-3d-occupancy-estimation-with-gaussian-splatting,drivetransformer-unified-transformer-for-scalable-end-to-end-autonomous-driving,voxposer-composable-3d-value-maps-for-robotic-manipulation-with-language-models,senna-bridging-large-vision-language-models-and-end-to-end-autonomous-driving,unisim-learning-interactive-real-world-simulators,variational-lossy-autoencoder,emerging-properties-in-self-supervised-vision-transformers, andmomad-momentum-aware-planning-in-end-to-end-autonomous-driving. - Current corpus counts after this sample:
audited-solid185,audited-needs-tightening9,audited-needs-correction3,unchecked0.
[2026-04-11] random-sample audit | 10 additional paper summaries
- Deterministic sample seed:
20260412 - Sampled pages:
multi-scale-context-aggregation-by-dilated-convolutions,rt-h-action-hierarchies-using-language,embodiment-scaling-laws-in-robot-locomotion,occgen-generative-multi-modal-3d-occupancy-prediction-for-autonomous-driving,selfocc-self-supervised-vision-based-3d-occupancy-prediction,rt-1-robotics-transformer-for-real-world-control-at-scale,lift-splat-shoot-encoding-images-from-arbitrary-camera-rigs-by-implicitly-unprojecting-to-3d,llarva-vision-action-instruction-tuning-enhances-robot-learning,3d-vla-a-3d-vision-language-action-generative-world-model,occworld-learning-a-3d-occupancy-world-model-for-autonomous-driving - Found
3summaries needing changes:multi-scale-context-aggregation-by-dilated-convolutionswas downgraded toaudited-needs-tighteningafter fixing the basic context-module description (seven layers, no batch normalization claim);occgen-generative-multi-modal-3d-occupancy-prediction-for-autonomous-drivingwas downgraded toaudited-needs-correctionafter correcting the swapped camera-only vs. LiDAR-only nuScenes numbers;lift-splat-shoot-encoding-images-from-arbitrary-camera-rigs-by-implicitly-unprojecting-to-3dremainedaudited-needs-correctionwith unsupported transfer/runtime overclaims removed. - The other
7sampled pages matched the source papers closely enough to keep their currentaudited-solidstatus:rt-h-action-hierarchies-using-language,embodiment-scaling-laws-in-robot-locomotion,selfocc-self-supervised-vision-based-3d-occupancy-prediction,rt-1-robotics-transformer-for-real-world-control-at-scale,llarva-vision-action-instruction-tuning-enhances-robot-learning,3d-vla-a-3d-vision-language-action-generative-world-model, andoccworld-learning-a-3d-occupancy-world-model-for-autonomous-driving. - Current corpus counts after this sample:
audited-solid186,audited-needs-tightening8,audited-needs-correction3,unchecked0.
[2026-04-11] corpus metadata validation | 197 source pages
- Validated all
wiki/sources/papers/entries at the primary-record level:197total pages,187arXiv-backed entries,10non-arXiv entries. - arXiv-backed coverage is structurally clean after link fixes:
158exact title matches to arXiv records,27acceptable title variants or acronym-prefix variants,2shortened-title variants (gpipe,drive-occworld) that still point to the correct paper, and0unresolved arXiv-record misses. - Fixed three broken primary-source references:
solvenow points to arXiv2505.16805,simlingonow points to arXiv2503.09594, andpara-drivenow points to the CVPR 2024 Open Access page with the incorrectarxiv_idremoved. - Frontmatter year vs. arXiv upload year differs by at most
1across all arXiv-backed entries, which is consistent with conference-year vs. preprint-year drift rather than broken metadata. - This pass established source identity and metadata correctness; the follow-up source-faithfulness audit below resolves the old legacy status vocabulary and clears the remaining unchecked backlog.
- Non-arXiv entries now explicitly accounted for:
cs231n,helix,imagenet-classification-with-deep-convolutional-neural-networks,keeping-neural-networks-simple-by-minimizing-the-description-length-of-the-weights,kolmogorov-complexity-and-algorithmic-randomness,machine-super-intelligence,para-drive-parallelized-architecture-for-real-time-autonomous-driving,the-first-law-of-complexodynamics,the-unreasonable-effectiveness-of-recurrent-neural-networks,understanding-lstm-networks.
[2026-04-11] remaining source-faithfulness audit | 14 pages
- Audited the final
14pages that were still literally markeduncheckedin frontmatter. - Marked
10of those pagesaudited-solid:a-tutorial-introduction-to-the-minimum-description-length-principle,imagenet-classification-with-deep-convolutional-neural-networks,keeping-neural-networks-simple-by-minimizing-the-description-length-of-the-weights,kolmogorov-complexity-and-algorithmic-randomness,machine-super-intelligence,multi-scale-context-aggregation-by-dilated-convolutions,neural-machine-translation-by-jointly-learning-to-align-and-translate,recurrent-neural-network-regularization,simlingo-vision-only-closed-loop-autonomous-driving-with-language-action-alignment, andtalk2car-taking-control-of-your-self-driving-car. - Marked
4pagesaudited-needs-tighteningand softened their wording because they are blog/course-style sources whose summaries mixed source content with broader field-impact interpretation:cs231n-convolutional-neural-networks-for-visual-recognition,the-first-law-of-complexodynamics,the-unreasonable-effectiveness-of-recurrent-neural-networks, andunderstanding-lstm-networks. - Normalized
110legacyaudited-clean/audited-fixedlabels toaudited-solidso the corpus now uses only the tracker legend. - Current corpus counts:
audited-solid188,audited-needs-tightening7,audited-needs-correction2,unchecked0.
[2026-04-11] random-sample audit | 10 paper summaries
- Deterministic sample seed:
20260411 - Hard factual issues fixed across 5 summaries:
carla,surroundocc,self-improving-embodied-foundation-models,bert,drivedreamer - Audited with no material paper-faithfulness fixes needed:
sparsedrive,language-models-are-few-shot-learners,scaling-laws-for-neural-language-models,s4-driver,scaling-cross-embodied-learning - Main error types: wrong benchmark values, incorrect loss descriptions, inaccurate venue/training metadata, and one limitation claim contradicted by the paper
Batch Tracker
Batch 01 — active
wiki/sources/papers/3d-vla-a-3d-vision-language-action-generative-world-model.mdwiki/sources/papers/a-generalist-agent.mdwiki/sources/papers/a-language-agent-for-autonomous-driving.mdwiki/sources/papers/a-simple-neural-network-module-for-relational-reasoning.mdwiki/sources/papers/a-tutorial-introduction-to-the-minimum-description-length-principle.mdwiki/sources/papers/alpamayo-r1-bridging-reasoning-and-action-prediction-for-autonomous-driving.mdwiki/sources/papers/alphadrive-unleashing-the-power-of-vlms-in-autonomous-driving.mdwiki/sources/papers/an-image-is-worth-16x16-words-transformers-for-image-recognition-at-scale.mdwiki/sources/papers/asyncdriver-asynchronous-large-language-model-enhanced-planner-for-autonomous-driving.mdwiki/sources/papers/attention-is-all-you-need.md
Batch 02 — active
wiki/sources/papers/autort-embodied-foundation-models-for-large-scale-orchestration-of-robotic-agents.mdwiki/sources/papers/autovala-vision-language-action-model-for-end-to-end-autonomous-driving.mdwiki/sources/papers/bert-pre-training-of-deep-bidirectional-transformers-for-language-understanding.mdwiki/sources/papers/bevdiffuser-plug-and-play-diffusion-model-for-bev-denoising.mdwiki/sources/papers/bevformer-learning-birds-eye-view-representation-from-multi-camera-images-via-spatiotemporal-transformers.mdwiki/sources/papers/bevformer-v2-adapting-modern-image-backbones-to-birds-eye-view-recognition-via-perspective-supervision.mdwiki/sources/papers/bevnext-reviving-dense-bev-frameworks-for-3d-object-detection.mdwiki/sources/papers/blip-bootstrapping-language-image-pre-training-for-unified-vision-language-understanding-and-generation.mdwiki/sources/papers/bridgead-bridging-past-and-future-end-to-end-autonomous-driving-with-historical-prediction.mdwiki/sources/papers/carla-an-open-urban-driving-simulator.md
Batch 03 — active
wiki/sources/papers/carplanner-consistent-autoregressive-rl-planner-for-autonomous-driving.mdwiki/sources/papers/chain-of-thought-prompting-elicits-reasoning-in-large-language-models.mdwiki/sources/papers/chauffeurnet-learning-to-drive-by-imitating-the-best-and-synthesizing-the-worst.mdwiki/sources/papers/cosmos-world-foundation-model-platform-for-physical-ai.mdwiki/sources/papers/covla-comprehensive-vision-language-action-dataset-for-autonomous-driving.mdwiki/sources/papers/cs231n-convolutional-neural-networks-for-visual-recognition.mdwiki/sources/papers/deep-residual-learning-for-image-recognition.mdwiki/sources/papers/deep-speech-2.mdwiki/sources/papers/deepseek-r1-incentivizing-reasoning-capability-in-llms-via-reinforcement-learning.mdwiki/sources/papers/denoising-diffusion-probabilistic-models.md
Batch 04 — active
wiki/sources/papers/dexvla-vision-language-model-with-plug-in-diffusion-expert.mdwiki/sources/papers/diffusion-models-beat-gans-on-image-synthesis.mdwiki/sources/papers/diffusiondrive-truncated-diffusion-model-for-end-to-end-autonomous-driving.mdwiki/sources/papers/dima-distilling-multi-modal-large-language-models-for-autonomous-driving.mdwiki/sources/papers/direct-preference-optimization-your-language-model-is-secretly-a-reward-model.mdwiki/sources/papers/dita-scaling-diffusion-transformer-for-generalist-vla-policy.mdwiki/sources/papers/drive-as-you-speak-enabling-human-like-interaction-with-large-language-models-in-autonomous-vehicles.mdwiki/sources/papers/drive-occworld-driving-in-the-occupancy-world.mdwiki/sources/papers/driveadapter-breaking-the-coupling-barrier-of-perception-and-planning-in-end-to-end-autonomous-driving.mdwiki/sources/papers/drivedreamer-towards-real-world-driven-world-models.md
Batch 05 — active
wiki/sources/papers/drivegpt-scaling-autoregressive-behavior-models-for-driving.mdwiki/sources/papers/drivegpt4-interpretable-end-to-end-autonomous-driving-via-large-language-model.mdwiki/sources/papers/drivelm-driving-with-graph-visual-question-answering.mdwiki/sources/papers/drivemlm-aligning-multi-modal-llms-with-behavioral-planning-states.mdwiki/sources/papers/drivemoe-mixture-of-experts-for-vision-language-action-in-autonomous-driving.mdwiki/sources/papers/drivetransformer-unified-transformer-for-scalable-end-to-end-autonomous-driving.mdwiki/sources/papers/drivevlm-the-convergence-of-autonomous-driving-and-large-vision-language-models.mdwiki/sources/papers/driving-gaussian-composite-gaussian-splatting-for-surrounding-dynamic-driving-scenes.mdwiki/sources/papers/driving-with-llms-fusing-object-level-vector-modality-for-explainable-autonomous-driving.mdwiki/sources/papers/drivor-driving-on-registers.md
Batch 06 — active
wiki/sources/papers/ecot-embodied-chain-of-thought-reasoning-for-vision-language-action-models.mdwiki/sources/papers/embodiment-scaling-laws-in-robot-locomotion.mdwiki/sources/papers/emerging-properties-in-self-supervised-vision-transformers.mdwiki/sources/papers/emma-end-to-end-multimodal-model-for-autonomous-driving.mdwiki/sources/papers/end-to-end-driving-via-conditional-imitation-learning.mdwiki/sources/papers/end-to-end-learning-for-self-driving-cars.mdwiki/sources/papers/exploring-simple-siamese-representation-learning.mdwiki/sources/papers/fast-efficient-action-tokenization-for-vision-language-action-models.mdwiki/sources/papers/fb-bev-bev-representation-from-forward-backward-view-transformations.mdwiki/sources/papers/flamingo-a-visual-language-model-for-few-shot-learning.md
Batch 07 — active
wiki/sources/papers/flashocc-fast-and-memory-efficient-occupancy-prediction-via-channel-to-height-plugin.mdwiki/sources/papers/gaussianbev-3d-gaussian-representation-meets-perception-models-for-bev-segmentation.mdwiki/sources/papers/gaussianflowocc-sparse-occupancy-with-gaussian-splatting-and-temporal-flow.mdwiki/sources/papers/gaussianformer-2-probabilistic-gaussian-superposition-for-efficient-3d-occupancy-prediction.mdwiki/sources/papers/gaussianformer-scene-as-gaussians-for-vision-based-3d-semantic-occupancy-prediction.mdwiki/sources/papers/gaussianlss-toward-real-world-bev-perception-with-depth-uncertainty-via-gaussian-splatting.mdwiki/sources/papers/gaussianocc-fully-self-supervised-3d-occupancy-estimation-with-gaussian-splatting.mdwiki/sources/papers/gaussianworld-gaussian-world-model-for-streaming-3d-occupancy-prediction.mdwiki/sources/papers/gaussrender-learning-3d-occupancy-with-gaussian-rendering.mdwiki/sources/papers/gausstr-foundation-model-aligned-gaussian-transformer-for-self-supervised-3d.md
Batch 08 — active
wiki/sources/papers/gemini-25-pushing-the-frontier-with-advanced-reasoning-multimodality-long-context-and-next-generation-agentic-capabilities.mdwiki/sources/papers/gemini-robotics-bringing-ai-into-the-physical-world.mdwiki/sources/papers/gemma-3-technical-report.mdwiki/sources/papers/genad-generalized-predictive-model-for-autonomous-driving.mdwiki/sources/papers/genad-generative-end-to-end-autonomous-driving.mdwiki/sources/papers/goalflow-goal-driven-flow-matching-for-multimodal-trajectory-generation.mdwiki/sources/papers/gpipe-easy-scaling-with-micro-batch-pipeline-parallelism.mdwiki/sources/papers/gpt-4-technical-report.mdwiki/sources/papers/gpt-driver-learning-to-drive-with-gpt.mdwiki/sources/papers/gr-2-a-generative-video-language-action-model-with-web-scale-knowledge-for-robot-manipulation.md
Batch 09 — active
wiki/sources/papers/groot-n1-an-open-foundation-model-for-generalist-humanoid-robots.mdwiki/sources/papers/helix-a-vla-for-generalist-humanoid-control.mdwiki/sources/papers/hermes-a-unified-self-driving-world-model-for-simultaneous-3d-scene-understanding-and-generation.mdwiki/sources/papers/hierarchical-text-conditional-image-generation-with-clip-latents.mdwiki/sources/papers/high-resolution-image-synthesis-with-latent-diffusion-models.mdwiki/sources/papers/hpt-scaling-proprioceptive-visual-learning-with-heterogeneous-pre-trained-transformers.mdwiki/sources/papers/hydra-mdp-end-to-end-multimodal-planning-with-multi-target-hydra-distillation.mdwiki/sources/papers/identity-mappings-in-deep-residual-networks.mdwiki/sources/papers/imagenet-classification-with-deep-convolutional-neural-networks.mdwiki/sources/papers/is-ego-status-all-you-need-for-open-loop-end-to-end-autonomous-driving.md
Batch 10 — active
wiki/sources/papers/keeping-neural-networks-simple-by-minimizing-the-description-length-of-the-weights.mdwiki/sources/papers/knowledge-insulating-vision-language-action-models.mdwiki/sources/papers/kolmogorov-complexity-and-algorithmic-randomness.mdwiki/sources/papers/language-models-are-few-shot-learners.mdwiki/sources/papers/languagempc-large-language-models-as-decision-makers-for-autonomous-driving.mdwiki/sources/papers/law-enhancing-end-to-end-autonomous-driving-with-latent-world-model.mdwiki/sources/papers/learning-by-cheating.mdwiki/sources/papers/learning-lane-graph-representations-for-motion-forecasting.mdwiki/sources/papers/learning-transferable-visual-models-from-natural-language-supervision.mdwiki/sources/papers/lift-splat-shoot-encoding-images-from-arbitrary-camera-rigs-by-implicitly-unprojecting-to-3d.md
Batch 11 — active
wiki/sources/papers/llama-2-open-foundation-and-fine-tuned-chat-models.mdwiki/sources/papers/llarva-vision-action-instruction-tuning-enhances-robot-learning.mdwiki/sources/papers/llms-cant-plan-but-can-help-planning-in-llm-modulo-frameworks.mdwiki/sources/papers/lmdrive-closed-loop-end-to-end-driving-with-large-language-models.mdwiki/sources/papers/lora-low-rank-adaptation-of-large-language-models.mdwiki/sources/papers/machine-super-intelligence.mdwiki/sources/papers/mamba-linear-time-sequence-modeling-with-selective-state-spaces.mdwiki/sources/papers/mistral-7b.mdwiki/sources/papers/mixtral-of-experts.mdwiki/sources/papers/momad-momentum-aware-planning-in-end-to-end-autonomous-driving.md
Batch 12 — active
wiki/sources/papers/multi-scale-context-aggregation-by-dilated-convolutions.mdwiki/sources/papers/navsim-data-driven-non-reactive-autonomous-vehicle-simulation.mdwiki/sources/papers/navsim-v2-pseudo-simulation-for-autonomous-driving.mdwiki/sources/papers/neural-machine-translation-by-jointly-learning-to-align-and-translate.mdwiki/sources/papers/neural-message-passing-for-quantum-chemistry.mdwiki/sources/papers/neural-turing-machines.mdwiki/sources/papers/nuscenes-a-multimodal-dataset-for-autonomous-driving.mdwiki/sources/papers/occformer-dual-path-transformer-for-vision-based-3d-semantic-occupancy-prediction.mdwiki/sources/papers/occgen-generative-multi-modal-3d-occupancy-prediction-for-autonomous-driving.mdwiki/sources/papers/occmamba-semantic-occupancy-prediction-with-state-space-models.md
Batch 13 — active
wiki/sources/papers/occworld-learning-a-3d-occupancy-world-model-for-autonomous-driving.mdwiki/sources/papers/octo-an-open-source-generalist-robot-policy.mdwiki/sources/papers/on-the-opportunities-and-risks-of-foundation-models.mdwiki/sources/papers/opendrivevla-towards-end-to-end-autonomous-driving-with-large-vision-language-action-model.mdwiki/sources/papers/openvla-an-open-source-vision-language-action-model.mdwiki/sources/papers/openvla-oft-optimizing-speed-and-success-for-vla-fine-tuning.mdwiki/sources/papers/order-matters-sequence-to-sequence-for-sets.mdwiki/sources/papers/orion-holistic-end-to-end-autonomous-driving-by-vision-language-instructed-action-generation.mdwiki/sources/papers/palm-e-an-embodied-multimodal-language-model.mdwiki/sources/papers/palm-scaling-language-modeling-with-pathways.md
Batch 14 — active
wiki/sources/papers/para-drive-parallelized-architecture-for-real-time-autonomous-driving.mdwiki/sources/papers/pi0-a-vision-language-action-flow-model-for-general-robot-control.mdwiki/sources/papers/pi05-a-vision-language-action-model-with-open-world-generalization.mdwiki/sources/papers/pi06-a-vla-that-learns-from-experience.mdwiki/sources/papers/planning-oriented-autonomous-driving.mdwiki/sources/papers/pointer-networks.mdwiki/sources/papers/prefix-tuning-optimizing-continuous-prompts-for-generation.mdwiki/sources/papers/qlora-efficient-finetuning-of-quantized-language-models.mdwiki/sources/papers/quantifying-the-rise-and-fall-of-complexity-in-closed-systems-the-coffee-automaton.mdwiki/sources/papers/qwen3-technical-report.md
Batch 15 — active
wiki/sources/papers/racformer-query-based-radar-camera-fusion-for-3d-object-detection.mdwiki/sources/papers/rdt-1b-a-diffusion-foundation-model-for-bimanual-manipulation.mdwiki/sources/papers/react-synergizing-reasoning-and-acting-in-language-models.mdwiki/sources/papers/reason2drive-towards-interpretable-and-chain-based-reasoning-for-autonomous-driving.mdwiki/sources/papers/recurrent-neural-network-regularization.mdwiki/sources/papers/relational-recurrent-neural-networks.mdwiki/sources/papers/robocat-a-self-improving-generalist-agent-for-robotic-manipulation.mdwiki/sources/papers/roboflamingo-vision-language-foundation-models-as-effective-robot-imitators.mdwiki/sources/papers/robovlms-what-matters-in-building-vision-language-action-models.mdwiki/sources/papers/rt-1-robotics-transformer-for-real-world-control-at-scale.md
Batch 16 — active
wiki/sources/papers/rt-2-vision-language-action-models-transfer-web-knowledge-to-robotic-control.mdwiki/sources/papers/rt-h-action-hierarchies-using-language.mdwiki/sources/papers/s4-driver-scalable-self-supervised-driving-mllm-with-spatio-temporal-visual-representation.mdwiki/sources/papers/sam-2-segment-anything-in-images-and-videos.mdwiki/sources/papers/scaling-cross-embodied-learning-one-policy-for-manipulation-navigation-locomotion-and-aviation.mdwiki/sources/papers/scaling-instruction-finetuned-language-models.mdwiki/sources/papers/scaling-laws-for-neural-language-models.mdwiki/sources/papers/segment-anything.mdwiki/sources/papers/self-improving-embodied-foundation-models.mdwiki/sources/papers/selfocc-self-supervised-vision-based-3d-occupancy-prediction.md
Batch 17 — active
wiki/sources/papers/senna-bridging-large-vision-language-models-and-end-to-end-autonomous-driving.mdwiki/sources/papers/simlingo-vision-only-closed-loop-autonomous-driving-with-language-action-alignment.mdwiki/sources/papers/smolvla-a-vision-language-action-model-for-affordable-robotics.mdwiki/sources/papers/solve-synergy-of-language-vision-and-end-to-end-networks-for-autonomous-driving.mdwiki/sources/papers/sparsedrive-end-to-end-autonomous-driving-via-sparse-scene-representation.mdwiki/sources/papers/sparsedriveV2-end-to-end-autonomous-driving-via-sparse-scene-representation.mdwiki/sources/papers/sparseocc-fully-sparse-3d-occupancy-prediction.mdwiki/sources/papers/sparseocc-rethinking-sparse-latent-representation.mdwiki/sources/papers/spatialvla-exploring-spatial-representations-for-vla-models.mdwiki/sources/papers/surroundocc-multi-camera-3d-occupancy-prediction-for-autonomous-driving.md
Batch 18 — active
wiki/sources/papers/swin-transformer-hierarchical-vision-transformer-using-shifted-windows.mdwiki/sources/papers/talk2car-taking-control-of-your-self-driving-car.mdwiki/sources/papers/talk2drive-towards-personalized-autonomous-driving-with-large-language-models.mdwiki/sources/papers/textual-explanations-for-self-driving-vehicles.mdwiki/sources/papers/the-first-law-of-complexodynamics.mdwiki/sources/papers/the-unreasonable-effectiveness-of-recurrent-neural-networks.mdwiki/sources/papers/think-twice-before-driving-towards-scalable-decoders-for-end-to-end-autonomous-driving.mdwiki/sources/papers/toolformer-language-models-can-teach-themselves-to-use-tools.mdwiki/sources/papers/training-compute-optimal-large-language-models.mdwiki/sources/papers/training-language-models-to-follow-instructions-with-human-feedback.md
Batch 19 — active
wiki/sources/papers/transfuser-imitation-with-transformer-based-sensor-fusion-for-autonomous-driving.mdwiki/sources/papers/tree-of-thoughts-deliberate-problem-solving-with-large-language-models.mdwiki/sources/papers/understanding-lstm-networks.mdwiki/sources/papers/uniact-universal-actions-for-enhanced-embodied-foundation-models.mdwiki/sources/papers/unisim-learning-interactive-real-world-simulators.mdwiki/sources/papers/unleashing-large-scale-video-generative-pre-training-for-visual-robot-manipulation.mdwiki/sources/papers/vad-vectorized-scene-representation-for-efficient-autonomous-driving.mdwiki/sources/papers/vadv2-end-to-end-vectorized-autonomous-driving-via-probabilistic-planning.mdwiki/sources/papers/variational-lossy-autoencoder.mdwiki/sources/papers/vectornet-encoding-hd-maps-and-agent-dynamics-from-vectorized-representation.md
Batch 20 — active
wiki/sources/papers/video-prediction-policy-a-generalist-robot-policy-with-predictive-visual-representations.mdwiki/sources/papers/vista-a-generalizable-driving-world-model-with-high-fidelity-and-versatile-controllability.mdwiki/sources/papers/visual-instruction-tuning.mdwiki/sources/papers/vlp-vision-language-planning-for-autonomous-driving.mdwiki/sources/papers/voxposer-composable-3d-value-maps-for-robotic-manipulation-with-language-models.mdwiki/sources/papers/wote-end-to-end-driving-with-online-trajectory-evaluation-via-bev-world-model.mdwiki/sources/papers/yolov10-real-time-end-to-end-object-detection.md