Tags

239 tags across the wiki

Pages tagged scaling

DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

📄 **[Read on arXiv](https://arxiv.org/abs/2505.16278)** DriveMoE introduces a dual-level Mixture-of-Experts (MoE) architecture to driving Vision-Language-Action models. The key innovation is applying expert specializati…

Gemma 3 Technical Report

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2503.19786)** Gemma 3 is a family of open-weight language models from Google DeepMind spanning 1B, 4B, 12B, and 27B parameters. It represents a significant leap over Gemma 2 by…

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

source-summary

📄 **[Read on arXiv](https://arxiv.org/abs/1811.06965)** GPipe introduces micro-batch pipeline parallelism as a practical method for training neural networks too large to fit on a single accelerator. The core idea is to…

GPT-4 Technical Report

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2303.08774)** GPT-4 is a large-scale multimodal Transformer model developed by OpenAI that accepts both image and text inputs and produces text outputs. It represents a major st…

Open Questions: Foundation Models & Cross-Embodiment

query

Stream-specific open questions for foundation models, scaling, and cross-embodiment transfer. See wiki/queries/open-questions for the full tree across all streams. 1. **Compute-optimal scaling for embodied AI:** Kaplan…

Palm Scaling Language Modeling With Pathways

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2204.02311)** PaLM (Pathways Language Model) is a 540-billion parameter dense decoder-only Transformer language model trained by Google using the Pathways distributed training s…

Scaling Instruction-Finetuned Language Models (Flan-PaLM / Flan-T5)

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2210.11416)** Large language models exhibit strong few-shot capabilities, but their ability to follow instructions and generalize to unseen tasks remains limited without targete…

Scaling Laws for Neural Language Models

source-summary

📄 **[Read on arXiv](https://arxiv.org/abs/2001.08361)** This is the canonical early scaling-law paper for language models, authored by Kaplan et al. at OpenAI. It demonstrated that neural language model cross-entropy lo…

Training Compute-Optimal Large Language Models

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2203.15556)** The Chinchilla paper (Hoffmann et al., DeepMind, 2022) is one of the most consequential papers in the LLM era because it corrected the field's scaling intuition. K…