Tags

239 tags across the wiki

Pages tagged mixture-of-experts

DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

📄 **[Read on arXiv](https://arxiv.org/abs/2505.16278)** DriveMoE introduces a dual-level Mixture-of-Experts (MoE) architecture to driving Vision-Language-Action models. The key innovation is applying expert specializati…

Gemini 25 Pushing The Frontier With Advanced Reasoning Multimodality Long Context And Next Generation Agentic Capabilities

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2507.06261)** Gemini 2.5 is Google's frontier multimodal model family, built on a sparse Mixture-of-Experts (MoE) Transformer architecture. It represents a major advance in reas…

Mixtral Of Experts

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2401.04088)** Mixtral 8x7B, developed by Mistral AI, introduces a Sparse Mixture-of-Experts (SMoE) language model that achieves the quality of much larger dense models at a frac…

Qwen3 Technical Report

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2505.09388)** Qwen3, developed by the Qwen team at Alibaba, represents a major step forward in open-weight language models by offering a comprehensive family spanning both dense…