Tags

Deepseek R1 Incentivizing Reasoning Capability In Llms Via Reinforcement Learning

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2501.12948)** DeepSeek-R1 demonstrates that sophisticated reasoning capabilities -- including self-verification, reflection, and extended chain-of-thought -- can emerge in large…

Direct Preference Optimization Your Language Model Is Secretly A Reward Model

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2305.18290)** Aligning large language models (LLMs) with human preferences has traditionally required reinforcement learning from human feedback (RLHF), a complex multi-stage pi…

Gemini 25 Pushing The Frontier With Advanced Reasoning Multimodality Long Context And Next Generation Agentic Capabilities

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2507.06261)** Gemini 2.5 is Google's frontier multimodal model family, built on a sparse Mixture-of-Experts (MoE) Transformer architecture. It represents a major advance in reas…

Gemma 3 Technical Report

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2503.19786)** Gemma 3 is a family of open-weight language models from Google DeepMind spanning 1B, 4B, 12B, and 27B parameters. It represents a significant leap over Gemma 2 by…

GPT-4 Technical Report

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2303.08774)** GPT-4 is a large-scale multimodal Transformer model developed by OpenAI that accepts both image and text inputs and produces text outputs. It represents a major st…

Llama 2: Open Foundation and Fine-Tuned Chat Models

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2307.09288)** Llama 2 (Touvron et al., Meta AI, 2023) addresses the gap between open-source pretrained language models and polished, closed-source "product" LLMs like ChatGPT. W…

Lora Low Rank Adaptation Of Large Language Models

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2106.09685)** As pretrained language models grow to hundreds of billions of parameters, full fine-tuning -- updating every weight for each downstream task -- becomes prohibitive…

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2312.00752)** Transformers have dominated sequence modeling since 2017, but their quadratic-complexity self-attention mechanism creates a fundamental bottleneck for long sequenc…

Mistral 7B

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2310.06825)** Mistral 7B (Jiang et al., Mistral AI, 2023) challenged the prevailing assumption that larger language models are always better by demonstrating that a carefully de…

Mixtral Of Experts

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2401.04088)** Mixtral 8x7B, developed by Mistral AI, introduces a Sparse Mixture-of-Experts (SMoE) language model that achieves the quality of much larger dense models at a frac…

Palm Scaling Language Modeling With Pathways

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2204.02311)** PaLM (Pathways Language Model) is a 540-billion parameter dense decoder-only Transformer language model trained by Google using the Pathways distributed training s…

Prefix Tuning Optimizing Continuous Prompts For Generation

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2101.00190)** Large pretrained language models like GPT-2 and BART achieve strong performance on generation tasks, but full fine-tuning requires storing a separate copy of all m…

Qlora Efficient Finetuning Of Quantized Language Models

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2305.14314)** Full fine-tuning of large language models requires enormous GPU memory -- a 65B-parameter model in 16-bit precision needs over 780 GB of GPU memory for parameters…

Qwen3 Technical Report

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2505.09388)** Qwen3, developed by the Qwen team at Alibaba, represents a major step forward in open-weight language models by offering a comprehensive family spanning both dense…

ReAct: Synergizing Reasoning and Acting in Language Models

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2210.03629)** Large language models had demonstrated two powerful capabilities in isolation: chain-of-thought reasoning for multi-step problem solving, and action generation for…

Recurrent Neural Network Regularization

source-summary

📄 **[Read on arXiv](https://arxiv.org/abs/1409.2329)** This paper discovered that dropout can be successfully applied to LSTMs if it is restricted to non-recurrent (feedforward) connections only, preserving the LSTM's a…

RT-H: Action Hierarchies Using Language

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2403.01823)** RT-H (Robot Transformer with Action Hierarchies) introduces a hierarchical approach to multi-task robot control that uses natural language as an intermediate repre…

Scaling Instruction-Finetuned Language Models (Flan-PaLM / Flan-T5)

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2210.11416)** Large language models exhibit strong few-shot capabilities, but their ability to follow instructions and generalize to unseen tasks remains limited without targete…

The Unreasonable Effectiveness of Recurrent Neural Networks

source-summary

📄 **[Read Blog Post](https://karpathy.github.io/2015/05/21/rnn-effectiveness/)** Andrej Karpathy's 2015 blog post offers a vivid qualitative demonstration that character-level recurrent neural networks with LSTM cells c…

Toolformer: Language Models Can Teach Themselves to Use Tools

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2302.04761)** Large language models exhibit remarkable in-context learning abilities but paradoxically struggle with tasks that are trivial for simple external tools -- arithmet…

Training Compute-Optimal Large Language Models

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2203.15556)** The Chinchilla paper (Hoffmann et al., DeepMind, 2022) is one of the most consequential papers in the LLM era because it corrected the field's scaling intuition. K…

Training Language Models to Follow Instructions with Human Feedback

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2203.02155)** Large language models like GPT-3 are trained on vast internet corpora to predict the next token, but this objective is fundamentally misaligned with the goal of fo…

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2305.10601)** Language models are typically used in a left-to-right token-generation mode, which limits their ability to explore alternative reasoning paths or backtrack from mi…

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2307.05973)** VoxPoser addresses a fundamental bottleneck in robot manipulation: translating open-ended natural language instructions into precise physical actions without requi…

Pages tagged language-modeling