Tags

239 tags across the wiki

Pages tagged image-classification

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

📄 **[Read on arXiv](https://arxiv.org/abs/2010.11929)** Dosovitskiy et al., ICLR, 2021. - [Paper](https://arxiv.org/abs/2010.11929) The Vision Transformer (ViT) demonstrates that a pure Transformer applied to sequences…

Deep Residual Learning for Image Recognition

source-summary

📄 **[Read on arXiv](https://arxiv.org/abs/1512.03385)** He, Zhang, Ren, Sun (Microsoft Research), CVPR, 2016. - [Paper](https://arxiv.org/abs/1512.03385) Deep Residual Learning introduces skip connections that add the i…

Emerging Properties in Self-Supervised Vision Transformers (DINO)

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2104.14294)** DINO (self-DIstillation with NO labels) demonstrates that self-supervised learning with Vision Transformers produces features with remarkable emergent properties t…

ImageNet Classification with Deep Convolutional Neural Networks

source-summary

📄 **[Read Paper](https://papers.nips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html)** AlexNet, as this paper's architecture came to be known, is a deep convolutional neural network trained on GPUs th…

Learning Transferable Visual Models From Natural Language Supervision

source-summary

📄 **[Read on arXiv](https://arxiv.org/abs/2103.00020)** CLIP (Contrastive Language-Image Pre-training) learns visual representations from natural language supervision by training an image encoder and a text encoder join…

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

paper

📄 **[Read on arXiv](https://arxiv.org/abs/2103.14030)** Vision Transformers (ViT) demonstrated that pure transformer architectures could match or exceed CNNs on image classification, but ViT's design introduced two fund…