Back to GlossaryArchitecture

Transformer

Definition

A neural network architecture introduced in 2017 that uses self-attention mechanisms to process sequential data in parallel, becoming the dominant architecture for language and increasingly for vision and other modalities.

The Transformer was introduced in the landmark paper "Attention Is All You Need" by Vaswani et al. at Google in 2017. Unlike RNNs that process sequences one element at a time, transformers use self-attention to relate all positions in a sequence simultaneously, enabling massive parallelization during training. The architecture consists of encoder and decoder blocks, each containing multi-head attention layers and feed-forward networks. Transformers power virtually all modern LLMs (GPT, Claude, Gemini, Llama) and have been adapted for vision (ViT), audio, and multimodal tasks. The architecture scales remarkably well with data and compute, following scaling laws that predict performance improvements.

Companies in Architecture

View Architecture companies →