Back to GlossaryArchitecture

Positional Encoding

Definition

A technique that injects information about the position of each token in a sequence into the model, since transformers have no inherent sense of order.

Unlike RNNs that process tokens sequentially and naturally know position, transformers process all tokens in parallel and are permutation-invariant without positional information. Positional encodings solve this by adding position-dependent signals to the token embeddings. The original transformer used fixed sinusoidal functions of different frequencies, but modern models often use learned positional embeddings. Rotary Position Embeddings (RoPE) have become popular in recent models like Llama, encoding relative positions through rotation matrices. ALiBi (Attention with Linear Biases) offers another approach by directly biasing attention scores based on distance. The choice of positional encoding affects a model's ability to generalize to sequence lengths beyond what it saw during training.

Companies in Architecture

View Architecture companies →