Positional Encoding
Definition
A technique that injects information about the position of each token in a sequence into the model, since transformers have no inherent sense of order.
Unlike RNNs that process tokens sequentially and naturally know position, transformers process all tokens in parallel and are permutation-invariant without positional information. Positional encodings solve this by adding position-dependent signals to the token embeddings. The original transformer used fixed sinusoidal functions of different frequencies, but modern models often use learned positional embeddings. Rotary Position Embeddings (RoPE) have become popular in recent models like Llama, encoding relative positions through rotation matrices. ALiBi (Attention with Linear Biases) offers another approach by directly biasing attention scores based on distance. The choice of positional encoding affects a model's ability to generalize to sequence lengths beyond what it saw during training.
Related Terms
Multi-Head Attention
An extension of the attention mechanism that runs multiple attention operations in parallel, allowin...
Transformer
A neural network architecture introduced in 2017 that uses self-attention mechanisms to process sequ...
Attention Mechanism
A technique that allows neural networks to focus on the most relevant parts of the input when produc...
Embedding
A learned dense vector representation that maps discrete data like words, tokens, or items into cont...