Attention Mechanism
Definition
A technique that allows neural networks to focus on the most relevant parts of the input when producing each element of the output, assigning different weights to different input positions.
Attention mechanisms revolutionized sequence modeling by allowing models to dynamically focus on relevant information regardless of distance in the input. Originally introduced for machine translation (Bahdanau et al., 2014), attention computes a weighted combination of all input representations. Self-attention, used in transformers, allows each position in a sequence to attend to every other position. The computation involves queries, keys, and values: the query from one position is compared with keys from all positions to produce attention weights, which are used to create a weighted sum of values. This mechanism enables transformers to capture long-range dependencies efficiently and is the core innovation behind modern language models.
Related Terms
Large Language Model
A neural network with billions of parameters trained on massive text datasets, capable of understand...
Multi-Head Attention
An extension of the attention mechanism that runs multiple attention operations in parallel, allowin...
Transformer
A neural network architecture introduced in 2017 that uses self-attention mechanisms to process sequ...