Infrastructure

Flash Attention

Definition

An optimized attention algorithm that dramatically reduces the memory requirements and speeds up transformer model training and inference. Flash Attention achieves this by tiling the attention computation and avoiding materializing the full attention matrix in GPU memory. It is now standard in most LLM training pipelines.

Related Terms

No related terms linked yet.

Explore all terms →

Explore companies in this space

Infrastructure Companies

View Infrastructure companies