Back to GlossaryArchitecture

LSTM

Definition

Long Short-Term Memory — a specialized recurrent neural network architecture that uses gating mechanisms to effectively learn long-term dependencies in sequential data.

LSTM networks were introduced by Hochreiter and Schmidhuber in 1997 to address the vanishing gradient problem in standard RNNs. The key innovation is a cell state that runs through the entire sequence, regulated by three gates: the forget gate (what to discard), the input gate (what to store), and the output gate (what to pass to the next step). These gates learn to selectively remember or forget information over long sequences. LSTMs dominated sequence modeling for nearly two decades, powering Google Translate, Apple's Siri, and many speech recognition systems. While transformers have largely superseded LSTMs for language tasks, LSTMs remain useful for time series forecasting, anomaly detection, and resource-constrained environments where transformer compute costs are prohibitive.

Companies in Architecture

View Architecture companies →