Back to GlossarySafety

Explainability

Definition

The degree to which a human can understand how an AI model arrives at its predictions or decisions, also referred to as interpretability or transparency.

Explainability is critical for trust, accountability, and debugging in AI systems. Simple models like decision trees are inherently interpretable, but deep neural networks with billions of parameters are often "black boxes." Explainability methods include feature importance (which inputs mattered most), attention visualization (where the model focused), SHAP values (game-theoretic explanations), LIME (local interpretable model-agnostic explanations), and mechanistic interpretability (reverse-engineering how neural networks represent concepts internally). Regulations like GDPR's "right to explanation" and the EU AI Act increasingly require explainability for high-risk AI applications in areas like credit scoring, healthcare, and criminal justice. The tension between model performance (larger, more complex models are more accurate) and explainability remains an active research challenge.

Companies in Safety

View Safety companies →