Explainability
Definition
The degree to which a human can understand how an AI model arrives at its predictions or decisions, also referred to as interpretability or transparency.
Explainability is critical for trust, accountability, and debugging in AI systems. Simple models like decision trees are inherently interpretable, but deep neural networks with billions of parameters are often "black boxes." Explainability methods include feature importance (which inputs mattered most), attention visualization (where the model focused), SHAP values (game-theoretic explanations), LIME (local interpretable model-agnostic explanations), and mechanistic interpretability (reverse-engineering how neural networks represent concepts internally). Regulations like GDPR's "right to explanation" and the EU AI Act increasingly require explainability for high-risk AI applications in areas like credit scoring, healthcare, and criminal justice. The tension between model performance (larger, more complex models are more accurate) and explainability remains an active research challenge.
Related Terms
AI Alignment
The research field focused on ensuring AI systems behave in accordance with human values, intentions...
Bias
Systematic errors in AI systems that lead to unfair or discriminatory outcomes, often reflecting bia...
AI Ethics
The branch of applied ethics examining the moral implications and societal impacts of artificial int...