Safety
Alignment Tax
Definition
“
The performance cost incurred when making AI models safer and more aligned with human values. Safety training through RLHF or constitutional methods can reduce a model raw capabilities on certain tasks. Minimizing alignment tax while maintaining safety is a key research challenge for AI labs building commercial products.
”
Related Terms
No related terms linked yet.
Explore all terms →