Safety

Alignment Tax

Definition

The performance cost incurred when making AI models safer and more aligned with human values. Safety training through RLHF or constitutional methods can reduce a model raw capabilities on certain tasks. Minimizing alignment tax while maintaining safety is a key research challenge for AI labs building commercial products.

Related Terms

No related terms linked yet.

Explore all terms →

Explore companies in this space

Safety Companies

View Safety companies