What is AI Alignment?

AI Alignment — AI Glossary — Chain of Thought

AI Alignment

AI alignment is the work of making an AI system pursue what its designers and users actually intend — including the goals they didn't think to spell out — rather than optimizing a literal objective in harmful or unintended ways. It spans training techniques, evaluation, and oversight.

Also known as: alignment, aligned AI

Jun 27, 2026 · Chain of Thought

A model optimizes whatever objective it’s given — and the gap between what you said and what you meant is where things go wrong. AI alignment is the effort to close that gap: getting a system to act on intended goals and values, including the unstated ones, instead of gaming a literal metric or producing fluent, confident, wrong output. Techniques like RLHF are alignment in practice — shaping behavior toward what people actually want.

For builders it’s not just a frontier-lab concern. Every shipped system has a smaller version of the problem: an agent that follows instructions too literally, a model that’s helpful at the expense of being honest. That’s why alignment shades into the day-to-day disciplines the show covers — guardrails, evaluation, and keeping a human in the loop — which are how intent gets enforced in production rather than assumed.

AI Alignment

From the conversation

Related terms