AI Glossary

AI Safety

AI safety is the work of keeping AI systems from causing harm — making them behave as intended, refuse dangerous requests, and fail gracefully. In practice for builders it means alignment, guardrails, evaluation for harmful behavior, and human oversight on consequential actions.

· Chain of Thought

Enterprise AI

“AI safety” spans a wide range — from near-term, concrete concerns (a model giving harmful instructions, an agent taking a destructive action, biased or unsafe outputs in production) to long-horizon research questions about advanced systems. For most teams shipping today, it’s the near-term, practical end that matters.

In that practical sense, safety is the overlap of several things this glossary already covers: alignment (training the model toward intended behavior, e.g. via RLHF), guardrails (enforcing limits outside the model), evaluation and red teaming (testing for harmful behavior before and after launch), and human-in-the-loop oversight on high-stakes actions. It’s distinct from security — security is about defending against attackers, safety is about the system not causing harm even with well-meaning use — but in practice the two reinforce each other and share much of the same tooling.