AI, decoded

What are AI agent guardrails, and how do you set them?

Guardrails are the limits that keep an autonomous agent inside safe, intended behavior — checks on what it's allowed to do, what it can access, and what it's about to output. They run at three points: on the input (block malicious or out-of-scope requests), on the actions (require approval for high-stakes tool calls, scope permissions), and on the output (catch unsafe, off-policy, or ungrounded responses before they reach the user). You set them by deciding in advance what the agent must never do, then enforcing those rules in code, not in the prompt alone.

· Chain of Thought

AI AgentsMulti-Agent Systems

Why an autonomous agent needs limits

An agent decides its own steps and can call real tools — send email, move money, change records. That autonomy is the point and the danger. Guardrails are the boundary that lets you give an agent useful power without giving it unlimited power, so a wrong decision is caught instead of executed.

The three places guardrails run

On the input, you filter what the agent is asked to do — block prompt injection, out-of-scope requests, and unsafe instructions before they enter the loop. On the actions, you constrain what it can actually do: scope its permissions to the minimum it needs, and require human approval for high-stakes or irreversible calls. On the output, you check the response before it ships — for safety, policy, and whether it’s actually grounded in real data rather than hallucinated.

Enforce in code, not just the prompt

Telling an agent “don’t do X” in its instructions is a request, not a control — a clever input can talk it out of its own rules. Real guardrails sit outside the model: permission systems, validation checks, and approval gates that hold even when the model is wrong or manipulated. Start by writing down what the agent must never do, then enforce each of those as a hard limit the model can’t override.

From the conversation

This explainer is drawn from these episodes — each carries its full transcript.