AI, decoded

Are small language models better than large ones for production?

Often, yes — for a specific, well-defined task. A small model that's been tuned for your job can match a frontier model's quality on that job while costing far less, running faster, and being possible to host yourself. The frontier models earn their keep on broad, open-ended reasoning. The mistake is defaulting to the biggest model for everything; the production-smart move is using the smallest model that still passes your evals for each task.

· Chain of Thought

AI EngineeringAI Infrastructure

Big models are generalists; production tasks are specific

A frontier model is built to do almost anything reasonably well. Most production features don’t need that — they do one task, over and over. For a narrow, repeated job, a smaller model tuned on that task can hit the same quality as a giant general one, because it only has to be good at the thing you actually run.

What small buys you

Three things the big general model can’t match. Cost: a smaller model can be an order of magnitude cheaper per call, which is why Intercom’s switch off a frontier model to a tuned smaller one cut spend so sharply. Latency: fewer parameters means faster responses, which matters for anything interactive. Control: small enough models can run on your own infrastructure, keeping data in-house and removing a per-token dependency on an outside provider.

When to stay with a large model

Small isn’t always the answer. Open-ended reasoning, tasks with huge or unpredictable input variety, and work where you can’t gather enough examples to tune still favor a frontier model. New architectures, like the efficiency-focused designs Liquid AI is exploring, are widening what small models can do — but the rule holds: match the model to the task.

The decision rule

Don’t default to the biggest model. For each task, find the smallest model that still passes your evals. That’s usually where cost, speed, and quality balance out in production.

From the conversation

This explainer is drawn from these episodes — each carries its full transcript.