AI Glossary

LoRA (Low-Rank Adaptation)

LoRA is a parameter-efficient way to fine-tune a model: instead of updating all its weights, you train small add-on matrices and leave the original model frozen. You get most of the benefit of fine-tuning at a fraction of the compute and storage.

Also known as: low-rank adaptation, LoRA

· Chain of Thought

AI Engineering

Full fine-tuning updates every weight in a model, which is expensive and produces a whole new copy per task. LoRA takes a shortcut: it freezes the original model and trains small low-rank “adapter” matrices that adjust its behavior. Because the adapters are tiny, training is far cheaper and faster, and you can store many task-specific adapters instead of many full models.

That efficiency is what makes fine-tuning practical for smaller teams and for serving many specialized variants. Adapters can be swapped or combined, and quantized variants (QLoRA) push the cost lower still. It’s the default starting point when you want to specialize a model and don’t need — or can’t afford — a full fine-tune.