The utility of instruction tuning, like that of most fine-tuning techniques, lies in the fact that pre-trained LLMs are not optimized for conversations or instruction following. In a literal sense, LLMs do not answer a prompt: they only append text to it. Instruction tuning helps make that appended text more useful.
The pre-training process for autoregressive language models—LLMs used for generating text, like Meta’s Llama 2, OpenAI’s GPT, Google’s Gemini or IBM’s Granite—optimizes these LLMs to simply predict the next word(s) in a given sequence until it’s complete.
LLMs are pre-trained using self-supervised learning on a massive corpus of written content. In pre-training, autoregressive models are provided the beginning of a text sample and repeatedly tasked with predicting the next word in the sequence until the end of the excerpt. For each prediction, the actual next word of the original sample sentence serves as “ground truth.” Through optimization algorithms like gradient descent that iteratively adjust model parameters—the varying weights and biases applied to the mathematical operations occurring at each node in a neural network—in a way that brings the model’s predictions closer to the original text, the model “learns” the linguistic patterns in its training data (and, by extension, the “knowledge” conveyed in those linguistic patterns).
Though this pre-training process imparts an impressive ability to generate linguistically coherent text, it doesn’t necessary align model performance with the practical needs of human users. Without fine-tuning, a base model might respond to a prompt of “teach me how to bake bread” with “in a home oven.” That’s a grammatically sound way to complete the sentence, but not what the user wanted.
Nevertheless, pre-training an LLM for any specific purpose (like following instructions) is impractical. The “large” in “large language models” refers to the fact that these models often have billions of parameters: training these huge models from scratch entails a tremendous amount of energy, time, computational resources and training data. Conversely, fine-tuning an already-trained LLM requires far less data and, especially when using parameter efficient fine-tuning (PEFT) methods like partial fine-tuning or low rank adaptation (LoRA), only a fraction of the computational demands.
Though fine-tuning can be achieved through nearly any machine learning paradigm, including reinforcement learning, semi-supervised learning or additional self-supervised learning, instruction tuning entails supervised learning on labeled (input, output) pairs. What distinguishes instruction tuning from other forms of supervised fine-tuning (SFT) is that the input samples in an instruction dataset consist entirely of tasks that resemble requests users might make in their prompts; the outputs demonstrate desirable responses to those requests. In adjusting model weights to make the LLM’s outputs resemble the examples in the instruction dataset, the LLM “learns” to respond to a prompt like “teach me how to bake bread” by appending text that contains actual advice for baking bread.
Instruction tuning thus helps to bridge the gap between the model’s fundamental objective—next-word prediction—and the user’s goal of having the model follow instructions and perform specific tasks. This makes model behavior more useful and predictable.