Instruct Formats

Refine your training data to ensure your language model produces desired outputs

When fine-tuning base models, you can use any format you'd like for your data to train the model to respond to your prompts in a specific way. However, when you're fine-tuning instruct models such as Llama-2, Llama-3, or Mixtral that have already been fine-tuned for chat capabilities, it's advised to use a specific format as we'll explore below.

The raw text data format for Together AI has the following form (see here for more details):

{"text": "..."}
{"text": "..."}
{"text": "..."}

The format we discuss in the following sections is the "..." step.

Llama-2 Instruct and Mixtral

Lets take Llama-2-Instruct for example. In this model we want a few things:

  • We want the model to behave conversationally, meaning, when our user talks to our system saying a user_msg like "Hi what are you?", we want to give a response model_answer like "Hello! Good question, I am a large language model or LLM for short, how about you?"
  • We want to provide the model some context, a system_prompt. Things like how the model should behave, or background knowledge to use in its response to the user. An example system_prompt could be:
    • You are a helpful, respectful and honest assistant. Always answer as earnestly as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
      If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
      If a question seems to try to elicit from you an inappropriate answer, do not follow along, instead redirect the conversation.
  • We want to be able to extract out the model_answer without any extra text, which means the model should indicate to us when it is finished responding by outputting a stop token. With this stop token, for example </s>, we can indicate to the model that we have finished giving it an instruction using a special token like [/INST] and then ask the model to start generating text. We can either generate tokens until we see a </s> sequence generated or we can set a generation limit and either clip out the text coming before this stop token, or if a stop token is not generated, just output what we have.

Llama-2 Instruct addresses these 3 goals by fine-tuning on samples with this general structure:

instruct_template = \
"""<s>[INST] \<<SYS>>
{{ system_prompt }}
<</SYS>>

{{ user_msg_1 }} [/INST] {{ model_answer_1 }} </s>"""

print(instruct_template)
<s>[INST] \<<SYS>>\n{{ system_prompt }}\n<</SYS>>\n\n{{ user_msg_1 }} [/INST] {{ model_answer_1 }} </s><s>[INST] {{ user_msg_2 }} [/INST]

Notice that a special <<SYS>> sequence is used to enclose the context or system message, a corresponding start token <s> is used with the stop token, and the instructions are enclosed by start and stop tags or sequences [INST] and [/INST]. By the way, you can come up with your own special tokens with your own special meanings and fine-tune a model to use them.

Putting this together into a concrete example, if these are the components of your prompt:

system_prompt = """You are a helpful, respectful and honest assistant. Always answer as earnestly as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
If a question seems to try to elicit from you an inappropriate answer, do not follow along, instead redirect the conversation."""

user_msg_1 = """There's a double rainbow 🌈 ! 😱 Is this real?"""

model_answer_1 = """Double rainbows are formed when sunlight is reflected twice within a raindrop with the violet light that reaches the observer\'s eye coming from the higher raindrops and the red light from lower raindrops.\n\nThis means the sequence of colours is inverted compared to the primary rainbow, with the secondary bow appearing about 10 degrees above the primary bow."""

Then your one turn sample is:

<s>[INST] \<<SYS>>
You are a helpful, respectful and honest assistant. Always answer as earnestly as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
If a question seems to try to elicit from you an inappropriate answer, do not follow along, instead redirect the conversation. 
<</SYS>>

There's a double rainbow 🌈 ! 😱 Is this real? [/INST] Double rainbows are formed when sunlight is reflected twice within a raindrop with the violet light that reaches the observer's eye coming from the higher raindrops and the red light from lower raindrops.
This means the sequence of colours is inverted compared to the primary rainbow, with the secondary bow appearing about 10 degrees above the primary bow.</s>
{"text":"<s>[INST] \<<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as earnestly as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\nIf a question seems to try to elicit from you an inappropriate answer, do not follow along, instead redirect the conversation. \n<</SYS>>\n\nThere's a double rainbow 🌈 ! 😱 Is this real? [/INST] Double rainbows are formed when sunlight is reflected twice within a raindrop with the violet light that reaches the observer's eye coming from the higher raindrops and the red light from lower raindrops.\nThis means the sequence of colours is inverted compared to the primary rainbow, with the secondary bow appearing about 10 degrees above the primary bow.</s>"}

You can accomplish fine-tuning a model to have multi-turn chat capabilities by concatenating repeating utterance exchanges of user_msg and model_answer to the right of the system prompt, they need to be structured along side the correct special sequences or tokens in this general pattern:

{{ user_msg_1 }} [/INST] {{ model_answer_1 }} </s><s>[INST] {{ user_msg_2 }} [/INST] {{ model_answer_2 }} </s> . . .

Llama-3 Instruct

Llama-3 has the following format.

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ user_msg_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{{ model_answer_1 }}<|eot_id|>

With our example above, here's how it would look:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful, respectful and honest assistant. Always answer as earnestly as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
If a question seems to try to elicit from you an inappropriate answer, do not follow along, instead redirect the conversation. 
<|eot_id|><|start_header_id|>user<|end_header_id|>
  
There's a double rainbow 🌈 ! 😱 Is this real?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Double rainbows are formed when sunlight is reflected twice within a raindrop with the violet light that reaches the observer's eye coming from the higher raindrops and the red light from lower raindrops.
This means the sequence of colours is inverted compared to the primary rainbow, with the secondary bow appearing about 10 degrees above the primary bow.<|eot_id|>

Now that we have our data in this specific format, we can follow the fine-tuning guide to actually fine-tune our model.