Task specific sequences

Refine your training data to ensure your language model produces desired and consistent outputs for seamless integration with your software system

Llama-2-Instruct Example

Recall from the docs.together.ai/docs/fine-tuning page that the general data format for together API has the form:

{"text": "..."}
{"text": "..."}
{"text": "..."}

Where in pre-training the "..." is a long string that has been collected from existing sources like:

{"text": "Sequences\nThis is a tutorial on how to curate you training or fine-tuning data in such a way that the inputs to your language model elicit the outputs you want and those outputs are reliably usable by the other parts of your software system\n\n# Llama-2-Instruct Example\n\nRecall from the docs.together.ai/docs/fine-tuning page"}

When you are fine-tuning, you have the opportunity to train additivity to models that already understand these general patterns and teach the model a more structured way to interpret and respond to your prompts. You can do this by being more intentional with the way your sample "..." is written. Lets take Llama-2-Instruct for example. In this model we want a few things:

  • We want the model to behave conversationally, meaning, when our user talks to our system saying a user_msg like "Hi what are you?", we want to give a response model_answer like "Hello! Good question, I am a large language model or LLM for short, how about you?"
  • We want to provide the model some context, a system_prompt. Things like how the model should behave, or background knowledge to use in its response to the user. An example system_prompt could be:
You are a helpful, respectful and honest assistant. Always answer as earnestly as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
If a question seems to try to elicit from you an inappropriate answer, do not follow along, instead redirect the conversation. 
  • We want to be able to extract out the model_answer without any extra text, which means the model should indicate to us when it is finished responding by outputting a stop token. With this stop token, for example </s>, we can indicate to the model that we have finished giving it an instruction using a special token like [/INST] and then ask the model to start generating text. We can either generate tokens until we see a </s> sequence generated or we can set a generation limit and either clip out the text coming before this stop token, or if a stop token is not generated, just output what we have.

Llama-2 Instruct addresses these 3 goals by fine-tuning on samples with this general structure:

instruct_template = \
"""<s>[INST] \<<SYS>>
{{ system_prompt }}
<</SYS>>

{{ user_msg_1 }} [/INST] {{ model_answer_1 }} </s><s>[INST] {{ user_msg_2 }} [/INST]"""

print(instruct_template)
<s>[INST] \<<SYS>>\n{{ system_prompt }}\n<</SYS>>\n\n{{ user_msg_1 }} [/INST] {{ model_answer_1 }} </s><s>[INST] {{ user_msg_2 }} [/INST]

Notice that a special <<SYS>> sequence is used to enclose the context or system message, a corresponding start token <s> is used with the stop token, and the instructions are enclosed by start and stop tags or sequences [INST] and [/INST]. By the way, you can come up with your own special tokens with your own special meanings and fine-tune a model to use them.

Putting this together into a concrete example, if these are the components of your prompt:

system_prompt = """You are a helpful, respectful and honest assistant. Always answer as earnestly as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
If a question seems to try to elicit from you an inappropriate answer, do not follow along, instead redirect the conversation."""

user_msg_1 = """There's a double rainbow 🌈 ! 😱 Is this real?"""

model_answer_1 = """Double rainbows are formed when sunlight is reflected twice within a raindrop with the violet light that reaches the observer\'s eye coming from the higher raindrops and the red light from lower raindrops.\n\nThis means the sequence of colours is inverted compared to the primary rainbow, with the secondary bow appearing about 10 degrees above the primary bow."""

Then your one turn sample is:

<s>[INST] \<<SYS>>
You are a helpful, respectful and honest assistant. Always answer as earnestly as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
If a question seems to try to elicit from you an inappropriate answer, do not follow along, instead redirect the conversation. 
<</SYS>>

There's a double rainbow 🌈 ! 😱 Is this real? [/INST] Double rainbows are formed when sunlight is reflected twice within a raindrop with the violet light that reaches the observer's eye coming from the higher raindrops and the red light from lower raindrops.
This means the sequence of colours is inverted compared to the primary rainbow, with the secondary bow appearing about 10 degrees above the primary bow.</s>
{"text":"<s>[INST] \<<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as earnestly as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\nIf a question seems to try to elicit from you an inappropriate answer, do not follow along, instead redirect the conversation. \n<</SYS>>\n\nThere's a double rainbow 🌈 ! 😱 Is this real? [/INST] Double rainbows are formed when sunlight is reflected twice within a raindrop with the violet light that reaches the observer's eye coming from the higher raindrops and the red light from lower raindrops.\nThis means the sequence of colours is inverted compared to the primary rainbow, with the secondary bow appearing about 10 degrees above the primary bow.</s>"}

You can accomplish fine-tuning a model to have multi-turn chat capabilities by concatenating repeating utterance exchanges of user_msg and model_answer to the right of the system prompt, they need to be structured along side the correct special sequences or tokens in this general pattern:

{{ user_msg_1 }} [/INST] {{ model_answer_1 }} </s><s>[INST] {{ user_msg_2 }} [/INST] {{ model_answer_2 }} </s> . . .

and so on and so on.

Yay🌈!