Llama 3.1

Learn how to do function calling for Llama 3.1 models!

Llama 3.1 shipped natively with function calling support, but instead of specifying a tool_choice parameter like traditional function calling, it works with a special prompt syntax. Let's take a look at how to do function calling with Llama 3.1 models – strictly with a custom prompt!

According to Meta, if you want to do a full conversation with tool calling, Llama 3.1 70B and Llama 3.1 405B are the two recommended options. Llama 3.1 8B is good for zero shot tool calling, but can't hold a full conversation at the same time.

Function calling w/ Llama 3.1 70B

Say we have a function called weatherTool and want to pass it to LLama 3.1 to be able to call it when it sees fit. We'll define the function attributes, use the special prompt from llama-agentic-system (from Meta) to pass the function to the model in the system prompt, and send in a prompt asking how the weather is in Tokyo.

import json
from together import Together

together = Together()

weatherTool = {
    "name": "get_current_weather",
    "description": "Get the current weather in a given location",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA",
            },
        },
        "required": ["location"],
    },
}

toolPrompt = f"""
You have access to the following functions:

Use the function '{weatherTool["name"]}' to '{weatherTool["description"]}':
{json.dumps(weatherTool)}

If you choose to call a function ONLY reply in the following format with no prefix or suffix:

<function=example_function_name>{{\"example_name\": \"example_value\"}}</function>

Reminder:
- Function calls MUST follow the specified format, start with <function= and end with </function>
- Required parameters MUST be specified
- Only call one function at a time
- Put the entire function call reply on one line
- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls

"""

messages = [
  	{
        "role": "system",
        "content": toolPrompt,
    },
    {
        "role": "user",
        "content": "What is the weather in Tokyo?",
    },
    
]

response = together.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
    messages=messages,
    max_tokens=1024,
    temperature=0,
)

messages.append(response.choices[0].message)
print(response.choices[0].message.content)
import Together from 'together-ai';

const together = new Together();

const weatherTool = {
  name: 'get_current_weather',
  description: 'Get the current weather in a given location',
  parameters: {
    type: 'object',
    properties: {
      location: {
        type: 'string',
        description: 'The city and state, e.g. San Francisco, CA',
      },
    },
    required: ['location'],
  },
};

const toolPrompt = `You have access to the following functions:

Use the function '${weatherTool.name}' to '${weatherTool.description}':
${JSON.stringify(weatherTool)}

If you choose to call a function ONLY reply in the following format with no prefix or suffix:

<function=example_function_name>{"example_name": "example_value"}</function>

Reminder:
- Function calls MUST follow the specified format, start with <function= and end with </function>
- Required parameters MUST be specified
- Only call one function at a time
- Put the entire function call reply on one line
- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls

`;

let messages = [
    {
      role: 'system',
      content: toolPrompt,
    },
    {
      role: 'user',
      content: 'What is the weather in Casablanca?',
    },
  ];

async function main() {
  const response = await together.chat.completions.create({
    model: 'meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo',
    messages: messages,
    max_tokens: 1024,
    temperature: 0,
  });
	if (response.choices?.[0]?.message) {
    messages.push({
      role: response.choices[0].message.role,
      content: response.choices[0].message.content!,
    });
    console.log(response.choices?.[0]?.message?.content);
  }
}

main();
<function=get_current_weather>{"location": "Tokyo, Japan"}</function>

The output to the function above gets us the following string. We'll then need to parse this string & can call our function after.

Parsing

Note: Function calling with Llama 3.1 does not call the function for us, but it simply lets us know what function(s) to call with what parameters.

To parse the function above, we can write some code to get a nice JSON object.

import re
import json

def parse_tool_response(response: str):
    function_regex = r"<function=(\w+)>(.*?)</function>"
    match = re.search(function_regex, response)

    if match:
        function_name, args_string = match.groups()
        try:
            args = json.loads(args_string)
            return {
                "function": function_name,
                "arguments": args,
            }
        except json.JSONDecodeError as error:
            print(f"Error parsing function arguments: {error}")
            return None
    return None

parsed_response = parse_tool_response(response.choices[0].message.content)
print(parse_tool_response(response.choices[0].message.content))
function parseToolResponse(response: string) {
  const functionRegex = /<function=(\w+)>(.*?)<\/function>/;
  const match = response.match(functionRegex);

  if (match) {
    const [, functionName, argsString] = match;
    try {
      return {
        function: functionName,
        arguments: JSON.parse(argsString),
      };
    } catch (error) {
      console.error('Error parsing function arguments:', error);
      return null;
    }
  }

  return null;
}

const parsedResponse = parseToolResponse(response.choices[0].message.content);
console.log(parsedResponse);
{
  "function": "get_current_weather",
  "arguments": { "location": "Tokyo, Japan" },
}

Now that we have the parsed function call and arguments in JSON, we can pass these into our actual weather function, pass these into the LLM, and have it respond back to the user.

Calling the function

Now that Llama 3.1 has told us what function(s) to call and with what parameters, we can execute this ourselves and pass it back to the LLM so it can respond to the user.

def get_current_weather(location: str) -> str:
    # This would be replaced by a weather API
    if location == "San Francisco, CA":
        return "62 degrees and cloudy"
    elif location == "Philadelphia, PA":
        return "83 degrees and sunny"
    return "Weather is unknown"

if parsed_response:
    available_functions = {"get_current_weather": get_current_weather}
    function_to_call = available_functions[parsed_response["function"]]
    weather = function_to_call(parsed_response["arguments"]["location"])
    messages.append(
        {
            "role": "tool",
            "content": weather,
        }
    )
    print("Weather answer is: ", weather)

    res = together.chat.completions.create(
        model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
        messages=messages,
        max_tokens=1000,
        temperature=0,
    )
    print("Answer from the LLM: ", res.choices[0].message)
else:
    print("No function call found in the response")
function get_current_weather(location: string) {
    // This would be replaced by a weather API
    if (location === 'San Francisco, CA') {
      return '62 degrees and cloudy';
    } else if (location === 'Philadelphia, PA') {
      return '87 degrees and sunny';
    }
    return 'Weather is unknown';
  }

if (parsedResponse) {
    const availableFunctions = {
      get_current_weather: get_current_weather,
    };

    if (parsedResponse.function in availableFunctions) {
      const functionToCall = availableFunctions[parsedResponse.function];
      let weather = functionToCall(parsedResponse.arguments.location);

      messages.push({
        role: 'tool',
        content: weather,
      });
      console.log('Weather answer is: ', weather);
      let res = await together.chat.completions.create({
        model: 'meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo',
        messages: messages as any,
        max_tokens: 1000,
        temperature: 0,
      });
      console.log('Answer from the LLM: ', res.choices[0].message);
    } else {
      console.log(`Function ${parsedResponse.function} not found`);
    }
  } else {
    console.log('No function call found in the response');
  }
Weather answer is:  87 degrees and sunny
Answer from the LLM:  {
  role: 'assistant',
  content: 'The current weather in Casablanca is 87 degrees and sunny.'
}

How good is Llama 3.1 at tool calling?

Llama 3.1 70B and Llama 3.1 405B are both excellent models for tool calling since they were trained with this functionality in mind. In fact, according to some evals, these models even perform better than GPT-4o.