Send a single query
Usechat.completions.create to send a single query to a chat model:
create method takes a model name and a messages array. Each message is an object with content and a role naming the author.
In the example above, the role is user. The user role tells the model that the message comes from the end user of your system, for example, a customer using your chatbot app.
The other two roles are assistant and system, covered below.
Multi-turn conversations
Every query to a chat model is self-contained, so models don’t automatically remember prior queries. Theassistant role solves this by carrying historical context for how a model has responded to prior queries, which makes it useful for chatbots and long-running conversations.
To provide a chat history for a new query, pass the previous messages to the messages array. Tag the user-provided messages with the user role and the model’s responses with the assistant role:
Add a system prompt
You can query a model with just a user message, but you’ll typically want to give the model a system prompt with context for how to respond. For example, if you’re building a travel chatbot, you might tell the model to act like a helpful travel guide. To add a system prompt, provide an initial message with thesystem role:
Stream responses
Models take time to generate a full response. Streaming returns chunks as they’re produced, so your app can display partial results while the model is still running instead of waiting for the entire request to finish. To return a stream, set thestream option to True.
Run async requests in parallel from Python
By default, Python’s Together client runs requests synchronously, so multiple queries execute in sequence even when they’re independent. To run independent calls in parallel, use theAsyncTogether module from the Python library:
Python