Playground
The Playground is a web application offered by Together AI to allow our customers to run inference without having to use our API. The playground can be used with standard models, or a selection of fine-tuned models. You can access the Playground at api.together.xyz/playground.API Usage
You can use Together’s APIs to send individual queries or have long-running conversations with chat models. You can also configure a system prompt to customize how a model should respond. Queries run against a model of your choice. For most use cases, we recommend using Meta Llama 3.Running a single query
Usechat.completions.create to send a single query to a chat model:
create method takes in a model name and a messages array. Each message is an object that has the content of the query, as well as a role for the message’s author.
In the example above, you can see that we’re using “user” for the role. The “user” role tells the model that this message comes from the end user of our system – for example, a customer using your chatbot app.
The other two roles are “assistant” and “system”, which we’ll talk about next.
Having a long-running conversation
Every query to a chat model is self-contained. This means that new queries won’t automatically have access to any queries that may have come before them. This is exactly why the “assistant” role exists. The “assistant” role is used to provide historical context for how a model has responded to prior queries. This makes it perfect for building apps that have long-running conversations, like chatbots. To provide a chat history for a new query, pass the previous messages to themessages array, denoting the user-provided queries with the “user” role, and the model’s responses with the “assistant” role:
Customizing how the model responds
While you can query a model just by providing a user message, typically you’ll want to give your model some context for how you’d like it to respond. For example, if you’re building a chatbot to help your customers with travel plans, you might want to tell your model that it should act like a helpful travel guide. To do this, provide an initial message that uses the “system” role:Streaming responses
Since models can take some time to respond to a query, Together’s APIs support streaming back responses in chunks. This lets you display results from each chunk while the model is still running, instead of having to wait for the entire response to finish. To return a stream, set thestream option to true.
A note on async support in Python
Since I/O in Python is synchronous, multiple queries will execute one after another in sequence, even if they are independent. If you have multiple independent calls that you want to run in parallel, you can use our Python library’sAsyncTogether module:
Python