The Responses API is in beta. Basic generation, function tool calls, and remote MCP work today. Stored responses and conversation continuation (
previous_response_id) are not yet supported, and the list of supported models is still growing.POST /v1/responses. You can call it with the responses.create method in the OpenAI Python and TypeScript clients, or with cURL. It’s an alternative to chat completions that returns a structured list of output items and has first-class support for remote MCP servers.
Setup
Point the OpenAI client at Together by settingbase_url to https://api.together.ai/v1 and api_key to your Together API key:
Basic usage
Pass amodel and an input string. The OpenAI SDK exposes the generated text on response.output_text. With cURL, read it from the last output item at output[-1].content[0].text:
status of completed and an output array. Each element is an item such as a message (the assistant’s reply), a function_call, or an MCP item. The SDK’s output_text helper concatenates the text from the message items for you.
All the models that support the Responses API are reasoning models, and they return their reasoning inline in the
output_text message. The API doesn’t split reasoning into a separate output item, so parse or trim the text if you only want the final answer.Streaming
Setstream to receive the response as server-sent events as the model generates it. Text arrives in response.output_text.delta events, and the stream ends with a response.completed event:
Tool calls
Define a function in thetools array. When the model decides to call it, the response includes a function_call output item with the function name, a JSON-encoded arguments string, and a call_id. You execute the function and decide what to do with the result:
Remote MCP
The Responses API can connect directly to a remote MCP server. Add a tool withtype: "mcp", the server’s server_url, and a server_label. Together discovers the server’s tools and calls them on the model’s behalf, so you don’t run a client loop yourself.
The example below connects to the public DeepWiki MCP server and asks a question about a GitHub repository:
output includes an mcp_list_tools item showing the tools the server exposed, one or more mcp_call items for each tool the model invoked, and a final message with the answer.
require_approval: "never" lets the model call the server’s tools without pausing for confirmation. Only point at MCP servers you trust, since the model can send them data from your prompt.Supported models
The Responses API is enabled on a curated set of models, and the list grows over time:| Model | API string |
|---|---|
| MiniMax M2.7 | MiniMaxAI/MiniMax-M2.7 |
| Kimi K2.7 Code | moonshotai/Kimi-K2.7-Code |
| Kimi K2.6 | moonshotai/Kimi-K2.6 |
| GLM-5.1 | zai-org/GLM-5.1 |
| DeepSeek-V4-Pro | deepseek-ai/DeepSeek-V4-Pro |
Calling the Responses API with a model that isn’t enabled for it returns a
400 error: The requested model does not support the Responses api. Use chat completions for unsupported models. See Serverless models for our full catalog.Limitations
The Responses API has partial support. The following OpenAI features don’t work on Together yet:- Stored responses. The
storeparameter is accepted but has no effect, so responses aren’t persisted. - Conversation continuation. Passing
previous_response_idreturns a400error, because prior responses aren’t stored. - Retrieving or deleting a response.
GET /v1/responses/{id}andDELETE /v1/responses/{id}return404. - Native tool types beyond function calling and remote MCP (for example
web_search,file_search,image_generation, andcode_interpreter) aren’t executed. The request still succeeds, but the tool is silently ignored and the model answers without it.
Next steps
OpenAI compatibility
See how every OpenAI SDK method maps to Together endpoints.
Function calling
Build multi-turn tool-calling loops with Together models.
Available models
Browse the full catalog of serverless models.