Skip to main content
The Responses API is in beta. Basic generation, function tool calls, and remote MCP work today. Stored responses and conversation continuation (previous_response_id) are not yet supported, and the list of supported models is still growing.
Together’s API implements the OpenAI Responses API at POST /v1/responses. You can call it with the responses.create method in the OpenAI Python and TypeScript clients, or with cURL. It’s an alternative to chat completions that returns a structured list of output items and has first-class support for remote MCP servers.

Setup

Point the OpenAI client at Together by setting base_url to https://api.together.ai/v1 and api_key to your Together API key:
import os
import openai

client = openai.OpenAI(
    api_key=os.environ.get("TOGETHER_API_KEY"),
    base_url="https://api.together.ai/v1",
)

Basic usage

Pass a model and an input string. The OpenAI SDK exposes the generated text on response.output_text. With cURL, read it from the last output item at output[-1].content[0].text:
response = client.responses.create(
    model="MiniMaxAI/MiniMax-M2.7",
    input="Does Together AI support the Responses API?",
)

print(response.output_text)
The response object has a status of completed and an output array. Each element is an item such as a message (the assistant’s reply), a function_call, or an MCP item. The SDK’s output_text helper concatenates the text from the message items for you.
All the models that support the Responses API are reasoning models, and they return their reasoning inline in the output_text message. The API doesn’t split reasoning into a separate output item, so parse or trim the text if you only want the final answer.

Streaming

Set stream to receive the response as server-sent events as the model generates it. Text arrives in response.output_text.delta events, and the stream ends with a response.completed event:
with client.responses.stream(
    model="MiniMaxAI/MiniMax-M2.7",
    input="Write a haiku about the ocean.",
) as stream:
    for event in stream:
        if event.type == "response.output_text.delta":
            print(event.delta, end="", flush=True)

Tool calls

Define a function in the tools array. When the model decides to call it, the response includes a function_call output item with the function name, a JSON-encoded arguments string, and a call_id. You execute the function and decide what to do with the result:
response = client.responses.create(
    model="MiniMaxAI/MiniMax-M2.7",
    input="What is the weather in San Francisco?",
    tools=[
        {
            "type": "function",
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and state, for example San Francisco, CA.",
                    }
                },
                "required": ["location"],
            },
        }
    ],
    tool_choice="auto",
)

for item in response.output:
    if item.type == "function_call":
        print(
            item.name, item.arguments
        )  # get_weather {"location":"San Francisco, CA"}
For multi-turn function-calling patterns, see Function calling.

Remote MCP

The Responses API can connect directly to a remote MCP server. Add a tool with type: "mcp", the server’s server_url, and a server_label. Together discovers the server’s tools and calls them on the model’s behalf, so you don’t run a client loop yourself. The example below connects to the public DeepWiki MCP server and asks a question about a GitHub repository:
response = client.responses.create(
    model="MiniMaxAI/MiniMax-M2.7",
    input="What transport protocols does the modelcontextprotocol/python-sdk repo support?",
    tools=[
        {
            "type": "mcp",
            "server_label": "deepwiki",
            "server_url": "https://mcp.deepwiki.com/mcp",
            "require_approval": "never",
        }
    ],
)

print(response.output_text)
The response output includes an mcp_list_tools item showing the tools the server exposed, one or more mcp_call items for each tool the model invoked, and a final message with the answer.
require_approval: "never" lets the model call the server’s tools without pausing for confirmation. Only point at MCP servers you trust, since the model can send them data from your prompt.

Supported models

The Responses API is enabled on a curated set of models, and the list grows over time:
ModelAPI string
MiniMax M2.7MiniMaxAI/MiniMax-M2.7
Kimi K2.7 Codemoonshotai/Kimi-K2.7-Code
Kimi K2.6moonshotai/Kimi-K2.6
GLM-5.1zai-org/GLM-5.1
DeepSeek-V4-Prodeepseek-ai/DeepSeek-V4-Pro
Calling the Responses API with a model that isn’t enabled for it returns a 400 error: The requested model does not support the Responses api. Use chat completions for unsupported models. See Serverless models for our full catalog.

Limitations

The Responses API has partial support. The following OpenAI features don’t work on Together yet:
  • Stored responses. The store parameter is accepted but has no effect, so responses aren’t persisted.
  • Conversation continuation. Passing previous_response_id returns a 400 error, because prior responses aren’t stored.
  • Retrieving or deleting a response. GET /v1/responses/{id} and DELETE /v1/responses/{id} return 404.
  • Native tool types beyond function calling and remote MCP (for example web_search, file_search, image_generation, and code_interpreter) aren’t executed. The request still succeeds, but the tool is silently ignored and the model answers without it.
For the full picture of which OpenAI SDK methods map to Together endpoints, see OpenAI compatibility.

Next steps

OpenAI compatibility

See how every OpenAI SDK method maps to Together endpoints.

Function calling

Build multi-turn tool-calling loops with Together models.

Available models

Browse the full catalog of serverless models.