Step 1: Create an API key
- Register for an account if you don’t have one.
- Go to your project’s API keys page.
- Select Create key, give it a name, and copy the value. New keys are only shown once, so make sure to save it somewhere safe.
- Export the key as an environment variable in your terminal:
TOGETHER_API_KEY automatically when you call Together(). Pass api_key= to the constructor to override it.
Step 2: Install the SDK
Together AI publishes official SDKs for Python and TypeScript. You can also use the OpenAI SDK pointed at our base URL, or call the REST API directly from any language.Step 3: Run your first query
The example below sends a chat completion request to MiniMax M3 and prints the response:Going further
Try some of these variations to see what else the model can do:Stream the response
Streaming returns the response token by token as it’s generated, instead of making you wait for the full reply. This is especially helpful with a reasoning model like MiniMax M3, which works through a problem before answering and can produce a lot of output. A reasoning model’s response has two parts: the step-by-step thinking, in areasoning field, and the final answer, in content.
Set stream=True (Python) or stream: true (TypeScript/cURL) and read both fields off each chunk’s delta:
reasoning stays empty and only content is returned, so the same loop works unchanged.
Add a system prompt
Prepend asystem message to set the model’s tone, role, or constraints:
Get structured JSON output
Pass a JSON schema viaresponse_format to get parseable JSON back:
Analyze an image
MiniMax M3 also accepts images. Add animage_url block to the user message to ask questions about a picture:
Use the OpenAI SDK
If you’re already using the OpenAI SDK, you can point it at Together’s base URL (https://api.together.ai/v1) and keep the rest of your code the same:
Next steps
Choose a model
Browse the catalog of models for chat, coding, vision, and reasoning.
Dedicated endpoints
Reserve GPUs for steady traffic or fine-tuned models.
Fine-tune a model
Train a model on your own data with LoRA, DPO, or full fine-tuning.
GPU clusters
Run large-scale training and custom workloads on dedicated GPU clusters.