November, 2025
Enhanced Audio Capabilities: Real-time Text-to-Speech and Speech-to-TextTogether AI expands audio capabilities with real-time streaming for both TTS and STT, new models, and speaker diarization.
- Real-time Text-to-Speech: WebSocket API for lowest-latency interactive applications
- New TTS Models: Orpheus 3B (
canopylabs/orpheus-3b-0.1-ft) and Kokoro 82M (hexgrad/Kokoro-82M) supporting REST, streaming, and WebSocket endpoints - Real-time Speech-to-Text: WebSocket streaming transcription with Whisper for live audio applications
- Voxtral Model: New Mistral AI speech recognition model (
mistralai/Voxtral-Mini-3B-2507) for audio transcriptions - Speaker Diarization: Identify and label different speakers in audio transcriptions with a free
diarizeflag - TTS WebSocket endpoint:
/v1/audio/speech/websocket - STT WebSocket endpoint:
/v1/realtime - Check out the Text-to-Speech guide and Speech-to-Text guide
October, 2025
Model DeprecationsThe following image models have been deprecated and are no longer available:
black-forest-labs/FLUX.1-pro(Calls to FLUX.1-pro will now redirect to FLUX.1.1-pro)black-forest-labs/FLUX.1-Canny-pro
Video Generation API & 40+ New Image and Video ModelsTogether AI expands into multimedia generation with comprehensive video and image capabilities. Read more
- New Video Generation API: Create high-quality videos with models like OpenAI Sora 2, Google Veo 3.0, and Minimax Hailuo
- 40+ Image & Video Models: Including Google Imagen 4.0 Ultra, Gemini Flash Image 2.5 (Nano Banana), ByteDance SeeDream, and specialized editing tools
- Unified Platform: Combine text, image, and video generation through the same APIs, authentication, and billing
- Production-Ready: Serverless endpoints with transparent per-model pricing and enterprise-grade infrastructure
- Video endpoints:
/videos/createand/videos/retrieve - Image endpoint:
/images/generations
September, 2025
Improved Batch Inference API: Enhanced UI, Expanded Model Support, and Rate Limit IncreaseWhat’s New
- Streamlined UI: Create and track batch jobs in an intuitive interface — no complex API calls required.
- Universal Model Access: The Batch Inference API now supports all serverless models and private deployments, so you can run batch workloads on exactly the models you need.
- Massive Scale Jump: Rate limits are up from 10M to 30B enqueued tokens per model per user, a 3000× increase. Need more? We’ll work with you to customize.
- Lower Cost: For most serverless models, the Batch Inference API runs at 50% the cost of our real-time API, making it the most economical way to process high-throughput workloads.
Qwen3-Next-80B Models ReleaseNew Qwen3-Next-80B models now available for both thinking and instruction tasks.
- Model ID:
Qwen/Qwen3-Next-80B-A3B-Thinking - Model ID:
Qwen/Qwen3-Next-80B-A3B-Instruct
Fine-Tuning Platform UpgradesEnhanced fine-tuning capabilities with expanded model support and increased context lengths. Read moreEnable fine-tuning for new large models:
Increased maximum supported context length (per model and variant):DeepSeek Models
Enhanced Hugging Face integrations:
openai/gpt-oss-120bdeepseek-ai/DeepSeek-V3.1deepseek-ai/DeepSeek-V3.1-Basedeepseek-ai/DeepSeek-R1-0528deepseek-ai/DeepSeek-R1deepseek-ai/DeepSeek-V3-0324deepseek-ai/DeepSeek-V3deepseek-ai/DeepSeek-V3-BaseQwen/Qwen3-Coder-480B-A35B-InstructQwen/Qwen3-235B-A22B(context length 32,768 for SFT and 16,384 for DPO)Qwen/Qwen3-235B-A22B-Instruct-2507(context length 32,768 for SFT and 16,384 for DPO)meta-llama/Llama-4-Maverick-17B-128Emeta-llama/Llama-4-Maverick-17B-128E-Instructmeta-llama/Llama-4-Scout-17B-16Emeta-llama/Llama-4-Scout-17B-16E-Instruct
Increased maximum supported context length (per model and variant):DeepSeek Models
- DeepSeek-R1-Distill-Llama-70B: SFT: 8192 → 24,576, DPO: 8192 → 8192
- DeepSeek-R1-Distill-Qwen-14B: SFT: 8192 → 65,536, DPO: 8192 → 12,288
- DeepSeek-R1-Distill-Qwen-1.5B: SFT: 8192 → 131,072, DPO: 8192 → 16,384
- gemma-3-1b-it: SFT: 16,384 → 32,768, DPO: 16,384 → 12,288
- gemma-3-1b-pt: SFT: 16,384 → 32,768, DPO: 16,384 → 12,288
- gemma-3-4b-it: SFT: 16,384 → 131,072, DPO: 16,384 → 12,288
- gemma-3-4b-pt: SFT: 16,384 → 131,072, DPO: 16,384 → 12,288
- gemma-3-12b-pt: SFT: 16,384 → 65,536, DPO: 16,384 → 8,192
- gemma-3-27b-it: SFT: 12,288 → 49,152, DPO: 12,288 → 8,192
- gemma-3-27b-pt: SFT: 12,288 → 49,152, DPO: 12,288 → 8,192
- Qwen3-0.6B / Qwen3-0.6B-Base: SFT: 8192 → 32,768, DPO: 8192 → 24,576
- Qwen3-1.7B / Qwen3-1.7B-Base: SFT: 8192 → 32,768, DPO: 8192 → 16,384
- Qwen3-4B / Qwen3-4B-Base: SFT: 8192 → 32,768, DPO: 8192 → 16,384
- Qwen3-8B / Qwen3-8B-Base: SFT: 8192 → 32,768, DPO: 8192 → 16,384
- Qwen3-14B / Qwen3-14B-Base: SFT: 8192 → 32,768, DPO: 8192 → 16,384
- Qwen3-32B: SFT: 8192 → 24,576, DPO: 8192 → 4096
- Qwen2.5-72B-Instruct: SFT: 8192 → 24,576, DPO: 8192 → 8192
- Qwen2.5-32B-Instruct: SFT: 8192 → 32,768, DPO: 8192 → 12,288
- Qwen2.5-32B: SFT: 8192 → 49,152, DPO: 8192 → 12,288
- Qwen2.5-14B-Instruct: SFT: 8192 → 32,768, DPO: 8192 → 16,384
- Qwen2.5-14B: SFT: 8192 → 65,536, DPO: 8192 → 16,384
- Qwen2.5-7B-Instruct: SFT: 8192 → 32,768, DPO: 8192 → 16,384
- Qwen2.5-7B: SFT: 8192 → 131,072, DPO: 8192 → 16,384
- Qwen2.5-3B-Instruct: SFT: 8192 → 32,768, DPO: 8192 → 16,384
- Qwen2.5-3B: SFT: 8192 → 32,768, DPO: 8192 → 16,384
- Qwen2.5-1.5B-Instruct: SFT: 8192 → 32,768, DPO: 8192 → 16,384
- Qwen2.5-1.5B: SFT: 8192 → 32,768, DPO: 8192 → 16,384
- Qwen2-72B-Instruct / Qwen2-72B: SFT: 8192 → 32,768, DPO: 8192 → 8192
- Qwen2-7B-Instruct: SFT: 8192 → 32,768, DPO: 8192 → 16,384
- Qwen2-7B: SFT: 8192 → 131,072, DPO: 8192 → 16,384
- Qwen2-1.5B-Instruct: SFT: 8192 → 32,768, DPO: 8192 → 16,384
- Qwen2-1.5B: SFT: 8192 → 131,072, DPO: 8192 → 16,384
- Llama-3.3-70B-Instruct-Reference: SFT: 8,192 → 24,576, DPO: 8,192 → 8,192
- Llama-3.2-3B-Instruct: SFT: 8,192 → 131,072, DPO: 8,192 → 24,576
- Llama-3.2-1B-Instruct: SFT: 8,192 → 131,072, DPO: 8,192 → 24,576
- Meta-Llama-3.1-8B-Instruct-Reference: SFT: 8,192 → 131,072, DPO: 8,192 → 16,384
- Meta-Llama-3.1-8B-Reference: SFT: 8,192 → 131,072, DPO: 8,192 → 16,384
- Meta-Llama-3.1-70B-Instruct-Reference: SFT: 8,192 → 24,576, DPO: 8,192 → 8,192
- Meta-Llama-3.1-70B-Reference: SFT: 8,192 → 24,576, DPO: 8,192 → 8,192
- mistralai/Mistral-7B-v0.1: SFT: 8,192 → 32,768, DPO: 8,192 → 32,768
- teknium/OpenHermes-2p5-Mistral-7B: SFT: 8,192 → 32,768, DPO: 8,192 → 32,768
Enhanced Hugging Face integrations:
- Fine-tune any < 100B parameter CausalLM from Hugging Face Hub
- Support for DPO variants such as LN-DPO, DPO+NLL, and SimPO
- Support fine-tuning with maximum batch size
- Public
fine-tunes/models/limitsandfine-tunes/models/supportedendpoints - Automatic filtering of sequences with no trainable tokens (e.g., if a sequence prompt is longer than the model’s context length, the completion is pushed outside the window)
Together Instant Clusters General AvailabilitySelf-service NVIDIA GPU clusters with API-first provisioning. Read more
- New API endpoints for cluster management:
/v1/gpu_cluster- Create and manage GPU clusters/v1/shared_volume- High-performance shared storage/v1/regions- Available data center locations
- Support for NVIDIA Blackwell (HGX B200) and Hopper (H100, H200) GPUs
- Scale from single-node (8 GPUs) to hundreds of interconnected GPUs
- Pre-configured with Kubernetes, Slurm, and networking components
Serverless LoRA and Dedicated Endpoints support for EvaluationsYou can now run evaluations:
- Using Serverless LoRA models, including supported LoRA fine-tuned models
- Using Dedicated Endpoints, including fine-tuned models deployed via dedicated endpoints
Kimi-K2-Instruct-0905 Model ReleaseUpgraded version of Moonshot’s 1 trillion parameter MoE model with enhanced performance. Read more
- Model ID:
moonshot-ai/Kimi-K2-Instruct-0905
August, 2025
DeepSeek-V3.1 Model ReleaseUpgraded version of DeepSeek-R1-0528 and DeepSeek-V3-0324. Read more
Model DeprecationsThe following models have been deprecated and are no longer available:
- Dual Modes: Fast mode for quick responses, thinking mode for complex reasoning
- 671B total parameters with 37B active parameters
- Model ID:
deepseek-ai/DeepSeek-V3.1
Model DeprecationsThe following models have been deprecated and are no longer available:
meta-llama/Llama-3.2-90B-Vision-Instruct-Turboblack-forest-labs/FLUX.1-cannymeta-llama/Llama-3-8b-chat-hfblack-forest-labs/FLUX.1-reduxblack-forest-labs/FLUX.1-depthdeepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BNousResearch/Nous-Hermes-2-Mixtral-8x7B-DPOmeta-llama/Llama-3.2-11B-Vision-Instruct-Turbometa-llama-llama-3-3-70b-instruct-loraQwen/Qwen2.5-14Bmeta-llama/Llama-Vision-FreeQwen/Qwen2-72B-Instructgoogle/gemma-2-27b-itmeta-llama/Meta-Llama-3-8B-Instructperplexity-ai/r1-1776nvidia/Llama-3.1-Nemotron-70B-Instruct-HFQwen/Qwen2-VL-72B-Instruct
GPT-OSS Models Fine-Tuning SupportFine-tune OpenAI’s open-source models to create domain-specific variants. Read more
- Supported models:
gpt-oss-20Bandgpt-oss-120B - Supports 16K context SFT, 8k context DPO
OpenAI GPT-OSS Models Now AvailableOpenAI’s first open-weight models now accessible through Together AI. Read more
- Model IDs:
openai/gpt-oss-20b,openai/gpt-oss-120b
July, 2025
VirtueGuard Model ReleaseEnterprise-grade gaurd model for safety monitoring with 8ms response time. Read more
- Real-time content filtering and bias detection
- Prompt injection protection
- Model ID:
VirtueAI/VirtueGuard-Text-Lite
Together Evaluations FrameworkBenchmarking platform using LLM-as-a-judge methodology for model performance assessment. Read more
- Create custom LLM-as-a-Judge evaluation suites for your domain
- Support
compare,classifyandscorefunctionality - Enables comparing models, prompts and LLM configs, scoring and classifying LLM outputs
Qwen3-Coder-480B Model ReleaseAgentic coding model with top SWE-Bench Verified performance. Read more
- 480B total parameters with 35B active (MoE architecture)
- 256K context length for entire codebase handling
- Leading SWE-Bench scores on software engineering benchmarks
- Model ID:
Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8
NVIDIA HGX B200 Hardware SupportRecord-breaking serverless inference speed for DeepSeek-R1-0528 using NVIDIA’s Blackwell architecture. Read more
- Dramatically improved throughput and lower latency
- Same API endpoints and pricing
- Model ID:
deepseek-ai/DeepSeek-R1
Kimi-K2-Instruct Model LaunchMoonshot AI’s 1 trillion parameter MoE model with frontier-level performance. Read more
- Excels at tool use, and multi-step tasks and strong multilingual support
- Great agentic and function calling capabilities
- Model ID:
moonshotai/Kimi-K2-Instruct
Whisper Speech-to-Text APIsHigh-performance audio transcription that’s 15× faster than OpenAI with support for files over 1 GB. Read more
- Multiple audio formats with timestamp generation
- Speaker diarization and language detection
- Use
/audio/transcriptionsand/audio/translationsendpoint - Model ID:
openai/whisper-large-v3