Skip to main content

November, 2025

Nov 3
/audio/speech, /audio/transcriptions
Enhanced Audio Capabilities: Real-time Text-to-Speech and Speech-to-TextTogether AI expands audio capabilities with real-time streaming for both TTS and STT, new models, and speaker diarization.
  • Real-time Text-to-Speech: WebSocket API for lowest-latency interactive applications
  • New TTS Models: Orpheus 3B (canopylabs/orpheus-3b-0.1-ft) and Kokoro 82M (hexgrad/Kokoro-82M) supporting REST, streaming, and WebSocket endpoints
  • Real-time Speech-to-Text: WebSocket streaming transcription with Whisper for live audio applications
  • Voxtral Model: New Mistral AI speech recognition model (mistralai/Voxtral-Mini-3B-2507) for audio transcriptions
  • Speaker Diarization: Identify and label different speakers in audio transcriptions with a free diarize flag
  • TTS WebSocket endpoint: /v1/audio/speech/websocket
  • STT WebSocket endpoint: /v1/realtime
  • Check out the Text-to-Speech guide and Speech-to-Text guide

October, 2025

Oct 31
/images
Model DeprecationsThe following image models have been deprecated and are no longer available:
  • black-forest-labs/FLUX.1-pro (Calls to FLUX.1-pro will now redirect to FLUX.1.1-pro)
  • black-forest-labs/FLUX.1-Canny-pro
Oct 21
/videos, /images
Video Generation API & 40+ New Image and Video ModelsTogether AI expands into multimedia generation with comprehensive video and image capabilities. Read more
  • New Video Generation API: Create high-quality videos with models like OpenAI Sora 2, Google Veo 3.0, and Minimax Hailuo
  • 40+ Image & Video Models: Including Google Imagen 4.0 Ultra, Gemini Flash Image 2.5 (Nano Banana), ByteDance SeeDream, and specialized editing tools
  • Unified Platform: Combine text, image, and video generation through the same APIs, authentication, and billing
  • Production-Ready: Serverless endpoints with transparent per-model pricing and enterprise-grade infrastructure
  • Video endpoints: /videos/create and /videos/retrieve
  • Image endpoint: /images/generations

September, 2025

Sep 15
/batch_api
Improved Batch Inference API: Enhanced UI, Expanded Model Support, and Rate Limit IncreaseWhat’s New
  • Streamlined UI: Create and track batch jobs in an intuitive interface — no complex API calls required.
  • Universal Model Access: The Batch Inference API now supports all serverless models and private deployments, so you can run batch workloads on exactly the models you need.
  • Massive Scale Jump: Rate limits are up from 10M to 30B enqueued tokens per model per user, a 3000× increase. Need more? We’ll work with you to customize.
  • Lower Cost: For most serverless models, the Batch Inference API runs at 50% the cost of our real-time API, making it the most economical way to process high-throughput workloads.
Sep 13
/chat/completions
Qwen3-Next-80B Models ReleaseNew Qwen3-Next-80B models now available for both thinking and instruction tasks.
  • Model ID: Qwen/Qwen3-Next-80B-A3B-Thinking
  • Model ID: Qwen/Qwen3-Next-80B-A3B-Instruct
Sep 10
/fine-tunes
Fine-Tuning Platform UpgradesEnhanced fine-tuning capabilities with expanded model support and increased context lengths. Read moreEnable fine-tuning for new large models:
  • openai/gpt-oss-120b
  • deepseek-ai/DeepSeek-V3.1
  • deepseek-ai/DeepSeek-V3.1-Base
  • deepseek-ai/DeepSeek-R1-0528
  • deepseek-ai/DeepSeek-R1
  • deepseek-ai/DeepSeek-V3-0324
  • deepseek-ai/DeepSeek-V3
  • deepseek-ai/DeepSeek-V3-Base
  • Qwen/Qwen3-Coder-480B-A35B-Instruct
  • Qwen/Qwen3-235B-A22B (context length 32,768 for SFT and 16,384 for DPO)
  • Qwen/Qwen3-235B-A22B-Instruct-2507 (context length 32,768 for SFT and 16,384 for DPO)
  • meta-llama/Llama-4-Maverick-17B-128E
  • meta-llama/Llama-4-Maverick-17B-128E-Instruct
  • meta-llama/Llama-4-Scout-17B-16E
  • meta-llama/Llama-4-Scout-17B-16E-Instruct

Increased maximum supported context length (per model and variant):DeepSeek Models
  • DeepSeek-R1-Distill-Llama-70B: SFT: 8192 → 24,576, DPO: 8192 → 8192
  • DeepSeek-R1-Distill-Qwen-14B: SFT: 8192 → 65,536, DPO: 8192 → 12,288
  • DeepSeek-R1-Distill-Qwen-1.5B: SFT: 8192 → 131,072, DPO: 8192 → 16,384
Google Gemma Models
  • gemma-3-1b-it: SFT: 16,384 → 32,768, DPO: 16,384 → 12,288
  • gemma-3-1b-pt: SFT: 16,384 → 32,768, DPO: 16,384 → 12,288
  • gemma-3-4b-it: SFT: 16,384 → 131,072, DPO: 16,384 → 12,288
  • gemma-3-4b-pt: SFT: 16,384 → 131,072, DPO: 16,384 → 12,288
  • gemma-3-12b-pt: SFT: 16,384 → 65,536, DPO: 16,384 → 8,192
  • gemma-3-27b-it: SFT: 12,288 → 49,152, DPO: 12,288 → 8,192
  • gemma-3-27b-pt: SFT: 12,288 → 49,152, DPO: 12,288 → 8,192
Qwen Models
  • Qwen3-0.6B / Qwen3-0.6B-Base: SFT: 8192 → 32,768, DPO: 8192 → 24,576
  • Qwen3-1.7B / Qwen3-1.7B-Base: SFT: 8192 → 32,768, DPO: 8192 → 16,384
  • Qwen3-4B / Qwen3-4B-Base: SFT: 8192 → 32,768, DPO: 8192 → 16,384
  • Qwen3-8B / Qwen3-8B-Base: SFT: 8192 → 32,768, DPO: 8192 → 16,384
  • Qwen3-14B / Qwen3-14B-Base: SFT: 8192 → 32,768, DPO: 8192 → 16,384
  • Qwen3-32B: SFT: 8192 → 24,576, DPO: 8192 → 4096
  • Qwen2.5-72B-Instruct: SFT: 8192 → 24,576, DPO: 8192 → 8192
  • Qwen2.5-32B-Instruct: SFT: 8192 → 32,768, DPO: 8192 → 12,288
  • Qwen2.5-32B: SFT: 8192 → 49,152, DPO: 8192 → 12,288
  • Qwen2.5-14B-Instruct: SFT: 8192 → 32,768, DPO: 8192 → 16,384
  • Qwen2.5-14B: SFT: 8192 → 65,536, DPO: 8192 → 16,384
  • Qwen2.5-7B-Instruct: SFT: 8192 → 32,768, DPO: 8192 → 16,384
  • Qwen2.5-7B: SFT: 8192 → 131,072, DPO: 8192 → 16,384
  • Qwen2.5-3B-Instruct: SFT: 8192 → 32,768, DPO: 8192 → 16,384
  • Qwen2.5-3B: SFT: 8192 → 32,768, DPO: 8192 → 16,384
  • Qwen2.5-1.5B-Instruct: SFT: 8192 → 32,768, DPO: 8192 → 16,384
  • Qwen2.5-1.5B: SFT: 8192 → 32,768, DPO: 8192 → 16,384
  • Qwen2-72B-Instruct / Qwen2-72B: SFT: 8192 → 32,768, DPO: 8192 → 8192
  • Qwen2-7B-Instruct: SFT: 8192 → 32,768, DPO: 8192 → 16,384
  • Qwen2-7B: SFT: 8192 → 131,072, DPO: 8192 → 16,384
  • Qwen2-1.5B-Instruct: SFT: 8192 → 32,768, DPO: 8192 → 16,384
  • Qwen2-1.5B: SFT: 8192 → 131,072, DPO: 8192 → 16,384
Meta Llama Models
  • Llama-3.3-70B-Instruct-Reference: SFT: 8,192 → 24,576, DPO: 8,192 → 8,192
  • Llama-3.2-3B-Instruct: SFT: 8,192 → 131,072, DPO: 8,192 → 24,576
  • Llama-3.2-1B-Instruct: SFT: 8,192 → 131,072, DPO: 8,192 → 24,576
  • Meta-Llama-3.1-8B-Instruct-Reference: SFT: 8,192 → 131,072, DPO: 8,192 → 16,384
  • Meta-Llama-3.1-8B-Reference: SFT: 8,192 → 131,072, DPO: 8,192 → 16,384
  • Meta-Llama-3.1-70B-Instruct-Reference: SFT: 8,192 → 24,576, DPO: 8,192 → 8,192
  • Meta-Llama-3.1-70B-Reference: SFT: 8,192 → 24,576, DPO: 8,192 → 8,192
Mistral Models
  • mistralai/Mistral-7B-v0.1: SFT: 8,192 → 32,768, DPO: 8,192 → 32,768
  • teknium/OpenHermes-2p5-Mistral-7B: SFT: 8,192 → 32,768, DPO: 8,192 → 32,768

Enhanced Hugging Face integrations:
  • Fine-tune any < 100B parameter CausalLM from Hugging Face Hub
  • Support for DPO variants such as LN-DPO, DPO+NLL, and SimPO
  • Support fine-tuning with maximum batch size
  • Public fine-tunes/models/limits and fine-tunes/models/supported endpoints
  • Automatic filtering of sequences with no trainable tokens (e.g., if a sequence prompt is longer than the model’s context length, the completion is pushed outside the window)
Sep 9
/gpu_cluster
Together Instant Clusters General AvailabilitySelf-service NVIDIA GPU clusters with API-first provisioning. Read more
  • New API endpoints for cluster management:
    • /v1/gpu_cluster - Create and manage GPU clusters
    • /v1/shared_volume - High-performance shared storage
    • /v1/regions - Available data center locations
  • Support for NVIDIA Blackwell (HGX B200) and Hopper (H100, H200) GPUs
  • Scale from single-node (8 GPUs) to hundreds of interconnected GPUs
  • Pre-configured with Kubernetes, Slurm, and networking components
Sep 8
/evaluation
Serverless LoRA and Dedicated Endpoints support for EvaluationsYou can now run evaluations:
Sep 5
/chat/completions
Kimi-K2-Instruct-0905 Model ReleaseUpgraded version of Moonshot’s 1 trillion parameter MoE model with enhanced performance. Read more
  • Model ID: moonshot-ai/Kimi-K2-Instruct-0905

August, 2025

Aug 27
/chat/completions
DeepSeek-V3.1 Model ReleaseUpgraded version of DeepSeek-R1-0528 and DeepSeek-V3-0324. Read more
  • Dual Modes: Fast mode for quick responses, thinking mode for complex reasoning
  • 671B total parameters with 37B active parameters
  • Model ID: deepseek-ai/DeepSeek-V3.1

Model DeprecationsThe following models have been deprecated and are no longer available:
  • meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo
  • black-forest-labs/FLUX.1-canny
  • meta-llama/Llama-3-8b-chat-hf
  • black-forest-labs/FLUX.1-redux
  • black-forest-labs/FLUX.1-depth
  • deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
  • NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO
  • meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo
  • meta-llama-llama-3-3-70b-instruct-lora
  • Qwen/Qwen2.5-14B
  • meta-llama/Llama-Vision-Free
  • Qwen/Qwen2-72B-Instruct
  • google/gemma-2-27b-it
  • meta-llama/Meta-Llama-3-8B-Instruct
  • perplexity-ai/r1-1776
  • nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
  • Qwen/Qwen2-VL-72B-Instruct
Aug 19
/fine-tunes
GPT-OSS Models Fine-Tuning SupportFine-tune OpenAI’s open-source models to create domain-specific variants. Read more
  • Supported models: gpt-oss-20B and gpt-oss-120B
  • Supports 16K context SFT, 8k context DPO
Aug 5
/chat/completions
OpenAI GPT-OSS Models Now AvailableOpenAI’s first open-weight models now accessible through Together AI. Read more
  • Model IDs: openai/gpt-oss-20b, openai/gpt-oss-120b

July, 2025

Jul 29
/chat/completions
VirtueGuard Model ReleaseEnterprise-grade gaurd model for safety monitoring with 8ms response time. Read more
  • Real-time content filtering and bias detection
  • Prompt injection protection
  • Model ID: VirtueAI/VirtueGuard-Text-Lite
Jul 28
/evaluation
Together Evaluations FrameworkBenchmarking platform using LLM-as-a-judge methodology for model performance assessment. Read more
  • Create custom LLM-as-a-Judge evaluation suites for your domain
  • Support compare, classify and score functionality
  • Enables comparing models, prompts and LLM configs, scoring and classifying LLM outputs
Jul 25
/chat/completions
Qwen3-Coder-480B Model ReleaseAgentic coding model with top SWE-Bench Verified performance. Read more
  • 480B total parameters with 35B active (MoE architecture)
  • 256K context length for entire codebase handling
  • Leading SWE-Bench scores on software engineering benchmarks
  • Model ID: Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8
Jul 17
/chat/completions
NVIDIA HGX B200 Hardware SupportRecord-breaking serverless inference speed for DeepSeek-R1-0528 using NVIDIA’s Blackwell architecture. Read more
  • Dramatically improved throughput and lower latency
  • Same API endpoints and pricing
  • Model ID: deepseek-ai/DeepSeek-R1
Jul 14
/chat/completions
Kimi-K2-Instruct Model LaunchMoonshot AI’s 1 trillion parameter MoE model with frontier-level performance. Read more
  • Excels at tool use, and multi-step tasks and strong multilingual support
  • Great agentic and function calling capabilities
  • Model ID: moonshotai/Kimi-K2-Instruct
Jul 10
/audio/transcriptions
Whisper Speech-to-Text APIsHigh-performance audio transcription that’s 15× faster than OpenAI with support for files over 1 GB. Read more
  • Multiple audio formats with timestamp generation
  • Speaker diarization and language detection
  • Use /audio/transcriptions and /audio/translations endpoint
  • Model ID: openai/whisper-large-v3
Jul 8
Compliance
SOC 2 Type II Compliance CertificationAchieved enterprise-grade security compliance through independent audit of security controls. Read more
  • Simplified vendor approval and procurement
  • Reduced due diligence requirements
  • Support for regulated industries