September, 2025
Improved Batch Inference API: Enhanced UI, Expanded Model Support, and Rate Limit IncreaseWhat’s New
- Streamlined UI: Create and track batch jobs in an intuitive interface — no complex API calls required.
- Universal Model Access: The Batch Inference API now supports all serverless models and private deployments, so you can run batch workloads on exactly the models you need.
- Massive Scale Jump: Rate limits are up from 10M to 30B enqueued tokens per model per user, a 3000× increase. Need more? We’ll work with you to customize.
- Lower Cost: For most serverless models, the Batch Inference API runs at 50% the cost of our real-time API, making it the most economical way to process high-throughput workloads.
Qwen3-Next-80B Models ReleaseNew Qwen3-Next-80B models now available for both thinking and instruction tasks.
- Model ID:
Qwen/Qwen3-Next-80B-A3B-Thinking
- Model ID:
Qwen/Qwen3-Next-80B-A3B-Instruct
Together Instant Clusters General AvailabilitySelf-service NVIDIA GPU clusters with API-first provisioning. Read more
- New API endpoints for cluster management:
/v1/gpu_cluster
- Create and manage GPU clusters/v1/shared_volume
- High-performance shared storage/v1/regions
- Available data center locations
- Support for NVIDIA Blackwell (HGX B200) and Hopper (H100, H200) GPUs
- Scale from single-node (8 GPUs) to hundreds of interconnected GPUs
- Pre-configured with Kubernetes, Slurm, and networking components
Serverless LoRA and Dedicated Endpoints support for EvaluationsYou can now run evaluations:
- Using Serverless LoRA models, including supported LoRA fine-tuned models
- Using Dedicated Endpoints, including fine-tuned models deployed via dedicated endpoints
Kimi-K2-Instruct-0905 Model ReleaseUpgraded version of Moonshot’s 1 trillion parameter MoE model with enhanced performance. Read more
- Model ID:
moonshot-ai/Kimi-K2-Instruct-0905
August, 2025
DeepSeek-V3.1 Model ReleaseUpgraded version of DeepSeek-R1-0528 and DeepSeek-V3-0324. Read more
Model DeprecationsThe following models have been deprecated and are no longer available:
- Dual Modes: Fast mode for quick responses, thinking mode for complex reasoning
- 671B total parameters with 37B active parameters
- Model ID:
deepseek-ai/DeepSeek-V3.1
Model DeprecationsThe following models have been deprecated and are no longer available:
meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo
black-forest-labs/FLUX.1-canny
meta-llama/Llama-3-8b-chat-hf
black-forest-labs/FLUX.1-redux
black-forest-labs/FLUX.1-depth
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO
meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo
meta-llama-llama-3-3-70b-instruct-lora
Qwen/Qwen2.5-14B
meta-llama/Llama-Vision-Free
Qwen/Qwen2-72B-Instruct
google/gemma-2-27b-it
meta-llama/Meta-Llama-3-8B-Instruct
perplexity-ai/r1-1776
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
Qwen/Qwen2-VL-72B-Instruct
GPT-OSS Models Fine-Tuning SupportFine-tune OpenAI’s open-source models to create domain-specific variants. Read more
- Supported models:
gpt-oss-20B
andgpt-oss-120B
- Supports 16K context SFT, 8k context DPO
OpenAI GPT-OSS Models Now AvailableOpenAI’s first open-weight models now accessible through Together AI. Read more
- Model IDs:
openai/gpt-oss-20b
,openai/gpt-oss-120b
July, 2025
VirtueGuard Model ReleaseEnterprise-grade gaurd model for safety monitoring with 8ms response time. Read more
- Real-time content filtering and bias detection
- Prompt injection protection
- Model ID:
VirtueAI/VirtueGuard-Text-Lite
Together Evaluations FrameworkBenchmarking platform using LLM-as-a-judge methodology for model performance assessment. Read more
- Create custom LLM-as-a-Judge evaluation suites for your domain
- Support
compare
,classify
andscore
functionality - Enables comparing models, prompts and LLM configs, scoring and classifying LLM outputs
Qwen3-Coder-480B Model ReleaseAgentic coding model with top SWE-Bench Verified performance. Read more
- 480B total parameters with 35B active (MoE architecture)
- 256K context length for entire codebase handling
- Leading SWE-Bench scores on software engineering benchmarks
- Model ID:
Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8
NVIDIA HGX B200 Hardware SupportRecord-breaking serverless inference speed for DeepSeek-R1-0528 using NVIDIA’s Blackwell architecture. Read more
- Dramatically improved throughput and lower latency
- Same API endpoints and pricing
- Model ID:
deepseek-ai/DeepSeek-R1
Kimi-K2-Instruct Model LaunchMoonshot AI’s 1 trillion parameter MoE model with frontier-level performance. Read more
- Excels at tool use, and multi-step tasks and strong multilingual support
- Great agentic and function calling capabilities
- Model ID:
moonshotai/Kimi-K2-Instruct
Whisper Speech-to-Text APIsHigh-performance audio transcription that’s 15× faster than OpenAI with support for files over 1 GB. Read more
- Multiple audio formats with timestamp generation
- Speaker diarization and language detection
- Use
/audio/transcriptions
and/audio/translations
endpoint - Model ID:
openai/whisper-large-v3