Together.ai Docs home page
Search...
⌘K
Ask AI
Status
Playground
Playground
Search...
Navigation
❓ Frequently Asked Questions
Deploy Dedicated Endpoints In The Web
Documentation
API Reference
Getting Started
Introduction
Quickstart
OpenAI Compatibility
Inference
Serverless models
Dedicated Models
Batch Inference
DeepSeek-R1 Quickstart
Llama 4 Quickstart
Kimi K2 QuickStart
Capabilities
Chat
Structured Outputs
Function Calling
Images
Vision
Code Execution
Speech-to-Text
Evaluations
Other Modalities
Examples
Agent Workflows
Together Cookbooks
Example Apps
Integrations
Training
Fine-tuning
GPU Clusters
Guides
How to Build Coding Agents
Getting Started with Logprobs
Quickstart: Retrieval Augmented Generation (RAG)
Quickstart: Next.Js
Quickstart: Using Hugging Face Inference With Together
Quickstart: Using Vercel'S AI SDK With Together AI
Together Mixture Of Agents (MoA)
Quickstart: How to do OCR
Search & RAG
Apps
How to use Cline with DeepSeek V3 to build faster
❓ Frequently Asked Questions
Deployment Options
Rate Limits
Error Codes
Deploy Dedicated Endpoints In The Web
Priority Support
Deprecations
Playground
Inference FAQs
Fine Tuning FAQs
Multiple API Keys
Dedicated Endpoints
On this page
Creating an on demand dedicated endpoint
❓ Frequently Asked Questions
Deploy Dedicated Endpoints In The Web
Copy page
Deploy your own GPUs
Copy page
With Together AI, you can create on-demand dedicated endpoints with the following advantages:
Consistent, predictable performance, unaffected by other users’ load in our serverless environment
No rate limits, with a high maximum load capacity
More cost-effective under high utilization
Access to a broader selection of models
Creating an on demand dedicated endpoint
Navigate to the
Models page
in our playground. Under “All models” click “Dedicated.” Search across 179 available models.
Select your hardware. We have multiple hardware options available, all with varying prices (e.g. RTX-6000, L40, A100 SXM, A100 PCIe, and H100).
Click the Play button, and wait up to 10 minutes for the endpoint to be deployed.
We will provide you the string you can use to call the model, as well as additional information about your deployment.
You can navigate away while your model is being deployed. Click open when it’s ready:
Start using your endpoint!
You can now find your endpoint in the My Models Page, and upon clicking the Model, under “Endpoints”
Looking for custom configurations?
Contact us.
Was this page helpful?
Yes
No
Error Codes
Create Tickets In Slack
Assistant
Responses are generated using AI and may contain mistakes.