Vision Models

See which open-source vision models we currently host, or learn how to configure and host your own.

Our Vision API has built-in support for popular vision models we host via our serverless endpoints, as well as any model that you configure and host yourself using our dedicated GPU infrastructure.

When using one of our serverless models, you'll be charged based on the amount of tokens you use in your queries. For dedicated models that you configure and run yourself, you'll be charged per minute as long as your endpoint is running. You can start or stop your endpoint at any time using our online playground.

To learn more about the pricing for our serverless endoints, check out our pricing page.

Hosted models

If you're not sure which chat model to use, we currently recommend Llama 3.2 11B Turbo (meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo) to get started. For model specific rate limits, navigate here.

OrganizationModel NameAPI Model StringContext length
Meta(Free) Llama 3.2 11B Vision Instruct Turbo*meta-llama/Llama-Vision-Free131072
MetaLlama 3.2 11B Vision Instruct Turbometa-llama/Llama-3.2-11B-Vision-Instruct-Turbo131072
MetaLlama 3.2 90B Vision Instruct Turbometa-llama/Llama-3.2-90B-Vision-Instruct-Turbo131072

*Free model has reduced rate limits compared to paid version of Llama 3.2 Vision 11B named Llama-3.2-11B-Vision-Instruct-Turbo

Request a model

Don't see a model you want to use?

Send us a Model Request here →