Use this file to discover all available pages before exploring further.
Beyond a single hosted image URL, vision models accept local files (base64-encoded), video URLs, and multiple images in one prompt. For the basic URL example and supported models, see the Vision overview.
Video understanding (passing a video_url content block to a chat completion) is supported on select VLMs that run only as a dedicated endpoint, for example Qwen/Qwen3-VL-8B-Instruct. Spin up a dedicated endpoint, then pass the endpoint name as model and a video_url block alongside text:
Python
from together import Togetherclient = Together()response = client.chat.completions.create( model="<ACCOUNT>/Qwen/Qwen3-VL-8B-Instruct-<ENDPOINT_HASH>", # your dedicated endpoint name messages=[ { "role": "user", "content": [ {"type": "text", "text": "What's happening in this video?"}, { "type": "video_url", "video_url": { "url": "http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerFun.mp4" }, }, ], } ],)print(response.choices[0].message.content)
For text-to-video and image-to-video generation (separate from video understanding), see Video generation.
from together import Togetherclient = Together()# Multi-modal message with multiple imagesresponse = client.chat.completions.create( model="moonshotai/Kimi-K2.5", messages=[ { "role": "user", "content": [ {"type": "text", "text": "Compare these two images."}, { "type": "image_url", "image_url": { "url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png" }, }, { "type": "image_url", "image_url": { "url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/slack.png" }, }, ], } ],)print(response.choices[0].message.content)
Sample model output
The first image is a collage of multiple identical landscape photos showing a natural scene with rocks, trees, and a stream under a blue sky. The second image is a screenshot of a mobile app interface, specifically the navigation menu of the Canva app, which includes icons for Home, DMs (Direct Messages), Activity, Later, Canvases, and More.#### Comparison:1. **Content**: - The first image focuses on a natural landscape. - The second image shows a digital interface from an app.2. **Purpose**: - The first image could be used for showcasing nature, design elements in graphic work, or as a background. - The second image represents the functionality and layout of the Canva app's navigation system.3. **Visual Style**: - The first image has vibrant colors and realistic textures typical of outdoor photography. - The second image uses flat design icons with a simple color palette suited for user interface design.4. **Context**: - The first image is likely intended for artistic or environmental contexts. - The second image is relevant to digital design and app usability discussions.