Local images
To query a vision model with a local image:Output
Video input
Video understanding (passing avideo_url content block to a chat completion) is supported on select VLMs that run only as a dedicated endpoint, for example Qwen/Qwen3-VL-8B-Instruct. Spin up a dedicated endpoint, then pass the endpoint name as model and a video_url block alongside text:
Python
Multiple images
Sample model output
Sample model output