JSON
Vision
Vision-language function calling
Combine image understanding with tool use on Together AI vision-language models.
Vision language models (VLMs) can also use function calling, letting you combine image understanding with tool use. This enables use cases like extracting structured data from images, identifying objects and taking actions, or analyzing visual content to trigger specific functions.
The model analyzes the image to identify the company, then returns a function call with the appropriate stock symbol: