Introduction
Standard large language models respond to user queries by generating plain text. This is great for many applications like chatbots, but if you want to programmatically access details in the response, plain text is hard to work with. Some models have the ability to respond with structured JSON instead, making it easy to work with data from the LLM’s output directly in your application code. If you’re using a supported model, you can enable structured responses by providing your desired schema details to theresponse_format
key of the Chat Completions API.
Supported models
The following newly released top models support JSON mode:openai/gpt-oss-120b
openai/gpt-oss-20b
moonshotai/Kimi-K2-Instruct
zai-org/GLM-4.5-Air-FP8
Qwen/Qwen3-235B-A22B-Thinking-2507
Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8
Qwen/Qwen3-235B-A22B-Instruct-2507-tput
deepseek-ai/DeepSeek-R1
deepseek-ai/DeepSeek-R1-0528-tput
deepseek-ai/DeepSeek-V3
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
Qwen/Qwen2.5-72B-Instruct-Turbo
Qwen/Qwen2.5-VL-72B-Instruct
meta-llama/Llama-4-Scout-17B-16E-Instruct
meta-llama/Llama-3.3-70B-Instruct-Turbo
deepcogito/cogito-v2-preview-llama-70B
deepcogito/cogito-v2-preview-llama-109B-MoE
deepcogito/cogito-v2-preview-llama-405B
deepcogito/cogito-v2-preview-deepseek-671b
deepseek-ai/DeepSeek-R1-Distill-Llama-70B
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
marin-community/marin-8b-instruct
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
meta-llama/Llama-3.3-70B-Instruct-Turbo-Free
Qwen/Qwen2.5-7B-Instruct-Turbo
Qwen/Qwen2.5-Coder-32B-Instruct
Qwen/QwQ-32B
Qwen/Qwen3-235B-A22B-fp8-tput
arcee-ai/coder-large
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
meta-llama/Llama-3.2-3B-Instruct-Turbo
meta-llama/Meta-Llama-3-8B-Instruct-Lite
meta-llama/Llama-3-70b-chat-hf
google/gemma-3n-E4B-it
mistralai/Mistral-7B-Instruct-v0.1
mistralai/Mistral-7B-Instruct-v0.2
mistralai/Mistral-7B-Instruct-v0.3
arcee_ai/arcee-spotlight
Basic example
Let’s look at a simple example, where we pass a transcript of a voice note to a model and ask it to summarize it. We want the summary to have the following structure:JSON
response_format
key.
Finally – and this is important – we need to make sure to instruct our model to only respond in JSON format, and include details of the schema we want to use. This ensures it will actually use the schema we provide when generating its response. Any instructions in the schema itself will not be followed by the LLM.
Important: You must always instruct your model to only respond in JSON format, either in the system prompt or a user message, in addition to passing your schema to the response_format
key.
Let’s see what this looks like:
JSON
Regex example
All the models supported for JSON mode also support regex mode. Here’s an example using it to constrain the classification.Reasoning model example
You can also extract structured outputs from some reasoning models such asDeepSeek-R1-0528
.
Below we ask the model to solve a math problem step-by-step showing it’s work:
Python
JSON
Vision model example
Let’s look at another example, this time using a vision model. We want our LLM to extract text from the following screenshot of a Trello board:
JSON
Try out your code in the Together Playground
You can try out JSON Mode in the Together Playground to test out variations on your schema and prompt: