<think>
tags and the answer.
Because these models use more computation/tokens to perform better reasoning they produce longer outputs and can be slower and more expensive then their non-reasoning counterparts.

How to use DeepSeek-R1 API
Since these models produce longer responses we’ll stream in tokens instead of waiting for the whole response to complete.Working with DeepSeek-R1
Reasoning models like DeepSeek-R1 should be used differently than standard non-reasoning models to get optimal results. Here are some usage guides:- Temperature: Use 0.5–0.7 (recommended 0.6) to balance creativity and coherence, avoiding repetitive or nonsensical outputs.
- System Prompts: Omit system prompts entirely. Provide all instructions directly in the user query.
- Strengths: Excels at open-ended reasoning, multi-step logic, and inferring unstated requirements.
- Over-prompting (e.g., micromanaging steps) can limit its ability to leverage advanced reasoning.
Under-prompting (e.g., vague goals like “Help with math”) may reduce specificity – balance clarity with flexibility.
DeepSeek-R1 Use-cases
- Benchmarking other LLMs: Evaluates LLM responses with contextual understanding, particularly useful in fields requiring critical validation like law, finance and healthcare.
- Code Review: Performs comprehensive code analysis and suggests improvements across large codebases
- Strategic Planning: Creates detailed plans and selects appropriate AI models based on specific task requirements
- Document Analysis: Processes unstructured documents and identifies patterns and connections across multiple sources
- Information Extraction: Efficiently extracts relevant data from large volumes of unstructured information, ideal for RAG systems
- Ambiguity Resolution: Interprets unclear instructions effectively and seeks clarification when needed rather than making assumptions
Managing Context and Costs
When working with reasoning models, it’s crucial to maintain adequate space in the context window to accommodate the model’s reasoning process. The number of reasoning tokens generated can vary based on the complexity of the task - simpler problems may only require a few hundred tokens, while more complex challenges could generate tens of thousands of reasoning tokens. Cost/Latency management is an important consideration when using these models. To maintain control over resource usage, you can implement limits on the total token generation using themax_tokens
parameter.
While limiting tokens can reduce costs/latency, it may also impact the model’s ability to fully reason through complex problems. Therefore, it’s recommended to adjust these parameters based on your specific use case and requirements, finding the optimal balance between thorough reasoning and resource utilization.
General Limitations
Currently, the capabilities of DeepSeek-R1 fall short of DeepSeek-V3 in general purpose tasks such as:- Function calling
- Multi-turn conversation
- Complex role-playing
- JSON output.