How to use Kimi K2.5
Get started with this model in just a few lines of code. The model ID ismoonshotai/Kimi-K2.5 and it supports a 256K context window.
Thinking Mode
K2.5 supports both instant mode (fast responses) and thinking mode (step-by-step reasoning). When enabling thinking mode, you’ll receive both areasoning field and a content field. By default the model will use thinking mode.
Vision Capabilities
K2.5 is natively multimodal, pre-trained on vision-language tokens from the ground up. This means it excels at visual knowledge, cross-modal reasoning, and agentic tool use grounded in visual inputs.Use Cases
K2.5 excels in scenarios requiring combined visual understanding and agentic execution:- Coding from Visual Specs: Generate code from UI designs, wireframes, or video workflows, then autonomously orchestrate tools for implementation
- Visual Data Processing Pipelines: Analyze charts, diagrams, or screenshots and chain tool calls to extract, transform, and act on visual data
- Multi-Modal Agent Workflows: Build agents that maintain coherent behavior across extended sequences of tool calls interleaved with image analysis
- Document Intelligence: Process complex documents with mixed text and visuals, extracting information and taking actions based on what’s seen
- UI Testing & Automation: Analyze screenshots, identify elements, and generate test scripts or automation workflows
- Cross-Modal Reasoning: Solve problems that require understanding relationships between visual and textual information
Agent Swarm Capability
K2.5 introduces an agent swarm capability where the model can decompose complex tasks into parallel sub-tasks executed by dynamically instantiated, domain-specific agents. We have seen this show up in coding agent tool likes OpenCode where it will call more tools and parallel to solve a problem. This training approach focused on rewarding steps-to-task-completion, encouraging the model to delegate work effectively.The agent swarm capability is a new paradigm for open-source models. Technical documentation from Moonshot on the exact tool schema for sub-agent spawning is still emerging. Check the Kimi GitHub repo for the latest implementation guidance.
Prompting Tips
| Tip | Rationale |
|---|---|
| Temperature = 1.0 for Thinking, 0.6 for Instant | Critical for output quality. Thinking mode needs higher temperature; instant mode benefits from more focused sampling. |
| top_p = 0.95 | Recommended default for both modes. |
Keep system prompts simple - "You are Kimi, an AI assistant created by Moonshot AI." | Matches the prompt used during instruction tuning. |
| Leverage native tool calling with vision | Pass images in user messages alongside tool definitions. K2.5 can ground tool calls in visual context. |
| Think in goals, not steps | Give high-level objectives and let the model orchestrate sub-tasks, especially for agentic workflows. |
| Chunk very long contexts | 256K context is large, but response speed drops on >100K inputs. Provide an executive summary to focus the model. |
Multi-Turn Tool Calling with Images
What truly sets K2.5 apart is its ability to perform massive multi-turn tool calls with images interleaved between the calls. While multi-turn function calling is table stakes for agentic models, K2.5 can maintain coherent tool use across 100+ sequential calls while processing visual inputs at each step. This makes K2.5 ideal for visual workflows where the model needs to analyze images, call tools based on what it sees, receive results, analyze new images, and continue iterating. The example below demonstrates a 4-turn conversation where the model:- Parallel calls of the weather tool for multiple cities
- Follows up with restaurant recommendations based on weather context
- Identifies a company from an image and fetches its stock price
- Processes a new city image to get weather and restaurant info
Python