How to use Kimi K2.6
Get started with this model in a few lines of code. The model ID ismoonshotai/Kimi-K2.6 and it supports a 256K context window.
Thinking mode
K2.6 supports both instant mode (fast responses) and thinking mode (step-by-step reasoning). When enabling thinking mode, you’ll receive both areasoning field and a content field. By default, the model uses thinking mode.
Use the right temperature: Set
temperature=1.0 for thinking mode and temperature=0.6 for instant mode. The wrong temperature can significantly degrade output quality.Vision capabilities
K2.6 accepts image inputs alongside text, so it can answer questions about visual content, reason across text and images, and ground tool calls in what it sees.Use cases
K2.6 excels in scenarios requiring combined visual understanding and agentic execution:- Coding from visual specs: Generate code from UI designs, wireframes, or video workflows, then autonomously orchestrate tools for implementation.
- Visual data processing pipelines: Analyze charts, diagrams, or screenshots and chain tool calls to extract, transform, and act on visual data.
- Multi-modal agent workflows: Build agents that maintain coherent behavior across extended sequences of tool calls interleaved with image analysis.
- Document intelligence: Process complex documents with mixed text and visuals, extracting information and taking actions based on what’s seen.
- UI testing and automation: Analyze screenshots, identify elements, and generate test scripts or automation workflows.
- Cross-modal reasoning: Solve problems that require understanding relationships between visual and textual information.
Agent swarm capability
K2.6 can decompose a complex task into parallel sub-tasks and coordinate them as a swarm of domain-specific sub-agents. You enable this by exposing two tools and prompting the model to delegate: one tool to spawn a sub-agent with a focused task, and one for sub-agents to report results back to the orchestrator. Given those tools and a high-level goal, K2.6 plans the decomposition, fans out the work in parallel, and aggregates the results. This pattern shows up in coding agents like OpenCode, where the model issues several tool calls in parallel to solve a problem faster.The exact tool schema for sub-agent spawning is up to your harness. Check the Kimi GitHub repo for the latest implementation guidance.
Prompting tips
| Tip | Rationale |
|---|---|
| Temperature = 1.0 for thinking, 0.6 for instant | Critical for output quality. Thinking mode needs higher temperature; instant mode benefits from more focused sampling. |
| top_p = 0.95 | Recommended default for both modes. |
Keep system prompts simple - "You are Kimi, an AI assistant created by Moonshot AI." | Matches the prompt used during instruction tuning. |
| Leverage native tool calling with vision | Pass images in user messages alongside tool definitions. K2.6 can ground tool calls in visual context. |
| Think in goals, not steps | Give high-level objectives and let the model orchestrate sub-tasks, especially for agentic workflows. |
| Chunk very long contexts | 256K context is large, but response speed drops on >100K inputs. Provide an executive summary to focus the model. |
Multi-turn tool calling with images
K2.6 can perform multi-turn tool calls with images interleaved between the calls, maintaining coherent tool use across long sequences while processing visual inputs at each step. This makes K2.6 ideal for visual workflows where the model needs to analyze images, call tools based on what it sees, receive results, analyze new images, and continue iterating. The example below demonstrates a four-turn conversation where the model:- Calls the weather tool for multiple cities in parallel.
- Follows up with restaurant recommendations based on weather context.
- Identifies a company from an image and fetches its stock price.
- Processes a new city image to get weather and restaurant info.
Python