Kimi K2 QuickStart

How to get the most out of models like Kimi K2.

Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model developed by Moonshot AI. It's a 1 trillion total parameter model (32B activated) that is currently the best non-reasoning open source model out there.

It was trained on 15.5 trillion tokens, supports a 128k context window, and excels in agentic tasks, coding, reasoning, and tool use. Even though it's a 1T model, at inference time, the fact that only 32 B parameters are active gives it near‑frontier quality at a fraction of the compute of dense peers.

In this quick guide, we'll go over the main use cases for Kimi K2, how to get started with it, when to use it, and prompting tips for getting the most out of this incredible model.

How to use Kimi K2

Get started with this model in 10 lines of code! The model ID is moonshotai/Kimi-K2-Instruct and the pricing is $1 for input tokens and $3 for output tokens.

from together import Together

client = Together()
resp = client.chat.completions.create(
    model="moonshotai/Kimi-K2-Instruct",
    messages=[{"role":"user","content":"Code a hacker news clone"}],
    stream=True,
)
for tok in resp:
    print(tok.choices[0].delta.content, end="", flush=True)
import Together from 'together-ai';
const together = new Together();

const stream = await together.chat.completions.create({
  model: 'moonshotai/Kimi-K2-Instruct',
  messages: [{ role: 'user', content: 'Code a hackernews clone' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Use cases

Kimi K2 shines in scenarios requiring autonomous problem-solving – specifically with coding & tool use:

  • Agentic Workflows: Automate multi-step tasks like booking flights, research, or data analysis using tools/APIs
  • Coding & Debugging: Solve software engineering tasks (e.g., SWE-bench), generate patches, or debug code
  • Research & Report Generation: Summarize technical documents, analyze trends, or draft reports using long-context capabilities
  • STEM Problem-Solving: Tackle advanced math (AIME, MATH), logic puzzles (ZebraLogic), or scientific reasoning
  • Tool Integration: Build AI agents that interact with APIs (e.g., weather data, databases).

Prompting tips

TipRationale
Keep the system prompt simple
"You are Kimi, an AI assistant created by Moonshot AI." is the recommended default.
Matches the prompt used during instruction tuning.
Temperature ≈ 0.6Calibrated to Kimi‑K2‑Instruct’s RLHF alignment curve; higher values yield verbosity.
Leverage native tool callingPass a JSON schema in tools=[…]; set tool_choice="auto". Kimi decides when/what to call.
Think in goals, not stepsBecause the model is “agentic”, give a high‑level objective (“Analyse this CSV and write a report”), letting it orchestrate sub‑tasks.
Chunk very long contexts128 K is huge, but response speed drops on >100 K inputs; supply a short executive brief in the final user message to focus the model.

Many of this information was found in the Kimi GitHub repo.

General Limitations of Kimi K2

We've outlined various use cases for when to use Kimi K2, but it also has a few situations where it currently isn't the best. The main ones are for latency specific applications like real-time voice agents, it's not the best solution currently to due to the speed of the model.

Similarly, if you wanted a quick summary for a long PDF, even though it can handle a good amount of context (128k tokens), it's speed is a bit prohibitive if you want to show text quickly to your user as it can get even slower when it is given a lot of context. However, if you're summarizing PDFs async for example or in another scenario where latency isn't a concern, this could be a good model to try.