Kimi-K2-Instruct:Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the Muon optimizer, Kimi K2 achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities.

📰 技术博客 | 📄 技术报告

0. 更新日志

2025年8月11日

现已支持包含 name 字段的消息。同时将对话模板移至独立文件以便查看。

2025年7月18日

进一步优化对话模板以增强其鲁棒性，并更新了默认系统提示词。

2025年7月15日

更新了分词器实现，现支持将特殊标记（如[EOS]）编码为对应的标记ID
修复了对话模板中导致多轮工具调用中断的错误

1. 模型介绍

Kimi K2是一款尖端混合专家（MoE）语言模型，拥有320亿激活参数与1万亿总参数。通过Muon优化器训练，该模型在前沿知识推理与代码生成任务中表现卓越，并针对智能代理能力进行了精细化优化。

核心特性

大规模训练：基于15.5万亿标记预训练1万亿参数MoE模型，全程零训练不稳定现象
MuonClip优化器：将Muon优化器应用于前所未有的规模，并通过创新技术解决扩展过程中的稳定性问题
智能代理能力：专为工具调用、推理与自主问题解决场景设计

模型变体

Kimi-K2-Base：基础模型，为需要完全控制微调和定制解决方案的研究者与开发者提供强力起点
Kimi-K2-Instruct：后训练模型，适用于开箱即用的通用对话与智能代理场景，具备即时响应能力无需长时思考

2. 模型概要

特性	参数
架构	混合专家（MoE）
总参数量	1万亿
激活参数量	320亿
层数（含稠密层）	61
稠密层数量	1
注意力隐藏维度	7168
MoE隐藏维度（单专家）	2048
注意力头数量	64
专家总数	384
每标记选用专家数	8
共享专家数量	1
词表大小	16万
上下文长度	12.8万
注意力机制	MLA
激活函数	SwiGLU

3. 评测结果

指令模型评测结果

4. 模型部署

[!注意] 您可通过 https://platform.moonshot.ai 访问Kimi K2的API服务，我们提供与OpenAI/Anthropic兼容的API接口。

为更好地兼容现有应用，Anthropic兼容API的温度参数映射规则为：实际温度值 = 请求温度值 × 0.6。

我们的模型检查点采用block-fp8格式存储，您可以在Huggingface平台获取。

目前推荐在以下推理引擎上运行Kimi-K2模型：

vLLM
SGLang
KTransformers
TensorRT-LLM

vLLM和SGLang的部署示例请参阅模型部署指南。

5. 模型使用

对话补全

当本地推理服务启动后，您可以通过对话终端点进行交互：

def simple_chat(client: OpenAI, model_name: str):
    messages = [
        {"role": "system", "content": "You are Kimi, an AI assistant created by Moonshot AI."},
        {"role": "user", "content": [{"type": "text", "text": "Please give a brief self-introduction."}]},
    ]
    response = client.chat.completions.create(
        model=model_name,
        messages=messages,
        stream=False,
        temperature=0.6,
        max_tokens=256
    )
    print(response.choices[0].message.content)

[!NOTE] Kimi-K2-Instruct 的推荐温度参数为 temperature = 0.6。若无特殊指令要求，上述系统提示可作为优质默认配置。

工具调用

Kimi-K2-Instruct 具备强大的工具调用能力。启用该功能需在每次请求中传入可用工具列表，模型将自主决策调用时机与方式。

以下示例完整演示天气查询工具的端到端调用流程：

# Your tool implementation
def get_weather(city: str) -> dict:
    return {"weather": "Sunny"}

# Tool schema definition
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Retrieve current weather information. Call this when the user asks about the weather.",
        "parameters": {
            "type": "object",
            "required": ["city"],
            "properties": {
                "city": {
                    "type": "string",
                    "description": "Name of the city"
                }
            }
        }
    }
}]

# Map tool names to their implementations
tool_map = {
    "get_weather": get_weather
}

def tool_call_with_client(client: OpenAI, model_name: str):
    messages = [
        {"role": "system", "content": "You are Kimi, an AI assistant created by Moonshot AI."},
        {"role": "user", "content": "What's the weather like in Beijing today? Use the tool to check."}
    ]
    finish_reason = None
    while finish_reason is None or finish_reason == "tool_calls":
        completion = client.chat.completions.create(
            model=model_name,
            messages=messages,
            temperature=0.6,
            tools=tools,          # tool list defined above
            tool_choice="auto"
        )
        choice = completion.choices[0]
        finish_reason = choice.finish_reason
        if finish_reason == "tool_calls":
            messages.append(choice.message)
            for tool_call in choice.message.tool_calls:
                tool_call_name = tool_call.function.name
                tool_call_arguments = json.loads(tool_call.function.arguments)
                tool_function = tool_map[tool_call_name]
                tool_result = tool_function(**tool_call_arguments)
                print("tool_result:", tool_result)

                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "name": tool_call_name,
                    "content": json.dumps(tool_result)
                })
    print("-" * 100)
    print(choice.message.content)

tool_call_with_client 函数实现了从用户查询到工具执行的完整流程。该流程要求推理引擎支持 Kimi-K2 原生的工具解析逻辑。关于流式输出和手动工具解析，请参阅工具调用指南。

6. 许可协议

代码仓库和模型权重均基于修订版 MIT 许可证发布。

7. 第三方声明

详见第三方声明文件

7. 联系我们

如有任何疑问，请通过support@moonshot.cn与我们联系。

评测基准	指标	^{Kimi K2 Instruct}	^{DeepSeek-V3-0324}	^{Qwen3-235B-A22B ^{(非思维链模式)}}	^{Claude Sonnet 4 ^{(无扩展思维)}}	^{Claude Opus 4 ^{(无扩展思维)}}	^GPT-4.1	^{Gemini 2.5 Flash 预览版 (05-20)}
代码任务
LiveCodeBench v6 ^{(2024年8月-2025年5月)}	Pass@1	53.7	46.9	37.0	48.5	47.4	44.7	44.7
OJBench	Pass@1	27.1	24.0	11.3	15.3	19.6	19.5	19.5
MultiPL-E	Pass@1	85.7	83.1	78.2	88.6	89.6	86.7	85.6
SWE-bench 验证集 ^{(无代理编程)}	单补丁无测试 (准确率)	51.8	36.6	39.4	50.2	53.0	40.8	32.6
SWE-bench 验证集 ^{(代理式编程)}	单次尝试 (准确率)	65.8	38.8	34.4	72.7^*	72.5^*	54.6	—
SWE-bench 验证集 ^{(代理式编程)}	多次尝试 (准确率)	71.6	—	—	80.2	79.4^*	—	—
SWE-bench 多语言集 ^{(代理式编程)}	单次尝试 (准确率)	47.3	25.8	20.9	51.0	—	31.5	—
TerminalBench	内部框架 (准确率)	30.0	—	—	35.5	43.2	8.3	—
TerminalBench	Terminus (准确率)	25.0	16.3	6.6	—	—	30.3	16.8
Aider-Polyglot	准确率	60.0	55.1	61.8	56.4	70.7	52.4	44.0
工具调用任务
Tau2 零售场景	Avg@4	70.6	69.1	57.0	75.0	81.8	74.8	64.3
Tau2 航空场景	Avg@4	56.5	39.0	26.5	55.5	60.0	54.5	42.5
Tau2 电信场景	Avg@4	65.8	32.5	22.1	45.2	57.0	38.6	16.9
AceBench	准确率	76.5	72.7	70.5	76.2	75.6	80.1	74.5
数学与STEM任务
AIME 2024	Avg@64	69.6	59.4^*	40.1^*	43.4	48.2	46.5	61.3
AIME 2025	Avg@64	49.5	46.7	24.7^*	33.1^*	33.9^*	37.0	46.6
MATH-500	准确率	97.4	94.0^*	91.2^*	94.0	94.4	92.4	95.4
HMMT 2025	Avg@32	38.8	27.5	11.9	15.9	15.9	19.4	34.7
CNMO 2024	Avg@16	74.3	74.7	48.6	60.4	57.6	56.6	75.0
PolyMath-en	Avg@4	65.1	59.5	51.9	52.8	49.8	54.0	49.9
ZebraLogic	准确率	89.0	84.0	37.7^*	73.7	59.3	58.5	57.9
AutoLogi	准确率	89.5	88.9	83.3	89.8	86.1	88.2	84.1
GPQA-Diamond	Avg@8	75.1	68.4^*	62.9^*	70.0^*	74.9^*	66.3	68.2
SuperGPQA	准确率	57.2	53.7	50.2	55.7	56.5	50.8	49.6
人类终极考试 ^(纯文本)	-	4.7	5.2	5.7	5.8	7.1	3.7	5.6
通用任务
MMLU	精确匹配	89.5	89.4	87.0	91.5	92.9	90.4	90.1
MMLU-Redux	精确匹配	92.7	90.5	89.2	93.6	94.2	92.4	90.6
MMLU-Pro	精确匹配	81.1	81.2^*	77.3	83.7	86.6	81.8	79.4
IFEval	严格提示遵循	89.8	81.1	83.2^*	87.6	87.4	88.0	84.3
Multi-Challenge	准确率	54.1	31.4	34.0	46.8	49.0	36.4	39.5
SimpleQA	正确率	31.0	27.7	13.2	15.9	22.8	42.3	23.3
Livebench	Pass@1	76.4	72.4	67.6	74.8	74.6	69.8	67.8