name 字段的消息。同时将对话模板移至独立文件以便查看。[EOS])编码为对应的标记IDKimi K2是一款尖端混合专家(MoE)语言模型,拥有320亿激活参数与1万亿总参数。通过Muon优化器训练,该模型在前沿知识推理与代码生成任务中表现卓越,并针对智能代理能力进行了精细化优化。
| 特性 | 参数 |
|---|---|
| 架构 | 混合专家(MoE) |
| 总参数量 | 1万亿 |
| 激活参数量 | 320亿 |
| 层数(含稠密层) | 61 |
| 稠密层数量 | 1 |
| 注意力隐藏维度 | 7168 |
| MoE隐藏维度(单专家) | 2048 |
| 注意力头数量 | 64 |
| 专家总数 | 384 |
| 每标记选用专家数 | 8 |
| 共享专家数量 | 1 |
| 词表大小 | 16万 |
| 上下文长度 | 12.8万 |
| 注意力机制 | MLA |
| 激活函数 | SwiGLU |
[!注意] 您可通过 https://platform.moonshot.ai 访问Kimi K2的API服务,我们提供与OpenAI/Anthropic兼容的API接口。
为更好地兼容现有应用,Anthropic兼容API的温度参数映射规则为:
实际温度值 = 请求温度值 × 0.6。
我们的模型检查点采用block-fp8格式存储,您可以在Huggingface平台获取。
目前推荐在以下推理引擎上运行Kimi-K2模型:
vLLM和SGLang的部署示例请参阅模型部署指南。
当本地推理服务启动后,您可以通过对话终端点进行交互:
def simple_chat(client: OpenAI, model_name: str):
messages = [
{"role": "system", "content": "You are Kimi, an AI assistant created by Moonshot AI."},
{"role": "user", "content": [{"type": "text", "text": "Please give a brief self-introduction."}]},
]
response = client.chat.completions.create(
model=model_name,
messages=messages,
stream=False,
temperature=0.6,
max_tokens=256
)
print(response.choices[0].message.content)[!NOTE] Kimi-K2-Instruct 的推荐温度参数为
temperature = 0.6。 若无特殊指令要求,上述系统提示可作为优质默认配置。
Kimi-K2-Instruct 具备强大的工具调用能力。 启用该功能需在每次请求中传入可用工具列表,模型将自主决策调用时机与方式。
以下示例完整演示天气查询工具的端到端调用流程:
# Your tool implementation
def get_weather(city: str) -> dict:
return {"weather": "Sunny"}
# Tool schema definition
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Retrieve current weather information. Call this when the user asks about the weather.",
"parameters": {
"type": "object",
"required": ["city"],
"properties": {
"city": {
"type": "string",
"description": "Name of the city"
}
}
}
}
}]
# Map tool names to their implementations
tool_map = {
"get_weather": get_weather
}
def tool_call_with_client(client: OpenAI, model_name: str):
messages = [
{"role": "system", "content": "You are Kimi, an AI assistant created by Moonshot AI."},
{"role": "user", "content": "What's the weather like in Beijing today? Use the tool to check."}
]
finish_reason = None
while finish_reason is None or finish_reason == "tool_calls":
completion = client.chat.completions.create(
model=model_name,
messages=messages,
temperature=0.6,
tools=tools, # tool list defined above
tool_choice="auto"
)
choice = completion.choices[0]
finish_reason = choice.finish_reason
if finish_reason == "tool_calls":
messages.append(choice.message)
for tool_call in choice.message.tool_calls:
tool_call_name = tool_call.function.name
tool_call_arguments = json.loads(tool_call.function.arguments)
tool_function = tool_map[tool_call_name]
tool_result = tool_function(**tool_call_arguments)
print("tool_result:", tool_result)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"name": tool_call_name,
"content": json.dumps(tool_result)
})
print("-" * 100)
print(choice.message.content)tool_call_with_client 函数实现了从用户查询到工具执行的完整流程。
该流程要求推理引擎支持 Kimi-K2 原生的工具解析逻辑。
关于流式输出和手动工具解析,请参阅工具调用指南。
代码仓库和模型权重均基于修订版 MIT 许可证发布。
详见第三方声明文件
如有任何疑问,请通过support@moonshot.cn与我们联系。
| 评测基准 | 指标 | Kimi K2 Instruct | DeepSeek-V3-0324 | Qwen3-235B-A22B (非思维链模式) | Claude Sonnet 4 (无扩展思维) | Claude Opus 4 (无扩展思维) | GPT-4.1 | Gemini 2.5 Flash 预览版 (05-20) |
|---|---|---|---|---|---|---|---|---|
| 代码任务 | ||||||||
| LiveCodeBench v6 (2024年8月-2025年5月) | Pass@1 | 53.7 | 46.9 | 37.0 | 48.5 | 47.4 | 44.7 | 44.7 |
| OJBench | Pass@1 | 27.1 | 24.0 | 11.3 | 15.3 | 19.6 | 19.5 | 19.5 |
| MultiPL-E | Pass@1 | 85.7 | 83.1 | 78.2 | 88.6 | 89.6 | 86.7 | 85.6 |
| SWE-bench 验证集 (无代理编程) | 单补丁无测试 (准确率) | 51.8 | 36.6 | 39.4 | 50.2 | 53.0 | 40.8 | 32.6 |
| SWE-bench 验证集 (代理式编程) | 单次尝试 (准确率) | 65.8 | 38.8 | 34.4 | 72.7* | 72.5* | 54.6 | — |
| 多次尝试 (准确率) | 71.6 | — | — | 80.2 | 79.4* | — | — | |
| SWE-bench 多语言集 (代理式编程) | 单次尝试 (准确率) | 47.3 | 25.8 | 20.9 | 51.0 | — | 31.5 | — |
| TerminalBench | 内部框架 (准确率) | 30.0 | — | — | 35.5 | 43.2 | 8.3 | — |
| Terminus (准确率) | 25.0 | 16.3 | 6.6 | — | — | 30.3 | 16.8 | |
| Aider-Polyglot | 准确率 | 60.0 | 55.1 | 61.8 | 56.4 | 70.7 | 52.4 | 44.0 |
| 工具调用任务 | ||||||||
| Tau2 零售场景 | Avg@4 | 70.6 | 69.1 | 57.0 | 75.0 | 81.8 | 74.8 | 64.3 |
| Tau2 航空场景 | Avg@4 | 56.5 | 39.0 | 26.5 | 55.5 | 60.0 | 54.5 | 42.5 |
| Tau2 电信场景 | Avg@4 | 65.8 | 32.5 | 22.1 | 45.2 | 57.0 | 38.6 | 16.9 |
| AceBench | 准确率 | 76.5 | 72.7 | 70.5 | 76.2 | 75.6 | 80.1 | 74.5 |
| 数学与STEM任务 | ||||||||
| AIME 2024 | Avg@64 | 69.6 | 59.4* | 40.1* | 43.4 | 48.2 | 46.5 | 61.3 |
| AIME 2025 | Avg@64 | 49.5 | 46.7 | 24.7* | 33.1* | 33.9* | 37.0 | 46.6 |
| MATH-500 | 准确率 | 97.4 | 94.0* | 91.2* | 94.0 | 94.4 | 92.4 | 95.4 |
| HMMT 2025 | Avg@32 | 38.8 | 27.5 | 11.9 | 15.9 | 15.9 | 19.4 | 34.7 |
| CNMO 2024 | Avg@16 | 74.3 | 74.7 | 48.6 | 60.4 | 57.6 | 56.6 | 75.0 |
| PolyMath-en | Avg@4 | 65.1 | 59.5 | 51.9 | 52.8 | 49.8 | 54.0 | 49.9 |
| ZebraLogic | 准确率 | 89.0 | 84.0 | 37.7* | 73.7 | 59.3 | 58.5 | 57.9 |
| AutoLogi | 准确率 | 89.5 | 88.9 | 83.3 | 89.8 | 86.1 | 88.2 | 84.1 |
| GPQA-Diamond | Avg@8 | 75.1 | 68.4* | 62.9* | 70.0* | 74.9* | 66.3 | 68.2 |
| SuperGPQA | 准确率 | 57.2 | 53.7 | 50.2 | 55.7 | 56.5 | 50.8 | 49.6 |
| 人类终极考试 (纯文本) | - | 4.7 | 5.2 | 5.7 | 5.8 | 7.1 | 3.7 | 5.6 |
| 通用任务 | ||||||||
| MMLU | 精确匹配 | 89.5 | 89.4 | 87.0 | 91.5 | 92.9 | 90.4 | 90.1 |
| MMLU-Redux | 精确匹配 | 92.7 | 90.5 | 89.2 | 93.6 | 94.2 | 92.4 | 90.6 |
| MMLU-Pro | 精确匹配 | 81.1 | 81.2* | 77.3 | 83.7 | 86.6 | 81.8 | 79.4 |
| IFEval | 严格提示遵循 | 89.8 | 81.1 | 83.2* | 87.6 | 87.4 | 88.0 | 84.3 |
| Multi-Challenge | 准确率 | 54.1 | 31.4 | 34.0 | 46.8 | 49.0 | 36.4 | 39.5 |
| SimpleQA | 正确率 | 31.0 | 27.7 | 13.2 | 15.9 | 22.8 | 42.3 | 23.3 |
| Livebench | Pass@1 | 76.4 | 72.4 | 67.6 | 74.8 | 74.6 | 69.8 | 67.8 |