Mistral Medium 3.5 是我们的首款旗舰级融合模型。这是一个拥有 1280 亿参数的密集型模型,配备 256k 上下文窗口,能够通过单一权重集处理指令遵循、推理及编码任务。Mistral Medium 3.5 在 Le Chat 中取代了其前身 Mistral Medium 3.1 和 Magistral,并在我们的编码代理 Vibe 中替代了 Devstral 2。具体而言,与我们之前发布的模型相比,这款全新的统一模型在指令遵循、推理和编码任务上的性能有望得到提升。
推理强度可按请求配置,因此同一模型既能快速回复聊天消息,也能处理复杂的智能体运行任务。我们从零开始训练视觉编码器,以应对可变的图像尺寸和宽高比。
更多信息请参见我们的 博客。
[!Note] 若要使用 vLLM 或 SGLang 加速本地推理,请查看我们发布的 EAGLE 模型。
Mistral Medium 3.5 采用了以下架构设计:
Mistral Medium 3.5 具备以下功能:
我们依据 Modified MIT License 发布此模型:这是一种开源许可证,允许商业和非商业使用,但对高收入公司有例外规定。
'none' → 不使用推理'high' → 使用推理(推荐用于复杂提示词和智能体场景)
对于复杂任务和智能体编码,请使用 reasoning_effort="high"。reasoning_effort="high" 时,建议设为 0.7。当 reasoning_effort="none" 时,可根据具体任务将温度系数设置在 0.0 到 0.7 之间。
通常,较低的温度系数会使回答更切中要点,较高的温度系数则允许模型更具创造性。尝试不同的数值以优化模型性能,满足您的需求,是一种良好的实践。Mistral Medium 3.5 在所有基准测试中均超越了我们之前所有的编码模型,即 Devstral。它在 τ³-Telecom 上的得分为 91.4%,在 SWE-Bench Verified 上的得分为 77.6%。凭借其更强的智能体能力,Mistral Medium 3.5 已在我们的编码智能体 Vibe CLI 中取代了 Devstral 2。

我们在指令遵循、推理(数学)和编码基准测试中,将 Mistral Medium 3.5 与竞争模型进行了比较。得益于其统一的能力,它在所有这些任务中都取得了优异的成绩,并且 Mistral Medium 3.5 现已为 Le Chat 提供支持。

您可以在多个库中找到对 Mistral Medium 3.5 的支持,用于推理和微调。
在此,我们要感谢所有帮助实现这一目标的贡献者和维护者。
通过 Mistral Vibe 使用 Mistral Medium 3.5。
安装最新版本:
uv pip install mistral-vibe --upgrade启动 vibe 即可选择 Mistral Medium 3.5。如果是首次启动 vibe,程序将执行以下操作:
现在选择 mistral-medium-3.5 开始构建吧!
如果您不想调用 Mistral API,而是希望使用本地 vLLM 服务器,可按以下步骤操作:
~/.vibe/config.toml 中添加模型配置:display_name = "Mistral Medium 3.5 (local vLLM)"
description = "Mistral Medium 3.5 mode using local vLLM"
safety = "neutral"
active_model = "mistral-medium-3.5" # Make sure this is the only active_model entry
[[providers]]
name = "vllm"
api_base = "http://<your-host-url>:8000/v1"
api_key_env_var = ""
backend = "generic"
api_style = "reasoning"
[[models]]
name = "mistralai/Mistral-Medium-3.5-128B"
provider = "vllm"
alias = "mistral-medium-3.5"
thinking = "high"
temperature = 0.7
auto_compact_threshold = 168000
[tools.bash]
default_timeout = 1200注意:
<your-host-url> 替换为您服务器的 URL。然后重启 vibe 并通过“tab-shift”切换至“mistral-medium-3.5”模式。
尝试一些编码智能体任务,开始构建一些很酷的东西吧!
该模型可通过以下方式部署:
为获得最佳性能,如果本地服务效果不佳,我们建议使用 Mistral AI API。
可通过以下方式微调模型:
我们建议将 Mistral Medium 3.5 与 vLLM 库 结合使用,以实现生产级推理。
[!注意] 要使用 vLLM 加速本地推理,请查看我们发布的 EAGLE 模型
请确保安装 vllm nightly 版本:
uv pip install -U vllm \
--torch-backend=auto \
--extra-index-url https://wheels.vllm.ai/nightly执行此操作应会自动安装mistral_common >= 1.11.1和transformers >= 5.4.0。
检查方法:
python -c "import mistral_common; print(mistral_common.__version__)"
python -c "import transformers; print(transformers.__version__)"您还可以使用现成的 docker image 或 docker hub 上的镜像。
我们建议采用服务器/客户端架构:
vllm serve mistralai/Mistral-Medium-3.5-128B --tensor-parallel-size 8 \
--tool-call-parser mistral --enable-auto-tool-choice --reasoning-parser mistral --max_num_batched_tokens 16384 --max_num_seqs 128 \
--gpu_memory_utilization 0.8Mistral Medium 3.5 能够严格按照您的指令执行。
from datetime import datetime, timedelta
from openai import OpenAI
from huggingface_hub import hf_hub_download
# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
TEMP = 0.1
# use TEMP = 0.7 for reasoning="high"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
models = client.models.list()
model = models.data[0].id
def load_system_prompt(repo_id: str, filename: str) -> str:
file_path = hf_hub_download(repo_id=repo_id, filename=filename)
with open(file_path, "r") as file:
system_prompt = file.read()
today = datetime.today().strftime("%Y-%m-%d")
yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
model_name = repo_id.split("/")[-1]
return system_prompt.format(name=model_name, today=today, yesterday=yesterday)
SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{
"role": "user",
"content": "Write me a sentence where every word starts with the next letter in the alphabet - start with 'a' and end with 'z'.",
},
]
response = client.chat.completions.create(
model=model,
messages=messages,
temperature=TEMP,
reasoning_effort="none",
)
assistant_message = response.choices[0].message.content
print(assistant_message)借助我们简易的 Python 计算器工具来解一些方程吧。
import json
from datetime import datetime, timedelta
from openai import OpenAI
from huggingface_hub import hf_hub_download
# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
TEMP = 0.1
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
models = client.models.list()
model = models.data[0].id
def load_system_prompt(repo_id: str, filename: str) -> str:
file_path = hf_hub_download(repo_id=repo_id, filename=filename)
with open(file_path, "r") as file:
system_prompt = file.read()
today = datetime.today().strftime("%Y-%m-%d")
yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
model_name = repo_id.split("/")[-1]
return system_prompt.format(name=model_name, today=today, yesterday=yesterday)
SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
image_url = "https://math-coaching.com/img/fiche/46/expressions-mathematiques.jpg"
def my_calculator(expression: str) -> str:
return str(eval(expression))
tools = [
{
"type": "function",
"function": {
"name": "my_calculator",
"description": "A calculator that can evaluate a mathematical expression.",
"parameters": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "The mathematical expression to evaluate.",
},
},
"required": ["expression"],
},
},
},
{
"type": "function",
"function": {
"name": "rewrite",
"description": "Rewrite a given text for improved clarity",
"parameters": {
"type": "object",
"properties": {
"text": {
"type": "string",
"description": "The input text to rewrite",
}
},
},
},
},
]
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{
"role": "user",
"content": [
{
"type": "text",
"text": "Thanks to your calculator, compute the results for the equations that involve numbers displayed in the image.",
},
{
"type": "image_url",
"image_url": {
"url": image_url,
},
},
],
},
]
response = client.chat.completions.create(
model=model,
messages=messages,
temperature=TEMP,
tools=tools,
tool_choice="auto",
reasoning_effort="none",
)
tool_calls = response.choices[0].message.tool_calls
results = []
for tool_call in tool_calls:
function_name = tool_call.function.name
function_args = tool_call.function.arguments
if function_name == "my_calculator":
result = my_calculator(**json.loads(function_args))
results.append(result)
messages.append({"role": "assistant", "tool_calls": tool_calls})
for tool_call, result in zip(tool_calls, results):
messages.append(
{
"role": "tool",
"tool_call_id": tool_call.id,
"name": tool_call.function.name,
"content": result,
}
)
response = client.chat.completions.create(
model=model,
messages=messages,
temperature=TEMP,
reasoning_effort="none",
)
print(response.choices[0].message.content)来看看 Mistral Medium 3.5 是否知道何时该“出手”!
from datetime import datetime, timedelta
from openai import OpenAI
from huggingface_hub import hf_hub_download
# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
TEMP = 0.7
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
models = client.models.list()
model = models.data[0].id
def load_system_prompt(repo_id: str, filename: str) -> str:
file_path = hf_hub_download(repo_id=repo_id, filename=filename)
with open(file_path, "r") as file:
system_prompt = file.read()
today = datetime.today().strftime("%Y-%m-%d")
yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
model_name = repo_id.split("/")[-1]
return system_prompt.format(name=model_name, today=today, yesterday=yesterday)
SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438"
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.",
},
{"type": "image_url", "image_url": {"url": image_url}},
],
},
]
response = client.chat.completions.create(
model=model,
messages=messages,
temperature=TEMP,
reasoning_effort="high",
)
print(response.choices[0].message.content)借助SGLang 库部署 Mistral Medium 3.5,实现生产级推理。
[!Note] 如需使用 SGLang 加速本地推理,请查看我们发布的 EAGLE 模型。
初始支持已包含在专用的 Docker 标签中:
docker pull lmsysorg/sglang:dev-mistral-medium-3.5 # H100 / H200 (Hopper, CUDA 12.9)
docker pull lmsysorg/sglang:dev-cu13-mistral-medium-3.5 # B200 / B300 (Blackwell, CUDA 13.0)或者遵循 SGLang 安装指南。要求 transformers >= 5.4.0。
python -m sglang.launch_server --model-path mistralai/Mistral-Medium-3.5-128B \
--tp 8 --tool-call-parser mistral --reasoning-parser mistral如需完整部署指南、基准测试以及每个请求的示例(推理工作、工具调用、视觉、流式传输),请参阅 Mistral Medium 3.5 的 SGLang cookbook 条目。
首先安装 Transformers 框架 以使用 Mistral Medium 3.5:
uv pip install transformersimport torch
from transformers import AutoProcessor, Mistral3ForConditionalGeneration
model_id = "mistralai/Mistral-Medium-3.5-128B"
processor = AutoProcessor.from_pretrained(model_id)
model = Mistral3ForConditionalGeneration.from_pretrained(
model_id, device_map="auto"
)
image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438"
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.",
},
{"type": "image_url", "image_url": {"url": image_url}},
],
},
]
inputs = processor.apply_chat_template(messages, return_tensors="pt", tokenize=True, return_dict=True, reasoning_effort="high")
inputs = inputs.to(model.device)
output = model.generate(
**inputs,
max_new_tokens=1024,
do_sample=True,
temperature=0.7,
)[0]
# Setting `skip_special_tokens=False` to visualize reasoning trace between [THINK] [/THINK] tags.
decoded_output = processor.decode(output[len(inputs["input_ids"][0]):], skip_special_tokens=False)
print(decoded_output)本模型根据修改后的 MIT 许可协议进行许可。
您不得将本模型用于侵犯、盗用或以其他方式违反任何第三方权利(包括知识产权)的行为。