飞桨PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Thinking

🚀 ERNIE-4.5-VL-28B-A3B-Thinking 重磅发布：多模态 AI 的突破性进展

模型亮点

基于强大的 ERNIE-4.5-VL-28B-A3B 架构，全新升级的 ERNIE-4.5-VL-28B-A3B-Thinking 在多模态推理能力上实现了显著飞跃。🧠✨ 通过大规模的中期训练阶段，该模型吸收了海量且高度多样化的优质视觉语言推理数据。这种大规模训练过程大幅提升了模型的表征能力，同时深化了视觉与语言模态之间的语义对齐，为细致入微的视觉文本推理解锁了前所未有的能力。📊

该模型在可验证任务上采用了前沿的多模态强化学习技术，整合了 GSPO 和 IcePop 策略以稳定 MoE 训练，并结合动态难度采样，实现了卓越的学习效率。⚡ 响应社区的强烈需求，我们显著增强了模型的视觉定位（grounding）性能和指令跟随能力，使视觉定位功能比以往任何时候都更易于使用。🎯 此外，我们创新的“Thinking with Images”功能与图像缩放、图像搜索等工具配合使用，极大地提升了模型处理细粒度细节和应对长尾视觉知识的能力。🔍🖼️

这些增强功能共同构成了开发复杂多模态智能体的关键基础，赋能开发者和研究人员打造下一代 AI 应用，不断突破视觉语言理解的边界。🤖🌟

benchmark

核心能力

作为仅激活 3B 参数 的轻量级模型 ⚡，ERNIE-4.5-VL-28B-A3B-Thinking 在各类基准测试中性能已接近行业顶级旗舰模型。🚀

视觉推理 🧠👁️：借助大规模强化学习，模型在复杂视觉任务中展现出卓越的多步推理、图表分析和因果推理能力！📊✨
STEM 推理 🔬📐：凭借强大的视觉能力，模型在 STEM 任务上实现性能飞跃，如从图片解题，轻松应对复杂问题！🎯💡
视觉定位 📍🎨：具备更精准的定位和灵活的指令执行能力，在复杂工业场景中可轻松触发定位功能，显著提升效率！⚙️💪
Thinking with Images 🤔🔍：模型能像人类一样思考，可自由缩放图像以捕捉每个细节，发掘全部信息。🖼️✨
工具调用 🛠️⚡：拥有强大的工具调用能力，可即时使用图像搜索等功能，轻松识别长尾知识，实现全面信息检索！🔎📚
视频理解 🎬🎥：具备出色的时序感知和事件定位能力，能准确识别视频中不同时间段的内容变化，让视频分析更智能高效！⏱️🌟

快速入门

使用 `transformers` 库

以下是使用 transformers 库进行推理的示例：

import torch
from transformers import AutoProcessor, AutoTokenizer, AutoModelForCausalLM
model_path = 'baidu/ERNIE-4.5-VL-28B-A3B-Thinking'
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="auto",
    dtype=torch.bfloat16,
    trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
model.add_image_preprocess(processor)
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What color clothes is the girl in the picture wearing?"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://paddlenlp.bj.bcebos.com/datasets/paddlemix/demo_images/example1.jpg"
                }
            },
        ]
    },
]
text = processor.tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
image_inputs, video_inputs = processor.process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
device = next(model.parameters()).device
inputs = inputs.to(device)
generated_ids = model.generate(
    inputs=inputs['input_ids'].to(device),
    **inputs,
    max_new_tokens=1024,
    use_cache=False
    )
output_text = processor.decode(generated_ids[0][len(inputs['input_ids'][0]):])
print(output_text)

vLLM 推理

安装 vLLM 主分支

pip install uv
uv pip install -U vllm --pre \
  --extra-index-url https://wheels.vllm.ai/nightly \
  --extra-index-url https://download.pytorch.org/whl/cu129 \
  --index-strategy unsafe-best-match

运行 vLLM

# 80G*1 GPU，If an error occurs, add the --gpu-memory-utilization 0.95 and try again
vllm serve baidu/ERNIE-4.5-VL-28B-A3B-Thinking --trust-remote-code

使用 reasoning-parser 和 tool-call-parser 运行 vLLM

# 80G*1 GPU，If an error occurs, add the --gpu-memory-utilization 0.95 and try again
vllm serve baidu/ERNIE-4.5-VL-28B-A3B-Thinking --trust-remote-code \
 --reasoning-parser ernie45  \
 --tool-call-parser ernie45  \
 --enable-auto-tool-choice

FastDeploy 推理

可通过以下方式使用 FastDeploy 快速部署服务。更多详细用法，请参考 FastDeploy GitHub 仓库。

注意：单卡部署时，至少需要 80GB GPU 内存。

fastdeploy serve --model baidu/ERNIE-4.5-VL-28B-A3B-Thinking \
  --max-model-len 131072 \
  --max-num-seqs 32 \
  --port 8180 \
  --quantization wint8 \
  --reasoning-parser ernie-45-vl-thinking \
  --tool-call-parser ernie-45-vl-thinking \
  --mm-processor-kwargs '{"image_max_pixels": 12845056 }'

使用ERNIEKit进行微调

ERNIEKit是基于PaddlePaddle开发的训练工具包，专为ERNIE系列开源大模型设计。它全面支持指令微调（SFT、LoRA）和对齐训练（DPO）等场景，确保实现最佳性能。

使用示例：

# Download model
huggingface-cli download baidu/ERNIE-4.5-VL-28B-A3B-Thinking --local-dir baidu/ERNIE-4.5-VL-28B-A3B-Thinking
# SFT
erniekit train examples/configs/ERNIE-4.5-VL-28B-A3B-Thinking/sft/run_sft_lora_8k.yaml
# SFT (Function Call)
erniekit train examples/configs/ERNIE-4.5-VL-28B-A3B-Thinking/sft_function_call/run_sft_8k.yaml

如需更详细的示例，包括使用 LoRA 的 SFT、多 GPU 配置以及高级脚本，请参考 ERNIEKit 代码库中的 examples 文件夹。

许可证

引用

如果您认为 ERNIE 4.5 有用或希望在您的项目中使用它，请引用我们的技术报告：

@misc{ernie2025technicalreport,
      title={ERNIE 4.5 Technical Report},
      author={Baidu-ERNIE-Team},
      year={2025},
      primaryClass={cs.CL},
      howpublished={\url{https://ernie.baidu.com/blog/publication/ERNIE_Technical_Report.pdf}}
}

🚀 ERNIE-4.5-VL-28B-A3B-Thinking 重磅发布：多模态 AI 的突破性进展

模型亮点

这些增强功能共同构成了开发复杂多模态智能体的关键基础，赋能开发者和研究人员打造下一代 AI 应用，不断突破视觉语言理解的边界。🤖🌟

benchmark

核心能力

作为仅激活 3B 参数 的轻量级模型 ⚡，ERNIE-4.5-VL-28B-A3B-Thinking 在各类基准测试中性能已接近行业顶级旗舰模型。🚀

视觉推理 🧠👁️：借助大规模强化学习，模型在复杂视觉任务中展现出卓越的多步推理、图表分析和因果推理能力！📊✨
STEM 推理 🔬📐：凭借强大的视觉能力，模型在 STEM 任务上实现性能飞跃，如从图片解题，轻松应对复杂问题！🎯💡
视觉定位 📍🎨：具备更精准的定位和灵活的指令执行能力，在复杂工业场景中可轻松触发定位功能，显著提升效率！⚙️💪
Thinking with Images 🤔🔍：模型能像人类一样思考，可自由缩放图像以捕捉每个细节，发掘全部信息。🖼️✨
工具调用 🛠️⚡：拥有强大的工具调用能力，可即时使用图像搜索等功能，轻松识别长尾知识，实现全面信息检索！🔎📚
视频理解 🎬🎥：具备出色的时序感知和事件定位能力，能准确识别视频中不同时间段的内容变化，让视频分析更智能高效！⏱️🌟

快速入门

使用 `transformers` 库

以下是使用 transformers 库进行推理的示例：

import torch
from transformers import AutoProcessor, AutoTokenizer, AutoModelForCausalLM
model_path = 'baidu/ERNIE-4.5-VL-28B-A3B-Thinking'
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="auto",
    dtype=torch.bfloat16,
    trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
model.add_image_preprocess(processor)
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What color clothes is the girl in the picture wearing?"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://paddlenlp.bj.bcebos.com/datasets/paddlemix/demo_images/example1.jpg"
                }
            },
        ]
    },
]
text = processor.tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
image_inputs, video_inputs = processor.process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
device = next(model.parameters()).device
inputs = inputs.to(device)
generated_ids = model.generate(
    inputs=inputs['input_ids'].to(device),
    **inputs,
    max_new_tokens=1024,
    use_cache=False
    )
output_text = processor.decode(generated_ids[0][len(inputs['input_ids'][0]):])
print(output_text)

vLLM 推理

安装 vLLM 主分支

pip install uv
uv pip install -U vllm --pre \
  --extra-index-url https://wheels.vllm.ai/nightly \
  --extra-index-url https://download.pytorch.org/whl/cu129 \
  --index-strategy unsafe-best-match

运行 vLLM

# 80G*1 GPU，If an error occurs, add the --gpu-memory-utilization 0.95 and try again
vllm serve baidu/ERNIE-4.5-VL-28B-A3B-Thinking --trust-remote-code

使用 reasoning-parser 和 tool-call-parser 运行 vLLM

# 80G*1 GPU，If an error occurs, add the --gpu-memory-utilization 0.95 and try again
vllm serve baidu/ERNIE-4.5-VL-28B-A3B-Thinking --trust-remote-code \
 --reasoning-parser ernie45  \
 --tool-call-parser ernie45  \
 --enable-auto-tool-choice

FastDeploy 推理

可通过以下方式使用 FastDeploy 快速部署服务。更多详细用法，请参考 FastDeploy GitHub 仓库。

注意：单卡部署时，至少需要 80GB GPU 内存。

fastdeploy serve --model baidu/ERNIE-4.5-VL-28B-A3B-Thinking \
  --max-model-len 131072 \
  --max-num-seqs 32 \
  --port 8180 \
  --quantization wint8 \
  --reasoning-parser ernie-45-vl-thinking \
  --tool-call-parser ernie-45-vl-thinking \
  --mm-processor-kwargs '{"image_max_pixels": 12845056 }'

使用ERNIEKit进行微调

使用示例：

# Download model
huggingface-cli download baidu/ERNIE-4.5-VL-28B-A3B-Thinking --local-dir baidu/ERNIE-4.5-VL-28B-A3B-Thinking
# SFT
erniekit train examples/configs/ERNIE-4.5-VL-28B-A3B-Thinking/sft/run_sft_lora_8k.yaml
# SFT (Function Call)
erniekit train examples/configs/ERNIE-4.5-VL-28B-A3B-Thinking/sft_function_call/run_sft_8k.yaml

如需更详细的示例，包括使用 LoRA 的 SFT、多 GPU 配置以及高级脚本，请参考 ERNIEKit 代码库中的 examples 文件夹。

许可证

引用

如果您认为 ERNIE 4.5 有用或希望在您的项目中使用它，请引用我们的技术报告：

@misc{ernie2025technicalreport,
      title={ERNIE 4.5 Technical Report},
      author={Baidu-ERNIE-Team},
      year={2025},
      primaryClass={cs.CL},
      howpublished={\url{https://ernie.baidu.com/blog/publication/ERNIE_Technical_Report.pdf}}
}

🚀 ERNIE-4.5-VL-28B-A3B-Thinking 重磅发布：多模态 AI 的突破性进展

模型亮点

核心能力

快速入门

使用 transformers 库

vLLM 推理

FastDeploy 推理

使用ERNIEKit进行微调

许可证

引用

🚀 ERNIE-4.5-VL-28B-A3B-Thinking 重磅发布：多模态 AI 的突破性进展

模型亮点

核心能力

快速入门

使用 transformers 库

vLLM 推理

FastDeploy 推理

使用ERNIEKit进行微调

许可证

引用

使用 `transformers` 库

使用 `transformers` 库