💜 Qwen Chat | 🤗 Hugging Face | 🤖 ModelScope | 📑 技术报告 | 📑 博客
🖥️ 演示 | 💬 微信 | 🫨 Discord
我们隆重推出Qwen-Image,这是通义千问系列中的图像生成基础模型,在复杂文本渲染和精准图像编辑方面取得重大突破。实验表明,该模型在图像生成与编辑任务中均展现出强大的通用能力,尤其在中文文本渲染方面表现卓越。

安装最新版diffusers
pip install git+https://github.com/huggingface/diffusers以下是一段代码示例,展示了如何使用该模型根据文本提示生成图像:
from diffusers import DiffusionPipeline
import torch
model_name = "Qwen/Qwen-Image"
# Load the pipeline
if torch.cuda.is_available():
torch_dtype = torch.bfloat16
device = "cuda"
else:
torch_dtype = torch.float32
device = "cpu"
pipe = DiffusionPipeline.from_pretrained(model_name, torch_dtype=torch_dtype)
pipe = pipe.to(device)
positive_magic = {
"en": ", Ultra HD, 4K, cinematic composition.", # for english prompt
"zh": ", 超清,4K,电影级构图." # for chinese prompt
}
# Generate image
prompt = '''A coffee shop entrance features a chalkboard sign reading "Qwen Coffee 😊 $2 per cup," with a neon light beside it displaying "通义千问". Next to it hangs a poster showing a beautiful Chinese woman, and beneath the poster is written "π≈3.1415926-53589793-23846264-33832795-02384197". Ultra HD, 4K, cinematic composition'''
negative_prompt = " " # using an empty string if you do not have specific concept to remove
# Generate with different aspect ratios
aspect_ratios = {
"1:1": (1328, 1328),
"16:9": (1664, 928),
"9:16": (928, 1664),
"4:3": (1472, 1140),
"3:4": (1140, 1472),
"3:2": (1584, 1056),
"2:3": (1056, 1584),
}
width, height = aspect_ratios["16:9"]
image = pipe(
prompt=prompt + positive_magic["en"],
negative_prompt=negative_prompt,
width=width,
height=height,
num_inference_steps=50,
true_cfg_scale=4.0,
generator=torch.Generator(device="cuda").manual_seed(42)
).images[0]
image.save("example.png")Qwen-Image 的突出能力之一是在多样化图像中实现高保真文本渲染。无论是英语等字母语言,还是中文这样的表意文字,该模型都能以惊人的精度保留字体细节、版式连贯性与场景协调性。文字并非简单叠加,而是与视觉元素浑然天成。

除文本外,Qwen-Image 在通用图像生成方面同样卓越,支持多种艺术风格。从逼真场景到印象派画作,从动漫美学到极简设计,模型能流畅响应创意指令,成为艺术家、设计师和内容创作者的万能工具。

在图像编辑领域,Qwen-Image 突破了简单调整的局限。它支持风格迁移、物体增减、细节增强、图文修改乃至人体姿态调整等高级操作——通过直观输入即可获得协调输出,让普通用户也能轻松实现专业级编辑效果。

Qwen-Image 不仅擅长创作与编辑,更具备深度理解能力。它支持物体检测、语义分割、深度与边缘(Canny)估计、新视角合成、超分辨率等一系列图像理解任务。这些技术各异的能力,本质上都是基于深度视觉认知的智能编辑形态。

这些特性共同使 Qwen-Image 不仅是精美图片的生成工具,更成为语言、版式与影像交汇的智能视觉创作基础模型。
Qwen-Image 采用 Apache 2.0 许可证授权。
如果您认为本项工作有价值,我们诚挚建议您引用我们的成果。
@misc{wu2025qwenimagetechnicalreport,
title={Qwen-Image Technical Report},
author={Chenfei Wu and Jiahao Li and Jingren Zhou and Junyang Lin and Kaiyuan Gao and Kun Yan and Sheng-ming Yin and Shuai Bai and Xiao Xu and Yilei Chen and Yuxiang Chen and Zecheng Tang and Zekai Zhang and Zhengyi Wang and An Yang and Bowen Yu and Chen Cheng and Dayiheng Liu and Deqing Li and Hang Zhang and Hao Meng and Hu Wei and Jingyuan Ni and Kai Chen and Kuan Cao and Liang Peng and Lin Qu and Minggang Wu and Peng Wang and Shuting Yu and Tingkun Wen and Wensen Feng and Xiaoxiao Xu and Yi Wang and Yichang Zhang and Yongqiang Zhu and Yujia Wu and Yuxuan Cai and Zenan Liu},
year={2025},
eprint={2508.02324},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.02324},
}