S
SGLang Ascend/Qwen3.6-27B
模型介绍文件和版本Pull Requests讨论分析
下载使用量0

Qwen3.6-27B

简介

Qwen3.6-27B是通义千问团队推出的全新开源稠密多模态模型,拥有270亿参数。作为社区备受期待的规格,它在保持稠密架构优势的同时,全面提升了智能体编程与多模态推理能力,达到了旗舰级表现。 该模型在各项主要编程基准测试中,全面超越前代开源旗舰 Qwen3.5-397B-A17B(总参数397B / 激活参数17B的MoE模型),为开发者在实用、可广泛部署的规模上获取顶尖编程能力提供了理想选择。Qwen3.6-27B原生支持多模态,能够处理图像、视频与文本的混合输入,并支持多模态思考与非思考模式。 该模型的主要特性包括:

  • 旗舰级的智能体编程能力:在SWE-bench Verified、Terminal-Bench 2.0等多个权威基准测试中,性能超越更大规模的模型。
  • 强大的原生多模态能力:支持视觉推理、文档理解和视觉问答等任务,能力与Qwen3.6-35B-A3B保持一致。

本文档将展示该模型的主要验证步骤,包括支持特性、特性配置、环境准备、单节点与多节点部署、精度评估及性能评估。

环境准备

安装

NPU运行时环境所需的依赖已集成到Docker镜像中,并上传至华为云平台,用户可直接拉取该镜像。

#Atlas 800 A3
docker pull quay.io/ascend/sglang:v0.5.10-npu.rc1-a3
#Atlas 800 A2
docker pull quay.io/ascend/sglang:v0.5.10-npu.rc1-910b

#start container
docker run -itd --shm-size=16g --privileged=true --name ${NAME} \
--privileged=true --net=host \
-v /var/queue_schedule:/var/queue_schedule \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /usr/local/sbin:/usr/local/sbin \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/Ascend/firmware:/usr/local/Ascend/firmware \
--device=/dev/davinci0:/dev/davinci0  \
--device=/dev/davinci1:/dev/davinci1  \
--device=/dev/davinci2:/dev/davinci2  \
--device=/dev/davinci3:/dev/davinci3  \
--device=/dev/davinci4:/dev/davinci4  \
--device=/dev/davinci5:/dev/davinci5  \
--device=/dev/davinci6:/dev/davinci6  \
--device=/dev/davinci7:/dev/davinci7  \
--device=/dev/davinci8:/dev/davinci8  \
--device=/dev/davinci9:/dev/davinci9  \
--device=/dev/davinci10:/dev/davinci10  \
--device=/dev/davinci11:/dev/davinci11  \
--device=/dev/davinci12:/dev/davinci12  \
--device=/dev/davinci13:/dev/davinci13  \
--device=/dev/davinci14:/dev/davinci14  \
--device=/dev/davinci15:/dev/davinci15  \
--device=/dev/davinci_manager:/dev/davinci_manager \
--device=/dev/hisi_hdc:/dev/hisi_hdc \
--entrypoint=bash \
quay.io/ascend/sglang:${tag}

权重下载

Model Nameaddress
Qwen3.6-27Bmodelers
Qwen3.6-27Bmodelscope
Qwen3.6-27Bgitcode

部署

单节点部署

执行以下脚本进行在线推理.

# high performance cpu
echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
sysctl -w vm.swappiness=0
sysctl -w kernel.numa_balancing=0
sysctl -w kernel.sched_migration_cost_ns=50000
export SGLANG_SET_CPU_AFFINITY=1

unset https_proxy
unset http_proxy
unset HTTPS_PROXY
unset HTTP_PROXY
unset ASCEND_LAUNCH_BLOCKING

# cann
source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh

export STREAMS_PER_DEVICE=32
export HCCL_OP_EXPANSION_MODE=AIV
export HCCL_SOCKET_IFNAME=lo
export GLOO_SOCKET_IFNAME=lo

export SGLANG_ENABLE_SPEC_V2=1
export SGLANG_ENABLE_OVERLAP_PLAN_STREAM=0
export SGLANG_SCHEDULER_DECREASE_PREFILL_IDLE=1
export SGLANG_PREFILL_DELAYER_MAX_DELAY_PASSES=100

python3 -m sglang.launch_server \
    --model-path $MODEL_PATH \
    --attention-backend ascend \
    --device npu \
    --tp-size 4 --nnodes 1 --node-rank 0 \
    --chunked-prefill-size -1 --max-prefill-tokens 60000 \
    --disable-radix-cache \
    --trust-remote-code \
    --host 127.0.0.1 --max-running-requests 48 --max-mamba-cache-size 60 \
    --mem-fraction-static 0.7 \
    --port 8000 \
    --cuda-graph-bs 2 8 16 32 48 \
    --enable-multimodal \
    --mm-attention-backend ascend_attn \
    --dtype bfloat16 --mamba-ssm-dtype bfloat16 \
    --speculative-algorithm NEXTN \
    --speculative-num-steps 3 \
    --speculative-eagle-topk 1 \
    --speculative-num-draft-tokens 4

发送请求测试

curl --location http://127.0.0.1:8000/v1/chat/completions --header 'Content-Type: application/json' --data '{
  "model": "qwen3.6",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {"url": "/image_path/qwen.png"} 
        },
        {"type": "text", "text": "What is the text in the illustrate?"}
      ]
    }
  ]
}'

结果返回如下

{"id":"cdcd6d14645846e69cc486554f198154","object":"chat.completion","created":1772098465,"model":"qwen3.6","choices":[{"index":0,"message":{"role":"assistant","content":"The user is asking about the text present in the image. I will analyze the image to identify the text.\n</think>\n\nThe text in the image is \"TONGyi Qwen\".","reasoning_content":null,"tool_calls":null},"logprobs":null,"finish_reason":"stop","matched_stop":248044}],"usage":{"prompt_tokens":98,"total_tokens":138,"completion_tokens":40,"prompt_tokens_details":null,"reasoning_tokens":0},"metadata":{"weight_version":"default"}}

声明

1)当前仅为尝鲜体验,性能优化中。

2)本代码仓提到的数据集和模型仅作为示例,这些数据集和模型仅供您用于非商业目的,如您使用这些数据集和模型来完成示例,请您特别注意应遵守对应数据集和模型的License,如您因使用数据集或模型而产生侵权纠纷,华为不承担任何责任。

3)如您在使用本代码仓的过程中,发现任何问题(包括但不限于功能问题、合规问题),请在本代码仓提交issue,我们将及时审视并解答。