WinCLIP ViT-L/14-336px NPU

基于 CLIP ViT-L/14-336px 的零样本异常检测模型，适配华为昇腾 NPU (Ascend 910B)。

模型信息

项目	说明
模型	WinCLIP (ViT-L/14-336px)
任务	零样本异常检测 (Zero-shot Anomaly Detection)
数据集	VisA (Visual Anomaly detection)
目标硬件	Ascend 910B NPU
CANN 版本	8.5.1
框架	PyTorch 2.9.0 + torch_npu

快速部署

1. 环境准备

# 安装依赖
pip install -r requirements.txt

# 下载预训练权重 (从 GitHub 镜像)
~/.local/bin/aria2c --console-log-level=warn -x 16 -s 16 \
  -o ViT-L-14-336px.pt \
  "https://ghfast.top/https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt"

2. 数据集准备

# 下载 VisA 数据集
~/.local/bin/atomgit download weixin_72661020/VisA_20220922 -d ./models

# 解压
cd models && tar xf VisA_20220922.tar && cd ..

# 转换为 MVTec 格式
python3 convert_visa2mvtec.py --visa_root ./models --output_root ./data

3. 运行推理

# 一键运行
bash run_npu.sh

# 或手动运行
python3 inference.py

性能测试

端到端时间 (12个类别)

模型	时间 (s)	提升
ViT-B/16-plus-240 (基线)	195.00	-
ViT-L/14-336px (优化后)	57.97	-70.3%

精度对比

类别	AUROC	AUPR	F1-Max
candle	93.11%	93.69%	87.80%
capsules	75.22%	84.39%	78.07%
cashew	91.64%	96.11%	89.11%
chewinggum	97.52%	98.97%	95.92%
fryum	90.96%	95.85%	87.68%
macaroni1	75.43%	76.69%	72.20%
macaroni2	68.19%	63.93%	70.68%
pcb1	56.50%	59.60%	66.67%
pcb2	64.27%	68.22%	66.67%
pcb3	55.37%	58.60%	67.12%
pcb4	83.18%	85.90%	75.89%
pipe_fryum	94.65%	97.40%	90.91%
平均	79.31%	81.61%	79.06%

性能基准

指标	结果
单图延迟	7.56 ms
单图吞吐	133 img/s
批量=4	395 img/s
批量=8	523 img/s
批量=16	631 img/s
峰值吞吐 (bs=128)	691 img/s

优化技术

混合精度推理: FP16 用于所有参数，LayerNorm 保持 FP32
文本特征缓存: 每个类别只计算一次文本特征
批量文本编码: 一次编码所有文本模板
优化文本模板: 从 22 个减少到 11 个（最优平衡点）
NPU 环境优化: TASK_QUEUE_ENABLE, CPU_AFFINITY_CONF, 内存分配优化
异步数据传输: non_blocking=True, pin_memory=True

文件结构

.
├── inference.py              # 主推理脚本
├── README.md                 # 本文件
├── requirements.txt          # 依赖列表
├── run_npu.sh               # 一键启动脚本
├── benchmark_accuracy.py     # 精度评测
├── benchmark_throughput.py   # 吞吐量评测
├── benchmark_latency.py      # 延迟评测
├── ViT-L-14-336.json        # 模型配置
├── logs/                     # 运行日志
│   ├── inference_log.txt
│   └── benchmark_log.txt
└── data/                     # 数据集 (需自行准备)

参考

WinCLIP ViT-L/14-336px NPU

基于 CLIP ViT-L/14-336px 的零样本异常检测模型，适配华为昇腾 NPU (Ascend 910B)。

模型信息

项目	说明
模型	WinCLIP (ViT-L/14-336px)
任务	零样本异常检测 (Zero-shot Anomaly Detection)
数据集	VisA (Visual Anomaly detection)
目标硬件	Ascend 910B NPU
CANN 版本	8.5.1
框架	PyTorch 2.9.0 + torch_npu

快速部署

1. 环境准备

# 安装依赖
pip install -r requirements.txt

# 下载预训练权重 (从 GitHub 镜像)
~/.local/bin/aria2c --console-log-level=warn -x 16 -s 16 \
  -o ViT-L-14-336px.pt \
  "https://ghfast.top/https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt"

2. 数据集准备

# 下载 VisA 数据集
~/.local/bin/atomgit download weixin_72661020/VisA_20220922 -d ./models

# 解压
cd models && tar xf VisA_20220922.tar && cd ..

# 转换为 MVTec 格式
python3 convert_visa2mvtec.py --visa_root ./models --output_root ./data

3. 运行推理

# 一键运行
bash run_npu.sh

# 或手动运行
python3 inference.py

性能测试

端到端时间 (12个类别)

模型	时间 (s)	提升
ViT-B/16-plus-240 (基线)	195.00	-
ViT-L/14-336px (优化后)	57.97	-70.3%

精度对比

类别	AUROC	AUPR	F1-Max
candle	93.11%	93.69%	87.80%
capsules	75.22%	84.39%	78.07%
cashew	91.64%	96.11%	89.11%
chewinggum	97.52%	98.97%	95.92%
fryum	90.96%	95.85%	87.68%
macaroni1	75.43%	76.69%	72.20%
macaroni2	68.19%	63.93%	70.68%
pcb1	56.50%	59.60%	66.67%
pcb2	64.27%	68.22%	66.67%
pcb3	55.37%	58.60%	67.12%
pcb4	83.18%	85.90%	75.89%
pipe_fryum	94.65%	97.40%	90.91%
平均	79.31%	81.61%	79.06%

性能基准

指标	结果
单图延迟	7.56 ms
单图吞吐	133 img/s
批量=4	395 img/s
批量=8	523 img/s
批量=16	631 img/s
峰值吞吐 (bs=128)	691 img/s

优化技术

混合精度推理: FP16 用于所有参数，LayerNorm 保持 FP32
文本特征缓存: 每个类别只计算一次文本特征
批量文本编码: 一次编码所有文本模板
优化文本模板: 从 22 个减少到 11 个（最优平衡点）
NPU 环境优化: TASK_QUEUE_ENABLE, CPU_AFFINITY_CONF, 内存分配优化
异步数据传输: non_blocking=True, pin_memory=True

文件结构

.
├── inference.py              # 主推理脚本
├── README.md                 # 本文件
├── requirements.txt          # 依赖列表
├── run_npu.sh               # 一键启动脚本
├── benchmark_accuracy.py     # 精度评测
├── benchmark_throughput.py   # 吞吐量评测
├── benchmark_latency.py      # 延迟评测
├── ViT-L-14-336.json        # 模型配置
├── logs/                     # 运行日志
│   ├── inference_log.txt
│   └── benchmark_log.txt
└── data/                     # 数据集 (需自行准备)