PP-OCRv6：从150万到3450万参数，在OCR任务上超越十亿级视觉语言模型

PP-OCRv6 概述

PP-OCRv6 是一款轻量级 OCR 系统，融合了架构创新与数据驱动优化。它围绕统一的 MetaFormer 风格构建块并结合结构重参数化，重新设计了骨干网络、检测颈部和识别颈部。三种模型层级（medium、small、tiny）共享相同的块基元，覆盖从服务器到边缘端的部署场景。

核心特性

统一且可扩展的模型家族：三级 OCR 模型家族，参数规模从 150 万到 3450 万不等。PP-OCRv6_medium 实现了 86.2% 的检测 Hmean 和 83.2% 的识别准确率，分别比 PP-OCRv5_server 提升了 4.6% 和 5.1%。
轻量级架构创新：(i) LCNetV4，一种采用结构重参数化的 MetaFormer 风格轻量级骨干网络；(ii) RepLKFPN，一种带有扩张可重参数化深度卷积的检测颈部；(iii) EncoderWithLightSVTR，一种具有局部-全局注意力和 additive 跳跃连接的识别颈部。
多语言与场景支持：支持 48 种语言和多样化的工业场景（数字显示、点阵字符、轮胎印等），在参数数量少几个数量级的情况下，性能超越 Qwen3-VL-235B、GPT-5.5 和 Gemini-3.1-Pro。

PP-OCRv6_tiny_det

简介

PP-OCRv6 文本检测架构 overview

PP-OCRv6_tiny_det 是 PaddleOCR 团队开发的 PP-OCRv6 检测系列中的轻量模型。它采用 LCNetV4 作为骨干网络，RepLKFPN 作为特征金字塔颈部，能够在多种场景下提供准确的文本定位，包括手写体、印刷体、旋转文本、弯曲文本以及多语言艺术字等。该模型包含 0.43M 参数。关键精度指标如下：

模型	Average	手写中文	手写英文	印刷中文	印刷英文	繁体中文	古文	日文	模糊	表情符号	扭曲	拼音	艺术字	表格	旋转	工业场景	通用场景
Gemini-3.1-Pro	46.8	53.4	56.5	47.3	47.6	39.0	45.8	38.2	50.0	68.1	44.6	40.6	65.2	26.9	22.1	52.5	50.2
GPT-5.5	45.6	42.4	58.5	50.2	51.9	35.0	26.7	42.0	49.1	97.5	37.7	36.3	52.0	71.0	10.0	36.2	32.6
Qwen3-VL-235B	38.3	56.5	66.0	41.7	37.0	19.3	13.1	27.0	38.5	81.2	28.5	33.0	68.3	19.6	2.1	48.4	32.3
Kimi-K2.6	12.8	12.5	25.5	10.1	18.5	8.2	7.5	11.2	16.9	28.9	13.9	6.8	16.1	10.9	0.8	6.3	10.9
MiniMax-M3	12.0	13.7	19.3	9.8	14.1	7.7	11.1	10.6	16.1	32.8	12.8	8.5	16.6	5.5	0.1	6.4	6.4
PP-OCRv5_server	81.6	80.3	84.1	94.5	91.7	81.5	67.6	77.2	90.1	96.2	87.6	67.1	67.3	97.1	80.0	64.3	79.7
PP-OCRv5_mobile	75.2	74.4	77.7	90.5	91.0	82.3	58.1	72.7	87.4	93.6	82.7	57.5	52.5	92.8	64.7	52.8	72.1
PP-OCRv6_medium	86.2	83.7	84.0	95.1	93.7	86.3	80.2	84.3	94.1	99.6	88.6	74.0	69.0	96.8	93.8	73.3	82.8
PP-OCRv6_small	84.1	80.5	87.1	94.2	93.6	85.7	72.6	82.3	92.6	99.7	87.6	69.6	65.3	95.6	93.7	67.6	78.2
PP-OCRv6_tiny	80.6	79.4	85.9	93.1	92.3	83.7	63.0	76.6	89.3	99.8	86.1	59.0	60.1	94.7	91.0	62.0	73.8

快速开始

安装

# Install PaddleOCR
pip install paddleocr

# Install ONNX Runtime
pip install onnxruntime-gpu  # or onnxruntime for CPU-only

模型使用

您可以通过一条命令快速体验功能：

paddleocr text_detection \
    --model_name PP-OCRv6_tiny_det \
    --engine onnxruntime \
    -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png

您也可以将文本检测模块的模型推理集成到您的项目中。在运行以下代码之前，请将示例图像下载到本地机器。

from paddleocr import TextDetection
model = TextDetection(model_name="PP-OCRv6_tiny_det", engine="onnxruntime")
output = model.predict(input="3ul2Rq4Sk5Cn-l69D695U.png", batch_size=1)
for res in output:
    res.print()
    res.save_to_img(save_path="./output/")
    res.save_to_json(save_path="./output/res.json")

有关使用命令的详细说明和参数解释，请参考文档。

流水线使用方法

通用的OCR流水线可从图像中提取文本信息。该流水线包含以下几个模块：

文档图像方向分类模块（可选）
文本图像矫正模块（可选）
文本行方向分类模块（可选）
文本检测模块
文本识别模块

运行单个命令即可快速体验OCR流水线：

paddleocr ocr -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png \
    --text_detection_model_name PP-OCRv6_tiny_det \
    --text_recognition_model_name PP-OCRv6_tiny_rec \
    --engine onnxruntime \
    --use_doc_orientation_classify False \
    --use_doc_unwarping False \
    --use_textline_orientation True \
    --save_path ./output \
    --device gpu:0

项目集成说明：

from paddleocr import PaddleOCR

ocr = PaddleOCR(
    text_detection_model_name="PP-OCRv6_tiny_det",
    text_recognition_model_name="PP-OCRv6_tiny_rec",
    engine="onnxruntime",
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_textline_orientation=False,
)
result = ocr.predict("./3ul2Rq4Sk5Cn-l69D695U.png")
for res in result:
    res.print()
    res.save_to_img("output")
    res.save_to_json("output")

有关使用命令和参数说明的详细信息，请参考文档。

链接

PaddleOCR 代码库

PaddleOCR 文档

引用

@misc{zhang2026ppocrv6,
  title={PP-OCRv6: From 1.5M to 34.5M Parameters, Surpassing Billion-Scale VLMs on OCR Tasks},
  author={Yubo Zhang and Xueqing Wang and Manhui Lin and Yue Zhang and Penglongyi Deng and Ting Sun and Tingquan Gao and Zelun Zhang and Jiaxuan Liu and Changda Zhou and Hongen Liu and Suyin Liang and Cheng Cui and Yi Liu and Dianhai Yu and Yanjun Ma},
  year={2026},
  eprint={2606.13108},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2606.13108},
}