飞桨PaddlePaddle/PP-OCRv6_medium_rec_safetensors

PP-OCRv6：从150万到3450万参数，在OCR任务上超越十亿级视觉语言模型

PP-OCRv6 概述

PP-OCRv6 是一款轻量级 OCR 系统，融合了架构创新与数据驱动优化。它围绕统一的 MetaFormer 风格构建块并结合结构重参数化技术，重新设计了骨干网络、检测 neck 和识别 neck。三个模型层级（medium、small、tiny）共享相同的块原语，覆盖从服务器到边缘设备的部署场景。

核心特性

统一且可扩展的模型家族：三级 OCR 模型家族，参数规模从 150 万到 3450 万不等。PP-OCRv6_medium 实现了 86.2% 的检测 Hmean 和 83.2% 的识别准确率，相比 PP-OCRv5_server 分别提升了 +4.6% 和 +5.1%。
轻量化架构创新：(i) LCNetV4，采用结构重参数化的 MetaFormer 风格轻量级骨干网络；(ii) RepLKFPN，带有扩张可重参数化深度卷积的检测 neck；(iii) EncoderWithLightSVTR，具备局部-全局注意力和 additive 跳跃连接的识别 neck。
多语言与场景支持：支持 50 种语言及多样化工业场景（数字显示屏、点阵字符、轮胎印记等），在参数规模小几个数量级的情况下，性能超越 Qwen3-VL-235B、GPT-5.5 和 Gemini-3.1-Pro。

PP-OCRv6_medium_rec

简介

PP-OCRv6 文本识别架构 overview

PP-OCRv6_medium_rec 是 PP-OCRv6 系列中最大的识别模型。它采用 LCNetV4 作为骨干网络，EncoderWithLightSVTR 作为识别 neck，并配备 CTC+NRTR 多头解码器。该模型支持 50 种语言，包含 1900 万参数。关键准确率指标如下：

模型	W-Avg	手写中文	手写英文	印刷中文	印刷英文	TC	古文	日文	易混淆字符	特殊字符	通用场景	拼音	艺术字	工业场景	屏幕文字	证件文字
GPT-5.5	64.2	19.2	56.9	75.7	82.2	57.5	63.7	58.6	49.1	48.3	67.7	50.4	53.0	62.4	67.7	71.1
Qwen3-VL-235B	74.9	49.7	73.2	82.3	86.2	76.4	33.6	66.2	56.1	49.0	82.5	76.5	69.6	74.7	73.8	78.7
Kimi-K2.6	62.9	31.0	58.4	76.8	80.9	62.7	16.5	54.1	43.5	38.0	68.0	45.2	59.9	57.1	58.4	68.4
MiniMax-M3	54.1	15.5	60.3	63.5	81.5	53.2	2.2	43.7	42.2	42.8	53.8	50.3	44.3	44.1	56.6	67.0
Gemini-3.1-Pro	71.4	46.4	73.0	80.0	90.5	69.5	18.0	67.2	54.4	50.3	74.6	75.9	63.1	69.1	73.2	75.9
PP-OCRv5_server	78.1	58.0	59.6	90.1	85.1	74.7	60.4	73.7	59.4	56.8	86.5	74.4	64.0	70.2	68.1	87.6
PP-OCRv5_mobile	73.7	41.7	50.9	86.0	86.0	72.0	57.8	75.8	55.7	54.8	80.7	72.5	54.0	59.3	57.6	81.7
PP-OCRv6_medium	83.2	62.1	67.8	91.5	94.1	78.6	72.4	90.5	64.9	61.7	87.5	78.1	71.2	77.4	82.5	88.1
PP-OCRv6_small	81.3	57.6	61.1	90.5	93.3	77.0	71.1	88.2	64.1	60.2	85.7	75.9	68.4	76.4	79.7	86.9
PP-OCRv6_tiny	73.5	40.1	39.3	86.7	88.4	65.0	68.4	89.8	52.3	57.1	78.0	65.4	54.7	62.1	71.2	80.5

快速开始

安装

PaddleOCR

# Install the basic version
pip install paddleocr

# Install the full version (includes all features)
pip install "paddleocr[all]"

Transformers 环境（safetensors 模型所需）

pip install transformers torch

模型使用

您可以通过以下单条命令快速体验功能：

paddleocr text_recognition \
    --model_name PP-OCRv6_medium_rec \
    --engine transformers \
    -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/2PZfbirjfxA88695lRmgk.jpeg

您也可以将文本识别模块的模型推理集成到您的项目中。在运行以下代码之前，请将示例图像下载到本地机器。

from paddleocr import TextRecognition
model = TextRecognition(model_name="PP-OCRv6_medium_rec", engine="transformers")
output = model.predict(input="2PZfbirjfxA88695lRmgk.jpeg", batch_size=1)
for res in output:
    res.print()
    res.save_to_json(save_path="./output/res.json")

运行后，得到的结果如下：

{'res': {'input_path': '2PZfbirjfxA88695lRmgk.jpeg', 'page_index': None, 'rec_text': 'day as a reminder of the', 'rec_score': 0.9857}}

可视化图像如下：

image/jpeg

有关使用命令的详情和参数说明，请参阅文档。

pipeline 用法

通用 OCR pipeline 用于解决文本识别任务，通过从图像中提取文本信息来实现。该 pipeline 包含以下几个模块：

文档图像方向分类模块（可选）
文本图像矫正模块（可选）
文本行方向分类模块（可选）
文本检测模块
文本识别模块

运行单个命令即可快速体验 OCR pipeline：

paddleocr ocr -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png \
    --text_detection_model_name PP-OCRv6_medium_det \
    --text_recognition_model_name PP-OCRv6_medium_rec \
    --engine transformers \
    --use_doc_orientation_classify False \
    --use_doc_unwarping False \
    --use_textline_orientation True \
    --save_path ./output \
    --device gpu:0

如果指定了 save_path，可视化结果将保存到 save_path 目录下。可视化输出如下所示：

image/jpeg

项目集成方面：

from paddleocr import PaddleOCR

ocr = PaddleOCR(
    text_detection_model_name="PP-OCRv6_medium_det",
    text_recognition_model_name="PP-OCRv6_medium_rec",
    engine="transformers",
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_textline_orientation=True,
)
result = ocr.predict("./3ul2Rq4Sk5Cn-l69D695U.png")
for res in result:
    res.print()
    res.save_to_img("output")
    res.save_to_json("output")

有关使用命令的详细说明和参数解释，请参考文档。

链接

PaddleOCR 代码库

PaddleOCR 文档

引用

@misc{zhang2026ppocrv6,
  title={PP-OCRv6: From 1.5M to 34.5M Parameters, Surpassing Billion-Scale VLMs on OCR Tasks},
  author={Yubo Zhang and Xueqing Wang and Manhui Lin and Yue Zhang and Penglongyi Deng and Ting Sun and Tingquan Gao and Zelun Zhang and Jiaxuan Liu and Changda Zhou and Hongen Liu and Suyin Liang and Cheng Cui and Yi Liu and Dianhai Yu and Yanjun Ma},
  year={2026},
  eprint={2606.13108},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2606.13108},
}

PP-OCRv6：从150万到3450万参数，在OCR任务上超越十亿级视觉语言模型

🔥 官方网站 📝 技术报告

PP-OCRv6 概述

核心特性

统一且可扩展的模型家族：三级 OCR 模型家族，参数规模从 150 万到 3450 万不等。PP-OCRv6_medium 实现了 86.2% 的检测 Hmean 和 83.2% 的识别准确率，相比 PP-OCRv5_server 分别提升了 +4.6% 和 +5.1%。
轻量化架构创新：(i) LCNetV4，采用结构重参数化的 MetaFormer 风格轻量级骨干网络；(ii) RepLKFPN，带有扩张可重参数化深度卷积的检测 neck；(iii) EncoderWithLightSVTR，具备局部-全局注意力和 additive 跳跃连接的识别 neck。
多语言与场景支持：支持 50 种语言及多样化工业场景（数字显示屏、点阵字符、轮胎印记等），在参数规模小几个数量级的情况下，性能超越 Qwen3-VL-235B、GPT-5.5 和 Gemini-3.1-Pro。

PP-OCRv6_medium_rec

简介

PP-OCRv6 文本识别架构 overview

模型	W-Avg	手写中文	手写英文	印刷中文	印刷英文	TC	古文	日文	易混淆字符	特殊字符	通用场景	拼音	艺术字	工业场景	屏幕文字	证件文字
GPT-5.5	64.2	19.2	56.9	75.7	82.2	57.5	63.7	58.6	49.1	48.3	67.7	50.4	53.0	62.4	67.7	71.1
Qwen3-VL-235B	74.9	49.7	73.2	82.3	86.2	76.4	33.6	66.2	56.1	49.0	82.5	76.5	69.6	74.7	73.8	78.7
Kimi-K2.6	62.9	31.0	58.4	76.8	80.9	62.7	16.5	54.1	43.5	38.0	68.0	45.2	59.9	57.1	58.4	68.4
MiniMax-M3	54.1	15.5	60.3	63.5	81.5	53.2	2.2	43.7	42.2	42.8	53.8	50.3	44.3	44.1	56.6	67.0
Gemini-3.1-Pro	71.4	46.4	73.0	80.0	90.5	69.5	18.0	67.2	54.4	50.3	74.6	75.9	63.1	69.1	73.2	75.9
PP-OCRv5_server	78.1	58.0	59.6	90.1	85.1	74.7	60.4	73.7	59.4	56.8	86.5	74.4	64.0	70.2	68.1	87.6
PP-OCRv5_mobile	73.7	41.7	50.9	86.0	86.0	72.0	57.8	75.8	55.7	54.8	80.7	72.5	54.0	59.3	57.6	81.7
PP-OCRv6_medium	83.2	62.1	67.8	91.5	94.1	78.6	72.4	90.5	64.9	61.7	87.5	78.1	71.2	77.4	82.5	88.1
PP-OCRv6_small	81.3	57.6	61.1	90.5	93.3	77.0	71.1	88.2	64.1	60.2	85.7	75.9	68.4	76.4	79.7	86.9
PP-OCRv6_tiny	73.5	40.1	39.3	86.7	88.4	65.0	68.4	89.8	52.3	57.1	78.0	65.4	54.7	62.1	71.2	80.5

快速开始

安装

PaddleOCR

# Install the basic version
pip install paddleocr

# Install the full version (includes all features)
pip install "paddleocr[all]"

Transformers 环境（safetensors 模型所需）

pip install transformers torch

模型使用

您可以通过以下单条命令快速体验功能：

paddleocr text_recognition \
    --model_name PP-OCRv6_medium_rec \
    --engine transformers \
    -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/2PZfbirjfxA88695lRmgk.jpeg

您也可以将文本识别模块的模型推理集成到您的项目中。在运行以下代码之前，请将示例图像下载到本地机器。

from paddleocr import TextRecognition
model = TextRecognition(model_name="PP-OCRv6_medium_rec", engine="transformers")
output = model.predict(input="2PZfbirjfxA88695lRmgk.jpeg", batch_size=1)
for res in output:
    res.print()
    res.save_to_json(save_path="./output/res.json")

运行后，得到的结果如下：

{'res': {'input_path': '2PZfbirjfxA88695lRmgk.jpeg', 'page_index': None, 'rec_text': 'day as a reminder of the', 'rec_score': 0.9857}}

可视化图像如下：

image/jpeg

有关使用命令的详情和参数说明，请参阅文档。

pipeline 用法

通用 OCR pipeline 用于解决文本识别任务，通过从图像中提取文本信息来实现。该 pipeline 包含以下几个模块：

文档图像方向分类模块（可选）
文本图像矫正模块（可选）
文本行方向分类模块（可选）
文本检测模块
文本识别模块

运行单个命令即可快速体验 OCR pipeline：

paddleocr ocr -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png \
    --text_detection_model_name PP-OCRv6_medium_det \
    --text_recognition_model_name PP-OCRv6_medium_rec \
    --engine transformers \
    --use_doc_orientation_classify False \
    --use_doc_unwarping False \
    --use_textline_orientation True \
    --save_path ./output \
    --device gpu:0

如果指定了 save_path，可视化结果将保存到 save_path 目录下。可视化输出如下所示：

image/jpeg

项目集成方面：

from paddleocr import PaddleOCR

ocr = PaddleOCR(
    text_detection_model_name="PP-OCRv6_medium_det",
    text_recognition_model_name="PP-OCRv6_medium_rec",
    engine="transformers",
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_textline_orientation=True,
)
result = ocr.predict("./3ul2Rq4Sk5Cn-l69D695U.png")
for res in result:
    res.print()
    res.save_to_img("output")
    res.save_to_json("output")

有关使用命令的详细说明和参数解释，请参考文档。

链接

PaddleOCR 代码库

PaddleOCR 文档

引用

@misc{zhang2026ppocrv6,
  title={PP-OCRv6: From 1.5M to 34.5M Parameters, Surpassing Billion-Scale VLMs on OCR Tasks},
  author={Yubo Zhang and Xueqing Wang and Manhui Lin and Yue Zhang and Penglongyi Deng and Ting Sun and Tingquan Gao and Zelun Zhang and Jiaxuan Liu and Changda Zhou and Hongen Liu and Suyin Liang and Cheng Cui and Yi Liu and Dianhai Yu and Yanjun Ma},
  year={2026},
  eprint={2606.13108},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2606.13108},
}