一種多模型融合的中文古籍OCR後處理方法=A Post-OCR Method of Multi-Model Ensemble for Chinese Ancient Scriptures

釋賢度 (著)=Shih, Hsien-du (au.)

Author

釋賢度 (著)=Shih, Hsien-du (au.)

Source

數位典藏與數位人文=Journal of Digital Archives and Digital Humanities

Volume

n.11

Date

2023.04

Pages

83 - 104

Publisher

臺灣數位人文學會

Publisher Url

https://tadh.org.tw/

Location

臺北市, 臺灣 [Taipei shih, Taiwan]

Content type

期刊論文=Journal Article

Language

中文=Chinese

Keyword

post-OCR; 古籍=Ancient Scriptures; 模型融合=model ensemble; 版面分析=layout analysis; 深度學習=deep learning

Abstract

本文提出一種多模型融合的OCR後處理方法，採用獨特的版面分析和對齊算法，整合了整頁檢測模型、字識別模型、列識別模型與語言預訓練模型等深度學習模型，實現了超越單一模型的效果。全文錯誤率達到1.64%，僅為單一模型平均錯誤率的23%。在各類常規古籍版式場景中，該方法具有較好的泛用性。

This paper proposes a post-OCR method of multi-model ensemble, which uses a unique layout analysis and alignment algorithms, and integrate different types of deep learning models, such as the full-page character detection model, character recognition model, line recognition model and language pre-training model, and achieves effects beyond a single model. The full-text error rate reaches 1.64%, which is only 23% of the average error rate of a single model. In various conventional ancient book layout scenarios, this method has good generalization.

Table of contents

壹、背景 84

貳、原理 87
一、圖片檢測 88
二、字識別 89
三、版面分析 89
四、列識別 91
五、字列融合 91
六、語義預測 92
七、語義校正 92

參、實驗 93

肆、分析 95

伍、結論 96

參考文獻 97

ISSN

26165732 (E)

DOI

https://www.airitilibrary.com/Common/Click_DOI?DOI=10.6853/DADH.202304_(11).0002

Hits

305

Created date

2023.10.18

Modified date

2023.10.23

Notice

You are leaving our website for The full text resources provided by the above database or electronic journals may not be displayed due to the domain restrictions or fee-charging download problems.

Record correction

Please delete and correct directly in the form below, and click "Apply" at the bottom.
(When receiving your information, we will check and correct the mistake as soon as possible.)

Serial No.
684394

Search History (Only show 10 bibliography limited)

Search Criteria Field Codes

	Search Criteria	Browse