|
|
![](en/images/title/Title_FulltextSearch.gif) |
|
|
|
|
|
一種多模型融合的中文古籍OCR後處理方法=A Post-OCR Method of Multi-Model Ensemble for Chinese Ancient Scriptures |
|
|
|
Author |
釋賢度 (著)=Shih, Hsien-du (au.)
|
Source |
數位典藏與數位人文=Journal of Digital Archives and Digital Humanities
|
Volume | n.11 |
Date | 2023.04 |
Pages | 83 - 104 |
Publisher | 臺灣數位人文學會 |
Publisher Url |
https://tadh.org.tw/
|
Location | 臺北市, 臺灣 [Taipei shih, Taiwan] |
Content type | 期刊論文=Journal Article |
Language | 中文=Chinese |
Keyword | post-OCR; 古籍=Ancient Scriptures; 模型融合=model ensemble; 版面分析=layout analysis; 深度學習=deep learning |
Abstract | 本文提出一種多模型融合的OCR後處理方法,採用獨特的版面分析和對齊算法,整合了整頁檢測模型、字識別模型、列識別模型與語言預訓練模型等深度學習模型,實現了超越單一模型的效果。全文錯誤率達到1.64%,僅為單一模型平均錯誤率的23%。在各類常規古籍版式場景中,該方法具有較好的泛用性。
This paper proposes a post-OCR method of multi-model ensemble, which uses a unique layout analysis and alignment algorithms, and integrate different types of deep learning models, such as the full-page character detection model, character recognition model, line recognition model and language pre-training model, and achieves effects beyond a single model. The full-text error rate reaches 1.64%, which is only 23% of the average error rate of a single model. In various conventional ancient book layout scenarios, this method has good generalization. |
Table of contents | 壹、背景 84
貳、原理 87 一、圖片檢測 88 二、字識別 89 三、版面分析 89 四、列識別 91 五、字列融合 91 六、語義預測 92 七、語義校正 92
參、實驗 93
肆、分析 95
伍、結論 96
參考文獻 97
|
ISSN | 26165732 (E) |
DOI | https://www.airitilibrary.com/Common/Click_DOI?DOI=10.6853/DADH.202304_(11).0002 |
Hits | 85 |
Created date | 2023.10.18 |
Modified date | 2023.10.23 |
![](en/images/logo/bg-btn-edit.png)
|
Best viewed with Chrome, Firefox, Safari(Mac) but not supported IE
|
|
|