多版本佛經文獻之檢索及瀏覽系統設計=Design of a Searching and Browsing System for Multiple-Version Buddhist Scriptures

徐惠芬 (撰)=Hsu, Hui-fen (compose)

作者

徐惠芬 (撰)=Hsu, Hui-fen (compose)

出版日期

2000

頁次

出版者

國立臺灣大學

出版者網址

https://www.ntu.edu.tw/

出版地

臺北市, 臺灣 [Taipei shih, Taiwan]

資料類型

博碩士論文=Thesis and Dissertation

使用語言

中文=Chinese

學位類別

碩士

校院名稱

國立臺灣大學

系所名稱

資訊工程學系

指導教授

歐陽彥正

畢業年度

關鍵詞

徐惠芬; Hsu, Hui-fen; 多版本; Inverted Files; 模糊字串比對; Approximate String Matching; Inverted Files; Mutual Information; Single Link Clustering; XML; Multi-tier Client/Server; Daemon Program; Multiple-Version Comparison

摘要

古籍或宗教典籍常具有「多版本」的特性，而這些多版本資料的產生，則是由於時代背景及翻譯人員不同，因而對原本出於同源的資料，產生了不同的解釋與用詞。就研究文獻及宗教的角度而言，文獻的考據、引用及譯文的正確性皆為重要的課題。這些典籍目前已被大量數位化，因此，本論文的目的在於為這些具多版本特性的典籍提供一個線上瀏覽檢索系統。

此多版本檢索系統是一個多層的主從式（Multi-tier Client/Server）架構。本論文針對三個主要的部份：資料前置處理、應用伺服器（Application Server）及Cache分別提出設計架構。資料前置處理用來剖析（Parse）多版本XML檔案、抽取詞彙並做詞彙關聯分析，並運用Inverted Files之方法來產生索引檔，以供檢索使用。應用伺服器除提供基本檢索功能外，還包括引言查詢（近似字串檢索）、多版本比對及單一版本瀏覽功能。在Cache部份，則是用來加快系統反應時間。

除自動由原XML檔案產多版本資料以供瀏覽外，系統還包括兩個最主要之特色：「詞彙分析」與「引言查詢」。其中，「詞彙分析」是利用Mutual Information來計算特徵值與特徵值之間的相似性，將低頻詞彙取出，並以 Single Link Clustering的方法，讓詞彙自動叢聚為相關之集合。「引言查詢」則是混合Retrieval（計算Query 和文件的Similarity）和Text Search（String Matching）兩種方法，亦即首要考慮文件和查詢字串之相似性，接著再考慮查詢字串之關鍵字出現的次序（Term Sequence），其目的在將古文獻所引用之一段不完整的敘述（例如：缺字、錯誤或多字）以此容錯方式搜尋出來。

For ancient books and articles, it is a common phenomenon that multiple versions exist due to a variety of reasons. To content experts of these ancient books and articles, comparison between different versions is an important research task and may provide important insights. Since a large volume of the ancient books and articles has been digitized, modern information processing technologies should be employed to facilitate the tasks of content experts. This thesis discusses the design of a browsing and search system aimed at handling multiple-version ancient materials.
The browsing and search system presented in this thesis facilitates not only browsing of multiple-version materials but also search of imprecise quotations. Imprecise quotation is an interesting issue because in ancient books and articles quotations are often not explicitly identified and may differ from the origin by a few terms or sentences. This thesis employs mutual information and approximate string matching to tackle this problem.

英文論文摘要 2
中文論文摘要 3
第一章簡介 6
１１背景 6
１２動機與目的 7
１３論文架構 8
第二章相關研究 10
２１ Inverted Files 10
２２ Feature Selection 11
２３ Query Expansion 12
２４ Clustering 14
２５ Approximate String Matching 15
第三章系統架構 16
３１系統分析 16
３２需求 17
３３設計上之考量（Design Issue） 18
３４系統架構 19
３５困難點分析（Difficulty Analysis） 21
第四章多版本經文檢索系統 24
４１整體系統架構圖 25
４２後端資料之前處理 26
４２１系統架構 26
４２２資料庫分析 27
４２３實作方法 30
４３ Application Server 36
４３１系統架構 36
４３２實作方法 37
４４ Dynamic Cache 45
４４１系統架構 45
４４２實作方法 46
４５結論 47
第五章實驗 48
５１原始資料分析 48
５２系統實驗數據 49
５２１查詢集合 49
５２２實驗參數設定 50
５２３實驗結果 51
５３實驗結果討論 52
第六章結論與未來方向 54
６１結論 54
６２未來方向 54
參考文獻 56
附錄 59

點閱次數

403

建檔日期

2000.11.14

更新日期

2023.01.16

提示訊息

您即將離開本網站，連結到，此資料庫或電子期刊所提供之全文資源，當遇有網域限制或需付費下載情形時，將可能無法呈現。

修正書目錯誤

請直接於下方表格內刪改修正，填寫完正確資訊後，點擊下方送出鍵即可。
(您的指正將交管理者處理並儘快更正)

序號
345120

檢索策略

瀏覽