多版本佛經文獻之檢索及瀏覽系統設計=Design of a Searching and Browsing System for Multiple-Version Buddhist Scriptures

徐惠芬 (撰)=Hsu, Hui-fen (compose)

著者

徐惠芬 (撰)=Hsu, Hui-fen (compose)

出版年月日

2000

ページ

出版者

國立臺灣大學

出版サイト

https://www.ntu.edu.tw/

出版地

臺北市, 臺灣 [Taipei shih, Taiwan]

資料の種類

博碩士論文=Thesis and Dissertation

言語

中文=Chinese

学位

修士

学校

國立臺灣大學

学部・学科名

資訊工程學系

指導教官

歐陽彥正

卒業年

キーワード

徐惠芬; Hsu, Hui-fen; 多版本; Inverted Files; 模糊字串比對; Approximate String Matching; Inverted Files; Mutual Information; Single Link Clustering; XML; Multi-tier Client/Server; Daemon Program; Multiple-Version Comparison

抄録

古籍或宗教典籍常具有「多版本」的特性，而這些多版本資料的產生，則是由於時代背景及翻譯人員不同，因而對原本出於同源的資料，產生了不同的解釋與用詞。就研究文獻及宗教的角度而言，文獻的考據、引用及譯文的正確性皆為重要的課題。這些典籍目前已被大量數位化，因此，本論文的目的在於為這些具多版本特性的典籍提供一個線上瀏覽檢索系統。

此多版本檢索系統是一個多層的主從式（Multi-tier Client/Server）架構。本論文針對三個主要的部份：資料前置處理、應用伺服器（Application Server）及Cache分別提出設計架構。資料前置處理用來剖析（Parse）多版本XML檔案、抽取詞彙並做詞彙關聯分析，並運用Inverted Files之方法來產生索引檔，以供檢索使用。應用伺服器除提供基本檢索功能外，還包括引言查詢（近似字串檢索）、多版本比對及單一版本瀏覽功能。在Cache部份，則是用來加快系統反應時間。

除自動由原XML檔案產多版本資料以供瀏覽外，系統還包括兩個最主要之特色：「詞彙分析」與「引言查詢」。其中，「詞彙分析」是利用Mutual Information來計算特徵值與特徵值之間的相似性，將低頻詞彙取出，並以 Single Link Clustering的方法，讓詞彙自動叢聚為相關之集合。「引言查詢」則是混合Retrieval（計算Query 和文件的Similarity）和Text Search（String Matching）兩種方法，亦即首要考慮文件和查詢字串之相似性，接著再考慮查詢字串之關鍵字出現的次序（Term Sequence），其目的在將古文獻所引用之一段不完整的敘述（例如：缺字、錯誤或多字）以此容錯方式搜尋出來。

For ancient books and articles, it is a common phenomenon that multiple versions exist due to a variety of reasons. To content experts of these ancient books and articles, comparison between different versions is an important research task and may provide important insights. Since a large volume of the ancient books and articles has been digitized, modern information processing technologies should be employed to facilitate the tasks of content experts. This thesis discusses the design of a browsing and search system aimed at handling multiple-version ancient materials.
The browsing and search system presented in this thesis facilitates not only browsing of multiple-version materials but also search of imprecise quotations. Imprecise quotation is an interesting issue because in ancient books and articles quotations are often not explicitly identified and may differ from the origin by a few terms or sentences. This thesis employs mutual information and approximate string matching to tackle this problem.

英文論文摘要 2
中文論文摘要 3
第一章簡介 6
１１背景 6
１２動機與目的 7
１３論文架構 8
第二章相關研究 10
２１ Inverted Files 10
２２ Feature Selection 11
２３ Query Expansion 12
２４ Clustering 14
２５ Approximate String Matching 15
第三章系統架構 16
３１系統分析 16
３２需求 17
３３設計上之考量（Design Issue） 18
３４系統架構 19
３５困難點分析（Difficulty Analysis） 21
第四章多版本經文檢索系統 24
４１整體系統架構圖 25
４２後端資料之前處理 26
４２１系統架構 26
４２２資料庫分析 27
４２３實作方法 30
４３ Application Server 36
４３１系統架構 36
４３２實作方法 37
４４ Dynamic Cache 45
４４１系統架構 45
４４２實作方法 46
４５結論 47
第五章實驗 48
５１原始資料分析 48
５２系統實驗數據 49
５２１查詢集合 49
５２２實驗參數設定 50
５２３實驗結果 51
５３實驗結果討論 52
第六章結論與未來方向 54
６１結論 54
６２未來方向 54
參考文獻 56
附錄 59

ヒット数

677

作成日

2000.11.14

更新日期

2023.01.16

注意：

この先はにアクセスすることになります。このデータベースが提供する全文が有料の場合は、表示することができませんのでご了承ください。

修正のご指摘

下のフォームで修正していただきます。正しい情報を入れた後、下の送信ボタンを押してください。
(管理人がご意見にすぐ対応させていただきます。)

シリアル番号
345120

検索条件

ブラウズ