臺大佛學數位圖書館=NTU Digital Library of Buddhist Studies; 字串比對=string matching; 後分類=post-classification; 停用詞=stopwords; 標記=tagging
摘要
臺大佛學數位圖書館(NTU Digital Library of Buddhist Studies)收錄了大量的佛學書目資料,並且擁有完善的檢索系統,提供研究人員做佛學資料的搜集。基於書目metadata的欄位,臺大佛學數位圖書館的書目檢索系統對檢索結果提供了6種後分類:出版年份、資料類型、出處題名、關鍵詞、著者、語言,這些後分類都是書目匯入時既有的metadata欄位,若是能為書目做佛學類別專門詞彙的標記,並且將標記做為後分類提供給使用者,將更有利於研究人員對於書目資料的篩選及整理。 本研究旨在於為臺大佛學數位圖書館的書目資料進行標記,新增三個標記類別:佛教宗派、佛教人物、佛教經典。事先收集這些類別的專門詞彙進行字串比對實現自動標記,並建立停用詞表,以利為字串比對的專門詞彙進行篩選,搭配人工輔助檢核標記,專門詞彙、停用詞表以進行更新,並為臺大佛學數位圖書館的書目檢索系統新增三種不同的後分類:「提及:佛教宗派」、「提及:佛教人物」、「提及:佛教經典」,隨著專門詞彙及停用詞表的完整,期待能夠帶給使用者較為準確的標記及後分類,為使用者帶來更佳的使用體驗。
NTU Digital Library of Buddhist Studies has a large collection of bibliographies of Buddhist studies and a comprehensive bibliographic search system for researchers to access Buddhist materials. Based on bibliographic metadata, the bibliographic search system of NTU Digital Library of Buddhist Studies provides six types of post-categories of query results: publication years, media types, source topics, keywords, authors, and languages. If the bibliographies can be tagged with specific vocabularies of new categories, the categories and taggings can also be used for additional post-query classification. This will provide more beneficial for researchers to filter and organize the bibliographic data. The purpose of this study is to tag the bibliographic data of NTU Digital Library of Buddhist Studies by add three new tag categories: Buddhist sects, Buddhist persons, and Sutras. The terms of each categories are collected in advance for string matching to obtain automatic tagging. A list of stopwords is also created to facilitate the filtering of specific words for string matching and correction of tags. The lists of vocabulary and stopwords can be modified manually. We added three different categories to the bibliographic search system of NTU Digital Library of Buddhist Studies. They are ”Mentions: Buddhist Sect”, “Mentions: Buddhist Persons”, and “Mentions: Sutra”, each with a specialized vocabulary and a list of stopwords. Through post-query classification using these categories, we hope to provide better user experience.