利用分類演算法來判別早期漢譯佛典之朝代 ——以東漢、三國、西晉為對象=A Computer-Based Approach for Predicting the Translation Period of Early Chinese Buddhism Translation——Texts from the Eastern Han, Three Kingdoms and Western Jin Periods
Author 郭捷立 (著)=Kwok, Jie Li (au.)
Publisher法鼓佛教學院 佛教學系
Location新北市, 臺灣 [New Taipei City, Taiwan]
Content type博碩士論文=Thesis and Dissertation
Publication year100(下)
Keyword中古漢語=Ancient Chinese; 漢譯佛典=Chinese Translation; 譯者判別=Authorship Attribution; 費雪線性辨別分析法=Fisher Linear Discriminant Analysis; 可變長度n-gram=Variable Length n-Gram Feature Extraction
Abstract中國佛教大藏經中的翻譯作品是佛教文化研究的瑰寶,然而當中部分佛經的譯者記錄存疑仍待解決。受限於早期史料難以完整收集的困難,使得佛經翻譯初期的年代——東漢、三國和西晉的佛典譯者問題最嚴重,也最難處理。相對於傳統文獻學的質化分析研究方式,本研究嘗試以統計量化分析搭配資訊技術的方式,來尋求早期佛教譯經作者紀錄的問題之解答。本研究以建立一個能夠準確分析文獻是由上述三個朝代之中的哪一個朝代所翻譯完成之判別機制為主要目的。藉由此研究成果,我們可以找出未知經典最可能的翻譯年代,以進一步縮小可能譯者的比對範圍。在研究過程中,我們先參考傳統文獻學者的研究成果,建立三個朝代的可靠參考翻譯作品清單,之後再利用Variable Length N-gram 的演算法進行文獻特徵值的萃取,並使用「費雪線性辨別分析法」進行判別特徵值判斷。根據實驗結果,此辨別機制之效果十分顯著,其準確度可以至少達到89%以上。此外,我們藉由進一步分析由費雪線性辨別分析法所產生的辨別函式,找出此三個朝代經文在翻譯上所使用的特徵,此特徵能用於分析探討同一個外語詞在三個朝代中被翻譯成不同的語詞的狀況。在本研究聯合機制中,我們發覺這樣的量化分類方式是可以解釋部分的經典翻譯現象。

Buddhism has been spreading in China for more than two thousand years since its first introduction in the Eastern Han dynasty (C.E 25-221), and has become an important part of daily life and culture at large in China. A great number of Buddhist scriptures were translated from Indian originals starting from the Eastern Han dynasty to the Tang
dynasty (C.E. 618-907) and beyond. Scholarship has become increasingly aware, over the last few decades, that traditional authorship and translatorship attribution of the early Chinese works is often unreliable. The current reference edition of the Chinese Buddhist canon (Taishō shinshū daizōkyō (Abbr.: T.) 大正新修大蔵經, collated 1924-1934) contains 3053 works in 85 volumes, including about 1000 texts of Indian (or alleged Indian) provenance. However, ca. 150 of these texts are marked as shiyi 失譯, indicating that the name(s) of the translator(s) are unknown. In addition to such
unknown cases of attribution, for the texts that were translated between the 2nd and the late 6th century, many attributions are uncertain, problematic or simply incorrect.
Text-critical and philological studies have brought a significant advancement of the status of the research in the field. However, traditional philology has its scale limit. The research project the present thesis stems from has thus designed a statistic model employing variable-length n-gram, with Fisher Linear Discriminant, to establish a highly accurate classification mechanism for predicting the translation period of Chinese texts. The time brackets we focus on in the present study include three early
Chinese dynasties: the Eastern Han (C.E. 25-220), the Three Kingdoms (C.E. 220-280) and the Western Jin (C.E. 266-316). These three dynasties constitute the earliest phase of Buddhist translation history and most of the translations from these periods present attribution problems. In this research, we build up classification mechanisms for each of the three dynasties. These can be used to test whether the translation style of a text is similar to the one prevalent during a certain period. According to the output of our experiment, all of the three classifications for three dynasties have an accuracy rate of more than 89%. Also, by examining the classification result, we extract the special translation usages of Chinese sutras in different time period. With the help of statistic information bearing on the characteristics and features of Chinese texts, this approach can not only provide new evidence relevant to uncertain authorships but also encourage Buddhist scholars and scholars of linguistics to do further studies.
Table of contents中文摘要 i
英文摘要 ii
誌謝 iv
目次 vii
表目次 ix
圖目次 ix
凡例 x
註腳格式: x
外文表示方法: x
參考文獻: x
第一章 動機與目的 1
第二章 文獻回顧 5
第三章 研究方法 10
3.1 時間範圍之劃分 10
3.2 語料之處理 12
3.2.1 從近代研究篩選出訓練樣本 12
3.2.2. 電子全文來源 15
3.2.3. 資料準備 16
3.2.4. 樣本切割 17
3.3 特徵值之處理方式 17
3.4 機器學習 21
第四章 實驗分析 25
4.1 各朝代判別機制之效能評估 27
4.2 各朝代投影值之分佈現象 30
4.3 特徵值權重分析 32
4.3.1 「泥洹」之分析 36
4.3.2 「說法」與「說經」之分析 37
4.4 聯合機制之功效分析 38
4.4.1 樣本判別錯誤中之「佛傳」類別 40
4.4.2 樣本判別錯誤中之「三昧部」類別 41
4.4.3 小結 44
第五章 結論 45
參考文獻 47
(一)佛教藏經或原典文獻 47
(二)中日文專書、論文或網路資源等 47
(三)西文專書、論文或網路資源等 48
(四)工具書與網路資源 49
附錄(一) 50
東漢經錄比較結果 51
三國經錄比較結果 53
西晉經錄比較結果 55
Created date2015.09.10
Modified date2016.09.19

