《大正藏》經文中大範圍的文字重用現象之偵測與分析=Detection and Analysis of Textual Reuse in Taishō Tripiṭaka

韓東霖

Author

韓東霖

Date

2020

Pages

1 - 130

Publisher

法鼓文理學院

Publisher Url

https://www.dila.edu.tw/

Location

新北市, 臺灣 [New Taipei City, Taiwan]

Content type

博碩士論文=Thesis and Dissertation

Language

中文=Chinese

Degree

master

Institution

法鼓文理學院

Department

佛教學系

Advisor

洪振洲

Publication year

108

Keyword

文字重用; 大正新脩大藏經; Local Alignment; 量化分析; 數位人文; Textual Reuse; Taishō; Tripiṭ; aka; quantitative analysis; Digital Humanities

Abstract

近年來在人文研究界興起的一大變化，便是數位人文研究熱潮的興起。數位人文研究方法主要著眼於利用電腦能高速進行大量運算與比對的長處，搭配近年來大量完成的數位化內容，期望進行傳統人文研究較難以處理的大尺度問題。其中，文字重用是近年來被關注的一個出現於文本內部互相引用的特殊現象。所謂文字重用乃指以暗喻、改述，或甚至是逐字引用形式的存在，並且發生於某位作者借用或是再度使用前人或者當代的另一位作者的文章。透過文字重用的梳理，我們得以發現經文之間隱含的引用現象及其歷史傳承。而在佛典中，文字重用的現象相當頻繁，但針對大範圍經文當中的文字重用現象進行之研究卻鮮少見到。因此在本研究中，我們提出一種能有效找出文字重用現象的演算法，針對鮮少被大範圍分析研究的佛經經文進行比對，目的在於一次性的找出經文之間特有或是有意的文字重用現象。我們用統計與量化的方式來呈現我們所發現的比對結果，歸納整理這些重複文句的可能分類並進行篩選，進一步統計經典之間的重用比例，並與現存的文獻研究進行比對與驗證。本研究選擇以中華電子佛典協會所製作發行之電子佛典集成所收錄的《大正新脩大藏經》為資料來源，透過Local Alignment演算法逐一進行文句比對，目的在於找出經典之間的重複、並且重用長度夠長的文字重用現象。接著我們以重用長度與重用頻率兩個面相，觀察這些比對結果，並進行分類與篩選，進一步向上集合匯總，以找出經典之間獨有或是有意的文字重用段落所呈現的重用比例。研究結果顯示，經文中存在許多長度十分驚人的重複文句，大多來自於經錄、佛名經類的經典；而大量高度相似的重複文句則較接近於佛教經典內的專有術語與慣用表達方式，並且由許多不同的相似短句前後交互壘疊而成。透過清理這些高度相似的重複文句後我們由經典間的重用比例中發現：經典之間獨有或是有意的的重用現象，大多來自於同本異譯的經典之外，也找到許多高重用比例的經咒儀軌與注疏類經典；而其中也存在著許多部類不同，但重用比例卻非常高的經對，非常值得我們進行更深入的研究探討。

After Buddhism was introduced into China, it led to a large number of translation activities of Buddhist scriptures. These texts were later collected together to become Buddhist Canons. Nowadays, Taishō Tripiṭaka is the main source for the modern Buddhist scholars and researchers in their study of Buddhism. Large amounts of Buddhist texts present many issues worth studying. One of the research topics that has been widely discussed in recent years is the phenomenon of textual reuse between texts. By analyzing the textual reuse between texts, we are able to discover the implicit citation and historical inheritance between scriptures. However, due to the immense size of the Taishō Tripiṭaka, we rarely see the scope of research studies regarding textual reuse in the whole Taishō Tripiṭaka carried out in large-scale. In recent years, with the development of information technology, Digital Humanities has become an emerging topic in the traditional humanities research community. The digital methods mainly focus on the use of computer’s high-speed computing and precise comparison capabilities to deal with large-scale tasks that are difficult to be completed by traditional humanities research methods. In this study, we propose an effective algorithm that can detect and analyze the textual reuse phenomenon in the Taishō Tripiṭaka, and calculate the textual reuse ratio between texts. We then compare the results of our algorithm with those of the existing research studies.
Our research methods are listed as follows: (1) take the XML files of whole Taishō Tripiṭaka as our main materials for this study. (2) split the texts into sentences. (3) pair sentences for preliminary pairwise comparison, and rule out the sentences pairs with less than a preset number of characters in common. (4) performs the Local Alignment algorithm, which is commonly used to align long DNA sequences, for identifying repeated passages between sentences in the Taishō Tripiṭaka.
The results from the statistical analysis shows that: (1) extremely long repeated passages between texts, often happen in the text related to Tripitaka Catalogues and list of “Buddhas’ Names”. (2) huge amounts of similar patterns between texts can be understood as an idiom or common usages in Buddhist texts. (3) from the perspective of the proportion of reused paragraphs between texts, different translations of the same text tend to produce higher textual reuse percentages. However, we have also found that many texts are differently categorized, but have many similar common paragraphs, some of which have not even been discovered by the previous research studies. These results provide interesting clues for future research.

Table of contents

一、緒論 1
二、文獻回顧 5
（一）、「文字重用」的定義與佛典經典內容的文字重用 5
（二）、利用數位技術偵測文件間的文字重用現象 8
三、研究方法與執行步驟 11
（一）、名詞定義 11
（二）、文字重用的再定義 12
（三）、執行方法 13
1、資料來源與比對方法 13
2、決定比對範圍 14
3、訂定文字重用的偵測門檻 15
4、候選句對的初步篩選 16
5、核心比對演算法Local Alignment 16
四、比對參數與初步結果分析 21
（一）、初步結果觀察 21
1、長度驚人的重複文句 22
2、大量出現的高度相似文句 23
五、以重用長度觀察《大正藏》中的文字重用現象 27
（一）、重用部分長度TOP50的比對結果 27
（二）、重用部分得分TOP50的比對結果 31
（三）、以單句間的重用部分觀察可能的文字重用現象分類 33
1、經錄間的文字重用現象 33
2、佛名間的文字重用現象 35
3、同本異譯間的文字重用現象 38
4、經咒儀軌之間的文字重用現象 39
5、經典與其注疏的文字重用現象 42
6、佛教百科全書與其引用經典的文字重用現象 43
六、以重用頻率觀察《大正藏》中的文字重用現象 45
（一）、《大正藏》內各單字所參與的重用區段次數統計 47
（二）、重用區段所參與的經典數量統計 48
（三）、高頻重用區段的分類 50
1、修行 50
2、善法 54
3、名相 55
4、經典開頭結尾 60
5、問答 60
6、數字、時間 60
7、咒語、佛名 61
（四）、小結 63
七、《大正藏》經與經之間的文字重用現象 65
（一）、重用比例TOP20的經典 65
（二）、經對中單經最高重用比例TOP20的經典 68
（三）、經與經之間的重用關係之分類與討論 72
1、同部類經對中的重用關係 72
2、不同部類經對中的重用關係 89
八、結論與未來展望 97
參考文獻 100
（一）、外文資料 100
（二）、中文資料 102
（三）、日文資料 102
附錄1、輸出結果資料表 103
附錄2、重用部分長度TOP50的比對結果 105
附錄3、重用部分得分TOP50的比對結果 110
附錄4、重用比例TOP100的經典 115
附錄5、經對中單經最高重用比例TOP100的經典 122

Hits

738

Created date

2021.08.12

Modified date

2023.01.07

Notice

You are leaving our website for The full text resources provided by the above database or electronic journals may not be displayed due to the domain restrictions or fee-charging download problems.

Record correction

Please delete and correct directly in the form below, and click "Apply" at the bottom.
(When receiving your information, we will check and correct the mistake as soon as possible.)

Serial No.
621082

Search History (Only show 10 bibliography limited)

Search Criteria Field Codes

	Search Criteria	Browse