《中阿含經》與《增壹阿含經》之文本翻譯風格量化分析與相似斷詞自動化擷取=Quantitative Analysis of Translation Styles and Automatic Similar Phrases Identification of the Madhyama-āgama and the Ekottarika-āgama

董惠珠

Author

董惠珠

Date

2018

Pages

1 - 76

Publisher

法鼓文理學院

Publisher Url

https://www.dila.edu.tw/

Location

新北市, 臺灣 [New Taipei City, Taiwan]

Content type

博碩士論文=Thesis and Dissertation

Language

中文=Chinese

Degree

master

Institution

法鼓文理學院

Department

佛教學系

Advisor

洪振洲

Publication year

106

Keyword

中阿含經; 增壹阿含經; 翻譯風格; 量化分析; 可變長度n-gram; 主成分分析法; 最長共同子序列; Madhyama-ā; gama; Ekottarika-ā; translation style; quantitative analysis; variable length n-gram; principal components analysis; longest common subsequence

Abstract

《大正藏》經號T 26《中阿含經》與T 125《增壹阿含經》，兩經之譯者皆記載為僧伽提婆；對於現存《中阿含經》的譯者記錄，目前學界還沒有人提出異議，但是對於現存《增壹阿含經》的譯者記錄，各家說法不一，學界對此尚無定論。本研究嘗試利用統計量化分析的方式，對《中阿含經》與《增壹阿含經》進行翻譯風格分析，以此探討現存《中阿含經》與《增壹阿含經》是否來自相同譯者的作品。研究方法為：以「可變長度n-gram」（variable length n-gram，VL n-gram）為切詞方法，經由適當的篩選門檻找出風格特徵詞，再搭配主成分分析法（principal components analysis，PCA）進行統計分析，以之觀察兩經的翻譯風格是否具有一致性。分析結果顯示，兩經的翻譯風格有顯著的差異。本研究同時使用人工比對的方式從已經找出來的眾多風格特徵詞中尋找意義相似的斷詞，以此觀察兩個文本是否有用字不同卻是意義相似的詞彙或短語。經過人工判讀後，找到諸多例證顯示兩個文本翻譯風格之差異受到譯者用字習慣的影響。研究結果顯示，現存漢譯《中阿含經》和《增壹阿含經》，有極高的機率不是來自相同譯者的作品。在研究過程中，有鑑於以人工比對所需投入的大量工時，本研究也嘗試尋找一個自動化識別相似斷詞的方法，期能提高研究效率，並且因應日後巨量詞組的比對需求。我們以「最長共同子序列」（longest common subsequence，LCS）作為兩兩斷詞之間相似程度的衡量方法。實驗結果顯示，此衡量方法之成效雖非顯著，然而對於大量詞組的比對，仍不失為一個可用的方法；在演算結果中可能包含著關鍵性的線索，能夠提供學者作為進一步研究之用。

In the Taishō Tripiṭaka, the translators of the Madhyama-āgama (T 26) and the Ekottarika-āgama (T 125) are both attributed to the same person, Gautama Saṅghadeva. So far, no one doubts the translator of the Madhyama-āgama is Gautama Saṅghadeva but there are different opinions among scholars concerning the translator of the Ekottarika-āgama. This study attempts to analyze the translation style of the Madhyama-āgama and the Ekottarika-āgama by quantitative methods, and discuss whether these two collections are the works of a same translator. The research methods are as follows: (1) the variable length n-gram (VL n-gram) is used to split text of T 26 and T 125 into shorter segments, called gram, (2) the grams that are used in more than an arbitrary threshold documents are adopted as “style features”, and (3) applying the principal components analysis (PCA) to the frequency of the style features of T 26 and T 125, the consistency of the translation style of these two collections is analyzed. The results from the statistical analysis show that the translation styles of these two collections are significantly different. In order to further strengthen the analysis results, we manually check the style features of the two collections to look for different phrase but sharing similar meanings in different collections. After the manual comparison, we find many examples indicating that the differences in translation styles between the two collections are indeed affected by the translator’s choice of word. These results again confirm the fact that the Madhyama-āgama and the Ekottarika-āgama are probably not the works of a same translator. Seeing the drawback of manual comparison which required a huge contribution of man-hours, this study also attempts to provide a solution to automatically identify similar phrases in order to reduce the man-hours and improve the research efficiency. We use the longest common subsequence (LCS) as a measurement for the degree of similarity between two phrases. The experimental results show that although the effect of LCS is not as significant, yet it is still a useful method to compare large data of phrases and some computational findings may suggest clues that intrigue further scholastic research.

Table of contents

摘要 i
ABSTRACT ii
誌謝 iv
目次 v
表目錄 vii
圖目錄 viii
第一章、緒論 1
第二章、文獻回顧 9
第三章、文本翻譯風格研究方法 19
（一）、文本來源 19
（二）、語料處理 21
（三）、特徵值選取 21
（四）、投入主成分分析運算並繪圖觀測 25
第四章、文本翻譯風格實驗分析 26
（一）、最低卷數門檻值設定在20的主成分分析結果 26
（二）、最低卷數門檻值設定在40和60的主成分分析結果 28
（三）、最低卷數門檻值設定在80、100和111的主成分分析結果 29
（四）、主成分分析運算結果小結 32
（五）、主成分分析運算結果之gram分析 32
第五章、相似斷詞判讀 36
（一）、何謂「相似斷詞」 37
（二）、比較詞組 38
（三）、人工比對 39
（四）、判讀 40
（五）、利用「相似斷詞」進行文本風格分析 44
第六章、相似斷詞自動化擷取研究方法 49
（一）、最長共同子序列（LCS） 49
（二）、精確率、召回率、F1-度量 51
（三）、K折交叉驗證 53
（四）、同義詞語料 53
第七章、相似斷詞自動化擷取實驗分析 54
（一）、訓練最佳LCS相似度分數 55
（二）、效能評估與分析 57
（三）、加入同義詞 59
（四）、加入同義詞後之效能評估與分析 61
（五）、相似斷詞自動化擷取研究小結 65
第八章、結論 66
參考文獻 68
一、佛教藏經或原典文獻（依經號排序） 68
二、中日文專書、論文或網路資源等 68
三、西文專書、論文或網路資源等 71
附錄1、歷代經錄之撰出年代及略稱 74
附錄2、加入同義詞之前各組「相似斷詞」的LCS相似度分數 74
附錄3、加入同義詞之後各組「相似斷詞」的LCS相似度分數 75

Hits

910

Created date

2021.08.12

Modified date

2023.01.07

Notice

You are leaving our website for The full text resources provided by the above database or electronic journals may not be displayed due to the domain restrictions or fee-charging download problems.

Record correction

Please delete and correct directly in the form below, and click "Apply" at the bottom.
(When receiving your information, we will check and correct the mistake as soon as possible.)

Serial No.
621085

Search History (Only show 10 bibliography limited)

Search Criteria Field Codes

	Search Criteria	Browse