網站導覽關於本館諮詢委員會聯絡我們書目提供版權聲明引用本站捐款贊助回首頁
書目佛學著者站內
檢索系統全文專區數位佛典語言教學相關連結
 


加值服務
書目管理
書目匯出
Tibetan-Chinese-Sanskrit Text Alignment using Intelligent Agents and Genetic Algorithms
作者 Handy, Christopher
出處題名 數位典藏與數位人文國際研討會(第9屆)=International Conference of Digital Archives and Digital Humanities (9th)
出版日期2018.12.18
頁次43 - 44
出版者臺灣數位人文學會
出版地臺北市, 臺灣 [Taipei shih, Taiwan]
資料類型會議論文=Proceeding Article
使用語言英文=English
附註項1. Handy, Christopher: Principal Software Engineer, ERC Open Philology Leiden University.
關鍵詞Tibetan; Chinese; Sanskrit; alignment; genetic algorithm; intelligent agent
摘要The problem of multilingual text alignment is a frequent concern in the study of Buddhist texts. Often we find ourselves in possession of several Chinese, Tibetan and Sanskrit versions of a given textual work, without a clear sense of exactly how each individual text relates to the others. Two texts may contain some material in common without sharing all of their content, or share the bulk of their content but with phrases in different orders, or have a common vocabulary but no shared content at all. These issues are well known to philologists, and the idea of using computer software to alleviate some of the mechanical legwork in comparing texts has revolutionized the ways that we do research, within the narrow field of Buddhist studies and also much more broadly on any texts. Yet ancient texts, and especially ancient Asian texts, pose difficulties that prevent some popular text analysis methods commonly used for modern European languages from working properly with Tibetan, Chinese and Sanskrit. One desired task that is reasonably complex is to compare any two texts across these three languages, quantifiably measure how similar they are, and align the texts based on regions of similarity. The method I describe here can theoretically achieve this goal for any set of input texts in any language, but my examples are restricted to a specific set of Buddhist works in Chinese, Tibetan and Sanskrit called the Mahāratnakūṭa Sūtra (MRK). I demonstrate here a proof of concept on a few texts from this collection, and then discuss areas for improvement of the basic idea.

My method involves applying a genetic algorithm to intelligent agents to evolve the best alignments naturally from a given set of texts. Intelligent agents are computer programs designed to carry out a specific set of tasks using some kind of deterministic method and knowledge base. This type of system is useful when we know how to describe a decision process, but do not know all possible results of a decision. Genetic algorithms are information transmission schemes modeled on biological processes. They differ from biological processes in that we tend to specify a quantifiable end goal for them to reach without specifying the means of getting to the goal. By stating this goal in terms of a fitness algorithm, we can promote reproduction of agent genes in our model for those organisms least unfit according to the desired output (i.e., consistently improving accurate text alignments). Over multiple generations of this promotion, the gene pool of agents approaches 100% fitness (normally, an unreachable ideal). Genetic algorithms are useful for applications in which we know what we want our output to look like but have no idea how to get the results. For our text alignment problem, we have target words in our text that will be “most interesting” in the mathematical sense. We do not care if the computer finds these in the most efficient way, only that it reliably reports them. But, what is most interesting could change based on additional input witnesses. So, our system must adapt as it analyzes more texts.

Our agents in this scenario are tiny grammatical engines that each do a sequence of short alignment tasks between strings of syllables encountered in the input texts based on training they receive from manual alignments. By stacking sequences of successful organisms together, we can achieve various alignment suggestions from the model.
目次1. Introduction to the ERC Open Philology Project 80
1.1. The MRK Collection as a test project
1.2. The Buddhist Canon as a Digital Object: Resolution and Scope
1.3. The Problems of Current Software
2. Examples 86
2.1. Manual tests
2.2. Computer random tests
2.3. Assembling an organism
2.4. Massive population parallel problem solving
3. Conclusions 92
3.1. Interpretations of Data
3.2. Comparison of Automated and Human Alignments
3.3. Further research
4. Data 92
點閱次數452
建檔日期2019.01.28
更新日期2019.02.26










建議您使用 Chrome, Firefox, Safari(Mac) 瀏覽器能獲得較好的檢索效果,IE不支援本檢索系統。

提示訊息

您即將離開本網站,連結到,此資料庫或電子期刊所提供之全文資源,當遇有網域限制或需付費下載情形時,將可能無法呈現。

修正書目錯誤

請直接於下方表格內刪改修正,填寫完正確資訊後,點擊下方送出鍵即可。
(您的指正將交管理者處理並儘快更正)

序號
581033

查詢歷史
檢索欄位代碼說明
檢索策略瀏覽