太平洋鄰里協會會論文集=Proceedings of EBTI, ECAI, SEER & PNC Joint Meeting
出版日期
1999
頁次
323 - 328
出版者
中央研究院計算中心
出版地
臺北市, 臺灣 [Taipei shih, Taiwan]
資料類型
會議論文=Proceeding Article
使用語言
英文=English
附註項
會議地點:台北中央研究院, 主協辦單位:中央研究院計算中心
關鍵詞
Moro, Shigeki; 大正新脩大藏經; 中文文字輸入; 資料庫; 中文文字辨識; 佛經; Taisho Tripitaka; Chinese Character Input; Database; Chinese Character Recognition; 佛教經典=Sutra; Gaiji; SAT
摘要
In March of 1998,the Association for the Computerization of Buddhist Texts (ACBUT) began publishing the electronic text database of the Taisho Tripitaka. SAT is the nickname of this project.
The Taisho Tripitaka includes both classical Chinese and Japanese texts, so that SAT texts are encoded by JIS code set at the present. In the not-too-distant future,they shall be changed to larger sets like Unicode. But there always are characters that can not be input. The solution of the Gaiji (missing characters) is the most important subject for the projects like SAT. Now SAT has about 90 published e-texts, and they include over 7 million characters. Over 17,000 characters cannot be input with JIS and about 1,500 with Unicode.
Following the KanjiBase developed by Dr. Christian Wittern,we now use SGML-style placeholders that are both standardized and system-independent. And we are investigating the empty-element-tag of XML as a new solution.