1 / 7

Outreaching with Print resources in the digital Age

Outreaching with Print resources in the digital Age. Jidong Yang University of Michigan. Problems with CJK encodings.

edie
Download Presentation

Outreaching with Print resources in the digital Age

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Outreaching with Print resources in the digital Age Jidong Yang University of Michigan

  2. Problems with CJK encodings • The slow expansion of Chinese encodings: from GB 2312 (about 6,700 characters), Big 5 (13,000 characters) to GBK (22,000 characters), GB 18030 (27,000 characters), GB 18030-2005 (more than 70,000 characters) and Unicode Version 5 (similar to GB 18030-2005). • Not all computers have all the characters. Many existing databases are built on earlier encodings. • Mainstream Japanese encodings: JIS and EUC, each has less than 7,000 kanji characters.

  3. 国立国会図書館資料デジタル化の手引き

  4. The issue of OCR accuracy • When handling contemporary Chinese and Japanese publications in good conditions, the best OCR software can hardly achieve an accuracy rate better than 95%. • When processing pre-modern CJK texts, the OCR accuracy drops down to 30-40% or even lower. • Many database companies keep their OCR accuracy rate secret.

  5. The early stage of digital scholarship • New research methods and tools suitable for digital resources are still rare and need to be invented. • A great number of research tools in print formats still retain their values, at least for now.

  6. Databases vs. print indexes • How to find information about Kumārajīva in the Gaoseng zhuan高僧傳? • Search by Jiumoluoshi 鳩摩羅什? • Not enough! Try: Jiumoluoqipo 鳩摩羅耆婆, Shi 什, Shigong 什公, Shishi 什師, Tongshou 童壽, and Luoshi 羅什. ––– All can be found in Ryō kōsō den sakuin梁高僧傳索引, compiled by Makita Tairyō 牧田諦亮. • Databases are not necessarily better than print indexes.

  7. Conclusion • The computer still cannot match the book in the capability of presenting the full range of East Asian languages and cultures. • Print resources are still necessary for most serious researches on East Asia. • It’s our job to make the value of our print collections known to the patrons.

More Related