1 / 32

Make the WorldCat the World Catalog: How to optimize multilingual searching

2. WorldCat

auberta
Download Presentation

Make the WorldCat the World Catalog: How to optimize multilingual searching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. 1 Make the WorldCat the World Catalog: How to optimize multilingual searching Charlene Chou Columbia University Libraries at OCLC CJK Users Group 2009 Annual Meeting AAS/CEAL Conference March 27, 2009 http://www.columbia.edu/~cc179/

    2. 2 WorldCat & multilingual searching An ideal world catalog Strengths of multilingual searching in the WorldCat Current challenges of multilingual searching in the WorldCat Suggestion: language-specific approach + cross-language collaboration A wish list for improvement

    3. 3 An ideal world catalog A global digital library for print/non-print/electronic resources Cross-language searching Interdisciplinary resources Cross-country information Technology-supported services for librarians to assist multilingual users in using the WorldCat The key challenge: multilingual searching

    4. 4 WorldCat: Strengths Many non-English records/vendor records loaded in recent years— a more diverse database Greatly useful for acquisitions/selection/reference since we receive books in US later than in Asia Publishing information helps cataloging, e.g. serials A great source for authority control, e.g. knowing the birth year for the author to break conflict or separate record from undifferentiated record (N5L records) Summary notes—CNPITC (China National Publishing Industry Trading Corporation) records Keep 520 contents notes in Chinese, as they may provide a service to some using WorldCat

    5. 5 WorldCat: Strengths Parallel record policy: Non-English record can be used as a base record for creating an English-version/PCC record Can search non-English records in non-Latin scripts including non-MARC8 characters (N5L records) Automatic conversion between traditional and simplified scripts for searching Chinese in the WorldCat, but not in our ILS--Voyager Adding non-Latin references to authority records (NAF) since July 2008 Easier to search names/titles in CJK now OCLC language sets available in 17 languages including CJK

    6. 6 Multilingual searching: Challenges The complexity of each language, e.g. Chinese Traditional and simplified scripts & different encoding schemes Chinese forms vary in Japanese, other minority languages such as Tibetan or Mongolian, or different regions, e.g. Hong Kong or Taiwan For example, “?” used in standard modern Japanese means "daughter" and not "mother.“ “??” used in standard modern Japanese means "old lady" and not "wife." Also spaces and sorting for original scripts as well as word division for Romanization Multilanguage is not just translation but related to culture. Unicode display, esp. for non-Latin scripts, not ready for certain languages or characters Great to have both Romanized forms and original script for search options; Romanized form as controlled form can keep consistency, but also creates some ambiguity. Currently Web browser displays multi-scripts and fonts; Babel Fish for translating websites; website has language choice; speech recognition software available Translation machine is just one component of complex multilingual system (John White) (5) Multilingual metadata is cross-cultural retrieval--far more than mere cross-language searching (Cliff Lynch) Multilingual, multicultural and multimedia digital libraries HKCAN is bilingual but may be multicultural in certain contexts MARBI 2001-DP05: rename as “context-sensitive” authority record From multilingual to multi-context? Global vs. localized library VIAF/LEAF—matching “glocalization” movement? (12) Google: multilingual vs. localized site Currently Web browser displays multi-scripts and fonts; Babel Fish for translating websites; website has language choice; speech recognition software available Translation machine is just one component of complex multilingual system (John White) (5) Multilingual metadata is cross-cultural retrieval--far more than mere cross-language searching (Cliff Lynch) Multilingual, multicultural and multimedia digital libraries HKCAN is bilingual but may be multicultural in certain contexts MARBI 2001-DP05: rename as “context-sensitive” authority record From multilingual to multi-context? Global vs. localized library VIAF/LEAF—matching “glocalization” movement? (12) Google: multilingual vs. localized site

    7. 7 WorldCat: Current challenges Conflicting information—quality control Same authors but in different headings, e.g. birth year Monograph vs. serial, e.g. many monograph records for the same serial title Some catalogers’ concerns for non-English records (vendor or Asian produced): More time-consuming to enhance an existing record that does not comply with our cataloging rules, e.g. subject headings (650 4) Different Romanized forms, such as word division (the mono-syllabic word division)

    8. 8 WorldCat: Current challenges Lack of certain resources: Easier to search via Google or other search engines for certain language-specific resources, including academic resources, e.g. professor or writer information Unicode display: MARC-8 limitation for now Certain languages not available, e.g. Tibetan Example: an author name in original script may be in several various forms in the WorldCat, e.g. Wang “Hao” (?—non-MARC8 character). Easier to view all titles under certain authors in other online catalogs in Asia, e.g. CALIS

    9. 9 Search by ??, only three records retrieved in the WorldCat

    10. 10 Search “Wang, Hao, 1964-” in Connexion, only retrieved 5 correct records

    11. 11 Same author with an incorrect Chinese character…

    12. 12 If Unicode is fully implemented, all under one name rather than 3 versions…

    13. 13 It gets worse if not in Unicode, e.g. Kai dao tu “mi “--mixed forms in Asia too

    14. 14 Unicode in WorldCat: Current challenges Both OCLC & MARC21 have expanded its coverage to UTF-8 Key challenge: slow implementation in ILS Certain scripts available in Unicode but not implemented in WorldCat, e.g. Tibetan Unicode for Tibetan was released ca. 3 years ago. Recently, MS Vista was released with Tibetan Unicode font. WorldCat has plans to implement, but it is not on priority list. Unicode may be the only choice for now, but a better solution may be available in future when technology available. Hopefully, all computer systems are likely to provide integral support for multi-script working in the next ten years…

    15. 15 Comparison: Chinese characters much less available in MARC8 MARC8 (EACC)—16,000+ (used for record exchange in US) Unicode basics—23,000+ characters: mostly available on web/OPAC in Asia & web in US Unicode extended, such as Fang Zhen Extended Dictionary—71,000 characters available in National Library of China but not in most OPAC in Asia

    16. 16 Multilingual Authority Records in the WorldCat Adding non-Latin references to the authority records since July 2008

    17. 17 Adding non-Latin scripts to NAF Romanized forms work fine as controlled forms and for non-MARC/Unicode characters Make updating authority record of undifferentiated names much more efficient Help disambiguate names Find the authorized form easier via original-script searching, e.g. Chinese translation from Korean, Russian, or Tibetan authors, etc. For Chinese cataloging, heavily rely on online catalogs in Asia and internet resources for authority work, esp. for breaking conflict in undifferentiated-name record or finding more bio information for names.

    18. 18 Adding Chinese form to a Russian record: easier to find in Chinese

More Related