320 likes | 454 Views
2. WorldCat
E N D
1. 1 Make the WorldCat the World Catalog: How to optimize multilingual searching Charlene Chou
Columbia University Libraries
at
OCLC CJK Users Group 2009 Annual Meeting
AAS/CEAL Conference
March 27, 2009
http://www.columbia.edu/~cc179/
2. 2 WorldCat & multilingual searching An ideal world catalog
Strengths of multilingual searching in the WorldCat
Current challenges of multilingual searching in the WorldCat
Suggestion: language-specific approach + cross-language collaboration
A wish list for improvement
3. 3 An ideal world catalog A global digital library for print/non-print/electronic resources
Cross-language searching
Interdisciplinary resources
Cross-country information
Technology-supported services for librarians to assist multilingual users in using the WorldCat
The key challenge: multilingual searching
4. 4 WorldCat: Strengths Many non-English records/vendor records loaded in recent years a more diverse database
Greatly useful for acquisitions/selection/reference since we receive books in US later than in Asia
Publishing information helps cataloging, e.g. serials
A great source for authority control, e.g. knowing the birth year for the author to break conflict or separate record from undifferentiated record (N5L records)
Summary notesCNPITC (China National Publishing Industry Trading Corporation) records
Keep 520 contents notes in Chinese, as they may provide a service to some using WorldCat
5. 5 WorldCat: Strengths Parallel record policy:
Non-English record can be used as a base record for creating an English-version/PCC record
Can search non-English records in non-Latin scripts including non-MARC8 characters (N5L records)
Automatic conversion between traditional and simplified scripts for searching Chinese in the WorldCat, but not in our ILS--Voyager
Adding non-Latin references to authority records (NAF) since July 2008
Easier to search names/titles in CJK now
OCLC language sets available in 17 languages including CJK
6. 6 Multilingual searching: Challenges The complexity of each language, e.g. Chinese
Traditional and simplified scripts & different encoding schemes
Chinese forms vary in Japanese, other minority languages such as Tibetan or Mongolian, or different regions, e.g. Hong Kong or Taiwan
For example, ? used in standard modern Japanese means"daughter" and not "mother. ?? used in standard modern Japanese means "old lady" and not "wife."
Also spaces and sorting for original scripts as well as word division for Romanization
Multilanguage is not just translation but related to culture.
Unicode display, esp. for non-Latin scripts, not ready for certain languages or characters
Great to have both Romanized forms and original script for search options; Romanized form as controlled form can keep consistency, but also creates some ambiguity. Currently Web browser displays multi-scripts and fonts; Babel Fish for translating websites; website has language choice; speech recognition software available
Translation machine is just one component of complex multilingual system (John White) (5)
Multilingual metadata is cross-cultural retrieval--far more than mere cross-language searching (Cliff Lynch)
Multilingual, multicultural and multimedia digital libraries
HKCAN is bilingual but may be multicultural in certain contexts
MARBI 2001-DP05: rename as context-sensitive authority record
From multilingual to multi-context?
Global vs. localized library
VIAF/LEAFmatching glocalization movement? (12)
Google: multilingual vs. localized site
Currently Web browser displays multi-scripts and fonts; Babel Fish for translating websites; website has language choice; speech recognition software available
Translation machine is just one component of complex multilingual system (John White) (5)
Multilingual metadata is cross-cultural retrieval--far more than mere cross-language searching (Cliff Lynch)
Multilingual, multicultural and multimedia digital libraries
HKCAN is bilingual but may be multicultural in certain contexts
MARBI 2001-DP05: rename as context-sensitive authority record
From multilingual to multi-context?
Global vs. localized library
VIAF/LEAFmatching glocalization movement? (12)
Google: multilingual vs. localized site
7. 7 WorldCat: Current challenges Conflicting informationquality control
Same authors but in different headings, e.g. birth year
Monograph vs. serial, e.g. many monograph records for the same serial title
Some catalogers concerns for non-English records (vendor or Asian produced):
More time-consuming to enhance an existing record that does not comply with our cataloging rules, e.g. subject headings (650 4)
Different Romanized forms, such as word division (the mono-syllabic word division)
8. 8 WorldCat: Current challenges Lack of certain resources:
Easier to search via Google or other search engines for certain language-specific resources, including academic resources, e.g. professor or writer information
Unicode display:
MARC-8 limitation for now
Certain languages not available, e.g. Tibetan
Example: an author name in original script may be in several various forms in the WorldCat, e.g. Wang Hao (?non-MARC8 character). Easier to view all titles under certain authors in other online catalogs in Asia, e.g. CALIS
9. 9 Search by ??, only three records retrieved in the WorldCat
10. 10 Search Wang, Hao, 1964- in Connexion, only retrieved 5 correct records
11. 11 Same author with an incorrect Chinese character
12. 12 If Unicode is fully implemented, all under one name rather than 3 versions
13. 13 It gets worse if not in Unicode, e.g. Kai dao tu mi --mixed forms in Asia too
14. 14 Unicode in WorldCat: Current challenges Both OCLC & MARC21 have expanded its coverage to UTF-8
Key challenge: slow implementation in ILS
Certain scripts available in Unicode but not implemented in WorldCat, e.g. Tibetan
Unicode for Tibetan was released ca. 3 years ago. Recently, MS Vista was released with Tibetan Unicode font.
WorldCat has plans to implement, but it is not on priority list.
Unicode may be the only choice for now, but a better solution may be available in future when technology available. Hopefully, all computer systems are likely to provide integral support for multi-script working in the next ten years
15. 15 Comparison: Chinese characters much less available in MARC8 MARC8 (EACC)16,000+ (used for record exchange in US)
Unicode basics23,000+ characters: mostly available on web/OPAC in Asia & web in US
Unicode extended, such as Fang Zhen Extended Dictionary71,000 characters available in National Library of China but not in most OPAC in Asia
16. 16 Multilingual Authority Records in the WorldCat Adding non-Latin references to the authority records since July 2008
17. 17 Adding non-Latin scripts to NAF Romanized forms work fine as controlled forms and for non-MARC/Unicode characters
Make updating authority record of undifferentiated names much more efficient
Help disambiguate names
Find the authorized form easier via original-script searching, e.g. Chinese translation from Korean, Russian, or Tibetan authors, etc.
For Chinese cataloging, heavily rely on online catalogs in Asia and internet resources for authority work, esp. for breaking conflict in undifferentiated-name record or finding more bio information for names.
18. 18 Adding Chinese form to a Russian record: easier to find in Chinese