260 likes | 415 Views
Multilingual Interfaces for Biodiversity Information. Guy Baillargeon Agriculture and Agri-Food Canada, Ottawa. Key Innovations in Biodiversity Informatics Indaiatuba, SP, Brazil, 21-22 Oct . 2002. Abstract.
E N D
Multilingual Interfaces for Biodiversity Information Guy Baillargeon Agriculture and Agri-Food Canada, Ottawa Key Innovations in Biodiversity InformaticsIndaiatuba, SP, Brazil, 21-22 Oct. 2002
Abstract The Global Biodiversity Information Facility (GBIF) intends to promote standards and software tools designed to facilitate their adaptation into multiple languages. Countries, economies and organizations participating in GBIF are invited to develop novel user interface designs that incorporate features to support their functionality in a multi‑lingual global context and to develop standards and protocols for indexing, validation, documentation and quality control in multiple human languages, character sets and computer encodings. Ensuring that GBIF applications perform well in any language and that biodiversity data can be put to good use independently of the language of the primary records is a formidable but achievable challenge. It is very much a requirement if we want GBIF to fulfill its potential. How some of this can be done will be demonstrated by outlining the steps required to add a new language to the Integrated Taxonomic Information System (ITIS) and to the Biological Observations, Specimens and Collections (BiOSC) Gateway. A new Portuguese version developed incooperation with Brazil will be shown for the first time.
Think globally • 92% of the world population speaks little or no English • 20 main Asian languages • 15 main European languages Source: http://www.alis.com/pdf/GlobalisationEN.pdf
An enormous diversity Source: http://www.ethnologue.com/ethno_docs/distribution.asp
Human languages hit parade Source: http://www.krysstal.com/spoken.html combined with http://www.multilingualplanet.com/most_spoken_languages.htm
World on line population As of Sept. 2002 Total: 619 millions Source: http://www.glreach.com/globstats/
Non-English growing faster Source: http://global-reach.biz/globstats/evol.htm
Multiple languages • English Internet users • 2000: 58% • 2005: < 35% • Non English online traffic • 2000: 40% • 2005: 70% Source: http://www.alis.com/pdf/GlobalisationEN.pdf
GBIF MOU Goals • 2. “It is the intention of the Participants that GBIF: • […] • (d) promote standards and software tools designed to facilitate their adaptation into multiple languages, character sets and computer encodings”. • […] Source: http://www.gbif.net/moufinal.doc
GBIF MOU Scope of activities • 4 (a) (iii) “Developing suitable tools and standards for accessing, linking and analysing new and existing databases, including standards and protocols for indexing, validation, documentation and quality control in multiple human languages, character sets and computer encodings;” Source: http://www.gbif.net/moufinal.doc
ITIS North America • A joint project by • United States • Canada • Mexico • sis.agr.gc.ca/itis
Canadian context Translation module Other multilingual applic. Canadian context • Requirement for a bilingual version of ITIS in Canada • Not changing the underlying data model • Wanted reusable components • Capability to handle other languages as well ITIS Client Browser
Introducing ITIS*Brazil • A new version of ITIS in Portuguese • SIIT*Brasil - Versão em português • Developed in cooperation with CRIA • August - Sept. 2002 • www.itis.cria.org.br
DEMIS map WMS layer WMS layer ITIS/BiOSC Data flow diagram DB Translation module Other multilingual applic. DB REMIB 4 DB BiOSC Gateway TSA DB DB ENHSIN DB AVH 3 DB ITIS DIGIR DB WMS map server 1 : Query ITIS 2 : Click Map it! button 3 : Get record index data from BiOSC 4 : Get full record from data owner 1 2 Client Browser
How is it done • Selective translation • Semantic partitioning • Automated rendering • Localisation • Cultural conventions (date format, decimal separators, number format) • Alternate spellings
Architecture • Single multilingual application server Each stored procedures • Handles all languages • Locale sensitive • Single character set on a single encoding • Linguistic sorting
General issues • Treat look and feel independently from language issues • Determine user preference • Handle non-ASCII form input and query strings • Enable procedure for content translation • Tag HTML output with encoding information
Look and feel • Stored as blocs of static HTML components • Header • Footer • Background • Images, buttons, logos
User preference • Language and locale defined via passable parameter • User selectable
Query string handling • URLs can only be encoded in 7-bit ASCII • 8-bit bytes are transformed into their hexadecimal representation prefixed by a percent sign • Requires decoding by the application • German word “Schloß” converted to • Schlo%c3%9f (Unicode) • Schlo%DF (Latin)
Base letter conversion • Convert accented character to unaccented for easier query • éèêë to e • òóóõö to o • àáâãäå to a • ùúûü to u • ç to c • ñ to n • Output in correct (accented) form
Alternate spellings • German, Danish and Swedish • ä to ae • ö to oe • ü to ue • å to aa • ø to oe • æ to ae • Output in standard format
Translation table • All translation strings are externalized to a database table • String_id, Language_id, Translation • Primary key on String_id and Language_id • Translations are retrieved via SQL
Code snippet htp.prn(ctislib.multitext(177,p_lang)||': '|| ctislib.multidata(p_lang,90, v_info_cursor.currency_rating)); en: Current Standing: accepted fr: Statut: accepté es: Estado actual: aceptado pt: Posição atual: aceito Multitext function accepts (string id number, language parameter) to translate application text Multidata function accepts (language parameter,table id number, text to be translated) to translate data
Conclusion • Translation module works well for Western European languages • Could probably easily handle other languages using Latin script • Could probably expand to other alphabets such as Greek and Cyrillic • The big challenge: pictorial scripts • Japanese, Chinese, Korean …
Credits • Canada • Guy Baillargeon • Derek Munro • Brazil • Vanderlei Perez Canhos • Dora Ann Lange Canhos • Sidnei de Souza