470 likes | 624 Views
LRC-XI-11 th Annual Internationalisation and Localisation Conference. A Paper On Automating the HTML Localisation Process: An Implementation Using a Java Internationalisation Approach. Presented By: Prof. Manikrao L. Dhore Mr. Abhishek K. Dhote Department of Computer Engineering
E N D
LRC-XI-11th Annual Internationalisation and Localisation Conference A Paper On Automating the HTML Localisation Process: An Implementation Using a Java Internationalisation Approach Presented By: Prof. Manikrao L. Dhore Mr. Abhishek K. Dhote Department of Computer Engineering Vishwakarma Institute of Technology, Pune, India Organised By: Localisation Research Centre (LRC), Department of Computer Science and Information Systems (CSIS), University of Limerick,Limerick,Ireland.
Agenda • Introduction • Why Web Page Localisation? • Borderless Integration • Why Multilingual Web Sites? • What is Locale and multi-locale Operation? • Internationalisation and Key Challenges • I18n Standard: Important Issues and Business Context • Variance : Regional and Cultural Issues • System Design • Web Localisation and Rural India • Localization Approaches • Architecture of Servers • System Implementation and Test Results • Configuration of Server • Localisation Test Results • Alternative Approach • Conclusion • References
Service Sector Online Business Banking Sector Why Web Page Localisation? International Market and Customers Web Localisation Internet • Increased Sales Leads • Advantage of Global growth • Reduce Marketing Costs Information Repository Closed Linguistic Barriers Open Linguistic Barriers Objective Information Convenience
Borderless Integration Model Business Process Local Business Entities Customer Integration Logic Resource Mapping Global Global Integration Deployment Business Logic Market Research Analyse Optimize Process Internet Framework
Why Multilingual Websites? • Over 100 million people access the Internet in a language other than English. • Over 50% of web users speak native language other than English • According to Forrester research, 50% of all online sales are expected to occur outside USA. • Web users are four times more likely to purchase from a site that communicates in the customer’s native language. “Your website is your window to the world…”
Basic Terminology • Locale • Set of features that can be varied depending on the language and culture of the user or the data • Internationalisation • The process of designing software so that it can be easily adapted to different locales • Localisation • The process of adapting software to a locale
What is Locale? • A locale is an abstraction: a data processing structure that identifies a collection of culturally and linguistically affected preferences. • Java locales are associated with upwards of 300 pieces of data • time zone names • collation sequences • the infinity symbol • Number formats • Days of the week • Locales generally do not contain this data themselves. They represent a way of obtaining “localized behavior” in the system. • Locales are generally part of the programming context or environment.
Client Locale Client Locale Message Passing Message Passing Logic Execution Logic Execution Multi-Locale Operation Server Processes System Context Context Separation Design Policy APIs provide late bindinglocalisation
Internationalisation • "I18n" is an abbreviation for the word "Internationalisation". The term "i18n" is derived from its spelling as the letter "i" plus 18 letters plus the letter "n". I+n1t2e3r4n5a6t7i8o9n10a11l12i13s14a15t16i17o18+n • The extension of this naming convention to the terms Localisation (l10n), Europeanisation (e13n), Japanisation (j10n), Globalisation (g11n), seemed to come somewhat after the invention of "i18n". • Potentially handle multiple languages, customs in the world • Displaying/ Inputting characters for the users' native languages. • Handling popular encoding for the users' native languages. • Native characters for file names and other items. • Character classification & sorting. • Typesetting and hyphenation rules.
Standards Encoding and Character Set • Unicode support and implementation • Use of language specific encoding • Configuring encoding Locale and Parameterisation • Availability, Performance • Continuity of i18n features • Translation Data Correspondence • UI design • Handling collation • Migration of existing data Presentation, Processing Reference Information Key Challenges
Character encodings Date/Time Culture context Language rules UI preferences Currency Content management Localization Business impact Important Issues in I18n
To improve effectiveness of globally distributed business users by providing language/culture specific application/product/service interfaces To reach out to global customer base by providing language/culture specific interfaces and allow for international preferences. Internationalisation New Application New Service New Product Mergers / Acquisitions. To consolidate same functionality application/service developed and maintained separately for separate language/region. Old Product Old Application Existing Service To support region specific functionality (due to legal aspects, financial practice etc.). To provide region specific value added services (like UI, look and feel, Sorting/Searching). Business Context of I18n
Regional and Cultural Differences • Software solutions should be designed to fit into the cultural context of the user • Examples • Naming of the product • Differences in the meanings of jargons • Confusing graphical symbols • National rules, conventions • Religious beliefs and assumptions • Basic cultural values and customs • No appropriate translations available for phrases and slogans • Favorite sports and slangs • cultural anachronisms • Reading left-to-right, top-to-bottom etc…
Language and Character Encoding • Language peculiarities • Hyphenation • Collation • Spelling • Transliteration English: ABC...RSTUVWXYZ German: AÄB...NOÖ...SßTUÜV…YZ Swedish/Finnish: AB...STUVWXYZÅÄÖ Norwegian: AB…VWXYÜZÆØÅ • There are various “standards” and they are varied for different languages • ISO standards: ISO-8859-1,2,3,4,5,6,7, Windows-1252 • Chinese encodings: Big5, Big5-HKCS, GB18030, GB2312 • Japanese and Korean: EUC-JP,EUC-KR, ISO-2022-JP, ISO-2022-KR
Unicode Character Standard • Developed by the Unicode Consortium • Covers all major living scripts • Version 4.0 has 96,000+ characters • Capacity for 1 million+ characters • Unicode Character Set = ISO 10646 • Unicode adds character properties and algorithms • ISO and Unicode work together to synchronize • ISO support enhances international acceptance
Date / Time Formats Variance • Hour minute separators,AM,PM,TimeZone • India : 4:00 P.M. • U.S.A. : 4:00 p.m. • France : 16.00 • Japan : 1600 • Japan : 4:00
Numbers / Currency Variance • Varieties in group and fractional separators • India : 12,34,567.89 • England : 12,345.67 • Germany : 12.345,67 • Switzerland: 12’345,67 • Swiss money: 12’345.67 • France : 12 345,67 • Varieties in symbol placement, symbol length, precision, number width, rounding rules • India : Rs. 12,34,567.89 ; Re. 1 • U.S.A : US $1,234,567.89 • France : 12.345,67 € • Portuguese : 12$34ESC • Portuguese : 12$34€
Data Source : 2001 Census of India Number Percentage Hindi 337,272,114 40.22% Bengali 69,595,738 8.30% Telugu 66,017,615 7.87% Marathi 62,481,681 7.45% Tamil 53,006,368 6.32% Urdu 43,406,932 5.18% Gujarati 40,673,814 4.85% Kannada 32,753,676 3.91% Malayalam 30,377,176 3.62% Oriya 28,061,313 3.35% Punjabi 23,378,744 2.79% Assamese 13,079,696 1.56% Sindhi 2,122,848 0.25% Nepali 2,076,645 0.25% Konkani 1,760,607 0.21% Manipuri 1,270,216 0.15% Kashmiri 56,693 0.01% Sanskrit 49,736 0.01% Other Languages 31,142,376 3.71% Total : 838,583,988 100.00% Percentage Languages Usage Index Language
Indian Currency Example Indian Currency (Value Rs. 10) Population resides in villages of India : 70% Total number of Languages in India : 40 Official Languages : 22 Language Panel Overall Literacy Rate : 64.20 % English Language Literacy : 17.75 % 15 major Indian Languages
Internationalisation Text Extraction Translation Localisation Information Channelisation Prepare material for localisation (account for text expansion, avoid embedded text..) Extract text from source Files (graphics, PDFs etc.) Translate content from Extracted materials Replace graphics, change colors, redesign layout to accommodate target culture.
Site Acceptance Factors • Color • Image • Representation Translation Errors Text Placement in Separate File Web page is “dynamically” converted into target language Language selection Static web page is selected and displayed Mapping Techniques Late Binding Localisation Translation Localisation Process
HTML Server Parse Request Module Client Browser_2 Client Browser_3 Client Browser_n Client Browser_1 Server Architecture S O C K E T A P I Localised Content -------- -------- -------- -------- Default Alternative Language Response Property File --------- --------- --------- ---------
Implementation: Parse Request Module • Definition • To parse the request header • Responsibilities • To parse the request header • To analyze and forward the request • Provide log to the administrator • Compositions • Main server loop • Threads • Interfaces/Ports • Socket APIs
Thread 1 Main Server Loop Thread 2 Thread 3 Thread 4 Thread 5 Thread n Parse Request Module Architecture
HTML Server • Definition • Default implementation of HTTP protocol • Processes static HTML requests • Responsibilities • Process static HTML request • Process dynamic Internationalisation request • Compositions • Server Processes • Interfaces/Ports • Socket APIs
GET Request Processor Static Response -------- -------- -------- -------- Default Language Default Language Alternative Language Alternative Language .properties --------- --------- --------- --------- Parse Protocol GET/POST Static Response -------- -------- -------- -------- POST Request Processor HTML Server Architecture
Java Support for Internationalisation • The Locale class lets applications identify locales, allowing for truly multilingual applications. • The ResourceBundle class provides the foundation for localisation, including localization for multiple locales in a single application container. • The Date, Calendar, and TimeZone classes provide the basis for time handling around the globe. • The String and Character classes as well as the java.text package contain rich functionality for text processing, formatting, and parsing. • Text stream input and output classes support converting text between Unicode and other character encoding.
Conversion Process • Character conversion is a pretty straightforward process as long as there is a one-to-one mapping between sequences of Unicode characters on one side and sequences of bytes in another encoding on the other side, and the input only consists of characters or bytes that have mappings. • The reality is : • A single character in a non-Unicode encoding may have multiple equivalent representations (say, a precomposed character and a sequence of base character and combining mark). • A character in one encoding may not have an equivalent in the other encoding. • An invalid sequence of bytes or characters may show up in the input.
Conclusion • The Java Localisation API`s come in handy to dynamically localise the web page into alternative languages • The rich set of Java class libraries such as java.util.ResourceBundle and java.util.Locale provide an efficient approach to work with locale specific information • More manageable workspace for users in native language • Regional Settings, Colour, Image representation not disturbed • Improves effectiveness of globally distributed business users by providing language/culture specific application/product/service interfaces • Supports region specific functionality (due to legal aspects, financial practice etc.). • Provides region specific value added services (like UI, look and feel, Sorting/Searching). consolidate same functionality application/service developed and maintained separately for separate language/region.
References [1]. Fernandez, N. C. (2000), Web Site Localisation and Internationalisation: A Case study, published, City University [2]. Khachane, J, (2005), Web Page Localisation, published Pune University [3]. DEPALMA, D.A. (1999), Strategies for Global Sites, Forrester Research Inc, May 1998 and The eBusiness Report. In: eMarketer [4]. ROCHE, M. (2000) Managing Multilingual Web Applications. 16th International Unicode Conference, Amsterdam [5]. NIELSEN, J. (1999) Designing Web Usability, Indianapolis: New Riders Publishing [6]. Deitsch, Loukides, M, Java Internationalisation