200 likes | 334 Views
Review. What is multilingual computing ? Bilingual , trilingual , vs. Multilingual What are the fundamental issues in multi-lingual computing? Representation of each language in a computer Ways to distinguish different scripts
E N D
Review • What is multilingual computing? • Bilingual, trilingual, vs. Multilingual • What are the fundamental issues in multi-lingual computing? • Representation of each language in a computer • Ways to distinguish different scripts • How can a system be designed so that it can be used by different languages with minimal changes • How can a system be designed so that it can be used for multiple languages
Characteristics of different scripts • What is a script? • What are the different types of scripts and examples of them ? • Token-based/Alphabet-based scripts, • phonetic based scripts, • Ideographs • What is a phonetic transcription system and examples of them? • What is Romanization?
Characteristics of Chinese • Graphemics • Variant writing (e.g. 教 都) • Phonetics ( the sound,音) • Types of phonemes • Semantics (the meaning,義 ) • Independence of meaning
Computer representation of characters • Selection of a finite set of characters → character set • Uniqueness → each character/symbol • Design of a coded character set → codeset • Uniqueness → each codepoint assignment • Different coding length → different codesets • What are the following terms mean? • Codepoint • Length of a codepoint • Code space • Size of a code space • Code range • Order of characters ( in a char. Set vs. a codeset)
What are the different numerical notations? • Decimal notation • Binary notation • Hexadecimal notation • Scalar value • Characteristics of the ASCII codeset • What is the Row-cell notation? • What are character subsets and why? • Character set comparison operations • Codeset comparison operations • Character set • Codepoint assignment • Compatibility
What is an encoding method and why do we need it? • What is the so called high-bit on scheme? • What are the characteristics of GB-2312? • No. of Rows, No. of columns → code space • Code range? • Major subsets? • Full characters vs. half characters • What are the characteristics of Big5 and Etan Big5? • Rows, columns → code space • Major subsets? • What are UDAs and VDAs for? • HKSCS
Other codesets using high-bit on schemes? • Encodings using designation(指定)? • ISO 2022 • Extended Unix Code(EUC) • What is Charset registry and why? • Problems with different codesets? • Compatibility → wrong interpretation of data • Solutions: Codeset announcement(using designation) and conversion → conversion problems
ISO 10646 and Unicode • What are the design principles of ISO 10646? • What are the different coding structures in ISO 10646? • What is the structure of UCS-4? • What is the characteristics of BMP? • What is the structure of BMP? • What is UCS-2? • What is the compatibility zone for? • What is the difference between ISO 10646 and Unicode? • Big Endian vs Little Endian notation: FEFF vs FFFE
What is Extension A and Extension B? • Where were they coded? • What is Surrogate pairs, what is the need for surrogate pairs, and how does it work? • What is UTF, what is its purpose and how does UTF-8 work? • What is the difference between a character and a glyph? • What is the difference between multi-byte character and wide character ?
Input Methods • What is an input method, why do we need it? • What are the different types of input methods? • What is a keyboard-based input method? • How to design an IM? • What is the basic requirement? • What are the limitations? • What information can be used in IM design? • Who are the main users? • Efficiency consideration? • What are the two types of IM? • Applicability and limitations • What is keyboard arrangement, why do we need it?
Software L10N and I18N • What is L10N and why do we need it? • What is I18N and why do we need it? • What are the principles in I18N? • How to design I18N programs? • What is POSIX and what is its purpose? • What is the name of the POSIX facility for a specific region? • What are the components in a POSIX NLS package? • What is a locale and what are the classes in each locale?
POSIX provides a set of interface functions, how are their behaviors defined and in where? • What are the major files in each locale? • If POSIX where never developed, can you still develop an I18N program on top of an operating system? • What is a symbolic name and where are they used? • How do we know the binary code of a symbolic name? • Programming using wide character data type vs multi-byte characters • What is collation and how does it work?
Open systems • What is an open system? • Why do we want open systems? • What are the measurements of an open system? • What is an open specification? • What are the two types of portability issues? • What mechanisms can be used to improve portability or how can we write portable programs?
Output • What are characters, glyphs and fonts? • What are their relationships and/or difference? • Internal representation vs. external representation • What is the difference of character box and bounding box? • Why should there are space between the character box and bounding box? • What does rendering mean? • What are the two different glyph/font representations
What are the characteristics of bitmap fonts and outline fonts? • Representations, scaling (distortion), space requirement, compression • How to deal with distortion in the scaling of bitmap fonts? • Ad hoc smoothing algorithms • Smoothing spline and interpolation • Understanding of Bazier’s cubic curves • Control points and the equations • Why bitmap to outline conversion is needed? • How does erosion work?
Unicode on different platforms • Unicode is supported on what platforms and in what forms? • Unix, Windows, Mac, Linux, • What is a code page? • Can Unicode be used if the operating system is not coded using Unicode? • Why would encoding needs to be specified when compiling a Java program? • What are the data structures supporting multi-byte and Unicode in Java?
I18N vs. multilingual applications • What is the difference between an I18N program and a multilingual application? • Can a multilingual application be designed/implemented using I18N • What needs to be separately considered in the design of multilingual applications • What is the relationship between multi-lingual applications to Unicode?
IDCs and the IDS • What are ideographic description characters(IDCs)? • Different types of IDCs • Why introducing IDCs? • What is a ideograph description sequence? • How is an IDS between expressed? • For a given character, is its IDS unique? • For a given IDS does it uniquely define a character?
Information retrieval • Differences of IRS from Database system • Basic components of an IRS • What is the purpose of VSM? what are the data associated with a VSM? • What are the similarity functions for? • What is term selection for and methods to do term selection • What kinds of information can be used as weights for the VSM?