1 / 16

Health Information Standardization and Asian Languages

Health Information Standardization and Asian Languages. Michio Kimura M.D. Ph.D. Director and Professor of Medical Informatics Department Hamamatsu University School of Medicine HL7 Japan chair. Three types of representation -- We have 2 patient names in HIS. Alphabetic Ideographic

Download Presentation

Health Information Standardization and Asian Languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Health Information Standardization and Asian Languages Michio Kimura M.D. Ph.D. Director and Professor of Medical Informatics Department Hamamatsu University School of Medicine HL7 Japan chair

  2. Three types of representation-- We have 2 patient names in HIS • Alphabetic • Ideographic • Phonetic • Ideographic names • have many ways to pronounce • are difficult to sort Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine

  3. Multi-Byte Character Codesin Use in Asia • Korea: KS X 1001, and 1001 annex 3 • Hanguls(phonetic) and Ideographics • China(PR): GB 18030-2000 • Taiwan(ROC): CNS 11643, and Big-5 • Japan: JIS X 0208-1997 • Katakana, Hiragana(Ph.) and Ideographics • Junior school pupils must read/write 810 letters. • Varieties: 6879(JIS) to 48711(CNS) Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine

  4. ISO 2022-1983 Multi-Byte Extension Technique • Base set is usually ASCII 1-byte(ISO 646) • Defines ESCAPE sequence to set character set to G0 or G2 • Not necessarily multi-byte, to set ISO8859-1: ESC . A • If the set is 2-byte, it is assumed that following codes are recognized 2 bytes each. • To set JIS X 0208: ESC $ B • To set KS C 5601: ESC $ ( C • To set GB 2312: ESC $ A • To come back to ASCII: ESC ( B Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine

  5. Byte-wise Representation of ISO2022 Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine

  6. RFC 1468: Japanese Character Encoding for Internet Messages • ISO-2022-JP • Within 7-bit, safe for most nodes • Every line starts/ends with ASCII • No carryover shifting • ISO-2022-KR is also used in Korea • Same method is in DICOM(Supplement 9), and HL7 v.2.3.1 Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine

  7. UNICODE: ISO10646 • “Allocating 2 bytes for every character, UNICODE can represent every character in the world without any status nor shifting technique.” • 16 bits=65,536 • -> CJK unified ideographics Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine

  8. CJK Unified Ideographics Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine

  9. Why we do not use UNICODE as Message? (I know it is used inside, but, we do not like it go outside as message format.) • If Chinese “Bone” and our “Bone” are to be recognized same, because of symmetry, how about using these? • UNICODE consortium says “Introduction of Language information”. • We cannot write “Chinese language textbook written in Japanese. • We cannot accommodate Koreans living in Japan with their name properly in Korean letter, but their address is Japanese, of course. • Original UNICODE dream is gone. Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine

  10. UTF-8: Transformation format of UNICODE • UNICODE is originally 2 byte for every character. • 0000-007F: 0xxxxxxx • 0080-07FF: 110xxxxx 10xxxxxx • 0800-FFFF: 1110xxx 10xxxxxx 10xxxxxx • 1 Byte: ASCII • 2 Bytes: Latin extensions, Greek, Russian, Arabic, Thai, Hangul, Katakana, Hiragana, etc. • 3 Bytes: CJK ideographics • ASCII characters are compatible ASCII, ASCII users can say “we are universal, because we use UNICODE,” in the demerit of ideographic users. Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine

  11. Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine

  12. Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine

  13. Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine

  14. HL7 Japan’s answer to HL7 v.3 • In XML, UNICODE will be default in 2003. • Even in UNICODE v3.1, “over-unification” problem is not solved. • But with XML schema and XML namespace, font information can be set in each tag. • By this, Korean name in Japanese address can be described. • Original UNICODE dream (all languages in the same time) is gone, but “many 1 byte languages + one 2 byte language” is not bad. • Pokémon • Answer: “UNICODE can be default, provided that we can continue to use each local practice now being used.” Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine

  15. Language representation is not the only issue • Language used in; • Conversation with patients • School education • Medical, Nurse, Technicians • Medical record • Signs and symptoms • Reports • Structure of data types • Address • 250 Wu-Hsing street • 1-20-1 Handa cho Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine

  16. Final Remarks • Some OS (Windows NT 4.0 or later) are using UNICODE inside. • I do not blame their ignorance, maybe they just didn’t know. • I oppose any proposals with “UNICODE is the only way”. • When using UNICODE, pay attention to each language’s proper fonts • Let’s collaborate and agree on XML namespace for language to be used, and submit to standards. • Please take part in APAMI census for healthcare languages Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine

More Related