1 / 1

World Languages

College of Engineering Temple University. An Introduction to Mandarin Speech Recognition. Mandarin. English. John Steinberg & Dr. Joseph Picone Department of Electrical and Computer Engineering, Temple University, Philadelphia, Pennsylvania. Abstract

ursa
Download Presentation

World Languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. College of Engineering Temple University An Introduction to Mandarin Speech Recognition Mandarin English John Steinberg & Dr. Joseph PiconeDepartment of Electrical and Computer Engineering, Temple University, Philadelphia, Pennsylvania • Abstract • Speech recognition has recently gained a great deal of popularity in video gaming, robotics, speech prosthetics, and mobile technology. With China becoming an increasingly important player in the world economy, a new demand for Mandarin speech recognition has emerged. • This poster will discuss • Applications of speech recognition • Obstacles in English vs. Mandarin speech recognition • Methods used to successfully implement Mandarin speech recognition systems. • Benchmarks in the history of Mandarin speech recognition Speech Recognition Systems World Languages • Mandarin Chinese • Character Based • 8000+ characters compose 80k-200k common words • Act as morphemes • Monosyllabic • Have single associated tone • Tonal Language • 4 distinct tones, 1 neutral tone • Highly Contextual • Few syllables compared to English • English: ~10,000 • Mandarin: ~1300 • High # of homo-phones • 7 instances of “ma” • Coarticulation • Context can cause changes in tone • Bu4 + Dui4 = Bu2 Dui4 • Ni3 + Hao3 = Ni2 Hao3 Language families across the world [6] [1] • English speakers in the World: ~ 350 million [2] • Estimated # of current English learners in China: 200-350 million [3] • Estimated # of native Mandarin speakers: 1+ Billion Speech Recognition Applications Other Applications Auto/GPS • Translators • Prostheses • Language Educ. • Multimedia Search Mobile Technology National Intelligence Language families by speaker population [6] • A Timeline of Mandarin Speech Recognition • Conclusions • Mandarin speech recognition is a developing and increasingly important field • Mandarin speech recognition involves many challenges including large numbers of homophones, word & sentence flexibility, a lack of a segmented lexicon, and coarticulation effects. • Possible improvements to CER may be found in prosodic features and language models • Reference • [1] Lee, C-H. “Advances in Chinese Spoken Language Processing”, World Scientific Publishing Co., Singapore, 2007 pp 128 • [2] Lin-Shan Lee; Keh-Jiann Chen; et. al. , "Golden Mandarin(II)-an intelligent Mandarin dictation machine for Chinese character input with adaptation/learning functions," ISSIPNN '94, 1994 pp.155-159 vol.1, 13-16 Apr 1994 • [3] Ren-Yuan Lyu; Lee-Feng Chien; et. al., "Golden Mandarin (III)-a user-adaptive prosodic-segment-based Mandarin dictation machine for Chinese language with very large vocabulary," ICASSP-95, vol.1, pp.57-60 vol.1, 9-12 May 1995 [4] “The History of Automatic Speech Recognition Evaluations at NIST,” 2009 http://www.itl.nist.gov/iad/mig/publications/ASRhistory/index.html [5] Lee, L.S.; Tseng, C.Y.; Gu, H.Y.; et. al., "Golden Mandarin (I)-A real-time Mandarin speech dictation machine for Chinese language with very large vocabulary," Speech and Audio Processing, IEEE Transactions on , vol.1, no.2, pp.158-179, Apr 1993 [6] ]   K. Kūriákī, A Grammar of Modern Indo-European, Asociación Cultural Dnghu, 2007 • Modeling Methods • Prosodic Features • Models rhythm, focus, and intonation of speech • Reduces search space (i.e. reduces number of homophones and constrains sentence flexibility) • Language Models • Reduces solution space by using probable combinations of phones, words, phrases, etc. • Challenges in Mandarin Speech Recognition • Highly developed language model is required due to highly contextual nature of Mandarin • Tone modeling • Coarticulation • Large # of homophones • Chinese text is unsegmented • No standard lexicon • Chinese sentence/word structure is very flexible • Ex: Beijing Da Xue -> Bei Da • Prosodic Features • Used for segmentation • Measures pause duration, segment / syllable duration, and F0 reset. • Language Models • N-Grams • Probabilities of the occurrence of N sequences of words • Neural-Networks • High dimensionality • Random Forests • Incorporates morphology into language model • A Timeline of Benchmarks [1] [4]

More Related