Oriental COCOSDA – Country Report Activities on Language Resources and Technologies in Taiwan

Oriental COCOSDA – Country Report Activities on Language Resources and Technologies in Taiwan Hsiao-Chuan WANG Department of Electrical Engineering, National Tsing Hua University, Hsinchu November 24-25, 2010 Kathmandu, Nepal

NGASR II A Joint Research Project sponsored by NSC, Taiwan – Next Generation Automatic Speech Recognition – Phase 2 Prof. Hsiao-Chuan Wang, National Tsing Hua Univ. (2008 ~ 2011)

(1)Speech event detection and automatic labeling for speech corpora To develop methods for speech event and attribute detection To develop methods for automatic labeling of speech database. (2)Validation and labeling of speech corpus Labeling of MATBD – Mandarin broadcast news (by Prof. Hsin-Min Wang) Labeling of TCC300 – Mandarin read speech (by Prof. Yih-Ru Wang) (3) Other corpus Sinica COSPRO and Toolkit – fluent continuous speech databases for the study on Mandarin prosody(by Prof. Chiu-yu Tseng)

TWNAESOP Taiwan Asian English Speech Corpus Project (TWNAESOP) Chiu-yu Tseng, Academia Sinica (2009-2012) The corpus is designed to cover a wide range of phonetic features, namely, segments, phrases, sentences and discourses, thus providing speech data that can be used for prosodic investigations as well. Language: English Speech type: microphone speech Sound file format: *.wav Sampling rate: 16 kHz Bits per sample: 16-bit Channel: mono Speaker population: L1 American English: 12 speakers (6 males, 6 females), L2 English by Taiwan Mandarin speakers: 488 (231 males, 257 females) Corpus size: 8.58GB, around 500 hours at approximately 1 hr/speaker

Hakka Speech Processing Prosody Hierarchy Construction for Hakka Speech Processing Prof. Sin-Horng Chen, National Chiao Tung Univ. (2007 ~ 2010) (1) Hakka speech recognition system (HASR), (2) mixed Hakka-Mandarin speech recognition system (HMASR), and (3) HMM-based HTTS. Hakka Read Speech Corpus Speakers: 43 male and 49 female speakers. Number of files: 10229 Number of syllables: 153911 syllables. Parallel speech database of Mandarin and Hakka Context: 100 Chinese sentences Speakers: Group of Mandarin speakers: 10 male and 10 female speakers. Group of Hakka speakers: 10 male and 10 female speakers.

Multi-lingual Speech Processing Knowledge-Based Processing of Multilingual Speech for Diverse Source Language Prof. Chung-Hsien Wu, National Cheng Kung Univ. (2010 – 2013) (1) spontaneous speech recognition, (2) speech synthesis and (3) language modeling technologies. To construct a multilingual speech communication interface and provide the services for users with different languages, speaking styles and environments. It includes the processing of Mandarin, Min-Nan (usually called Taiwanese in Taiwan), and mixed English.

Oriental COCOSDA – Country Report Activities on Language Resources and Technologies in Taiwan

Oriental COCOSDA – Country Report Activities on Language Resources and Technologies in Taiwan

Presentation Transcript

INTRODUCTION TO THE LANGUAGE ARTS

歐盟第七期計畫 (EU-FP7) 與 NCP-Taiwan 簡介暨 FP7 Energy 及 Environment 徵求計畫說明 www.ncp-taiwan.ntust.edu.tw

Teaching Speaking

Communicative Language Teaching

Geography of the UK

Leading Software Technologies Chennai

Thailand Country Report 2008 on Information Technology

Taiwan SME Policy Formulation and SME Development

Taiwan Semiconductor Manufacturing Company

Leading Software Technologies Chennai

Chapter 14 Topic Tracking, Detection, and Summarization: Some IE Applications

Natural Language Processing

Learning-based MT Approaches for Languages with Limited Resources

Unit 5 : Part A

Out of This World Learning! Activities, Apps & Resources for Aliens & Planets

Let's Learn with a Ball! Activities, Web Tools & Apps (All Ages)