60 likes | 214 Views
REPORT FROM MALAYSIA O’COCOSDA 2010. Zuraidah Mohd Don. Speech Corpus: the Universiti Sains Malaysia (USM) Research Groups. MASS: A Malay language LVCSR corpus resource (collaboration between USM, MMU and NTU from Singapore).
E N D
REPORT FROM MALAYSIAO’COCOSDA 2010 Zuraidah Mohd Don
Speech Corpus: the Universiti Sains Malaysia (USM) Research Groups • MASS: A Malay language LVCSR corpus resource (collaboration between USM, MMU and NTU from Singapore). • Developing speech, text and pronunciation dictionary resources for the purpose of building a large vocabulary speech recognizer for Malay • Speech corpus: 70 hours of read aloud speech (speaker independent/ dependent and accent independent/ dependent) from 90 L1 speakers and 10 hours broadcast news from local TV stations. • Written corpus: 700 Mbytes of data extracted from Malaysia's local news web pages from 1998-2008 • The aim: to develop a rule based G2P tool to generate a pronunciation dictionary.
Speech synthesis and recognition: The University of Malaya Research Groups • Continuous speech recognition for Arabic based on HTK-toolkit • Continuous speech recognition for Malay based on HTK-toolkit • Arabic LVCSR using a large speech corpus for Automatic Speech Recognition • Malay neutral speech database for developing HMM-based Malay TTS • Malay Emotional Speech Corpus for Emotional Speech Synthesis • Grapheme to phoneme converter for Standard Malay • Standard Malay phonetic dictionary • Acoustic Model for Malay Automatic Speech Recognition (ASR) • Language Model for Malay ASR • HMM-based emotional speech synthesis for the Malay language
Other applications: Speaker verification • Speaker Verification using Vector Quantization and Hidden Markov Model • The aim is to improve the performance of HMM in a speaker verification system. • It investigates text-dependent speaker verification using an approach combining VQ and HMM. • The proposed technique is evaluated using a Malay 100 speaker spoken digit database obtained in a noise-free environment. • The results are compared with stand alone HMM. • Universiti Kebangsaan Malaysia Research Group, the Department of Electrical, Electronic & System Engineering,
Other applications: ESL context • Heuristics and Rule-Based Approach for Automated Marking Tool for ESL Writing • developing an automated marking tool for ESL and introducing heuristics and a rule-based approach to detect grammatical errors in tenses in ESL essays. • The results show that heuristics and a rule-based approach is useful and can improve the effectiveness of automated essay marking tool for writing in ESL. • UKM Research group: Nur Asma Mohd Razali, Nazlia Omar, Saadiyah Darus
Other applications: database design • Automation of database design through semantic analysis • Using syntactic and semantic heuristics to create a database design in terms of the Entity-Relationship(ER) model through natural language processing. • Research shows that the use of the semantic heuristics may help further improve the results in the automatic detection of the ER elements.