180 likes | 196 Views
Explore the recent trends in speech technology and the commercial market in Korea, including the establishment of venture companies and the government's support for speech technology research. Learn about the role of the Spoken Language Resources and Assessment Center (SLRAC) in the systematic construction and distribution of Korean language resources and speech I/O assessment methodologies. Discover the ongoing projects and distribution policy of speech and language corpora in Korea.
E N D
Recent Activities of Speech Corpora and Assessment in Korea 2000. 10. 16 Yong-Ju Lee Wonkwang University Korea
Recent trend of speech technology commercial market in Korea • New establishment of the venture company rapidly increased in recent years • Not only for original speech technology solution company • but value added speech application system or service company • CTI, UMS, VoIP, • Voice portal, GPS, • Mobile phone • Toy companies, ...
It’s because • Government’s venture business promotion policy • Speech technology as an attractive item for new technology business • Many researchers and engineers spin off government supported research institutes and big companies • ETRI, LG, SAMSUNG, … • Many labs in University also participated in the business work • Some major international(multinational) corporations also participated in Korean speech technology market • This also influenced to promote the market
Common requirement of speech technology companies • Common use of language resources • Objective methodology for speech I/O assessment • They all deeply interested in establishment of proper organization to handling above problem
Spoken Language Resources and Assessment Center(SLRAC) • Role • Systematic construction and distribution of Korean language resources • Speech only at the initial stage • Active role to preparing speech I/O assessment methodologies • Technical information center for speech technology • Status • SLRAC will be set up at first at Speech And Language Science Laboratory(SALL) in Wonkwang University • And will start their job from late this year
Distribution items • First stage • Results of speech related project funded by government • Common construction results of educational-industrial consortium • Next state • Private results of each organization(industries, Universities & etc) • Long term construction and administration program supported by government
Distribution Policy, ... • Distribution policy • Released to domestic organizations at first • will expand international use with a little time lag • Others • Hope to keep contact and connection with LDC, ELRA • Details will be presented soon!!
Brief introduction of speech and language corpora and assessment related projects • Construction of speech and language corpora and assessment methodologies(2000 ~ 2001, 2 years) • Supported by Ministry of Science and Technology • Goal • Language part (accomplished by KAIST) • methodology design of machine translation system performance evaluation • methodology design of information retrieved system performance evaluation • Speech part(accomplished by Wonkwang Univ.) • construction of 2000 speaker’s telephone speech corpus • isolated and connected digit, Phonetically Rich Words • set up and modification of K-ToBI transcription system and prototyping of prosody DB • Design and construction of speech and language corpus for Korean dictation system
Continued • Results(speech corpora) will be distributed at the SLRAC • Details will be presented in next meeting
Continued • Research on the basic platform of dialog system(Late 2001 ~ 2003, 3 years) • Supported by Ministry of Commerce, Industry of Energy • Accomplished by Seogang Univ. & Wonkwang Univ. • Various kinds of dialog speech corpora will be produced • various tasks and environments
Continued • Automatic captioning the TV program using speech recognition(2000 ~ 2001) • Supported by Ministry of Information and Communication • Accomplished by ETRI • Broadcast news speech corpus will be produced
Continued • Speech interface for internet application(2000 ~ 2002) • Supported by Ministry of Information and Communication • Accomplished by Korea Telecom • Various speech corpora is now preparing • words, phrase and sentence speech corpora for web application
Continued • Brain science research(1998 ~ 2007) • Supported by Ministry of Science and Technology • Accomplished by KAIST • Some speech corpus will be designed for language perceptual study
Continued • Other private industries • Industries and other organizations are preparing(or already prepared) various speech corpora • telephone speech(wire and mobile) • various kind of speech corpora under PC environment • command words for Car environment(voice dialing, GPS, etc.) • etc.
New trial for speech corpora in Korean • Several industries share expense for corpora construction • Experienced group produce it • SLRAC host the project and selling the results • Give a benefit for first attendants
First trial • Korean DIGIT speech corpus • Difficult but essential item for Korean speech recognition • Korean digit = monosyllable • First stage : 500 speaker • Contents • isolated words, connected 4 digit strings • various length digit string • telephone number, ID number, date and time, credit card number, etc. • Various kind of collecting environment • Will be released early next year • Next candidate • names, geographical name, etc.
Korean COCOSDA • Consult about • Establishment of speech and language resource distribution center(SLRAC etc.) • Planning for speech related national project • Sponsor • The technical meeting for speech I/O assessment and speech corpora under the auspices of academic society • Korean Science Foundation(KOSEF) start to support with Special Interest Group promotion program(2000 ~ 2004) • “SIG for speech I/O assessment and speech corpora” • This will help more active activities
Oriental COCOSDA 2001 • Host • The 2001 Oriental COCOSDA workshop in Korea (24 Aug. 2001)