1 / 39

Development of Filipino Phonetically Balanced Words Recognizer

Explore the development of phonetically balanced words (PBW) for accurate recognition using Hidden Markov Model. Previous studies, phoneme and speech corpus analysis, and structures of Filipino language are discussed.

piner
Download Presentation

Development of Filipino Phonetically Balanced Words Recognizer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. World Computer Congress 2013-International Conference for Artificial Intelligence (ICAI2013) Arnel C. Fajardo and Yoon-Joong Kim, Ph.D HanbatNational University, Korea

  2. Introduction Objectives: 1. To present the development of Filipino phonetically balanced words (PBW) 2. Test the recognition accuracy of the developed PBW using Hidden Markov Model.

  3. Related Studies In 2003, an ASR for Filipino alphabets was developed. This study reported to have achieved a recognition accuracy of 85.5%. However, this recognizer was used for phoneme utterances using discrete Hidden Markov Model (HMM) rather than continuous word recognition. Navaro, R. D., Recognition of Tagalog Alphabets Using The Hidden Markov Model

  4. Related Studies Dela Roca G., et.al (2003) attempted to recognize continuous speech using a developed Filipino Speech Corpus by Guevara R., et. al (2002) that reported to achieve only a 32% recognition accuracy. In 2010, an Indonesian speech corpus was incorporated for the recognizer as training sets to recognize Filipino utterances. • contains 80 hours of recording v.s 4 hours of Filipino Speech corpus • achieved 79.50% recognition accuracy. Note: None of these previous researches used a phonetically balanced set of words for the development of its speech corpus Sakti S., Isotani, R., Kawai H., and Nakamura, S., The Use of Indonesian Speech Corpora for Developing a Filipino Continuous Speech Recognition System, (2010). Guevara, R., Co, M., Espina, E., Gracia, I., Tan, E., Ensomo, R., and Sagum, R., Development of a Filipino speech corpus, (2002).

  5. The Specifics of Filipino Language Filipino is the language used largely in the Philippines with 22 million native speakers. • Consists of 20 Alphabets: • 5 (a , e, I, o, u) vowels and 15 (b, k, d, g, h, l, m, n, ng, p, r, s, t, w, y) consonants Some words are spelled the same but with slight differences in pronunciation, which produces differences in meaning. • bata /b:a - ta/ - “a child” • bata /ba - ta/ - “to bear or endure”

  6. The Filipino Phoneme System FILIPINO VOWEL SYSTEM

  7. The Filipino Phoneme System FILIPINO CONSONANT SYSTEM

  8. The Filipino Phoneme System -The diphthongs /iw/ /ay/ /aw/ /oy/ /ey/ /uy/ were also included as part of the vowel phoneme list. -Phonemessuch as /p:/ /b:/ /m:/ /t:/ /d:/ /n:/ /s:/ /l:/ /k:/ /g:/ were included

  9. Development of the Filipino Phonetically Balanced Word List WORD EXTRACTION Source 16 Articles from the Tagalog Textbook: Bagwis

  10. Development of the Filipino Phonetically Balanced Word List WORD EXTRACTION 9768 Words Transcribed words Source 16 Articles from the Tagalog Textbook: Bagwis

  11. Development of the Filipino Phonetically Balanced Word List WORD EXTRACTION 2938 Words Unique Words Source 16 Articles from the Tagalog Textbook: Bagwis

  12. Development of the Filipino Phonetically Balanced Word List WORD EXTRACTION 2938 Words Unique Words Source 16 Articles from the Tagalog Textbook: Bagwis

  13. Development of the Filipino Phonetically Balanced Word List WORD EXTRACTION 780 (2-Syllable Words) 2938 Words 912 (3-Syllable Words) Unique Words Syllable Count Source 16 Articles from the Tagalog Textbook: Bagwis

  14. Development of the Filipino Phonetically Balanced Word List 780 (2-Syllable Words) 912 (3-Syllable Words) Syllable Count

  15. Development of the Filipino Phonetically Balanced Word List WORD OCCURRENCE must be >1 780 (2-Syllable Words) 912 (3-Syllable Words) Syllable Count

  16. Development of the Filipino Phonetically Balanced Word List 780 (2-Syllable Words) 323 (2-Syllable Words) 912 (3-Syllable Words) 249 (3-Syllable Words) Syllable Count Word Occurrence

  17. Development of the Filipino Phonetically Balanced Word List SYLLABIC STRUCTURE 80% of Frequency 780 (2-Syllable Words) 323 (2-Syllable Words) 912 (3-Syllable Words) 249 (3-Syllable Words) Syllable Count Word Occurrence

  18. Development of the Filipino Phonetically Balanced Word List 130 cv-cvc 61 cv-cv 780 (2-Syllable Words) 323 (2-Syllable Words) 43 v-cvc 27 cvc-cv 100 cv-cv-cvc 38 cv-cv-cv 15 Cv-cvc-cvc 912 (3-Syllable Words) 249 (3-Syllable Words) 14 cvc-cv-cvc 11 V-cv-cvc 11 Cvc-cv-cv Syllable Count Word Occurrence 9 cv-cv-vc 8 V-cv-cv 8 Cv-cvc-cv

  19. Development of the Filipino Phonetically Balanced Word List 780 (2-Syllable Words) 323 (2-Syllable Words) 261 (2-Syllable Words) 912 (3-Syllable Words) 249 (3-Syllable Words) 214 (3-Syllable Words) Syllable Count Word Occurrence Syllabic Structure

  20. Development of the Filipino Phonetically Balanced Word List The total frequency of the phonemes represented (F) is the summation of all frequency of phonemes in a word (pfw) representing the number of a specific phoneme in a word [e.g., pfw (a) = 2 in the word bata] multiplied by the frequency of word occurrence implies the number of times the specific word occurred from the whole text corpus, divided by the total of phoneme frequencies in the word list. A frequency of each phoneme is calculated with the formula: F frequency of phonemes represented in the word list pfw frequency of phoneme in a word wf frequency of word occurrence n total number of phonemes in the word list

  21. Development of the Filipino Phonetically Balanced Word List This value is compared to the acceptance value (threshold value) with the formula: Where: aV acceptance value/threshold value x average of the vowels/consonants in a phonetic structure m total number of words

  22. Development of the Filipino Phonetically Balanced Word List 261 (2-Syllable Words) FREQUENCY OF PHONEMES F = RESULTS 214 (3-Syllable Words) Syllabic Structure Acceptance value: aV = 1 / (x*m) 2 SYLLABLE (VOWEL = 0.0023, CONSONANT = 0.0018) 3 SYLLABLE (VOWEL = 0.0015, CONSONANT = 0.0012) F = frequency of phonemes Pfw = frequency of phonemes in a word Wf = frequency of word occurrence N = total number of phonemes

  23. Results Phonetically Balanced Word List Extracted Word-list: 257 2-syllable words, representing 32 phonemes 212 3-syllable words, representing 31 phonemes

  24. Results – PBW LIST 2 Syllable PBW List aba’yagad akin akingako’yalamalonaminaminganakanongapatarawatinatingayawayonbagaybagobagong bahaybakabakit banal bansabatabata’ybatasbataybatobawalbawatbayanbelenbesesbigatbigaybiglabihisbilang bisabisigbombabuhaybukasbukidbukodbulsabunsobuwandagatdahildahondakongdaladamitdapatdati dating datu ditodiwadiyos dugong edadgabigabinggalingganapgawagawingayagayongayonggitnagubatguro gusto gutom habang halos haponharaphari haring hatidhawakhigithindihiramhulihulinghuwagiba’tibangibigibonikawilan ilangilogina’yinanginayisangisipito’yitongiya’yiyaniyo’yiyoniyongkahitkalyekamay kami kaming kanyakapag kaya kaya’tkayang kayo kaysakitakulangkulaykundikuninkuyalabanlabaslabilabis

  25. Results – PBW LIST 2 Syllable PBW List lagaylagilaging lahatlahilakaslalolalonglamanlamanglamiglangitlasalibinglihimlikod lima limanglinggo lobo luboslugarlupa lupitmagingmahalmalaymatamgamismomulamulimulingmundonagingnamannamangnaminnasanatinngalanngayo’y ngayonngayongngunitnilanilangninaninyonitonito’ynitongniyaniyangniyonoraspagodpapelparaparangparipasko patayperaperangperopisngipisopugadpulispunopuntopusoputoritosabisagingsakasanasana’ysanang sanaysanhisantasilasila’ysinasinosiyasiya’ysiyangsumansunodtabitakbo tanging tanongtapattatlotatlo tawatawagtayngatayotayongtindatingintinigtiyaktugontuladtulaktunaytungotuwingulitunangupangutak walawalangwariwaringwikawikang

  26. Results – PBW LIST 3 Syllable PBW List abalaagilaakalaalila animas animo’yanumangasawabahagibakuran banayadbayanibihirangbinatabinigyanbituinbulaklakbulsikotbumilibuwaya dahilangdakiladakilangdalangindalawadalawangdaliridalisaydaluyongdamdamin damdamingdaratingdayuhang debate dinatnandumatinggalaponggamitinganitoganitong gawaingawainggayundingayunmangilinganginawa global gumawahalamanhalamang humingiibayoibigayihulogilaliminagawinutangkabilakabilangkahapon kakaninkalamnankanilakanilangkanlurankapatidkapilingkapilitkatuladkatulong kawawakawawangkukuninkulturallalakilarawanlarawanglibanginlipunanlipunang liwanaglumabaslumikhalumipadlumitawlupainlupaingmabilangmabilismabuhay mabutimabutingmadilimmagandamagbigaymagigingmagulangmahabangmahina

  27. Results – PBW LIST 3 Syllable PBW List mahirap makitamakitidmalagkitmalakasmalakimalakingmalamanmalapitmalayomaliban maliitmalinawmalinismalungkotmaluwagmarahanmaramimaramingmarinigmarunong masarapmatandamatandangmataposmateryalmatiyakmatulogmatutomayamangminana nabuhaynagawanagbaliknagulatnakitanamataynapagodnapansinnapulotnarinig naritonariyannaroonnasaannasabi natural nawalanilikhanobelapagdalaw pagigingpagitanpalakaspaligidpamilyapanahonpanahongpananawpangalanpanlaban patuloypuhunanpumasokpuwestosabadosabungansalapisapagka’tsapagkat sarilisarili’ysarilingsasagisimbahansinabisinisisubalitsumagotsumapit sumunodtagalogtagintingtagumpaytahanantalagangtanggihantawagintinapostinawag trabahotumakbotumigiltumulongturistaulilangumagaumuwiunidos utusanyumaong

  28. Test of PBW 50 RESPONDENTS 25 Female 25 Male

  29. Test of PBW 50 RESPONDENTS } 25 Female Tagalog Native Speakers Able to read and speak Has no speaking ailment At proper disposition 25 Male

  30. Test of PBW 50 RESPONDENTS Dependent Speakers Independent Speakers } 25 Female 20 Dependent Female Speakers 5 Independent Female Speakers 20 Dependent Male Speakers 5 Independent Male Speakers 25 Male

  31. Test of PBW RECORDING SPECIFICATION Sampling Rate: 16kHz Mono Distance: 5-10 cm away from mouth Environment: Isolated Room Unidirectional Microphone

  32. Test of PBW SPEECH RECOGNITION Recorded Speech .wav file

  33. Test of PBW SPEECH RECOGNITION Recorded Speech .wav file Hidden Markov Model Toolkit (HMM)

  34. Test of PBW SPEECH RECOGNITION MFCC Recorded Speech .wav file Hidden Markov Model Toolkit (HMM)

  35. Test of PBW SPEECH RECOGNITION 20 Dependent Speakers (Training Data) MFCC 5 Independent Speakers (Testing Data) Recorded Speech .wav file Hidden Markov Model Toolkit (HMM)

  36. Test of PBW SPEECH RECOGNITION 40 Dependent Speakers (Training Data) MFCC 10 Independent Speakers (Testing Data) Recorded Speech .wav file Results Hidden Markov Model Toolkit (HMM)

  37. Results Test of Speech recognition The results from the independent speakers were less than the dependent speakers since these were not common to the training data.

  38. Researchers Arnel C. Fajardo Hanbat National University, South Korea • E-mail • acfajardo2000@yahoo.com Yoon-Joong Kim, Ph.D. Hanbat National University, South Korea • E-mail • yjkim@hanbat.ac.kr

  39. World Computer Congress 2013-International Conference for Artificial Intelligence (ICAI2013 Thank you for Listening! ARNEL C. FAJARDO, YOON JOONG KIM, PH.D

More Related