1 / 39

Speech Seminar & LTI Speaking Requirement Talk

Speech Seminar & LTI Speaking Requirement Talk. Identification of nativity of a Speaker Rohit Kumar From work done in 11-752 Project: Phonetics and Prosody. Overview. Motivation Hypotheses Perception Experiment Design Data collection Perceptual Tests Results Discussion

wenda
Download Presentation

Speech Seminar & LTI Speaking Requirement Talk

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech Seminar&LTI Speaking Requirement Talk Identification of nativity of a Speaker Rohit Kumar From work done in 11-752 Project: Phonetics and Prosody

  2. Overview • Motivation • Hypotheses • Perception Experiment • Design • Data collection • Perceptual Tests • Results • Discussion • Acoustic Observations • Study of Phoneme coverage • Conclusions

  3. Motivation • People try to make judgment about the nativity of the speaker by listening to their speech The judgments are more accurate sometimes and sometimes not accurate at all • Does this depend on the speaker ? • Does this depend on the listener ? • Does this depend on what is being spoken ? • Other factors (environment, channel, …) • A combination of all of the above !

  4. Hypotheses • Speakers of different nativities have different peculiarities • Certain peculiarities are more perceptible and are associated with the certain speakers more often than others • Certain listeners are better than others at distinguishing the nativity of speakers • May be due to richer phonology of native language of the listeners

  5. Design • 4 different nativities • American, Arabic, Chinese, Indian • Controlling for content being spoken • Recording proper names of each nativity • Speakers may have large variation in pronunciation of non-native names due to influence of their native language

  6. Data Collection : Recording (1) • Sentences recorded • 23 in total • 2 starters, 1 ending, 20 data • 20 = 5 names of each of the 4 nativities • Recording order is mixed up • [Filler words] (NAME) [filller words] • e.g.: We got to know that Deborah Mitchell is busy • Names chosen from web based on common names lists • Validated by the native speakers for correctness/acceptability

  7. Data Collection : Sentences (1) • Rohit Kumar is working • Brian Langner was there • I asked Joseph Campbell to go home • I got to know that Amina Ahmed was not doing well • Today Zhengqing Fei will stay • And Ranjan Prakash left • Lets ask Mohammad Rasshid to do it • Request Wen Huang to be present • I think Gayatri Sharma did not like the meal • We got to know that Deborah Mitchell is busy • It would be wise to involve Ho Yeok Mao in this matter

  8. Data Collection : Sentences (2) • Everyone liked Gurnaam Singh for his talent • They asked Margaret Anderson to call home • Finally Ousama Adib got back to work • I met Lakshmi Venkatesan at her office • Tomorrow Anthony Roberts will be on duty • I am glad to see Jamila Mansour here • At last Chih-Hao Tsai found some time • Work is keeping Christopher Turner on his toes • He said Tarek Rasshid was away • It will be presented by Yat-Sen Sun tomorrow • Its time for Bharat Patel to step down • Alan Black said ok

  9. Data Collection : Recording (2) • 11 speakers (4 Females, 7 Males) • 3 American (1 Female, 2 Males) • 2 Arabic (2 Males) • 3 Chinese (1 Female, 2 Males) • 3 Indian (2 Female, 1 Male) • All speakers are graduate students (22 – 28 years old) • All recordings done using the same equipment (@ 16khz). All but 2 recordings done in same room. The other 2 rooms were very similar too.

  10. Data Collection : Recording (3) • 253 utterances recorded using “EMU Recording Tool”. • 220 used. • 220 utterances are manual segmented to extract only the proper names (filler words replaced by silence). Using “Wavesurfer”.

  11. Perceptual Tests (1) • Each listener is asked to listen to a set of 64 utterances (out of the 220 utterances from data collection) • 64 (8 x 8) utterances comprise of • 8 speakers • 2 speaker of each nativity (American, Arabic, Chinese, Indian) • 8 utterances by each speaker • 2 utterances of each nativity • Listener is asked to identify the nativity of speaker of each utterance after listening • Each set is randomized after selecting the 64 utterances to avoid ordering effects • Balancing for gender among speakers and names to possible extent • Utterances contain only the proper names (filler words are replaced by silence)

  12. Perceptual Tests (2) • 12 listeners (4 Females, 8 Males) • 3 American (1 Female, 2 Males) • 3 Arabic (3 Males) • 3 Chinese (1 Female, 2 Males) • 3 Indian (2 Female, 1 Male) • All (but 1) listeners are graduate students (22 – 28 years old) • The Chinese, Indian speakers have not been outside their native country for more than 2 to 3 years • An attempt to control listener’s exposure to speakers of other nativities

  13. Perceptual Tests (3)

  14. Perceptual Tests (3) • INDIAN • AMERICAN • CHINESE • ARABIC

  15. Results (1) • People are pretty bad at identifying the nativity of the speakers • Average Accuracy: 67.44% • Some listeners are better than others • Individual accuracy varies from 41% to 87% • Std. Deviation 16% • Segregated Average Accuracy of Listeners • American: 73.44% • Arabic: 81.25% (Best) • Chinese: 47.92% • Indian: 67.19%

  16. Results (2) • Males are worse then Females at identifying (not much though!) • Average male accuracy: 65.82% • Average female accuracy: 70.70% • Female speakers are identified more accurately than male speakers consistently • 72.57% (Females) vs. 64.48% (Males) • Segregated identification accuracy for • Male: 70.31% (Females) vs. 63.13% (Males) • Female: 77.08% (Females) vs. 66.88% (Males)

  17. Results (3) • Confusability • Arabic speakers are most often wrongly identified (55.73%) i.e. Most Confusable (maybe Least peculiar) • Could be because no female speakers in Arabic data • Indian speakers are most often correctly identified (79.69%) i.e. Least Confusable (maybe Most peculiar)

  18. Results (4) Listener-wise Accuracies

  19. Hypothesis 1 is validated: Indian Speakers are least confusable Arabic Speakers are most confusable Suggests looking for highly consistent peculiar nature in acoustics for Indian speaker Hypothesis 2 is evaluated Arabic Listeners are most accurate Chinese Listeners are least accurate Suggests looking for evidence of Richer phonology in Arabic Discussion

  20. Perception Experiment : Summary • People are pretty bad at identifying the nativity of the speakers • Average Accuracy: 67.44% • Females speaker’s nativity is identified more accurately than Male speakers • Accuracy of identification of Nativity of speakers • Indian: 79.69% • American: 70.31% • Chinese: 64.06% (most often confused with American) • Arabic: 55.73% (most often confused with Indian) • Segregated Average Accuracy of Listeners • Arabic: 81.25% • American: 73.44% • Indian: 67.19% • Chinese: 47.92%

  21. Perception Experiment : Summary • People are pretty bad at identifying the nativity of the speakers • Average Accuracy: 67.44% • Females speaker’s nativity is identified more accurately than Male speakers • Accuracy of identification of Nativity of speakers • Indian: 79.69% • American: 70.31% • Chinese: 64.06% (most often confused with American) • Arabic: 55.73% (most often confused with Indian) • Segregated Average Accuracy of Listeners • Arabic: 81.25% • American: 73.44% • Indian: 67.19% • Chinese: 47.92% • 1. Look out for • Peculiarities in Indian and American speakers which makes them easy to identify

  22. Perception Experiment : Summary • People are pretty bad at identifying the nativity of the speakers • Average Accuracy: 67.44% • Females speaker’s nativity is identified more accurately than Male speakers • Accuracy of identification of Nativity of speakers • Indian: 79.69% • American: 70.31% • Chinese: 64.06% (most often confused with American) • Arabic: 55.73% (most often confused with Indian) • Segregated Average Accuracy of Listeners • Arabic: 81.25% • American: 73.44% • Indian: 67.19% • Chinese: 47.92% • 1. Look out for • Peculiarities in Indian and American speakers which makes them easy to identify

  23. Perception Experiment : Summary • People are pretty bad at identifying the nativity of the speakers • Average Accuracy: 67.44% • Females speaker’s nativity is identified more accurately than Male speakers • Accuracy of identification of Nativity of speakers • Indian: 79.69% • American: 70.31% • Chinese: 64.06% (most often confused with American) • Arabic: 55.73% (most often confused with Indian) • Segregated Average Accuracy of Listeners • Arabic: 81.25% • American: 73.44% • Indian: 67.19% • Chinese: 47.92% • 1. Look out for • Peculiarities in Indian and American speakers which makes them easy to identify • 2. Study • Phoneme coverage of the language to see if one is richer than other

  24. Total utterances from data collection 220 Created TWO sets of data Most Often Correctly Identified Identified correctly 50% or more times (134 utterances) 15 utterances of each language selected from these in the order of % of correct identification (60 utterances) Start looking at utterances which contain the same text first Most Often Wrongly Identified 47 Utterances Identified incorrectly 50% or more times they appeared in listening tests Selection of Data

  25. Peculiarities of Indian Speakers • Un-aspirated Stops /t/ and /k/ are common • /r/ are clearly visible and pronounced • Problem with consonant clusters due to syllabic nature of native language (insertion of vowels) • Tendency to speak as written (common for /ah/) • Evident syllable boundaries

  26. Peculiarities of Indian Speakers • Un-aspirated Stops /t/ and /k/ are common

  27. Peculiarities of Indian Speakers • /r/ are clearly visible and pronounced

  28. Peculiarities of Indian Speakers • Problem with consonant clusters due to syllabic nature of native language (insertion of vowels) Margaret Anderson

  29. Peculiarities of Indian Speakers • Tendency to speak as written (common for /ah/ ) Anthony Roberts

  30. Peculiarities of Indian Speakers • Evident syllable boundaries

  31. Peculiarities of American Speakers • Aspirated Stops /th/ and /kh/ are common • /r/ often disappears • Able to pronounce some consonant clusters easily • O in orthography pronounced as /ah/

  32. Peculiarities of American Speakers • Aspirated Stops /th/ and /kh/ are common • /r/ often disappears ?

  33. Peculiarities of American Speakers • Aspirated Stops /th/ and /kh/ are common • /r/ often disappears ?

  34. Peculiarities of American Speakers • Able to pronounce some consonant clusters easily

  35. Peculiarities of American Speakers • O in orthography pronounced as /ah/ Anthony Roberts Technological, Terror

  36. American English 24 Consonants, 7 vowels, whole bunch of “free” vowels (SAMPA) Commonly reported around 1200-1300 diphones Arabic 32 / 28 / 25 Consonants 6 vowels, 2 diphthongs Mandarin 25 consonants, 11 vowels Around 400 valid diphones (without considering tone variations) Hindi 35 Consonants, 11 vowels Around 1400 – 1500 diphones Study of Phone coverage

  37. Several peculiarities of Indian and American speakers are observed in the acoustics Due to prominence of these peculiarities, it can be concluded that listener are able to identify these speakers more easily Literature survey on phonetic richness of different languages does not provide supporting evidence to the observation that certain listeners do better than others Another reason: Arabic listener are older and relatively more experienced than other (most) listeners. Conclusions

More Related