1 / 17

Lecture 4

Lecture 4. CS4705 Sound Systems and Text-to-Speech. Sound Systems of Language. Phonetics The sounds ( phones ) of the world’s languages, the phonemes they map to, and how they are produced Phonology Rules that govern how phones are realized differently in different contexts

michel
Download Presentation

Lecture 4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 4 CS4705 Sound Systems and Text-to-Speech CS 4705

  2. Sound Systems of Language • Phonetics • The sounds (phones) of the world’s languages, the phonemes they map to, and how they are produced • Phonology • Rules that govern how phones are realized differently in different contexts • Technologies: • Automatic Speech Recognition (ASR) systems take sounds as input and output word hypotheses • Text-to-Speech (TTS) systems take text as input and produce speech

  3. Letters and Sounds • same spelling = different sounds o comb, tomb, bomb oo blood, food, good c court, center, cheese s reason, surreal, shy • same sound = different spellings [i] sea, see, scene, receive, thief [s] cereal, same, miss [u] true, few, choose, lieu, do [ay] prime, buy, rhyme, lie • combination of letters = single sound ch child, beach th that, bathe oo good, foot gh laugh • single letter = combination of sounds x exit, Texas u use, music • ‘silent’ letters k knife, know p psycho, pterodactyl e moose, bone gh through

  4. Alveolar ridge teeth velum uvula lips pharyngeal vocal folds:glottis Articulators palate larynx trachea

  5. Articulators in action (Sample from the Queen’s University / ATR Labs X-ray Film Database) “Why did Ken set the soggy net on top of his deck?”

  6. Vocal fold vibration [UCLA Phonetics Lab demo]

  7. alveolar post-alveolar/palatal dental velar uvular labial pharyngeal laryngeal/glottal Places of articulation http://www.chass.utoronto.ca/~danhall/phonetics/sammy.html

  8. PLACE OF ARTICULATION bilabial labio-dental inter-dental alveolar palatal velar glottal stop p b t d k g q fric. f v th dh s z sh zh h affric. ch jh nasal m n ng approx w l/r y flap dx voiceless voiced Articulatory parameters for English consonants (in ARPAbet) MANNER OF ARTICULATION VOICING:

  9. HIGH iy uw ix ux ih uh oy ey ow ax FRONT BACK ao aw eh ah ay ae aa LOW American English vowel space

  10. [p] [ix] [t] [ih] [sh] [ax] [n] [p] [ae] [t] [s] [iy] [n] [s] [ae] [l] [iy] [p] [ix] [t] [ih] Acoustic landmarks “Patricia and Patsy and Sally”

  11. Syllables • Syllabification important for • pronunciation: deny/denim • speaking rate calculation: syllables per second • word recognition in ASR • (onset) + nucleus + (coda): • cat • a • at • to • Lexical stress: primary, secondary, terciary • telephone

  12. Phonological Rules • Not all instances of a given phone [x] sound/look alike • Phoneme /x/ may have many allophones • Phonological rules map phonemes in context to allophones, e.g. • simple rules: /{t,d}/ --> []/ V’ _ V • FSA’s, FST’s • declarative constraints: t:   V’ _ V

  13. Allophones of /t/ • What we would consider a single ‘sound’ can be pronounced differently depending on the phonetic context. For example, the phoneme /t/: Figure 4.8: Jurafsky & Martin (2000), page 104.

  14. Application: Word Pronunciation for TTS • Pronouncing dictionaries (the: [‘dhax],[‘dhiy]) • Problems: • Homographs (bass/bass, wind/wind, desert/desert) • Abbreviation (dr., st.) • Numbers (2125551212) • Acronyms (NAACL, IDIAP) • Morphological variation (unrelentingly) • Proper names and unknown words • rules + dictionaries/dictionaries + rules

  15. Hybrid model: • FSTs model individual word pronunciation in lexicon (e.g. reg-noun-stem entry c:k a:ae t:t) • FSAs model morphology (e.g. reg-noun-stem + s) • FSTs for pronunciation rules (e.g. s--> z) • special rules to model name and acronym pronunciation • default letter2sound rules for other words

  16. Inventive (and sometimes useful) Approaches for Pronouncing Unknown Words • Rhyming analogy: varoom/room, todo/dodo • Linguistic origin: Infiniti, vingt, Perez • Abbreviation expansion: • spacious living/dining rm w/frplc/dining room with fireplace • pls?

  17. Summary • Phones realize phonemes in different contexts • Different places and manners of articulation result in acoustic differences that can be detected by ASR systems as well as people • Versatile FSTs can model phonological as well as morphological and spelling systems • Many creative approaches toward pronunciation modeling for TTS • Next time: Read Ch 6 (Guest Speaker: Sameer Maskey)

More Related