170 likes | 271 Views
Lecture 4. CS4705 Sound Systems and Text-to-Speech. Sound Systems of Language. Phonetics The sounds ( phones ) of the world’s languages, the phonemes they map to, and how they are produced Phonology Rules that govern how phones are realized differently in different contexts
E N D
Lecture 4 CS4705 Sound Systems and Text-to-Speech CS 4705
Sound Systems of Language • Phonetics • The sounds (phones) of the world’s languages, the phonemes they map to, and how they are produced • Phonology • Rules that govern how phones are realized differently in different contexts • Technologies: • Automatic Speech Recognition (ASR) systems take sounds as input and output word hypotheses • Text-to-Speech (TTS) systems take text as input and produce speech
Letters and Sounds • same spelling = different sounds o comb, tomb, bomb oo blood, food, good c court, center, cheese s reason, surreal, shy • same sound = different spellings [i] sea, see, scene, receive, thief [s] cereal, same, miss [u] true, few, choose, lieu, do [ay] prime, buy, rhyme, lie • combination of letters = single sound ch child, beach th that, bathe oo good, foot gh laugh • single letter = combination of sounds x exit, Texas u use, music • ‘silent’ letters k knife, know p psycho, pterodactyl e moose, bone gh through
Alveolar ridge teeth velum uvula lips pharyngeal vocal folds:glottis Articulators palate larynx trachea
Articulators in action (Sample from the Queen’s University / ATR Labs X-ray Film Database) “Why did Ken set the soggy net on top of his deck?”
Vocal fold vibration [UCLA Phonetics Lab demo]
alveolar post-alveolar/palatal dental velar uvular labial pharyngeal laryngeal/glottal Places of articulation http://www.chass.utoronto.ca/~danhall/phonetics/sammy.html
PLACE OF ARTICULATION bilabial labio-dental inter-dental alveolar palatal velar glottal stop p b t d k g q fric. f v th dh s z sh zh h affric. ch jh nasal m n ng approx w l/r y flap dx voiceless voiced Articulatory parameters for English consonants (in ARPAbet) MANNER OF ARTICULATION VOICING:
HIGH iy uw ix ux ih uh oy ey ow ax FRONT BACK ao aw eh ah ay ae aa LOW American English vowel space
[p] [ix] [t] [ih] [sh] [ax] [n] [p] [ae] [t] [s] [iy] [n] [s] [ae] [l] [iy] [p] [ix] [t] [ih] Acoustic landmarks “Patricia and Patsy and Sally”
Syllables • Syllabification important for • pronunciation: deny/denim • speaking rate calculation: syllables per second • word recognition in ASR • (onset) + nucleus + (coda): • cat • a • at • to • Lexical stress: primary, secondary, terciary • telephone
Phonological Rules • Not all instances of a given phone [x] sound/look alike • Phoneme /x/ may have many allophones • Phonological rules map phonemes in context to allophones, e.g. • simple rules: /{t,d}/ --> []/ V’ _ V • FSA’s, FST’s • declarative constraints: t: V’ _ V
Allophones of /t/ • What we would consider a single ‘sound’ can be pronounced differently depending on the phonetic context. For example, the phoneme /t/: Figure 4.8: Jurafsky & Martin (2000), page 104.
Application: Word Pronunciation for TTS • Pronouncing dictionaries (the: [‘dhax],[‘dhiy]) • Problems: • Homographs (bass/bass, wind/wind, desert/desert) • Abbreviation (dr., st.) • Numbers (2125551212) • Acronyms (NAACL, IDIAP) • Morphological variation (unrelentingly) • Proper names and unknown words • rules + dictionaries/dictionaries + rules
Hybrid model: • FSTs model individual word pronunciation in lexicon (e.g. reg-noun-stem entry c:k a:ae t:t) • FSAs model morphology (e.g. reg-noun-stem + s) • FSTs for pronunciation rules (e.g. s--> z) • special rules to model name and acronym pronunciation • default letter2sound rules for other words
Inventive (and sometimes useful) Approaches for Pronouncing Unknown Words • Rhyming analogy: varoom/room, todo/dodo • Linguistic origin: Infiniti, vingt, Perez • Abbreviation expansion: • spacious living/dining rm w/frplc/dining room with fireplace • pls?
Summary • Phones realize phonemes in different contexts • Different places and manners of articulation result in acoustic differences that can be detected by ASR systems as well as people • Versatile FSTs can model phonological as well as morphological and spelling systems • Many creative approaches toward pronunciation modeling for TTS • Next time: Read Ch 6 (Guest Speaker: Sameer Maskey)