420 likes | 603 Views
CS 551/651: Structure of Spoken Language Lecture 6: Phonological Processes John-Paul Hosom Fall 2008. Phonological Processes Phonemes undergo systematic variation depending on their context
E N D
CS 551/651: Structure of Spoken Language Lecture 6: Phonological Processes John-Paul Hosom Fall 2008
Phonological Processes • Phonemes undergo systematic variation depending on theircontext • For example, forming the past tense:cause /k aa z/ caused /k aa z d/talk /t aa k/ talked /t aa k t//d/ vs. /t/ is predictable based on voicing of word-final phoneme • Allophones can be viewed as systematic variations of phonemesthat are a result of cultural and physiological processes, butdo not distinguish meaning of utterance • For example, /p/ and /ph/ in English is predictable: word or syllable initial voiceless stops are aspiratedpit [ph ih t[h]] tip [th ih p[h]] kin [kh ih n]spit [s p ih t[h]] stick [s t ih k[h]] skin [s k ih n]
Phonological Processes /ph ih th th ih ph kh ih n/ /s p ih th s t ih kh s k ih n/
Phonological Processes • Other types of phonetic processes:Assimilation, Deletion, Reduction, Insertion, Substitution,Me'tathesis (switching order of two phonemes) • Assimilation“A feature of one segment is shared by a neighboring segment” • Examples of Assimilation Nasalization of vowels before nasal consonants in- (negative prefix) becomes im- in words beginning with bilabial consonant (imbalance, imperfect, indifferent, intolerance)
Phonological Processes • Assimilation may be due to coarticulation, or it may belanguage-specific, “arbitrary”: “word-final alveolar obstruent may take on place of articulation of following word-initial segment if word-initial segment is palato-alveoar”this /dh ih s/ shop /sh aa ph/ this shop /dh ih sh sh aa ph/this /dh ih s/ fish /f ih sh/ this fish /dh ih s f ih sh/this /dh ih s/ thing /th ih ng/ this thing /dh ih s th ih ng/ • also, depending on dialect, not within-word: • misshapen /m ih s sh ei p en/
Phonological Processes • Example of assimilation of /s/ with /sh/ but not /f/: /dh ih sh sh aa pcl ph dh ih s f ih sh/
Phonological Processes • Substitution: common in foreign accents or speaking impairments:welcome /v eh l k ah m/McDonald /m a k uw d ow n aa r uw d ow/Roger /w aa jh er/ • Metathesis: changing order of two phonemes within a word (dialect variation)pretty /p er dx iy/ ask /ae k s/
Phonological Processes • Deletion:Barbara /b aa r b ax r ah/ /b aa r b r ah/Memory /m eh m ax r iy/ /m eh m r iy/ • Reduction: unstressed vowels become /ax/conduct(verb) /k ax n d ah k t/conduct(noun) /k aa n d ax k t/ • Insertion: voiceless stop inserted between nasal and voiceless consonant; voiceless stop always has same place of articulation as nasal fancy /f ae n t s iy/Chomsky /ch aa m p s k iy/ schwa inserted after word-final nasalnine /n ay n ax/ dictionary pronunciation=
Phonological Processes • Deletion: /m eh m r iy/
Phonological Processes • Insertion: /f ae n t s iy ch aa m p s k iy/
Phonological Processes: Ladefoged Rules • [–voiced, +stop] [+aspirated] when syllable initialpit vs. spit • [ax] [–voiced] after syllable-initial [–voiced, +stop] and before [–voiced, +stop]potato • [+consonantal] longer at end of phrasebib, did, don, nod • [–voiced, +stop] [–aspirated] after syllable-initial /s/spew, stew, skew • [+vowel] shorter before unvoiced phonemes in same syllablecap vs. cab, back vs. bag
Phonological Processes: Ladefoged Rules • Devoicing, End-of-Phrase Length: /ph ax tcl th ey dx ow/ /d aa n n aa dcl d/
Phonological Processes: Ladefoged Rules • Length before Voiceless: /khae pc ph kh ae bc b b ae kc kh b ae gc g/
Phonological Processes: Ladefoged Rules • [–voiced] longer when at end of syllablesass, shook vs. push • [+stop] unreleased before [+stop]apt, act (often see some mark in spectrogram) • [–voiced, +alveolar, +stop] [+glottal stop] when before an alveolar nasal in same wordbeaten /b iy q en/ • [+nasal] [+syllabic] at word end when following [+obstruent]chasm /k ae z em/ NOT film (obstruent = complete closure of airway; /l/ is not) • [+liquid] [+syllabic] at word end and following [+consonant]paddle, whistle, kennel NOT snarl unless classify /r/ as [+vowel, –syllabic]
Phonological Processes: Ladefoged Rules /ae pcl tcl th ae kcl tcl th/ /bcl b iy q tcl en ax_h/
Phonological Processes: Ladefoged Rules • [+alveolar, +stop] [+voiced, +flap] when between two vowels, second of which is unstressed This rule has speaker-dependent variations • [+alveolar, +stop] omitted between two consonantsmost people, sandpaper, grand master • [+consonant] shortened before identical [+consonant] • [–voice, +stop] between [+nasal] and [–voice, +fricative] when following vowel absent or unstressedprince vs. prints (e'penthesis) • [&] following word-final [+nasal, +consonantal]nine come sang (e'penthesis)
Phonological Processes: Ladefoged Rules • “most people and grand masters use sandpaper” /m ow s pc ph iy pc ph el n gc g r ae n m ae s tc th er z yu z s ae n pc ph ey pc ph er/
Phonological Processes: Ladefoged Rules • “nine come sang” /n ay n ax kcl kh ah m ax s ae ng ax/
Phonological Processes: Ladefoged Rules • [+vowel] longer in open syllablessea vs. seed vs. seatsigh vs. side vs. sight(equalize length of syllables with differing numbers of segments) • [+vowel] longer in stressed syllablebelow vs. billow(stressed syllables are longer in duration than unstressed) • [+vowel] [+nasal] before [+nasal] consonant • [+vowel, –stressed] schwa (vowel reduction)able vs. abilityCanada vs. Canadianphotograph vs. photography
Phonological Processes: Ladefoged Rules • “sigh side sight” /s ay s ay dcl d s a tcl th/
Phonological Processes: Ladefoged Rules • “below billow” /b ax l ow b ih l ow/
Phonological Processes • Why is this useful? (a) Providing models of known phenomenon is better than having classifier learn the phenomenon from data • (b) Provides humans with appropriate cues for understanding, naturalness • (c) Accurate phonetic modeling improves ability of classifier to discriminate between classes • Example for Text-to-Speech (case (b)): Create a TTS system Don’t shorten vowels before voiceless plosives Creates, by default, acoustic cue for voiced plosives Decrease intelligibility or at least naturalness of system
Phonological Processes • Example for Automatic Speech Recognition (case (c)): Train a speech recognizer using “dictionary” pronunciation Then, in all cases where [–voice, +stop] between [+nasal] and [–voice, +fricative] such as “fancy” (in CMU dictionary as /f ae n s iy/), acoustics show alveolar stop, but trained as either nasal /n/ or fricative /s/. Decreases ability of model to discriminate classes Decreases performance of system • Difficulty is in providing comprehensive, accurate rulesthat are not inappropriately “forced” on a system
Stops/Plosives • There are six plosives (oral stops) in American English: • . bilabial alveolar velar • unvoiced | /p/ /t/ /k/ • voiced | /b/ /d/ /g/ • plus the flap /dx/ which is a very short /t/ or /d/ • Plosives can be difficult to identify and discriminate; contextualcues can be varied • Cue (1) is the formant transitions of neighboring vowels: • for bilabials, F2 drops at CV boundary • for alveolars, F2 goes toward 1800 Hz at CV boundary • for velars, F2 may meet F3 (velar pinch) or be fairly flat • Cue (2) is that voiced plosives may have pre-voicing; more likely when plosive is between two vowels
Stops/Plosives • Cue (3) is that voiced plosives usually have VOT of < 30 msec, • but unvoiced plosives usually have VOT of > 50 msec • Cue (4) is that the VOT is shortest for bilabials, longer for alveolars, and longest for velars. (VOT /p/ < /t/ < /k/ and /b/ < /d/ < /g/) • Cue (5) is that aspirated (unvoiced) plosives show evidence of F2 and F3 during aspiration; voiced plosives usually don’t • Cue (6) is the spectral shape; in theory, the shape of the spectrum at burst release can be used to distinguish plosives: • /p/ and /b/ have energy low in frequency or weakly spread throughout spectrum, • /t/ and /d/ have more energy above 4KHz (related to alveolar fricatives /s/ and /z/), • /k/ and /g/ tend to have more well-defined peaks in the spectrum (near formant locations).
Stops/Plosives • Other cues related to spectral shape: • Cue (7a): In the context of front vowels, /k/ and /g/ have • spectral peak just above F2 of adjacent vowel, making them • confusable with /t/ and /d/; but front vowels show more • “velar pinch” • Cue (7b): In the context of back vowels, /k/ and /g/ have one • spectral peak between 1000 and 1500 Hz, a second peak • between 3000 and 4500 Hz. • Cue (8): Velar bursts also sometimes display “double burst”, • or a second burst during the frication • Cue (9): Post-vocalic consonants are often unreleased; they can be identified by (a) glottalization, (b) sudden drop in vowel energy, or (c) formant movement at end of vowel
Stops/Plosives • Cue (10): When the plosive is unreleased, the voicing distinction is based more on length of preceding vowel; voiced plosives are associated with longer vowels, unvoiced plosives with shorter vowels • Cue (11): In V1C1C2V2 patterns, where both C are plosives, the existence of two plosives is in the different formant transitions in V1 and V2, the longer duration of closure, and sometimes in a brief • “click” in spectrum indicating a change in place of articulation • Cue (12): Plosives have different characteristics in stressed vs. unstressed environments. VOT for unvoiced plosives before unstressed vowels is shorter than VOT for unvoiced plosives before stressed vowels; plosives in an unstressed-vowel environment are less spectrally clear; in unstressed syllables, /t/ and /d/ may be realized as a flap /dx/.
Stops/Plosives • Cue (13): Flaps have short duration (< 30 msec), dip in energy levels between two vowels, weak F2 and F3, and F2 tends toward • 1800 Hz • Cue (14): Consonant clusters can provide restrictions; for 3-consonant clusters (beginning with /s/-plosive), the only valid combinations are: /s p l/, /s p r/, /s p y/ /s t r/, /s t y/ /s k l/, /s k r/, /s k y/, /s k w/ • Cue (15): In /s/−plosive−vowel combinations, VOT tends to be shorter and duration of /s/ shorter than normal
Plosives: Unvoiced Initial in Front-Vowel Context /p iy t iy k iy/
Plosives: Voiced Initial in Front-Vowel Context /b iy d iy g iy/
Plosives: Unvoiced Initial in Mid-Vowel Context /p ah t ah k ah/
Plosives: Voiced Initial in Mid-Vowel Context /b ah d ah g ah/
Plosives: Unvoiced Initial in Back-Vowel Context /p aa t aa k aa/
Plosives: Voiced Initial in Back-Vowel Context /b aa d aa g aa/
Plosives: Unvoiced Final in Front-Vowel Context /iy p iy t iy k/
Plosives: Voiced Final in Front-Vowel Context /iy b iy d iy g/
Plosives: Unvoiced Final in Mid-Vowel Context /ah p ah t ah k/
Plosives: Voiced Final in Mid-Vowel Context /ah b ah d ah g/
Plosives: Unvoiced Final in Back-Vowel Context /aa p aa t aa k/
Plosives: Voiced Final in Back-Vowel Context /aa b aa d aa g/