Speech Dynamics

Speech Dynamics The Main Idea: At an abstract linguisticlevel, phonetic segments ([b], [p], [r], [k], [i], [ɑ], [u], etc.) are discrete, independent, interchangeable snap-together parts – like beads on a string. Artist’s rendition of beads. The word [kQt] is built by stringing together 3 distinct, discrete “beads” – [k], [æ], and [t]. The [Q] bead is snapped out, an [ɪ] bead is snapped in, changing [kæt] into [kɪt]. The [t]bead is snapped out, an [n] bead is snapped in, changing [kɪt] into [kɪn]. The [ɪ] bead is snapped out, an [æ] bead is snapped in, changing [kɪn] into [kæn]. The [k] bead is snapped out, an [m] bead is snapped in, changing [kæn] into [mæn].

The phonetic structure of speech is an example of a discrete combinatorial system: morphemes and words are built by combining separate, independent, snap-together parts called phonemes or phonetic segments. DNA works this way as well: Genes are built by combining a finite (and very small) number of discrete, snap-together parts (adenine, guanine, cytosine, & thymine) in an infinite variety of combinations.

The letters that are used in any alphabetic writing system comprise a discrete combinatorial system – you get infinite variety by combining 26 discrete elements – a, b, c, d … z. Are there aspects of language other than morphemes & words built by combining phonemes that comprise a discrete combinatorial system? (Answer: Yes. Language is a discrete combinatorial system everywhere you look. Words are constructed by using morphological rules to combine discrete parts called morphemes. Phrases are constructed by using syntactic and semantic rules to combine words – nouns, verbs, articles, prepositions, etc. Sentences are constructed by using syntactic rules to combine phrases, the simplest being S ® NP + VP; i.e., a sentence is composed of a noun phrase and a verb phrase. So, language is a discrete combinatorial system at all levels.)

The snap-together parts in a discrete combinatorial system are: (1)discrete,(2)independent. • (1) discrete = digital rather than analog (continuous); e.g., in genetics, the bases must be one of four discrete choices: A, G, C, or T; no such thing as guanine and 3/4th; adenine and a 1/2. In phonetics, at an abstract level, snap-together phonemes are one of ~40 discrete choices; e.g., /b/ or /p/, not /b/ and 2/3rd. • (2) independent = the snap-together parts do not affect one another; e.g., in genetics, guanine is guanine whether it is attached to adenine, thymine, cytosine, or another guanine. In phonemics (not phonetics), /p/ is /p/, whether the next phoneme is /ɑ/ or /i/ or /æ/ or /r/ or /l/ or whatever.

OK, now the big deal about accommodation, coarticulation, and assimilation. Here’s the idea we started with: At an abstract linguisticlevel, phonetic segments are discrete, independent, interchangeable snap-together parts. The key phrase here is at an abstract linguistic level.At an abstract linguistic level phonemes are discrete, independent, and interchangeable snap-together parts.

However, at the actual level of speech production – real movements of real articulators generating real speech sounds – phonetic segments are: • not discrete • not independent • not interchangeable

[isi] [usu] [isi] [usu] spectra measured during the [s] In a world of discrete, independent, interchangeable parts, the two instances of [s] would be identical. Are they? The phonetic environment in which a sound occurs can have a strong influence on the way a sound is produced. This is an example of coarticulation – the lip rounding from the [u] carries over to the [s]; i.e., the rounded lips for [u] just stay rounded for [s].

[isi] [usu] [s] from [usu] -> [isi] [s] from [isi] -> [usu] This shows what happens when the [s] of [isi] is cut and pasted to [usu], and vice versa. Do the cross-spliced syllables sound ok? What does this tell us about the interchange-ability of the phonetic elements that are being recombined in this discrete combinatorial system? [Answer: They aren’t]

The phenomenon of coarticulation is nothing more than an articulatory shortcut. Withoutcoarticulation, producing [usu] would involve: • round the lips (and adjust the tongue) for [u] • retract the lips (and adjust the tongue) for [s] • round the lips again (and adjust the tongue) for [u] It’s not that this can’t be done. It can. Easily. The problem is that it slows you down. A lot. Most people speak almost as fast as they can almost all the time. And we talk fast – ~10 speech sounds per second and up. http://www.youtube.com/watch?v=NeK5ZjtpO-M

In terms of the # of muscles, the complexity of the movements, and the precision of the movements, there is nothing we do that approaches speech for speed, complexity, and efficiency. The primary reason for this: coarticulation. Coarticulation is not an arcane, egghead detail. It’s the single most important fact about speech motor control.

If you understand what’s going on in this one example – [isi] vs. [usu] – then you understand most of what you need to know about coarticulation. Basic Motor-Planning Principle:The motor planning system will take any shortcut that the earand the rules of the language will tolerate. That’s it. (Memorize that sentence, and be sure you understand what it means. No kidding.) [isi] vs. [usu]: If the lip-rounding shortcut produced a fricative that was so distorted that it no longer sounded like an [s], the shortcut wouldn’t be used. But the [s] remains quite intelligible, so the shortcut is used.

Another example (we’ve already seen this one): geese vs. gone ([gis] vs. [gɔn]) Is the place of articulation the same? [Hint: No. Place for [gɔn] is velar, place for [gis] is palatal.] What’s going on here; i.e., what’s the shortcut? Why is the place of articulation further forward for [gis] than [gɔn]? [Answer: You get a more forward place of artic for the front vowel than for the back vowel, so the movements take less time.] Would this shortcut work for a language that has both velar and palatal stops? [Answer: Nah] This is, in part, what was meant earlier by “and the rules of the language.”

Another example we’ve seen: bat vs. man ([bæt] vs. [mæn]) What happens to the velum during the [æ] in these two words? Phonetically, this is: [bæt] vs. [mæ̃n] What’s the shortcut? Why is the shortcut tolerated? [A: Listeners are ok with nasalized vowels]

Assimilation There are times when the shortcut is not tolerated by the ear directly, but it’s allowed by still allowed by listeners. These are examples of assimilation. Transcribe these: this steak vs. this shoe What’s the shortcut? Do you get an acceptable [s] in the word “this” in both cases? Produce this sentence as naturally as you can: I get my news by listening to NPR. How about this: I called her from a phone booth. And this, naturally and at a fairly rapid speaking rate: did he vs. did you

Terminology Accommodation:MacKay’s word for an articulatory shortcut.Two Varieties of Accommodation Assimilation The sound that’s modified by the shortcut is not directly tolerated by the ear; e.g., the underlying /s/ of “this” in “this shoe” no longer sounds like an/s/; the modified sound – now an [ʃ] – is tolerated by the language system. Coarticulation The sound that’s modified by the shortcut is directly tolerated by the ear; e.g., the [s] in [usu] still sounds like an [s]; the [g] in [gis] still sounds like a [g], etc.

Last point: The impression you may have is that the motor planning system cares only about speed. This can’t be. Imagine a speaker who says only this: “buh-buh-buh” – veryfast, but you can't distinguish one word from another. Mature speakers do not pronounce cupcake as cupake [kʌpek], though that would be faster; or cukake[kʌkek] (toddlers take that kind of shortcut all the time – but NOT for reasons of speed), though that too would be faster. Why?

The motor planning system is some poorly understood compromise among three factors: • speed • intelligibility • what listeners will accept as natural • The last one is the trickiest. For reasons that are not too clear, listeners are ok with thish shoe [ðɪʃu] and foam booth[fombuθ]) but they’re not ok with cukake[kʌkek] or “cupake” [kʌpek].

Summary Details aside, here’s the key idea: The basic motor-planning principle underlying accommodation (coarticulation & assimilation) is this: The motor planning system will take any shortcut that listeners will tolerate (i.e., the speech needs to be intelligible and it needs to sound natural). Q: Why do we take these shortcuts? A: Simple: They allow us to speak faster.

Speech Dynamics

Speech Dynamics

Presentation Transcript

The Speech Speech

Speech

Speech

SPEECH

Speech

Speech

Speech

REPORTED SPEECH / INDIRECT SPEECH

Speech

On Use of Temporal Dynamics of Speech for Language Identification

Models of speech dynamics for ASR, using intermediate linear representations

Speech

speech in, speech out

Speech

Reported speech / Indirect speech

Speech

Speech

Speech

Speech

Speech:

Speech Analytics Market Dynamics, Forecast & Cost Analysis Till 2025

Speech 204/ Speech 205