Sounds and Prosodies in Communicative Phonetic Science

Sounds and Prosodies in Communicative Phonetic Science Klaus J. Kohler University of Kiel, Germany Paper presented at Symposium in Honour of Hans Basbøll Odense, 20 August, 2013

1 From Sound to Phoneme • For thousands of years, homo sapiens loquens has invented ways of capturing the fleeting sound of spoken words in timeless symbols on durable material. • The aim of all the systematic writing systems that have resulted is to represent lexical items in graphic form • either ideographically, or with reference to sound units in syllabic or alphabetic scripts • An alphabetic writing system has been invented only once, in the Semitic language family. • All other alphabetic systems are derivatives from it. • Why should that be so?

3-consonant roots for semantic fields of the lexicon k'atab he wrote y'iktib he writes, will write k'aatib clerk k'atabaclerks kit'aab book k'utub books makt'uub written m'aktab office, desk makt'abalibrary

This was the birth of the “phonemic” principle in tight association of lexical meaning and form. • No other language had this, so no other language developed an indigenous alphabetic script. • When the phoneticians of the newly-founded IPA at the end of the 19th c. devised a phonetic alphabet to indicate pronunciation in languages like English or French, whose Latin orthographies had become deficient in the representation of sounds, they reinvented the phonemic principle • broad and narrow transcription

The linguists of the Prague Circle turned this into a phonological theory with the distinctive phoneme for the differentiation of the intellectual meaning of words, and allophonic variation in context. • They kept the function-form link • but dissociated it from graphic representation • and turned it into a principle of sound structures • every language having its own phonemic system • The American Structuralists, in their behaviouristic philosophy,went one step further and removed the link to meaning, being unable to formalize it.

Grouping of sounds into phonemes now governed by • complementary distribution • phonetic similarity • But Pike still recognised the original “phonemic principle” because he gave his book Phonemics the subtitle “A technique for reducing languages to writing”. • After that ,“phonology” became a separate discipline and had a metalinguistic purpose in itself practised by desk phonologists.

Generative Phonology, Optimality Theory, Markedness, Feature Hierarchy • Phonological categories were moved again from behaviouristic groupings to entities in the ideal speaker/listener’s mind. • At this point, psycholinguists got hold of them and started taking them into the lab for experiments on “the phoneme as a perceptual” unit. • This has been the MPI Nijmegen paradigm for the past 20 years, e.g. in phoneme spotting. • But is this extrapolation justified?

2 From Phoneme to Fine Phonetic Detail • Pronunciation“white please” vs. “black please” ordering coffee • [wA>«? pli:z] by a Londoner • mistaken for [bl³A>k pli:z]by a Scottish listener • expecting [ãÃi? pli:z]. • In this situational context, the listener‘s task was to understand one of two possible meanings • wrong understanding triggered by “graveness” instead of“acuteness” of the sound • not by wrong phoneme perception.

Listeners process speech signals with perceptual categories shaped by attention and memory, not by abstraction from sound to phoneme • they aim at understanding messages in all their facets of meaning, even from incomplete “segmental” signal information • stable multidimensional fine phonetic detail plays an important role • based on episodic memory, exemplar recognition and contextual information • This is mandatory in the processing of reduced speech, especially of function word form variability.

Here is an example from the Kiel Corpus of Spontaneous Speech: OLV g122a009 • I shall first play a stretch of speech that even native speakers of German will not be able to understand, which phoneticians find very difficult to represent as a string of segments, and German phoneticians as a sequence of phonemes. • Then I shall add the next stretch which will most likely trigger understanding of both stretches. • A third stretch will complete understanding.

n oâù â nVŒ)(M a k H U N0

0 m0 m I t H v8 x f Ò8 ai I s nun wollen wir mal kucken, ob Mittwoch frei ist /nuùn vl(«)n vIŒ maùl kUk(«)n ?p mItvx frai Ist/

n uù n v l « nv IŒ m aù l k H U k N

[kHUN0] is identified as the verb <kucken>. • The sound stretch that immediately precedes must be the modal particle <mal>, which commonly occurs in verbal context as [ma]. • But then an inflected auxiliary verb must precede. • The dark vocalic stretch ending in a labiodentalized nasal, which is in turn followed by [Œ], can be associated with <wollen wir>, because it commonly reduces in the direction of [VnVŒ]. <werden, sollen, müssen> do not fit.

The initial stretch of [n] + dark vowel with strong nasalization across the long vocalic section can be associated with <nun> . • The result is an understanding of what in English is <“Now let’s see if Wednesday is free.”>. • This theoretical account of how the highly reduced utterance may be recognised puts sound perception into an integrated framework of cognitive processing for the understanding of meaning. • Phonemes and canonical forms play no role in it.

Phonetic traces, which need not be segmental but may be spread over indefinite stretches (articulatory prosodies), trigger the recognition process. • Such articulatory prosodies are • nasalization • glottalization • labialization, labiodentalization • palatalization, velarization, pharyngealization

These phonetic traces work in conjunction with morphological, syntactic and situational constraints • memory of multiple phonetic forms of lexical items is essential • complete phonetic identification of acoustic sequences is not required

The spontaneous speech sample provides further interesting data • signalling boundaries • phrase boundaries: < mal kucken, ob> • word boundaries: <freiist> • in both cases canonical phonology has [?] • signalling articulatory breaks for stops in nasal environments: [kHUkN],[pmItvx]

k H U N0 0 m0 m I

The junctions between the words <kuck(e)n>, <ob>, <Mittwoch> have an overlay of continuous glottaliz-ation, with nasalization through the stops [N0 0 m0] • glottalization in the nasal provides a phonatory break to signal stop + nasal • do Danish listeners perceive stød? • and glottalization in the vowel is a phrase boundary break mark between <kucken> and <ob>

f Ò8 a i I s

The word boundary between <frei> and <ist> is neither marked by [?] nor by glottalization • but by a dip in f0 and energy • heightened by vocalic duration • Do Danish listeners perceive stød? • The word boundary may be • strengthened by introducing glottalization • or weakened by removing the f0/energy dip • only leaving vowel length to mark bisyllabicity

kHU g* NbmI t v8 x f Ò8 aI I0 s

kHU g* N bmI t v8 x f Ò8 aI I s

There is another German example of a word boundary that is signalled by prosody rather than [?], which one can hear all the time around Easter • FroheOstern[fÒoùoùstŒn]“Happy Easter” • Does it concide with [fÒoùstŒn]? • Here are naturally produced • FroheOstern[fÒoùoùstŒn] • and the non-word *Frohstern[fÒoùstŒn] • they differ in vowel duration, f0 and energy timing

Frohe Ostern *Frohstern

Frohe Ost- is bisyllabic: the low f0 precurser is perceived as the prehead to the rise-fall. • *Frohst-is monosyllabic: there is a unitary rise-fall. • Shortening Frohe Ost- to the vowel duration of *Frohst- squeezes the pitch pattern into a monosyllabic slot with a late peak pattern. • Lengthening *Frohst- creates an oscillating pattern of bisyllabic and excessively long monosyllabic. • We can now lengthen original *Frohst- (x 1.4)and shorten original Frohe Ost- (x 0.7)to the same value in between the two original vowel durations. • Then f0 and energy timing are manipulated.

orig. Frohe Ostern x0.7 smoothed f0 smoothed energy smoothed f0 + energy

orig. Frohstern x1.4 dipped f0 dipped energy dipped f0 + energy

The variability between presence of glottalization and dips in prosodic parameters for word boundary marking is reminiscent of what is found in the broad scale of stød realization in Danish. l æ s er l æ s er l æ s er (reads) (reads) (reader)

The two stød realizations have in common an abrupt fall of f0 and energy in the vowel • comparable to the dip in the German word boundary marking • and different from the smooth f0/energy timing in the stødless word form. • In both German and Danish we thus find the use of phonetic features to signal a break vs. smooth transition in articulation • but the function is, of course, different • non-tonal phrase prosody to mark boundaries vs. a non-tonal syllable prosody to mark lexical class.

Another language area that can be added to the discussion of this break prosody comprises the Frankish dialects in the border districts between Germany, Holland and Belgium, known as “Rheinische Schärfung”, e.g. “Nase” (nose) vs. “nass” (wet) in Cologne. • Dealing with these data in terms of phonemes, tones and canonical forms misses insights into production and perception across languages.

3 From Auditory Observation to Signal Analysis • The technological advance in speech signal analysis, initially the spectrograph, now computer programs, • inevitably led to taking the phoneme concept into the lab • in order to substantiate phonological entities and structures by objective measurement • thus to supplement auditory impressions by testable physical properties • finally to replace auditory observation altogether. • This development has culminated in Laboratory Phonology.

4 From Sound to Sense • The origin of speech technology after World War II had the communicative component incorporated • communications engineering, technological development to improve communiaction • Speech Communications Conference at MIT1950 • Menzerath and Meyer-Eppler invited • >Institut für Phonetik u. Kommunikationsforschung • Research Laboratory of Electronics, Speech Communication Group, MIT

Speech Communication Seminar, Stockholm 1974 • From Sound to Sense: 50+ years of discoveries in speech communication, MIT 2004 • invited paper by Sarah Hawkins: Puzzles and patterns in 50 years of research on speech perception

“… new theories will aim to include the following attributes. They should be biologically plausible; include roles for attention, memory, and learning; focus on understanding meaning rather than identifying phonological form; allow for multiple potential ‘units of perception’, possibly with no obligatory units; and they should allow meaning and linguistic structure to be understood from incomplete information. …”

5 From Sense to Sound • But we also need to include the complement • Jakobson, Fant, Halle, Preliminaries to speech analysis, 1952 “given the evident fact that we speak to be heard to be understood” • Speakers transmit meaning • by coding it in words and syntactic structures with fine phonetic detail of segments and prosodies • generating acoustic signals for listeners to decode

There are two questions: • How is the phonetic form of words represented mentally to trigger physiological and articulatory processes for acoustic sound production? • What are the rules for producing reduced or elaborated phonetic forms?

Answers to the first question must specify essential phonetic elements that define the whole formal set of a lexical item • this specification must include segmental units as well as articulatory prosodies • both are related to lexical, morphological and speech style categories • which allow for phonetic under-specification

The answer to the second question goes well beyond descriptive accounts of large databases • it needs to include the coupling of reduction/ elaboration with lexical class, morphology, syntax and speaking style, closely linked to the answer of the first question

6 From Sense to Sound to Sense • Finally, we have to combine the Speaker’s Sense-to-Sound with the Listener’s Sound-to-Sense in dialogue interaction. • At this point, the Propositional, Expressive and Appeal functions of speech communication and their prosodic coding come to the fore. • And for this we need a new methodology of data acquisition that is adaptable to the specific research questions asked by speech scientists • going beyond isolated words and sentences.

If we take the steps I have outlined we will be progressively providing answers to the central question of Phonetics How do humans communicate with speech in all types of speech interactions in the world’s languages ? • developing an integrated framework of Sounds and Prosodies in Communicative Phonetic Science

Sounds and Prosodies in Communicative Phonetic Science

Sounds and Prosodies in Communicative Phonetic Science

Presentation Transcript

phonetic features in asr

SEMANTIC AND COMMUNICATIVE TRANSLATION

Phonetic alphabets

Sounds and Rhythm in Poetry

Why isn’t phonetic spelled the way it sounds?

PHONETIC TRANSCRIPTIONS

Phonetic symbols

Sounds in Rhymes

Phonetic Alphabet

Phonetic features in ASR

Phonetic Alphabet

Sounds and Prosodies in Communicative Phonetic Science

PHONETIC ALPHABET

Phonetic Terminology

Phonetic Symbols

Phonetic Drill

PHONETIC TRANSCRIPTION

PHONETIC ALPHABET

Phonetic Alphabet

Phonetic features in ASR

Communicative English in Workplace

Phonetic Terminology