1 / 59

Whither Phonetic Science? Why are we doing what we are doing, and what should we be doing?

Whither Phonetic Science? Why are we doing what we are doing, and what should we be doing?. Klaus J. Kohler University of Kiel, Germany. Welcoming address to Sound-to-Sense, Kiel 14 December, 2012. 1 Introduction. Welcome to Germany to Kiel to Phonetics and Digital Speech Processing

elaine
Download Presentation

Whither Phonetic Science? Why are we doing what we are doing, and what should we be doing?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Whither Phonetic Science? Why are we doing what we are doing,and what should we be doing? Klaus J. Kohler University of Kiel, Germany Welcoming address to Sound-to-Sense, Kiel 14 December, 2012

  2. 1 Introduction • Welcome • to Germany • to Kiel • to Phonetics and Digital Speech Processing • the Institute was closed on 1 April 2011 • due totheinscrutablewisdomofour Alma Mater • but itsspiritis still verymuchaliveandkicking • and, like Phoenix, itisrisingfromtheashes • thanksto Oliver Niebuhr‘senthusiasmanddrive in speechscienceresearchandteaching

  3. You have come to this discussion meeting because, in some way or other, you are affiliated to the EC Marie Curie Research Training Network Sound to Sense • either because you actively worked on it • or because you want to be part of the interdisciplinary network paradigm which the funding program developed for the advance of speech science • So, this is a good opportunity to reflect on where phonetic science has got and where it should be going.

  4. These questions have been asked at various stages in the history of speech science. • The most famous case was JR Pierce in two papers in JASA (1969, 1970), “Whither speech recognition?” in connection with ASR “…before embarking upon such work, the worker should candidly ask and answer the following questions: Why am I working in this field? What particular thing do I hope to accomplish? Why is it worthwhile? Am I likely to succeed? How will I know whether or not I have succeeded? Where will success take or leave me?"

  5. One and a half decades later, Manfred Schroeder in the Preface to the Bibliotheca Phonetica volume Speech and Speaker Recognition [1985] says about the state-of-the-art of automatic recognition of speech at the time: "… one of the main impacts of the computer has been to demonstrate the manifest inadequacy of superficial algorithms that take no account of context and meaning. The simple-minded computer per se was not the hoped-for cure-all, and speech recognition was in acute danger of withering in the laboratory rather than blooming in the field…"

  6. So, what IS the phonetic scientist’s ultimate goal? • To find answers to the question “How do humans communicate with speech in all types of speech interactions in the languages of the world?” • This question has always been asked and partial answers have been proposed • by creating categories of phonetic description • but they have always ended up as concepts abstracted from their original life contexts and reified in metalinguistic pursuits in their own right • Let’s have a look at some corner stones in the history of phonetic science.

  7. 2 From Sound to Phoneme • For thousands of years, homo sapiens loquens has invented ways of capturing the fleeting sound of spoken words in timeless symbols on durable material. • The aim of all the systematic writing systems that have resulted is to represent lexical items in graphic form • either ideographically, or with reference to sound units in syllabic or alphabetic scripts • An alphabetic writing system has been invented only once, in the Semitic language family. • All other alphabetic systems are derivatives from it. • Why should that be so?

  8. 3-consonant roots for semantic fields of the lexicon k'atab he wrote y'iktib he writes, will write k'aatib clerk k'atabaclerks kit'aab book k'utub books makt'uub written m'aktab office, desk makt'abalibrary

  9. This was the birth of the “phonemic” principle in tight association of lexical meaning and form. • No other language had this, so no other language developed an indigenous alphabetic script. • When the phoneticians of the newly-founded IPA at the end of the 19th c. devised a phonetic alphabet to indicate pronunciation in languages like English or French, whose Latin orthographies had become deficient in the representation of sounds, they reinvented the phonemic principle • broad and narrow transcription

  10. The linguists of the Prague Circle turned this into a phonological theory with the distinctive phoneme for the differentiation of the intellectual meaning of words, and allophonic variation in context. • They kept the function-form link • but dissociated it from graphic representation • and turned it into a principle of sound structures • every language having its own phonemic system • The American Structuralists, in their behaviouristic philosophy went one step further and removed the link to meaning, being unable to formalize it.

  11. Grouping of sounds into phonemes now governed by • complementary distribution • phonetic similarity • But Pike still recognised the original “phonemic principle” because he gave his book Phonemics the subtitle “A technique for reducing languages to writing”. • After that “phonology” became a separate discipline and had a metalinguistic purpose in itself practised by desk phonologists.

  12. Generative Phonology, Optimality Theory, Markedness, Feature Hierarchy • Phonological categories were moved again from behaviouristic groupings to entities in the ideal speaker/listener’s mind. • At this point, psycholinguists got hold of them and started taking them into the lab for experiments on “the phoneme as a perceptual” unit. • This has been the MPI Nijmegen paradigm for the past 20 years, e.g. in phoneme spotting. • But is this extrapolation justified?

  13. 3 From Phoneme to Fine Phonetic Detail • Pronunciation“white please” vs. “black please” ordering coffee • [wA>«? pli:z] by a Londoner • mistaken for [bl³A>k pli:z]by a Scottish listener • expecting [ãÃi? pli:z]. • In this situational context, the listener‘s task was to understand one of two possible meanings • wrong understanding triggered by “graveness” instead of“acuteness” of the sound • not by wrong phoneme perception.

  14. Listeners process speech signals with perceptual categories shaped by attention and memory, not by abstraction from sound to phoneme • they aim at understanding messages in all their facets of meaning, even from incomplete “segmental” signal information • stable multidimensional fine phonetic detail plays an important role • based on episodic memory, exemplar recognition and contextual information • This is mandatory in the processing of reduced speech, especially of function word form variability.

  15. Here is an example from the Kiel Corpus of Spontaneous Speech: OLV g122a009 • I shall first play a stretch of speech that even native speakers of German will not be able to understand, which phoneticians find very difficult to represent as a string of segments, and German phoneticians as a sequence of phonemes. • Then I shall add the next stretch which will most likely trigger understanding of both stretches. • A third stretch will complete understanding. • The fine phonetic detail in the stretches will be discussed.

  16. n oâù â nVŒ)(M a k H U N0

  17. 0 m0 m I t H v8  x f Ò8 ai I s nun wollen wir mal kucken, ob Mittwoch frei ist /nuùn vl(«)n vIŒ maùl kUk(«)n ?p mItvx frai Ist/

  18. n uù n v  l « nv IŒ m aù l k H U k N

  19. [kHUN0] is identified as the verb <kucken>. • The sound stretch that immediately precedes must be the modal particle <mal>, which commonly occurs in verbal context as [ma]. • But then an inflected auxiliary verb must precede. • The dark vocalic stretch ending in a labiodentalized nasal, which is in turn followed by [Œ], can be associated with <wollen wir>, because it commonly reduces in the direction of [VnVŒ]. <werden, sollen, müssen> do not fit.

  20. The initial stretch of [n] + dark vowel with strong nasalization across the long vocalic section can be associated with <nun> . • The result is an understanding of what in English is <“Now let’s see if Wednesday is free.”>. • This theoretical account of how the highly reduced utterance may be recognised puts sound perception into an integrated framework of cognitive processing for the understanding of meaning. • Phonemes and canonical forms play no role in it.

  21. Phonetic traces that need not be segmental but may be spread over indefinite stretches (articulatory prosodies) trigger the recognition process, in conjunction with • morphological, syntactic and situational constraints • memory of multiple phonetic forms of lexical items is essential • complete phonetic identification of acoustic sequences is not required • These components of the recognition process must work in parallel to allow for real-time processing.

  22. How they are implemented in real situations is an interesting and pressing question for future research in cooperation with neuroscientists (Event-Related Potentials) • Important suprasegmental articulatory prosodies are • nasalization • glottalization • labialization, labiodentalization • palatalization, velarization, pharyngealization

  23. z o¢_Œ d « s m a x= « n0 <soller ><das ><machen > /zl eùŒ das max«n/

  24. z )Œ)n à s m a x= « n0 <sollenwir ><das ><machen > /zl«n viùŒ das max«n/

  25. z )0 Œ0)d¢ à s m8 a Ä n0 <solltenwir><das > <machen > /zlt«n viùŒ das max«n/

  26. The fact that no role is attributed to phonemes and canonical forms in speech recognition does not mean that they are useless concepts. • The relevance of the phoneme concept in devising economical alphabetic writing systems has already been referred to. • The concept of canonical forms is useful in compiling pronunciation dictionaries listing variants under a lexical heading. • It is also useful for training automatic speech recognisers.

  27. But neither concept should be extrapolated beyond these specific domains of application without special justification. • They are both inappropriate in (semi)automatic segmentation of acoustic databases for phonetic research, because they cannot capture articulatory prosodies, which are essential in speech production and perception. • The Munich Automatic Segmentation System (MAUS) fails to provide annotation files that are usable for such a research goal. • At present there is no adequate shortcut to manual phonetic annotation by competent phoneticians.

  28. The concept of articulatory prosodies was integrated into the annotation of the Kiel Corpus of Read and Spontaneous Speech n u: -MA n-+ &0 v- O- l- @- n+ &0 -MA v- i:6-6+ &0 m a: l-+ &1^ g-k -h 'U k @- n-N , &0 Q- -q O -MA p-m+ &2. &2^ m 'I t v O x &1. &2^ f r 'aI &0 Q- I s t-+ .

  29. Several publications: K.J. Kohler, Articulatory prosodies in German reduced speech, ICPhS 1999 Complementary Phonology – A theoretical frame for labelling an acoustic database of dialogues, ICSLP1994 O. Niebuhr, K.J. Kohler, Perception of phonetic detail in the identification of highly reduced words, JP 2011 K.J. Kohler, O. Niebuhr, On the role of articulatory prosodies in German message decoding, Phonetica 2011

  30. Phonemes and canonical forms are also inappropriate for gaining insight into speech and language acquisition, be it L1 or L2 • although they have provided the standard paradigm • e.g. the Contrastive Structures Series, ed. by Charles Furguson • but MacNeillage, P. The Origin of Speech. 2008; Frame and Content theory. Piske, T. Artikulatorische Muster im frühen Laut- und Lexikonerwerb. Tübingen: Gunter Narr (2001)

  31. 4 From Auditory Observation to Signal Analysis • The technological advance in speech signal analysis, the spectrograph to start with, and latterly computer programs, • inevitably led to taking the phoneme concept into the lab • in order to substantiate phonological entities and structures by objective measurement • thus to supplement auditory impressions by testable physical properties • finally to replace auditory observation altogether

  32. This development has culminated in Laboratory Phonology and has publication platforms in Journal of Phonetics, Laboratory Phonology • useless questions are asked and badly answered • e.g. Incomplete Neutralization of voicing in German final obstruents: rund(e) vs. bunt(e) • the latest analysis is Röttger, Winter, Grawunder, The robustness of incomplete neutralization in German, ICPhS 2011

  33. in production a difference was found of 8ms in vowel duration before voiced/voiceless plosives • below JND, thus has no communicative value • in the subsequent perception experiment 8 subjects classified 54% of the /ptkbdg/ stimuli as voiceless, 46% as voiced • logical regression and t tests gave significant differences between voiceless and voiced classification across all stimuli

  34. however, the distribution of voiceless and voiced judgements across /ptk/ and /bdg/ separately, i.e. hits, misses and false alarms, was not tested, and the frequencies are not given • but they can be estimated from other indices as • 56% voiceless and 44% voiced for /ptk/ • 52% voiced and 48% voiceless for /bdg/ • chi2 testing gives no significance for an association of /ptk/ or /bdg/ stimuli with voiceless or voiced judgements, nor significant deviation from equal distribution for /bdg/

  35. So, the judgements are random • and therefore neither the results of production nor of perception have any communicative value • and the robustness in the title is a phantom. • We can well do without such l’art pour l’art experimentation, which abounds in Laboratory Phonology. • This is time, effort and public money badly spent. • It does not advance our knowledge of how people communicate one bit. • Sense has to be reintroduced into measurement

  36. 5 From Sound to Sense • The origin of speech technology after World War II had of course the communicative component incorporated • communications engineering, technological development to improve communiaction • Speech Communications Conference at MIT1950 • Menzerath and Meyer-Eppler invited • >Institut für Phonetik u. Kommunikationsforschung • Research Laboratory of Electronics, Speech Communication Group, MIT

  37. Speech Communication Seminar, Stockholm 1974 • From Sound to Sense: 50+ years of discoveries in speech communication, MIT 2004 • invited paper by Sarah Hawkins: Puzzles and patterns in 50 years of research on speech perception

  38. “It seems reasonable to hope that new theories will aim to include the following attributes. They should be biologically plausible; include roles for attention, memory, and learning; focus on understanding meaning rather than identifying phonological form; allow for multiple potential ‘units of perception’, possibly with no obligatory units; and they should allow meaning and linguistic structure to be understood from incomplete information.”

  39. “A … key issue is to re-evaluate the distinction between bottom-up and top-down information. On the one hand, fine phonetic information that systematically indicates linguistic structure should make many model ‘top-down processes’ unnecessary. For example, fine allophonic detail can provide segmentation information that makes top-down use of abstract knowledge about possible word constraints redundant.

  40. On the other hand, such fine phonetic detail cannot be used in the absence of top-down knowledge about how it should be used —for this language, this accent, this speaker. The traditional distinction between signal and knowledge is thus likely to be blurred in future models. This seems entirely consistent with current understanding of brain functioning.”

  41. This is the theoretical background, including the name, for the EC Marie Curie RTN. • There is a strong influence from Firthian linguistics. • This embedding of sound into sense in speech communication was, and is again, the research and teaching strategy of Phonetics in Kiel • and it naturally led to the integration of prosody in the study of sounds and their phrasal variability • thus looking at the exchange of meaning between speakers and listeners with the full array of phonetic form and substance.

  42. 6 From Sense to Sound • But we also need to include the complement • Jakobson, Fant, Halle, Preliminaries to speech analysis, 1952 “given the evident fact that we speak to be heard to be understood” • Speakers transmit meaning • by coding it in words and syntactic structures with fine phonetic detail of segments and prosodies • generating acoustic signals for listeners to decode

  43. We need to answer two questions: • How is the phonetic form of words represented mentally to trigger physiological and articulatory processes for acoustic sound production? • What are the rules for producing reduced or elaborated phonetic forms? • A global answer to the first question is that the representation can certainly not be canonical phonemic form

  44. essential phonetic elements that define the whole formal set of a lexical item will need to be specified (Niebuhr’s phonetic essence) • this specification must include segmental units as well as articulatory prosodies • both are related to lexical, morphological and speech style categories • which allow for phonetic under-specification

  45. e.g. the ending of infinitives and 1st, 3rd persons plural of the German verb can be specified as [nasal] • the presence of a preceding vowel depends on a reduction-elaboration coefficient related to speaking style and speaking situation, > [«] > [E] • the realization of the nasal as [m n N] depends on the preceding vocalic or consonantal stretch • as in the spontaneous-speech example discussed earlier, the nasality feature may be realised as an articulatory prosody on the preceding vocalic stretch instead of a nasal consonant, when the reduction coefficient increases in more casual style

  46. The answer to the second question goes well beyond descriptive accounts of large databases (e.g. Kohler, Articulatory dynamics of vowels and consonants in speech communication, JIPA 2001) • it needs to include the coupling of reduction/ elaboration with lexical class, morphology, syntax and speaking style, closely linked to the answer of the first question • e.g. the German sequence of preposition + definite article masc. mit dem has two sets of realizations

  47. containing the deictic marker [d], as in local and temporal pointers da, dort, dannand demonstrative pronouns dieser, der (da), [mI(t) de(ù)m] [mId«m] II. not containing [d]: [mIpm] [mI(b)m] [mIm¼m] • II. is appropriate in phrases with generic reference, e.g. means of transport: mitdem Auto, mitdem Bus, mitdem Zug, mitdemFlugzeug“by car, bus, train, plane” • I. has a specific reference, e.g. ichfahrmitdem Auto, und zwarmitdem BMW meiner Frau “I go by car, and I take my wife’s BMW”

  48. These two sets need to have separate mental representations, because they have different functions in the transmission of meaning • both representations must contain [mI__ m] • for I. the deictic marker is inserted with variable vocalic release according to the situationally determined reduction coefficient • for II. bilabial plosive interruption of sonority is possible with any phonation feature. • Thus mental lexical representation is multivalued.

  49. You might call this proposition speculative • but it is no more speculative than the assumption of underlying canonical forms in the mental lexicon as a basis for 20 years of MPI Nijmegen perception research • we simply need to develop the adequate new experimentation to find answers for it • which means for researchers to give up cherished postulates and procedures to move in new directions • the Sense-to-Sound approach will make it possible.

More Related