320 likes | 543 Views
Annotation of speech from the phonetics/phonology perspective. Bettina Braun & Jürgen Trouvain. Fachrichtung 4.7, Institut für Phonetik. 15.02.2002. Manipulating text vs. speech [1]. text file manipulat ion "vowel-only" version
E N D
Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain Fachrichtung 4.7, Institut für Phonetik 15.02.2002
Manipulating text vs. speech [1] text file manipulation "vowel-only" version remove all consonantletters, replace them with a space, so that only the vowels are left e ea e o e a o o o o : a e ou y i e o i i a eu y e i e a e oo . Annotation of speech
Manipulating text vs. speech [2] text file manipulation"consonants-only" version remove all vowel letters, replace them with aspace, so that only the consonants are left Th w th r f r c st f r t m rr w: r th r cl d n th m n ng w th f ws nn sp lls n th ft n n. Annotation of speech
Manipulating text vs. speech [3] • The weather forecast for tomorrow: rather cloudy in the morning with a fewsunny spells in the afternoon. • speech file manipulation • original recording, not manipulated • "consonants-only" version: vowel segments replaced with silence • "vowels-only" version: consonant segments replaced with silence Annotation of speech
Coarticulation • articulating means • articulator in motion, not in fixed position • articulators move continously, not discretely • articulatory movements temporally overlap Annotation of speech
originalvowelsonlyvowelsonlywithoutsilences Annotation of speech
Timing • information of consonant durations:silence is more than nothing Annotation of speech
Speech melody • information about fundamental frequency (F0) in the voiced vowel segments • with F0 variation • without any F0 variation (monotonous) Annotation of speech
Annotation of sound segments: discreteness in mind & in physics • "Es ist 8 Uhr morgens." m m m o O g g e @ n n s s s r r graphemes phonemes phones O6 N Annotation of speech
Annotation of sound segments: discrete units? • "Die Nacht haben Maiers gut geschlafen." • "…………… haben Maier ……………………." • phonemic h a: b @ n m aI @ r s • acoustic-phonetic h a: b m aI 6 s • articulatory phonetic h a: b n m aI 6 s(possibly) Annotation of speech
Segmentation of sound segments: degree of discreteness • "Wer möchte noch Milch?" • clear segmentation: • closure and closure release in [t] in "möch t e" • unclear segmentation: • [I l] in "M il ch" Annotation of speech
Kiel Corpus read & spontaneous speech • orthography • phonemic (canonical) form • realised form • word & sentence boundary • manually labelled Annotation of speech
From sounds to syllables:how many syllables? • semi-vowels: syllabic or not? • Studie Stu - di - e vs. Stu - die • Piano Pi - a - no vs. Pia - no • size of auditory window • "… mit mir diese Dienstreise zu unternehmen, …" • rei - se - zu - un - ter • zu - un - ter • zu - un Annotation of speech
From sounds to syllables:where is the syllable boundary? • ambisyllabic consonants & onset principles • Mitte /m I - t @/ vs. /m I _t @/ • Adler /a: t - l @ r/ vs. / a: - d l @ r/ • Fenster /f E n s - t E r/ vs. /f E n - s t E r/ • resyllabification • "Wenn es Ihnen da 5 Tage lang irgendwo passen würde." • /v E n - E s/ vs. [v E _ n E s] Annotation of speech
Controlled elicitation of spontaneous speech • Monologues • Erzählung • Bildbeschreibung • Dialogues: Task-oriented data collection • Map Task • Appointment-making • Degree of naturalness? • Controlled elicitation Annotation of speech
Controlled elicitation of spontaneous speech Annotation of speech
Problems for annotation: non-speech in speech • Many non-linguistic signal portions: • swallowing • lip-smacking • breathing • unfilled, filled pauses • laughter • hesitational lengthening Partly overlapping with speech Annotation of speech
Functions of prosody • Generally: Features above the segmental level suprasegmental Annotation of speech
Phonetic encoding of prosody • perceived pitch over time • duration • intensity • spectral quality Annotation of speech
Prosodic annotation: Signal oriented • Tilt-model (Taylor 2000) • intonational “events” • continuous parameters (tilt parameter): • amplitude: sum of the magnitude of rise and fall • duration: sum of rise and fall durations • tilt: shape of the event 1.0 0.5 0 Annotation of speech
Prosodic annotation: Autosegmental, phonological • GToBI (Grice et al.) • Tonal tier, break tier • Two levels of pitch-heights (L, H) • Simple and complex pitch accents • Association to word stress marked by * • Exact temporal alignment • Boundary tones marked by % • Strength of prosodic breaks (3, 4) Annotation of speech
Prosodic annotation: Example tonal orth. break misc Annotation of speech
GToBI Labelfiles 46.836392 113 also 46.958899 113 ich 47.171623 113 bin 47.555335 113 genau 48.180049 113 waagerecht 48.468170 113 rechts 48.613576 113 von 48.726670 113 der 49.246344 113 Goldmine orthografic tones breaks 47.469173 115 L+H* 47.555339 115 H- 47.768061 115 H* 47.851534 115 < 48.320061 115 !H* 48.812822 115 !H* 49.240958 115 L-% 47.555339 123 3 49.249036 123 4 Annotation of speech
Prosodic annotation: Phonological, single-layer • KIM (Kohler 1995) • no suprasegmental tiers => efficient analysis of segment-prosody interaction • differentiated from segmental labels by special diacritica • time marks for prosodic events anchored to word boundaries. • Example: Annotation of speech
14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750 13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250 Annotation of speech
14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750 13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250 Annotation of speech
14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750 13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250 Annotation of speech
14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750 13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250 Annotation of speech
14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750 13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250 Annotation of speech
Data structures and retrieval • Mostly pure textfiles, aligned to signal • “Retrieval” using script languages • (GToBI in EMU-Format) • XML-formats Annotation of speech
What for? • Basic research • Rhythmic patterns • Speech rate measurements (units, domains) • Temporal alignment & scaling of pitch accents • Differentiated analysis of pitch range • Speech technology • Modelling accentuation in ASR • Speech rate in ASR • Intonation and timing for synthesis Annotation of speech
Bibliography • Alwan, A., H.Bourlard and S.Furui (eds). 2001. Speech Communication33. Special Issue on Speech Annotation and Corpus Tools. • Grice,M., S.Baumann and R.Benzmüller (to appear). German ToBI. In: S.Jun (ed). Prosodic Typology • Grice, M. et al. (2000). Representation and annotation of dialogue. In: Handbook of Multimodal and Spoken Dialogue Systems. Resources, Terminology and Product Evaluation. Kluwer, pp. 1-101. • Kohler, K.J. (ed) 1995. Kieler Arbeitsberichte29. • Taylor, P. 2000. Analysis and Synthesis of Intonation Using the Tilt Model. In: JASA107(3). pp. 1697-1714. Annotation of speech