220 likes | 374 Views
Investigating visual prosody using articulography. Johan Frid, Malin Svensson Lundmark, Gilbert Ambrazaitis (Linné), Susanne Schötz och David House ( Kth ) dHN 2019 , MAR 6, 2019. Self check: are we DHN?. From th e Call:. Background : EMA ( ElectroMagnetic Articulography ).
E N D
Investigating visual prosody using articulography Johan Frid, Malin Svensson Lundmark, Gilbert Ambrazaitis (Linné), Susanne Schötzoch David House (Kth) dHN 2019, MAR 6, 2019
Self check: arewe DHN? • From the Call:
Background: EMA (ElectroMagneticArticulography) • Measurement +recording of movements in 3D • EM field and sensors attached to tongue, jaw, lips, head • Up to 16 sensors simultan • Sampfreq 1250 Hz (AG501) / 200 Hz (AG500) • Catches quick movements • LU Humanities lab
2D viewofmovementsduringspeechthroughtime (head and articulators) Nose Ear Lips (2 sensors) Tongue (3 sensors) Jaw
Earlierproject: MUMOP/Swe-Clarin • Newsreaders • Audiovisualrecordings • Ambrazaitis & House (2017) • Head movements occur more often in the second part of a news event • To some extent dependent on information structure • Intial clause is the theme of the news • Frid, Ambrazaitis, Svensson-Lundmark & House (2017) • Machinelearningbaseddetectionofheadmovements
Headmovementsusinganalysisof video recordings • Face detection using OpenCV • Position of face determined in eachframe • Black square is detected face, whitedotreflectsheadmovements • Butalso general bodymovements
Currentproject: PROGEST • The production of prosodic prominence: integrating bodily and articulatory gestures • House, VR 2017-02140 • Multimodal prominence • Interplaybetween • Verbal prosody (rythm, intonation, intensity) • Visual prosody (gestures, head and face movements)
UseEMAfor headmovements? • Compared to video • Movementdata in 3D • Bettertimeresolution: EMA has 1250/200 measurements/s • Betterprecision : directregistrationinsteadof post processing • Bettersyncbetween sound and movement • Canalsemeasuretonguemovements • Disadvantages • has to be done on-line • rathercomplicatedprocedure • 2 studies on headmovement and itsinterplaywithprosody • reuseof old (’found’) data
Angleofhead nod Nose Ear Lips (2 sensors) Tongue (3 sensors) Jaw
angle+ how it changes (velocity) velocityanglewaveform Interpretation: positive meanvelocity head is tiltedupwards negative meanvelocity head is tilteddownwards
Study1: material from VOKART (Schötz) • Dialectal variation, mainly in vowelarticulation • 12 sensors • 29 speakers • 9 Stm, 10 Gbg, 10 Mmö • X men, Y women • Åldrar 20-63 • 3-4 reps oftwo read sentences
Data • 3-4 readingsofsentences • 1) Mobiltelefonen är nittiotalets stora fluga, både bland företagare och privatpersoner. The mobile phone is the big hit of the nineties, both among business people and private persons. • 2) Flyget, tåget och bilbranschen tävlar om lönsamhet och folkets gunst. Airlines, train companies and the automobile industry are competing for profitability and people's appreciation. • Possibleprosodicboundaries • S1: Possiblephraseboundaryafterfluga • S2: starts with list intonation • Canweseeanyheadmovementsassociatedwiththese? • Exclusionsbecause bad or no sound, non-completesentences • 86 ex of 1), 80 ex of 2) • Semi-automaticsegmentationintoword by meansofforcedalignment in PRAAT
Sentence 1: meanangvelocity per word, all speakers + repetitions (n=86)
Linear mixed effects • Fixed effect: mav, Random effect: speaker, likelihoodratio tests • fluga – både • word affected mav (χ2 (1)=8.5201, p=0.003512), lowering it by about 0.077 rad/s ± 0.017 (standard errors) • stora – fluga • word affected mav (χ2 (1)=8.4946, p=0.003562), increasing it by about 0.077 rad/s ± 0.017 (standard errors) • Mobiltelefonen- nittitalet • word affected mav (χ2 (1)=5.8811, p=0.0153), lowering it by about 0.043 rad/s ± 0.012 (standard errors) • flyget – tåget • word affected mav (χ2 (1)=3.913, p=0.04792), lowering it by about 0.043 rad/s ± 0.017 (standard errors) • tävlar – om • word affected mav (χ2 (1)=4.3803, p=0.03636), lowering it by about 0.032 rad/s ± 0.012 (standard errors)
Study2: PhD project byMalin Svensson Lundmark • 18 speakers, South Swedish dialect, ages 23-75 • 8 targetwords • Varied by word accent and vowellength • Embedded in QA-pairs like • Where did grandpa leave mom?Grandpa left mom with the doctor. • (in order to avoid ‘big’ accent on target word) • 8 reps/word, misreadingsetcremoved, in total 1092 tokens
(work in progress…) • Exploratory: Canweseeanyheadmovementspatternsin these? • GAM analysis • (GeneralizedAdditiveModeling) • non-linear regression method • identify general patterns over dynamically varying data • See Wieling 2018
Conclusions • Tendencies for • upwardmovementbeforephraseboundary • downwardmovementafterphraseboundary • noddingpatternsynced to vowel • differencesdepending on word accent and vowellength