290 likes | 302 Views
An empirical study analyzing verbal, prosodic, gesture, gaze, and other channels in multichannel communication using the Russian Pear Chats and Stories corpus. Explore vocal, gestural, and oculomotor aspects to understand interpersonal interaction.
E N D
An empirical studyof multichannel communication Andrej A. Kibrik Olga V. Fedorova Alla O. Litvinenko Julia V. Nikolaeva Institute of Linguistics RAS and Lomonosov MSU aakibrik@gmail.com ESLP-2017, Moscow, 2017-09-12 Project #14-18-03819
Situated language use • When we communicate naturally, we not only produce chains of words, but also • intonate • gesticulate • assume various postures • interact with eye gaze • etc. • These processes are traditionally studied by different academic disciplines • But the actual communication process is whole and undivided • Hence multimodal approach • Gibbon et al. eds. 2000, Kress 2002, Granström et al. eds. 2002, Scollon 2006, Kibrik 2010, Knight 2011, Adolphs & Carter 2013, Müller et al. eds. 2014 …
Citizen Minin and Prince Pozharsky gaze gesture talk posture touch
Discourse Vocal /auditory modality Kinetic /visualmodality Other modalities Verbal channel Prosodic channel Gaze channel Gesture channel Proxemicschannel Otherchannels Intonation Numerousothercomponents Facial gestures Cephalicgestures Manual gestures Torsogestures Other gestures Spoken multimodal (multichannel) discourse
Current stage: Russian Pear Chats and Stories • Character of interaction: structured vs. unstructured • Character of environment: prepared vs. unprepared • Idea behind this stage: sharpen tools for the subsequent stage of free conversation
The Pear Film (Chafe 1980) Russian Pear Chats and Stories
Outline 1. The Russian Pear Chats and Stories corpus Design Technical solutions Annotation Vocal Gestural Oculomotor 2. Some avenues of research 9
Design Listener Narrator Reteller Commentator • The Narrator and the Commentator watch the film • The Narrator tells the Reteller about the film • The Commentator adds details, all three discuss the film • The Reteller tells about the film to the Listener, who has just joined the group • The Listener writes down the contents of the film First telling Conversation Retelling 10 2nd retelling (written)
Audio recording • Six channels ZOOM H6 Handy Recorder • 96 kHz / 24 bit • Each participant recorded with a lapel SONY ECM-88B mic, mono • Inbuilt mic records all vocal events, stereo • Automatic synchronization of all audio files
Video recording: Cover shot • GoPro Hero 4 (wide angle) • Frame rate: 50 FPS • Resolution 2700х1500
Video recording: Individual frontal cameras • Industrial high-speed cameras JAI GO-5000M-USB • Frame rate: 100 FPS • Crucial for analysis of kinetic behavior • File format: mjpeg • No interframe compression • Resolution 1392х1000 • No audio
Eye trackers • Tobii Glasses II Eye Tracker • Sampling rate: 50 Hz • Video recording of the scene • Resolution: 1920х1080 • 25 FPS • Overimposed eye movements • Software that produces temporal coordinates of fixations and saccades
The scene Tobii glasses Listener Narrator Reteller Commentator 15
40 sessions recorded in 2015 and in 2017 160 participants in all 18 to 36 yrs old Gender 50 men and 110 women Education persons with higher education students 15 hours About 160 K words Open resource (informed consent) Russian Pear Chats and Stories: Quantitative parameters
Corpus web site (mostly in Russian) • http://multidiscourse.ru/annotation/ • As of now, three sessions (“recordings”) uploaded • Mediafiles • Audio • Video • Eyetrackers • Annotations • Vocal • Gestural • Manual • Cephalic • Oculomotor
Verbal structure Division into elementary discourse units (EDUs) Quanta of talk (Ščerba 1955, Cruttenden 1986, Chafe 1994) Elementary behavioral acts of discourse production Identified on the basis of prosodic criteria: tempo, pausing, etc. Temporal dynamics Pauses Accents Tone in accents Illocutionary characteristics Phase Tempo Emphasis Reduction Tonal register Disfluencies Comments on specific EDUs Etc., etc. Vocal transcription(see www.spokencorpora.ru)
Annotation of manual gestures (ELAN) • Movements (separately for each hand) • Self-adaptors • Gestures as such • Gestures (= Kendon’s gesture phrases) • Handedness • Gesture phases • Gesture chains • Movement chains • Tags • Multi-stroke • Overlaps
Oculomotor annotation (Olga Fedorova) Fixations on: Interlocutor face hands torso other Surroundings Saccades Durations
Multilayer annotation • Go to ELAN
Automatic detection of motion (Mikhail Buryakov) • Pairs of consecutive frames of the video file (100 FPS) are compared by the algorithm • Pixels are compared in brightness • Pixels are considered different, if the difference in brightness is above a certain threshold X • If the share of different pixels is above threshold Y, it is interpreted as movement • Visualization • Movements are written into ELAN annotation
Some avenues of research • Many studies address the relationships between “words” and “gestures”. How does prosody fit into this picture? • In kinetic behavior, different body parts are relatively independent. Should we postulate not only gestures, but also postures in different kinetic components (torso, head, hands, face)? • Gestures vary strongly depending on individual differences. How can we account for this variation?
Prosody in the theory of language/communication • Many specific similarities between prosodic and gestural phenomena (tempo, acceleration/deceleration, intensity, accents/beats, emphasis on most prominent semantic elements…) • High degree of coordination between units belonging to different channels: manual gestures and EDUs • 50% of whole gestures fit with the HM of 0.65 (Fedorova et al 2016) • Prosody as an interface between the verbal and gestural channels • Relating the verbal and gestural components is much more fruitful if prosody is included
Individual variation and the portrait methodology • Our prior studies (Kibrik 2009): prosodic portraits as essential for assessing specific phenomena in specific speakers • E.g. the “period intonation” can only be identified if one knows a speaker’s bottom of the F0 range • Gesticulation portrait: • (Dis)inclination to stillness • Particular self-adaptors • Amplitude of gestures • Oculomotor portrait: • Mean durations of fixations • Total durations of fixations • on interlocutors’ hands: 1.9% 5.2% 11.1% • Distributions • Targets • Stages
The corpus Natural communication in a group of four participants Clear communicative intention in each participant Structured interaction and prepared environment as a step towards documenting completely unrestricted communication Capturing what actually happens: High quality video: 100 FPS, mjpeg format Eyetracking In-depth annotation of talk, gesture, and eye gaze All this is a necessary prerequisite for a systematic study of multichannel communication Conclusions