330 likes | 420 Views
Efficient user-centred access to multimedia meeting content. Simon Tucker and Steve Whittaker University of Sheffield {s.tucker, s.whittaker}@shef.ac.uk. AMI Project. Meetings are a critical way in which knowledge is created and shared within organisations
E N D
Efficient user-centred access to multimedia meeting content Simon Tucker and Steve Whittaker University of Sheffield {s.tucker, s.whittaker}@shef.ac.uk
AMI Project • Meetings are a critical way in which knowledge is created and shared within organisations • Most of this knowledge is never recorded • AMI provides Multimodal Access to Multimedia Records of Meetings • 16 Partners • Follow on project AMIDA – Real Time
Sheffield AMI Work • User Requirements • Temporal Compression of Speech • Reducing the amount of time required to listen to a meeting recording but still getting the important information. • Dynamic Visual Summarization Techniques • A number of methods for dynamically presenting summary information interactively. • Temporal Compression of Video • Audio motivated video compression.
Meeting browsers • The primary means of accessing meeting records is via a browser. • In previous work we segregated browsers into four categories according to their focus. • The focus is either the primary means of presentation or navigation that the browser used. • This segregation allowed us to get a good idea of the current browser space.
Browser Examples Audio Video Artefact Discourse
User Requirements • Can make use of two different methods to collect user requirements • Practice–centric • Examination of current practices. • Collection through observation. • Technology-centric • Exposure to new technology. • Collection through user opinion.
Practice-centric AMI study • Meetings already generate a large amount of information exchange. • Personal Notes. • Minutes. • Post-meeting email discussion. • Informal meeting discussions. • Approach taken is to record (where possible) and then analyse these records. • Use this analysis information to determine how meeting records are used and what are any problems associated with such records.
Study details • We examined the meeting recording practices of two firms. • We studied a core team over a series of meetings. • Thus we can study the lifecycle of meeting documents. • Meetings in both firms were task oriented rather than being about the generation of ideas. • We collected permission to make recordings from each meeting participant • We also allowed participants to request that the recordings be switched off. • Names were removed from transcripts.
Analysis of State of the Art Tools • Important to assess the state of the art. • Assessed the efficiency of the first generation AMI meeting browser in answering typical questions about a meeting. • Generated a number of questions about a single meeting. • Subjects asked to answer these questions using the meeting browser. • ‘Thinkaloud’ was encouraged and we examined the accuracy of the answers. • The questions were either about specific information (what was the total budget?) or were more general (what was Ed’s contribution to the meeting?).
Tools Analysis Results • Inefficient for access • Too much low level detail • Assumption of large display • Users need abstraction / summarisation tools
Efficient Access to Meeting Data • There is a clear need for efficient access to meeting data. • Meetings contain a lot of irrelevant information (both in general and for specific participants). • Minutes and notes capture important information but lack contextual information. • State of the art tools lack abstraction – generally present the raw recordings, unfiltered. • We focus on lightweight components allowing for efficient access to meeting data.
Temporal Compression of Speech • Intended for environments which necessitate speech only access. • e.g. Mobile phone, travelling in car etc. • Aim is to reduce the length of the recording but to retain the important content. • Two techniques for reducing the length: • Speed Up: Play the full clip back at a faster rate. • Excision: Remove sections of the recording.
Speed Up • Simplest approach is to directly alter the playback rate. • Has the side effect of altering the pitch of the speakers. • Use an overlap and add algorithm to speed up whilst keeping pitch constant. • Has the problem of not reflecting how speakers naturally increase their speech rate. • Use a variable playback rate to better match how human speakers alter their speech rate.
Excision • Simple approach is to remove non-informational parts of the recording e.g. silence. • Limited by the amount of silence. • Derive measures of word importance and only play back the important words; missing words are mentally replaced. • Far from “natural” speech. • Use larger parts of speech (utterances) and locate important utterances and play only those back.
Experimental Overview • Initial Exploratory Experiment • Gain an understanding of the space. • Informally assessed a large number of techniques. • Located promising directions for research. • Follow up detailed study • Examined a subset of the techniques explored. • Used a measure of gisting ability to assess success. • Examined short and long meeting clips. • Also examined effect of a user interface.
Measuring Gisting Ability • A key facet of our techniques is that they support the discovery of gist rather than facts. • Therefore the metrics we have used previously do not adequately capture the proposed usage of these tools. • Key components of the performance metric: • Must be quick to assess and to score (experimenter and subject time) • Objective measure
Measuring Gisting Ability (2) • Our solution was to use a hybrid gold standard scheme. • We measure the importance of utterances from the transcript and select a number of utterances from the full range of importance. • We then ask judges to rank these utterances in order of importance. • Subjects then listen to the meetings and perform the same ranking. • The objective score is then the difference between the gold standard and subject rankings
Results • Removing unimportant utterances performed better than speed up. • Listeners understood the gist of a recording faster. • All techniques performed better than applying no compression. • With longer clips understanding was the same. • Speed up required more interface interactions than excision.
Dynamic Summarization • Using summary information to locate points of interest within a meeting transcript. • Traditional summaries can be customized but are largely presented statically. • Underpinned by two concepts: • User is able to dynamically alter the summarization level. • Alteration shown in real time. • Applying different presentation techniques.
Development Procedure • Using the same process to evaluate as was used for the speech work. • An initial lightweight evaluation of a number of UI concepts intended to find promising directions of research. • A follow up study examining the techniques in more detail with a more rigorous evaluation protocol.
Dynamic Summary Display • Two unit levels examined: • Words • Utterances • Two presentation techniques: • Unit shading. • Unit excision. • Two hybrid techniques: • Combining the four techniques into one • An experimental fish-eye view
Examples • Word Excision • Word Shading
Initial results • Shading works well. • Operating at the word level is satisfactory. • Fish-eye was not liked. • The combinatorial approach did not really offer anything novel.
Follow Up Study • Focus solely on the Word Excision and Word Shading techniques (highest rated in the previous experiment). • Two questions (one specific, one general) about a number of meetings. • Use the two interfaces (plus a control plain text transcript) to answer the questions (one question per meeting). • Measure the time taken to answer, the accuracy and the amount of interface actions used when answering the questions. • Collect subjective preference data and user comments about each of the techniques.
Follow Up Study Results • Subjects were largely accurate – there was no effect on interface type on the accuracy • No effect of interface type on time taken to answer – i.e. there was no efficiency loss as a result of using the dynamic interfaces.
Preference and Process Results • Subjects overwhelmingly preferred the Word Excision Condition. • Subjects scored the Word Excision and Plain Transcript conditions equally. • The Word Shading condition required less interface actions than the Word Excision condition. • Specifically users spent more time changing compression levels in the Word Excision condition.
Video Compression • The same techniques for audio can also be applied to video. • Compress the audio recording and use this compressed version to derive an audio-video recording. • Informal evaluation indicates a different modality for video.
Video Examples • Type of video being used • Word excised video • The cuts are now much more disconcerting. • Sped Up video • More comfortable to watch but disconcerting at high compression levels. • Can also do non-linear compressed video • Speed up only the non-silent parts. • Can also e.g. speed up through unimportant parts
Summary • Looking at Interfaces for Browsing Meeting Recordings • Problems with abstraction in current meeting recording technology and automatic browsing systems • Temporal Compression of Speech • Reducing the time required to listen to a speech recording but keeping the important information. • Utterance Excision.
Summary • Dynamic presentation of meeting transcripts • Real time selection of summary level. • Word Shading. • Temporal Compression of Video • Applying the above to video recordings. • Speed up more effective.