320 likes | 332 Views
This research paper outlines a system that combines video and audio processing techniques to capture and analyze meetings. The system includes fidget detection using frame differencing and temporal histograms, as well as fast Bayesian acoustic localization. The results demonstrate the effectiveness of the system in capturing and analyzing meeting data.
E N D
Fidget Detection for Audio Video Meeting Analysis Prashant K. Oswal Department of Electrical and Computer Engineering, Clemson University, Clemson, SC. 7th July, 2006
OUTLINE • Introduction • Video Processing • Audio Processing • System Architecture • Results • Conclusion
INTRODUCTION Importance of Meeting Capture and Analysis: • Takes care of schedule conflicts • Avoids note taking • Helps in decision making
INTRODUCTION Sub-systems:
INTRODUCTION Related work in Data Capture: Portable Meeting Recorder: Ricoh Innovations Distributed Meetings: Microsoft Research CAMEO: Carnegie Mellon University
INTRODUCTION Related work in Data Analysis: • DM System- SSL; Head and shoulder profiles, face detection, multi-cue tracking, hierarchical verification; Key frame extraction. • Portable Meeting Recorder- SSL; luminance variation and geometric feature analysis; background/foreground extraction. • CAMEO system- Parts-based face detection • Yong Rui et al.- Motion detection and statistical skin-color tracking
INTRODUCTION Related work in UI design: DM System Portable Meeting Recorder
INTRODUCTION Other systems: TeamSpace (Georgia Tech, IBM, Boeing) Media Enriched Conference Room (FX Palo Alto Laboratory) Quindi Meeting Companion
INTRODUCTION Our approach: DATA CAPTURE (off-the-shelf microphones and web cameras) DATA ANALYSIS (Fidget detection and Fast Bayesian acoustic localization) Direction of Sound Source q f
VIDEO PROCESSING Fidget detection: • Frame differencing. • No background image required. • Temporal histograms. • Participant mug-shots. • Short-term histograms • Works well on low resolution images.
Image at time (t-1) Image at time t FIDGET DETECTION:FLOWCHART Difference Image MOTION DETECTION Connected Components Fit 1D Gaussian on largest component TEMPORAL HISTOGRAM Construct Temporal Histogram Differentiate Histogram to detect slope changes PEAK DETECTION Detect Peaks Estimate Height Extract Mug Shot Participant 1 Mug Shot Participant n Mug Shot
VIDEO PROCESSING Extracted Mug-shots Temporal Histograms, Peak Detection Input Images Motion Detection
AUDIO PROCESSING Fast Bayesian acoustic localization: • Computationally efficient approach. • Sampled hemisphere. • Cross-correlation. • Azimuth and elevation angles.
SAMPLED HEMISPHERE ANDCORRELATION VECTORINDICES Number of Latitudes Number of Longitudes FIND CANDIDATE LOCATIONS Microphone 1 Location Microphone 2 Location FIND CANDIDATE TO MICROPHONE TIME FIND CANDIDATE TO MICROPHONE TIME Speed of Sound Speed of Sound Sampling Rate FIND CORRELATION INDICES Correlation Vector Indices
Microphone 1 Signal Microphone 2 Signal FASTBAYESIANACOUSTICLOCALIZATION PRE-FILTER PRE-FILTER CORRELATE Find Probability of Source at Each Candidate Location Correlation Vector Indices Source Probability Vector Source Probability Vector by correlating signals from: Mic 1 Mic 2 Mic 1 Mic 3 Mic 1 Mic 4 Mic 2 Mic 3 Mic 2 Mic 4 Mic 3 Mic 4 SUM PROBABILITIES Find Candidate Location with Highest Probability Estimated θ, φ
AUDIO PROCESSING 2 Mapping audio results onto image frame: • Azimuth → Column • Elevation → Row Top View of Compact Array with Camera at centre X 3 1 4 Y Side View of Compact Array with Camera at centre 4 2 Camera field of view
SYSTEM ARCHITECTURE VIDEO CAPTURE (Logitech QuickCam web camera) AUDIO CAPTURE (Delta44 sound card, Microphone element, Pre-amplifier) MEETING CAPTURE SYSTEM
CPU SYSTEMBLOCK DIAGRAM M-Audio Sound Card (PCI Slot) USB Port Pre-Amplifier Pre-Amplifier Microphone 1 Microphone 4 Web Camera Microphone 2 Microphone 3 MEETING AREA
SYSTEM ARCHITECTURE Software architecture consists of five classes: • Meeting Analysis • Capture Direct Show • Capture M-Audio • Fidget Detector • Acoustic Localizer
FRAMES 192 3600 3837 4255 SHORT-TERM HISTOGRAM LONG-TERM HISTOGRAM
FRAMES 80 1425 2851 3583 SHORT-TERM HISTOGRAM LONG-TERM HISTOGRAM
Frames 63, 217, 1440 Frames 2396, 2821, 2963 HEIGHT ESTIMATION
ACOUSTIC LOCALIZATION & FIDGET DETECTION Frames 1613, 1813, 2217, 2648 Frames 11, 139, 812, 1477
ACTUAL RESULTS FILTERED ACOUSTIC LOCALIZATION RESULTS
CONCLUSION • New approach to meeting analysis. • Meeting capture using off-the-shelf microphones and web cameras. • Fidget detection - no clean background image required. • Fast Bayesian acoustic localization.
FUTURE WORK • Omni-directional camera system. • Tracking participants. • Mapping audio results to image frame. • User interface design.
ACKNOWLEDGEMENTS • Dr. Stanley T. Birchfield (Adviser) • Dr. John N. Gowdy (Committee) • Dr. Stephen J. Hubbard (Committee) • Miheer Gurjar and Prashanth Govindaraju – Experiments.