460 likes | 588 Views
MoVi : Mobile Phone based Video Highlights via Collaborative Sensing. Xuan Bao Romit Roy Choudhury. Context. Next generation smart phones will have large number of sensors Cameras, microphones, accelerometers, GPS, compasses, health monitors, …. Mobile phone can be regarded as a
E N D
MoVi: Mobile Phone based Video Highlights via Collaborative Sensing Xuan Bao Romit Roy Choudhury
Context Next generation smart phones will have large number of sensors Cameras, microphones, accelerometers, GPS, compasses, health monitors, …
Mobile phone can be regarded as a window of views Retrieve views from the physical world
Abundance of electronic information Which view is of “interest”?
Information distillation is a broad topic To get a handle, we narrow down the problem space Can phones create a highlight movie of a social gathering without human intervention?
MoVi Goal • Envisioning the end product • Imagine a social party of the future • Assume phones are wearable • GOAL: Create 10-min movie highlights without human intervention • The Idea: • Mobile phones sense ambience • Collaboratively infer an “interesting event” • Select phone with good view of the event • Stitch the recorded clips to form the highlights Nokia Morph Apple IPod Nano Microsoft SenseCam 9
The problem is analogous to “event monitoring” in conventional sensor networks Except, events are social … And occur over multiple sensing dimensions
MoVi: Mobile Video Highlights • Introduction • Challenges • System Design • Experiment Results • Limitation and Ongoing Work
1. Who collaborates with whom? • Identifying Social Groups • Dynamic over time, not necessarily spatial Collaboration Group
Notion of “interesting events” is subjective Need patterns … training … or a rule book 2. What is interesting?
View Selection Need to select best view from available cameras 3. Which camera angle to choose? Face sky Good Blur Blocked
4. When did the event happen and end? • Need to select the right time span of an event • Clues may come after start of event • Require rewinding to logical start of event Joe talks Alice jokes Bob talks Sense Laughter Recording should start here
MoVi: Mobile Video Highlights • Introduction • Challenges • System Design • Experiment Results • Limitation and Ongoing Work
MoVi Architecture Phones are grouped according to ambience Identify multi-modal event triggers Select video with the best view Select the time span for events
Who collaborates with whom? Group Management 18
Group management Visual Acoustic Ambience Ringtone View Similarity Light
Acoustic: Ringtone High frequency ringtone Like a wireless beacon … measures distance to transmitter Grouping based on overhearing
Acoustic: Ambience Grouping based on ambient sound correlation Use MFCC (Mel-Frequency Cepstral Coefficients) as features Classify using SVM Couples together most times Few misclassifications
Visual: View Similarity & Light Grouping based on view similarity Exploiting spatiograms (spatial + color histogram) Originally used to track objects Classify based on light Intensity 3 simple buckets
Identifying interesting social events Group Management Event Detection 23
Event Detection Specific signature Laughter Group behavior View similarity Ambience fluctuation Group rotation Neighbor assistance
Specific Event Signature - Laughter Specific event signature: laughter Detected using MFCC and SVM over audio training set Detecting laughter happening time
Group Behavior - View Similarity Detecting people paying unusual attention to the same object
Group Behavior - Ambience Fluctuation Detecting burst of sound, fluctuation in accelerometer reading, unusual change of light…
Neighbor Assistance Neighbor assistance If human express interest, phones can follow Phones taking pictures will send out signals Brings human into the loop Human choices are given priority
Which camera view to use? Group Management Event Detection View Selector 29
View Selection Face count Accelerometer ranking Human assistance Light intensity ranking
View Selection Four heuristics (1) Face count: often interesting to humans (2) Accelerometer ranking: stable camera (3) Light intensity: rule out blocked views (4) Human in the loop: manually taken pictures are better Good view
When did the event start … end? Group Management Event Detection View Selector Event Segmentation 32
Event Segmentation Classify sound states (voice gender, music, pitch …) Search for distinct transitions before/after the trigger Joe talks Alice jokes Bob talks Sense Laughter
MoVi: Mobile Video Highlights • Introduction • Challenges • System Design • Experiment Results • Limitation and Ongoing Work
Field Experiments Field experiments Real social gatherings Thanksgiving party SmartHome tour All system features Thanksgiving Duke Smart Home
Field Experiments Experiment Set Up 5 students taped iPod Nano on shirt pocket Carried Nokia N95 phones on belt clip 1 dedicated video camera recorded entire party (2 hours) Offline Evaluation Automatic highlights (20 min) Manually created highlights Evaluate overlap
Zoom In View MoVi Human
Thanksgiving Party MoVi selected Non-Overlap Human selected Captured
SmartHome Tour MoVi selected Non-Overlap Human selected Captured
Metrics Thanksgiving: 38% SmartHome: 31% Thanksgiving: 39% SmartHome: 48% Thanksgiving: 21% SmartHome: 23%
MoVi: Mobile Video Highlights • Introduction • Challenges and Solutions • System Design • Experiment Results • Limitation and Ongoing Work
Limitations MoVi in very early stage Limited trigger space … sensor set Difficult to infer “socially interesting” Battery, Privacy, computation power Ongoing work Exploring larger set of triggers More sensors (virtual sensors) Energy efficiency of phones Possibility to combine with other devices Wall mounted cameras, webcams, …
Take Away Sensing, Computing, Communications … converging on the mobile platform Future will allow users to zoom into the world and look at it at much higher resolution …. However, lets not take this for granted. Our lives already have excessive information … Let’s not add more noise.
Questions? Thank You! Visit the SyNRG research group @ http://synrg.ee.duke.edu/
Controlled Experiment Artificial social gathering of students 5 students taped phones on shirt pocket Gathered in a group Watching movies Playing video games Triggers used to select “interesting” clips Mainly to test triggers performing correctly
Ctrled. Exp. Results Trigger detection Event name Time of occurrence Effective trigger Time of detection