1 / 65

Rating Online Content (Movies)

Your Reactions Suggest You Liked the Movie Automatic Content Rating via Reaction Sensing Xuan Bao , Songchun Fan , Romit Roy Choudhury , Alexander Varshavsky , Kevin A. Li . Rating Online Content (Movies). Manual rating not incentivized, not easy … does not reflect experience.

garry
Download Presentation

Rating Online Content (Movies)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Your Reactions Suggest You Liked the MovieAutomatic Content Rating via Reaction SensingXuanBao, Songchun Fan, Romit Roy Choudhury, Alexander Varshavsky, Kevin A. Li

  2. Rating Online Content (Movies) Manual rating not incentivized, not easy … does not reflect experience

  3. Our Vision Reaction-based highlights Reaction tags Overall star rating

  4. Our Vision Reaction-based highlights Reaction tags Overall star rating Automatically

  5. Reactions / Ratings Key Intuition Multi-modal sensing / learning 02:43 - Action 09:21 - Hilarious 12:01 - Suspense …… Overall – 5 Stars

  6. Specific Opportunities • Visual • Facial expressions, eye movements, lip movements … • Audio • Laughter, talking • Motion • Device stability • Touch screen activities • Fast forward, rewind, checking emails and IM chats … • Cloud • Aggregate knowledge from others’ reactions • Labeled scores from some users

  7. Pulse: System Sketch

  8. Applications (Beyond Movie Ratings) • Annotated movie timeline • Slide forward to the action scenes • Platform for ad analytics • Assess which ads grabbing attention … • Customize ads based on scenes that user reacts to • Personalized replays and automatic highlights • User reacts to specific tennis shot, TV shows personalized replay • Highlights of all exciting moments in the superbowl game • Online video courses (MOOCs) • May indicate which parts of lecture needs clarification • Early disease symptom identifcation • ADHD among young children, and other syndromes

  9. First Step: A Sensor Assisted Video Player

  10. Developed on Samsung Galaxy tablet (Android) • Sensor meta-data layered on video as output Pulse Media Player Media player control functions monitored Observe the user from front cam Sensing threads control

  11. Basic Design Data Distillation Process Tag Cloud Final Rating English Adjectives Numeric Rating Reaction to Rating & Adjective (R2RA) Reactions: Laugh, Giggle, Doze, Still, Music … Signals to Reactions (S2R) Features from Raw Sensor Readings Microphone, Camera, Acc, Gyro, Touch, Clicks

  12. Basic Design Cloud Data Distillation Process Tag Cloud Final Rating English Adjectives Numeric Rating Reaction to Rating & Adjective (R2RA) Reactions: Laugh, Giggle, Doze, Still, Music … Signals to Reactions (S2R) Features from Raw Sensor Readings Microphone, Camera, Acc, Gyro, Touch, Clicks

  13. Visual Reactions • Facial expressions (face size, eye size, blink, etc.) • Track viewers’ face through the front camera • Track eye position and size (challenging with spectacles) • Track partial faces (via SURF points matching) Partial Face Face Tracking Eye Tracking (Green) Blink (Red)

  14. Visual Reactions • Facial expressions (face size, eye size, blink, etc.) • Track viewers’ face through the front camera • Track eye position and size (challenging with spectacles) • Track partial faces (via SURF points matching) • Detect blinks, lip size Look for difference between frames

  15. Acoustic Reactions • Laughter, Conversation, Shout-outs … • Cancel out (known) movie sound from recorded sound • Laughter detection, conversation detection Even with knowledge of the original movie audio (Blue), it is hard to identify user conversation (distinguish Red and Green)

  16. Acoustic Reactions • Separating movie from user’s audio • Spectral energy density comparison not adequate • Different techniques for different volume regimes Low Volume High Volume

  17. Acoustic Reactions • Laughter, Conversation, Shout-outs … • Cancel out (known) movie sound from recorded sound • Laughter detection, conversation detection Early results demonstrate promise of detecting acoustic reactions

  18. Motion Reactions • Reactions also leave footprint on motion dimensions • Motionless during intense scene • Fidget during boredom Calm scene Intense scene Time to stretch

  19. Motion Reactions • Reactions also leave footprint on motion dimensions • Motionless during intense scene • Fidget during boredom

  20. Motion Reactions • Reactions also leave footprint on motion dimensions • Motionless during intense scene • Fidget during boredom Motion readings correlate with changing in ratings …

  21. Motion Reactions • Reactions also leave footprint on motion dimensions • Motionless during intense scene • Fidget during boredom Motion readings correlate with changing in ratings … Timing of motions also correlate with timing of scene changes

  22. Extract Reaction Features – Player control Collect users’ player control operations Pause, fast forward, jump, roll back, … All slider movement Seek bar

  23. Challenges in Learning

  24. Problem – A Generalized Model Does Not Work Directly trained model does not capture the rating trend Why?

  25. The Reason it Does Not Work is … • Human behaviors are heterogeneous • Users are different • Environments are different even for same user (home vs. commute) commute home Sensed motion patterns very different when the same movie wateched during a bus commute vs. in bed at home.

  26. The Reason it Does Not Work is … • Human behaviors are heterogeneous • Users are different • Environments are different even for same user (home vs. commute) • Gyroscope readings from same user (at home and office)

  27. The Reason it Does Not Work is … • Human behaviors are heterogeneous • Users are different • Environments are different even for same user (home vs. commute) • Gyroscope readings from same user (at home and office) • Naïve solution  build specific models one by one • Impossible to acquire data for all <User, Context, Movie> tuples … Office Home Commute

  28. Challenges in Learning Approach: Bootstrap from Reaction Agreements

  29. Approach: Bootstrap from Agreement • Thoughts • What behavior means positive/negative for a particular setting • How do we acquire data without explicitly asking the user every time • Approach: Utilize reactions that most people agree on Climax Boring Time Cloud Knowledge (Other users’ opinions) Sensor Reading Ratings

  30. Approach: Bootstrap from Agreement • Solution: spawn from consensus • Learn user reactions during the “climax” and the “boring” moments • Generalize this knowledge of positive/negative reactions • Gaussian process regression (ratings) and svm (labels) GPR SVM

  31. Evaluation

  32. User Experiment Setting • 11 participants watch preloaded movies (~50 movies) • 2 comedies, 2 dramas, 1 horror movie, 1action movie • Users provide manual ratings and labels • For ground truth • We compare Pulse’s ratings with manual ratings

  33. Preliminary Results – Final (5 Star) Rating

  34. Preliminary Results – Final (5 Star) Rating Difference with true 5 star manual rating

  35. Preliminary Results – Myth behind the Error Final ratings can deviate significantly from the average segment ratings User-given scores may not be linearly related to quality

  36. Preliminary Results – Lower Segment Rating Error Final ratings come from averaging segment ratings Our system outperforms other methods Per-segment ratings 3 4 4 2 2 2 5 Mean Error (5-point scale) Random ratings Collaborative filtering Our system

  37. Preliminary Results – Better Tag Quality Tags capture users’ feelings better than SVM alone Intense Warm Happy Intense Warm Happy

  38. Preliminary Results – Reasonable Energy Overhead Reasonable energy overhead compared to without sensing More tolerable on tablets. May need duty-cycling on smart phones

  39. Closing Thoughts • Human reactions are in the mind • However, manifest into bodily gestures, activities • Rich, multi-modal sensors on moble devices • A wider net for “catching” these reactions • Pulse is an attempt to realize this opportunity • Distilling semantic meanings from sensor streams • Rating movies … tagging any content with reaction meta data • Enabler for • Recommendation engines • Content/video search • Information retrieval, summarization

  40. Thoughts?

  41. Backup – potential questions • Privacy concern • Like every technology, pulse may attract early adoptors • If only final ratings are uploaded, the privacy level is similar to current ratings • Why not just emotion sensing/just laughter detection • Emotion sensing is a broad and challenging problem…but the goal is different than ours (rating)… • Explicit signs like laughter usually only account for a small duration of movie viewing, we need to explore other opportunities (motion) • Our approach takes advantage of the specific task – 1. we know the user is watching a movie 2. we can observe the user for a longer duration (than most emotion sensing work) 3. we know other users’ opinions • How is this possible…human mind is too complex • Human thoughts are complicated… but they may produce footprints in behaviors • Using collaborative filtering explicitly uses knowledge of other users’ thoughts to bootstrap our algorithm • The sample size is small…only 11 users • The sample size is limited, but • Each user watched multiple movies (50+ movies viewed)… segment ratings are for 1-minute segments (thousands of points) • Collaborative filtering shows that even within this data set, the ratings can diverge and naïve solution does not work as well as ours

  42. Preliminary Results – Better Retrieval Accuracy Viewers care more about the highlights of a movie Find the contribution by using sensing Gain Additional error Overall achieved performance Total goal

  43. Challenges in Learning

  44. Problem – A Generalized Model Does Not Work • Directly trained model does not capture the rating trend Why?

  45. The Reason it Does Not Work is … • Human behaviors are heterogeneous • Users are different • Environments are different (e.g., home vs. commute) commute home Sensed motion patterns very different when the same movie wateched during a bus commute vs. in bed at home.

  46. The Reason it Does Not Work is … • Human behaviors are heterogeneous • Users are different • Environments are different (e.g., home vs. commute) • Impact of sensor readings  histograms

  47. The Reason it Does Not Work is … • Human behaviors are heterogeneous • Users are different • Environments are different (e.g., home vs. commute) • Impact on sensor readings  histograms • Naïve solution  build specific models one by one • Impossible to acquire data for all <User, Context, Movie> tuples … Office Home Commute

  48. Challenges in Learning Approach: Bootstrap from Reaction Agreements

  49. Approach: Bootstrap from Agreement • Thoughts • What behavior means positive/negative for a particular setting • How do we acquire data without explicitly asking the user every time • Approach: Utilize reactions that most people agree on Climax Boring Time Cloud Knowledge (Other users’ opinions) Sensor Reading Ratings

  50. Approach: Bootstrap from Agreement • Solution: spawn from consensus • Learn user reactions during the “climax” and the “boring” moments • Generalize this knowledge of positive/negative reactions • Gaussian process regression (ratings) and svm (labels) GPR SVM

More Related