190 likes | 206 Views
Detection of Illicit Content in Video Streams. Niall Rea & Rozenn Dahyot. http://www.pixalert.com/. What? Why?. By illicit content, we mean pornographic material Applications: Kid protection (Parental control) Company protection (company computers scanning) Pedophilia (cyber surveillance).
E N D
Detection of Illicit Content in Video Streams Niall Rea & Rozenn Dahyot http://www.pixalert.com/
What? Why? • By illicit content, we mean pornographic material • Applications: • Kid protection (Parental control) • Company protection (company computers scanning) • Pedophilia (cyber surveillance)
State of the Art • Text Analysis (Internet Filtering) • Filenames • Text surrounding images • Known URLs • Image analysis • Skin detection • Geometrical constraints + Orientation • Face localisation • Video analysis • Current approach: keyframe extraction at regular time intervals and still image analysis
Extension to Illicit Video Detection • Using Video information • Color • Texture • Motion • Using Audio information • Audio energy (loudness)
Video Analysis • Considered 2 approaches exploiting features from the partially decoded MPEG video stream • Smart keyframe selection for real time performance based on macroblock type • Exploiting motion vectors for periodic motion detection • Optimised open-source ffdshow decoder (extraction of compressed domain motion features from MPEG-1/2/4) • Poesia filter for skin colour detection
n-3 n-2 n Shot Cut detection • Shot cut detection based on ratio of macroblock types of consecutive inter-coded frames in a sub-GOP • (e.g. shot cut occurs between the first reference frame and the first B-frame) • Macroblocks in both B-frames will be heavily backward predicted (indicated by the heavier arrows). A shot cut is deemed to have occurred if and n-1 I B B P Shot cut
Motion extraction • Compressed (MPEG motion vectors) • Background/ global motion compensation (Coudray 2004) • 4 parameter motion model • Calculate zoom • Calculate translation • Assume global motion only occurs in non-skin and reasonably high texture areas • Compute a 2D histogram of those motion vectors • Global translation is the mode of the histogram
Motion fields When Harry met Sally Illicit video
Motion and color segmentation Likelihoods computed from Poesia 32^3 bin RGB skin/non skin histograms Priors set empirically Assume a simple local homogeneous motion field and global homogeneous motion k-means clustering (2 clusters) of motion field
Audio stream • Content recognition using Audio data • Sport • Specificity of illicit: (pseudo) periodicity • Simple feature used: loudness (Audio energy) • Does not discriminate between different sources of noise (ie voices, specific sounds, etc.) • Capture the dominant pattern of the audio data
Audio stream Illicit Audio: Scene of When Harry met Sally
Loudness Audio energy computed over a 40ms (duration of an video frame at 25 fps) Sally is faking it (5s) Harry and Sally are talking (5s)
Periodicity Signal: s(t)=1+sin(t) Signal: s(t)=rand(t) Autocorrelation:
Periodicity of audio energy Correlation of the loudness computed over 5s Sally is faking it Harry and Sally are talking
Measure of periodicity in audio data Measure of periodicity: difference between the surface defined by the maxima and the minima Harry and Sally are talking Sally is faking it
Measure of periodicity in audio data Measure of periodicity over the sequence ‘When Harry met Sally’
Other results • False alarm rate assessed on 20minutes of non-illicit material (scenes from movies and music videos) • Detection rate assessed on 10 minutes of 8 illicit materials FA: 2% DR: 5 extracts (~9minutes of recordings) are flagged as illicit 3 (~minutes recordings) are missed
Future work: Motion and periodicity • Periodic Motion on P-frames • Mean motion vector over skin regions • Correlation between periodic motion and audio?