150 likes | 185 Views
Video Shot Boundary Detection at RMIT University. Timo Volkmer, Saied Tahaghoghi, and Hugh E. Williams School of Computer Science & IT, RMIT University {tvolkmer, saied, hugh}@cs.rmit.edu.au. Overview. Our general approach The moving query window Details of the approach
E N D
Video Shot Boundary Detectionat RMIT University Timo Volkmer, Saied Tahaghoghi, and Hugh E. Williams School of Computer Science & IT, RMIT University {tvolkmer, saied, hugh}@cs.rmit.edu.au
Overview • Our general approach • The moving query window • Details of the approach • How we measure frame similarity • Improvements for 2004 cut detection • Detection of gradual transitions • Evaluation • Experimental results • Conclusions
Pre Frames Current Frame Post Frames The Moving Query Window • A moving query window consists of two equal-sized half windows, surrounding a current frame • The moving query window is advanced through the video frame-by-frame • Cut detection and gradual transition detection is performed with separate decision stages during a single pass
Frame feature representation • We use one-dimensional, localised histograms with 4x4 regions in the HSV colour space (16 bins per colour component) • A colour histogram represents each frame region. Corresponding regions are compared • Different weights can be applied to each region during comparison
Cut detection • We disregard the four central regions of each frame to avoid the effect of rapid activity (that is, their weight = 0) • Using the remaining regions, each frame in the moving window is ranked by decreasing similarity to the current frame • Frame similarity is the sum of the inter-region similarities • The number of pre-frames that are ranked in the top half of the rankings is monitored • When a cut is passed, the number of top ranked pre-frames (usually) rises to a maximum and falls to a minimum within a few frames • We have determined an optimum window size and optimum thresholds that are effective for all our training sets • Our cut detection is (now) parameter free
Gradual transition detection • Pre-frames and post-frames are combined into two distinct sets of frames. The average distance of each set to the current frame is computed • We use all frame regions (with identical weights) • The ratio between the pre-frame set distance and the post-frame set distance, the PrePostRatio, is monitored • The end of most gradual transitions is indicated by a peak in the PrePostRatio curve • We maintain a moving average PrePostRatio for calculating a dynamic threshold to detect transitions • As a final decision step, we require a minimum difference between the last frame of the previous shot and the first frame of the new shot
PrePostRatio in detail • A schematised dissolve between a shot A and a shot B: • The PrePostRatio is usually minimal at the beginning of a gradual transition and rises up to a maximum at the end of the transition
PrePostRatio curve example • The curve shows two short gradual transitions and two cuts within a range of 1000 frames
Training and Evaluation • We have trained on the TRECVID 2003 shot boundary test set • Main parameters for gradual transition detection are • The query window size • The size of the history buffer for dynamic thresholding • A threshold level factor • Results are discussed on the next slides. (We achieve similar and better results on the 2002 and 2001 test sets in blind runs.)
Discussion • Cut detection is highly effective • This year, recall is 94% and precision is 92%. Improvements from 2003 due to ignoring centre region • Gradual detection has improved significantly since 2003: • Recall now between 68%--85%, precision 67%--84% • High detection threshold favours precision, low favours recall • Short detection threshold history length was found to be preferable • Final decision step reduces false positives • For television news, we are able to use a fixed moving query window size of 24 frames • Experimented with a simple ASR technique in 10 additional runs, which removed detected transitions that coincided with spoken words. Ad hoc, very unsuccessful…
Conclusions • Disregarding the focus area of frames for cut detection has improved our results by 3% in recall and 9% in precision • Our parameter-free ranking scheme is highly effective in cut detection on a wide variety of footage • Our gradual transition detection method is relatively simple and needs only few parameters • The additional, final preprocessing step reduces false positives and improved results significantly • The use of localised histograms and more dynamic thresholding also improved results in gradual transition detection • Our approach is computationally inexpensive, simple to implement, and effective • 15,500 seconds to process the video (around 4 hours, 18 minutes)