340 likes | 561 Views
Unsupervised Modelling , Detection and Localization of Anomalies in Surveillance Videos. Project Advisor : Prof. Amitabha Mukerjee Deepak Pathak (10222) Abhijit Sharang (10007). What is an “Anomaly” ?. Anomaly refers to the unusual (or rare event) occurring in the video
E N D
Unsupervised Modelling , Detection and Localization of Anomalies in Surveillance Videos Project Advisor : Prof. AmitabhaMukerjee Deepak Pathak (10222) AbhijitSharang (10007)
What is an “Anomaly” ? • Anomaly refers to the unusual(or rare event) occurring in the video • Definition is ambiguous and depends on context Idea : • Learn the “usual” events in the video and use the information to tag the rare events.
Step 1 : Unsupervised Modelling • Model the “usual” behaviour of scene using parametric bayesianmodelling. • Topic Models : Leveraged from Natural Language Processing • Given: Document and Vocabulary • Document is histogram over vocabulary • Goal: Identify topics in a given set of Documents [Topics are latent variables]Alternate view : • Clustering in topic space • Dimensionality reduction
Video Clips (or Documents) • 45 minute video footage of traffic available • 25 frames per second • 4 kinds of anomaly • Divided into clips of fixed size of 4 seconds (obtained empirically last semester)
Feature Extraction • Three components of visual word : • Location • Spatio-Temporal Gradient and Flow Information • Object size • Features are extracted only from foreground pixels for increasing the efficiency
Foreground Extraction • Extracted using ViBe foreground algorithm and smoothened afterwards using morphological filters
Visual Word • Location : • Each frame of dimension m x n is divided into blocks of 20 x 20 • HOG - HOF descriptor : • For each block, a foreground pixel was selected at random and spatio-temporal descriptor was computed around it. • From the descriptors obtained from the training set, 200,000 descriptors were randomly selected. 20 cluster centres were obtained from these descriptors by k-means clustering. • Each descriptor was assigned to one of these centres. • Size : • In each block , we compute the connected components of the foreground pixels • The size of the connected components is quantised to two values: large and small
pLSA : Topic Model • Fixed number of topics : . Each word in the vocabulary is attached with a single topic. • Topics are hidden variables. Used for modelling the probability distribution • Computation • Marginalise over hidden variables • Conditional independence assumption: p(w|z) and p(d|z) are independent of each other
Step 2 : Detection • We propose “Projection Model Algorithm” with the following key idea – Project the information learnt in training onto the test document word space, and analyze each word individually to tag it as usual or anomalous. • Robust to the quantity of anomaly present in video clip.
Preliminaries • Bhattacharyya Distance between documents : • For documents and represented by the probability distributions in topic space and respectively, the distance is defined by • Cumulative histogram of m documents: • Ahistogram obtained by stacking the word count histogram of the mdocuments. • Spatial neighbourhood of a word : • For a word at location , all words at locations , and with the same flow and size quantisation • Significant distribution of neighbourhood word : The distribution of a word is significant if its frequency in the cumulative histogram is greater than a threshold
Test document Bhattacharya distance m nearest training documents word Check Frequency Cumulative histogram of words Eight Spatial neighbours of word More than neighbours have significant distribution Word occurs more than times Word is “Usual”
Detection : • Now each visual word has been labelled as “anomalous” or “usual”. • Depending on the amount of anomalous words, call the complete test document as anomalous or usual.
Step 3 : Localization • Spatial Localization : Since every word has location information in it, w can directly localize the anomalous words in test document to their spatial locality. • Temporal Localization :This requires some book-keeping while creating term-frequency matrix of documents. We could maintain a list of frame numbers corresponding to document-word pair.
Results Demo • Anomaly detection • Anomaly localization
Main Contributions • Richer word feature space by incorporating local spatio-temporal gradient-flow information. • Proposed “projection model algorithm” which is agnostic to quantity of anomaly present. • Anomaly Localization in spatio-temporal domain. • Other Benefit : Extraction of common actions corresponding to most probable topics.
References • Varadarajan, Jagannadan, and J-M. Odobez. "Topic models for scene analysis and abnormality detection." Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on. IEEE, 2009. • Niebles, Juan Carlos, Hongcheng Wang, and Li Fei-Fei. "Unsupervised learning of human action categories using spatial-temporal words." International Journal of Computer Vision 79.3 (2008): 299-318. • Olivier Barnich and Marc Van Droogenbroeck. “Vibe: A universal background subtraction algorithm for video sequences”. Image Processing, IEEE Transactions on, 20(6):1709-1724, 2011. • Mahadevan, Vijay, et al. "Anomaly detection in crowded scenes." Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010. • Roshtkhari, MehrsanJavan, and Martin D. Levine. "Online Dominant and Anomalous Behavior Detection in Videos.“ • Ivan Laptev, MarcinMarszalek, CordeliaSchmid, and Benjamin Rozenfeld. “Learning realistic human actions from movies”. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1-8. IEEE, 2008. • Hofmann, Thomas. "Probabilistic latent semantic indexing." Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1999. • Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." the Journal of machine Learning research 3 (2003): 993-1022.
Summary (Last Semester) • Related Work • Image Processing • Foreground Extraction • Dense Optical Flow • Blob extraction • Implementing adapted pLSA • Empirical estimation of certain parameters • Tangible Actions/Topics Extraction
Extra Slides • About • Background subtraction • HOG HOF • pLSA and its EM • Previous results
Background subtraction • Extraction of foreground from image • Frame difference • D(t+1) = | I(x,y,t+1) – I(x,y,t) | • Thresholding on the value to get a binary output • Simplistic approach(can do with extra data but cannot miss any essential element) • Foreground smoothened using median filter
Optical flow example (a) Translation perpendicular to a surface. (b) Rotation about axis perpendicular to image plane. (c) Translation parallel to a surface at a constant distance. (d) Translation parallel to an obstacle in front of a more distant background. Slides from Apratim Sharma’s presentation on optical flow,CS676
Optical flow mathematics • Gradient based optical flow • Basic assumption: • I(x+Δx,y+Δy,t+Δt) = I(x,y,t) • Expanded to get IxVx+IyVy+It = 0 • Sparse flow or dense flow • Dense flow constraint: • Smoothness : motion vectors are spatially smooth • Minimise a global energy function
pLSA : Topic Model • Fixed number of topics : . Each word in the vocabulary is attached with a single topic. • Topics are hidden variables. Used for modelling the probability distribution • Computation • Marginalise over hidden variables • Conditional independence assumption: p(w|z) and p(d|z) are independent of each other
EM Algorithm: Intuition • E-Step • Expectation step where expectation of the likelihood function is calculated with the current parameter values • M-Step • Update the parameters with the calculated posterior probabilities • Find the parameters that maximizes the likelihood function
EM in pLSA: E Step • It is the probability that a word w occurring in a document d, is explained by aspect z (based on some calculations)
EM in pLSA: M Step • All these equations use p(z|d,w) calculated in E Step • Converges to local maximum of the likelihood function