490 likes | 784 Views
Action Recognition Robust to Occlusion U sing an Efficient Part-Based Approach. 強健於遮蔽之一種 有效率 組件式動作辨識. 學 生:蔡日昇 指導 教授:傅立成 教授. 1. Outline. Introduction Part-Based Representation Action Recognition Experiments Conclusion & Future Work. 2. Outline. Introduction
E N D
Action Recognition Robust to Occlusion Using an Efficient Part-Based Approach 強健於遮蔽之一種有效率組件式動作辨識 學 生:蔡日昇 指導教授:傅立成 教授 1
Outline • Introduction • Part-Based Representation • Action Recognition • Experiments • Conclusion & Future Work 2
Outline • Introduction • Part-Based Representation • Action Recognition • Experiments • Conclusion & Future Work 3
Introduction ─ Background • Human action recognition has become popular with many applications • Human robot interaction • Surveillance • Video games • Natural way is preferablein 10 years intrusive cumbersome natural 4
Introduction ─ Motivation • Human action recognition in complex environment • Occlusion is an important issue for vision-based system • Occlusion is a challenging problem[1─3] [1] S. Vishwakarma and A. Aggarwal, “A Survey on Activity Recognition and Behavior Understanding in Video Surveillance,” The Visual Computer, 2012. [2] J. Aggarwal and M. S. Ryoo, “Human activity analysis: A review,” ACM Computing Surveys (CSUR), 2011. [3] Weinlandet al. “A survey of vision-based methods for action representation, segmentation and recognition,” Computer Vision and Image Understanding, 2010. 5
Introduction ─ Challenges • Occlusion can be categorized into • Self occlusion • Partial occlusion • Temporary complete occlusion Self occlusion Partial occlusion Temporary complete occlusion 6
Introduction ─ Related Work * T.C.: Temporary complete O: Yes ─: No ∆: Not mentioned [1] Keet al. “Event detection in crowded videos,” IEEE International Conference onComputer Vision, 2007. [2] Ahadet al. “Analysis of motion self-occlusion problem due to motion overwriting for human activity recognition,” Journal of Multimedia, 2010. [3] Weinlandet al. “Making action recognition robust to occlusions and viewpoint changes,” European Conference on Computer Vision, 2010. [4] Wang et al. “Robust 3d action recognition with random occupancy patterns,” European Conference on Computer Vision, 2012. 7
Introduction ─ Objective • Build an efficient vision-based action recognition system • Accurately recognize human actions • Robustly handle occlusion • Design an efficient part-based approach • Correctly spot continuous actions 8
Introduction ─ System Overview Off-line training phase Part-Based Representation Part-Based Representation SVM Training SVM Training Action Recognition Action Recognition RGB Image Sequence Action Spotting Action Spotting Action Classifiers Sub-action Classifiers Sub-action Classifiers Action Classifiers Action Prior Database Action Prior Database Depth Image Sequence Result Result Part-Based Representation Part-Based Representation On-line testing phase 9
Outline • Introduction • Part-Based Representation • Action Recognition • Experiments • Conclusion & Future Work 10
Part-Based Representation─ Flowchart t x y RGB Image Sequence Preprocessing Temporal-Pyramid BoW Part Assignment Feature Extraction Depth Image Sequence BoW: Bag-of-Words 11
Part-Based Representation─ Preprocessing & Feature Extraction Spatio-Temporal Interest Point[1] Human Segmentation Noise Removal ▪ To eliminate features out of human segment ▪ Local feature is attractive for handling occlusion ▪ Shape and motion are two important features of an action ▪ To remove complex background ▪ To detect occlusion Non-occlusion Occlusion 12 [1] Laptevet al. “Learning realistic human actions from movies,” IEEE Conference onComputer Vision and Pattern Recognition, 2008.
Part-Based Representation─ Part Assignment (1/3) • Define part based on skeleton provided by OpenNI • With physical meaning • Take within-class variation into consideration ■ Head ■ Torso ■ Hand ■ Foot 13
Part-Based Representation─ Part Assignment (2/3) • Some joints of skeleton are vulnerable when there is occlusion • Every pair of parts are independentin ourdefinition Out of camera view Valid joints Self occlusion Unreasonable depth ■ Head ■ Torso ■ Hand ■ Foot Self occlusion Partial occlusion 14
Part-Based Representation─ Part Assignment (3/3) • Every feature is assigned into one part in a nearest-neighbor scheme : set of valid joints : Euclidean distance function : 2D position of feature : 2D position of joint : part label of joint ■ Head ■ Torso ■ Hand ■ Foot Non-occlusion Occlusion 15
Part-Based Representation─ Action Representation t x • An action is represented by a set of RGB-D Temporal-Pyramid BoWs[1] y BoW: Bag-of-Words RGB-D Temporal-Pyramid BoW Global Part Assignment RGB-D Temporal-Pyramid BoW Part 1 … … RGB-D Temporal-Pyramid BoW Part k [1] J.-S. Tsai and L.-C. Fu,“An Efficient Part-Based Approach to Action Recognition from RGB-D Video with BoW-Pyramid Representation,” IEEE/RSJ International Conference onIntelligent Robots and Systems, 2013. 16
Part-Based Representation─ Temporal-Pyramid Bag-of-Words (1/2) • Bag-of-Words (BoW) • Independent of number of features • Lose temporal layout 1. Generate codebook during the training phase 3. Count the frequency of each feature type 2. Represent each feature by its type Frequency Feature type A given sample Training features 17
Part-Based Representation─ Temporal-Pyramid Bag-of-Words (2/2) x t • Temporal-Pyramid BoW • Can distinguish actions with reversed temporal orders y RGB image sequence Depth image sequence … … … … Temporal Level 0 Temporal Level 1 Temporal Level L-1 RGB-D Temporal-Pyramid BoW t 18
Outline • Introduction • Part-Based Representation • Action Recognition • Experiments • Conclusion & Future Work 19
Action Recognition─ Training (1/2) • Two kinds of SVMs are trained for each class • A global SVM • A local SVM for each part • Each SVM is trained in two stages[1] SVM: Support Vector Machine Step 2: Re-train a new version of SVM Step 1: Train the preliminary SVM Hard Examples Training Examples SVM SVM Training Database Search for false positive examples in training database [1]N. Dalal and B. Triggs, “Histograms of Oriented gradients for human detection,” IEEE Conference onComputer Vision and Pattern Recognition, 2005. 20
Action Recognition─ Training (2/2) • Action prior database is constructed to help action recognition • Some actions are much associated with particular parts : a training example : features belonging to part : small positive constant ■ Head ■ Torso ■ Hand ■ Foot Boxing Training Database Running : action class : number of examples labeled asclass in Kicking 21
Action Recognition─ Action Spotting (1/2) • Starting frame detection Start Get next window Sliding window-based approach Sub-action Classifiers Action start? No Yes 22
Action Recognition─ Action Spotting (2/2) • Ending frame detection Get next frame Action Classifiers Action end? Yes New candidate Sequential-based approach No No Time buffer is filled? Yes Winner selection End 23
Action Recognition─ Action Recognition (1/2) • Recognition score is computed for each class Action Prior Database Action Prior Information Global (Part 0) RGB-D Temporal- Pyramid BoW RGB-D Temporal- Pyramid BoW RGB-D Temporal- Pyramid BoW SVM Score Part 1 Recognition Score Part Assignment Weighted Sum SVM Score … Part k : an example : action class SVM Score 24
Action Recognition─ Action Recognition (2/2) • Weight is associated with degree of occlusion • Recognize actions mainly by less occluded parts • Action is recognized as the class with maximal recognition score : number of joints composing part : number of valid joints belonging to part at frame : number of parts : ratio of valid joints over all joints belonging to part in a time segment : action class : recognition score belonging to class of input example 25
Outline • Introduction • Part-Based Representation • Action Recognition • Experiments • Conclusion & Future Work 26
Experiments─ Experimental Setting (1/2) • Experimental platform 27
Experiments─ Experimental Setting (2/2) • Measurement • Action recognition • Action spotting • Classifier type • Non-linear SVM with RBF Kernel : True positive : False positive : False negative More than 50% overlap with ground truth RBF: Radial Basis Function 28
Experiments─ Temporal-Pyramid BoW Evaluation (1/3) • Dataset KTH[1] RGB-D HuDaAct[2] 12 types of human daily activities 6types of actions 4 scenarios outdoors outdoors with scale variation outdoors with different clothes indoors [1]Schuldtet al. “Recognizing human actions: a local SVM approach,” International Conference onPattern Recognition, 2004. [2]Ni et al. “RGBD-HuDaAct: A color-depth video database for human daily activity recognition,” IEEE International Conference on Computer Vision Workshops, 2011. 29
Experiments─ Temporal-Pyramid BoW Evaluation (2/3) • Validation scheme • Leave-one-subject out cross validation • Result on KTH 30 [1] Laptevet al. “Learning realistic human actions from movies,” IEEE Conference onComputer Vision and Pattern Recognition, 2008.
Experiments─ Temporal-Pyramid BoW Evaluation (3/3) • Result on RGB-D HuDaAct Temporal-Pyramid BoW Ni et al.[1] [1] Ni et al. “RGBD-HuDaAct: A color-depth video database for human daily activity recognition,” IEEE International Conference on Computer Vision Workshops, 2011. [2] Zhaoet al. “Combing RGB and Depth Map Features for human activity recognition,” Signal & Information Processing Association Annual Summit and Conference, Asia-Pacific, 2012. 31
Experiments─ Recognition Performance (1/4) • Dataset • 8types of actions • 3 cases of occlusions Baseball striking Boxing Jumping Kicking Tennis serving Swimming Running Basketball shooting 32
Experiments─ Recognition Performance (2/4) • Non-occlusion test • Manually segmented actions • Continuous actions precision recall : True positive : False positive : False negative 33
Experiments─ Recognition Performance (3/4) • Occlusion test • Manually segmented actions Exc/Occlusion Exc/Clean ▪Train data: All non-occlusion data except those that the subject is also involved in testing data ▪Train data: All non-occlusion data except those that the subject is also involved in testing data ▪Testing data: Non-occlusion data ▪Testing data: Occlusion data [1] Weinlandet al. “Making action recognition robust to occlusions and viewpoint changes,” European Conference on Computer Vision, 2010. 34
Experiments─ Recognition Performance (4/4) • Occlusion test • Continuous actions 35
Outline • Introduction • Part-Based Representation • Action Recognition • Experiments • Conclusion & Future Work 36
Conclusion • We proposed a robustly occlusion-handling action recognition system • Robustly handle occlusion by reliable parts • Proposed an efficient part-based approaches with effective Temporal-Pyramid BoW representation • Used action prior information to help action recognition 37
Future Work • The performance of action spotting has space to be improved • Technique of parallel computing can be exploited to reduce latency 38
Thank you for listening!! Q&A 39
Appendix ─ Preprocessing & Feature Extraction • Human segmentation • Align depth image to RGB image • Regularize the range of depth value to [0,255] • Segment human according to depth and motion
Appendix─ Preprocessing & Feature Extraction • Spatio-temporal interest point[1] • Interest point detection Construct scale-space Compute response : spatio-temporal separable Gaussian kernel : input image : constant [1] Laptevet al. “Learning realistic human actions from movies,” IEEE Conference onComputer Vision and Pattern Recognition, 2008.
Appendix─ Preprocessing & Feature Extraction • Spatio-temporal interest point[1] • Use HOG and HOF as feature descriptor • Extract features from RGB image sequence and depth image sequence separately HOG: Histogram of Oriented Gradient HOF: Histogram of Optical Flow [1] Laptevet al. “Learning realistic human actions from movies,” IEEE Conference onComputer Vision and Pattern Recognition, 2008.
Appendix─ Preprocessing & Feature Extraction • Noise removal • An interest point is considered as noise ifthere are many non-human pixels around it • Compute the ratio of pixels with zero-valued depth within a window
Appendix─ Part Assignment • Invalid joint • Illegitimate joint • Joint with illegitimate projective position or unreasonable depth value • Occluded joint : difference between regularized depth value and depth intensity of joint : adjacent joints of joint ,: two thresholds,
Appendix─ Part Assignment • Complexity • Our approach takes • Approaches based on part filters take : number of features : number of joints : number of parts frame size : window size of part filter
Appendix─ Training • Support Vector Machine (SVM) • Linear SVM
Appendix─ Training • Support Vector Machine (SVM) • Non-linear SVM • Transform the original input space into a high dimensional space by a transform function