1 / 21

Real-Time Vision-Based Gesture Recognition Using Haar-like Features

Real-Time Vision-Based Gesture Recognition Using Haar-like Features. By: Qing Chen, Nicolas D. Georganas and Emil M. Petriu IMTC 2007, Warsaw, Poland, May 1-3, 2007. Outline. 1. Introduction 2. Two-level Approach 3. Posture Recognition 4. Gesture Recognition 5. Conclusions.

kipling
Download Presentation

Real-Time Vision-Based Gesture Recognition Using Haar-like Features

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Real-Time Vision-Based Gesture Recognition Using Haar-like Features By: Qing Chen, Nicolas D. Georganas and Emil M. Petriu IMTC 2007, Warsaw, Poland, May 1-3, 2007

  2. Outline • 1. Introduction • 2. Two-level Approach • 3. Posture Recognition • 4. Gesture Recognition • 5. Conclusions

  3. 1. Introduction • Human-Virtual Environment (VE) interaction requires utilizing different modalities (e.g. speech, body position, hand gestures, haptic response, etc.) and integrating them together for a more immersive user experience. • Hand gestures are a intuitive yet powerful communication modality which has not been fully explored for H-VE interaction. • The latest computer vision, image processing techniques make real-time vision-based hand gesture recognition feasible for human-computer interaction. • Vision-based hand gesture recognition system needs to meet the requirements in terms of real-time performance, robustness and accurate recognition.

  4. 1. Introduction (cont’d) • Vision-based gesture recognition techniques can be divided into two categories: • Appearance-based approaches:√- Pros: simple hand models; efficient implementation; real-time performance easier to achieve.- Cons: limited capability to model 3D hand gestures.- We choose this approach to achieve the real-time performance. • 3D hand model-based approaches:- Pros: potentiality to model more natural hand gestures.- Cons: complex hand model; real-time performance is difficult; user-dependent.

  5. 2. Two-level Approach • Definition 1 (Posture/Pose) A posture or pose is defined solely by the (static) hand configurations and hand locations. • Definition 2 (Gesture) A gesture is a series of postures over a time span connected by motions (global hand motion and local finger motion).

  6. 2. Two-level Approach (cont’d) • With the hierarchical nature of the definition, it is natural to decouple the gesture classification problem into two levels: • Lower-level: recognition of primitives (postures); • Solution: Viola and Jones algorithm • Higher-level: recognition of structure (gesture); • Solution: Grammar-based analysis Posture level Viola & Jones Algorithm Gesture level Grammar-based analysis

  7. 3. Posture Recognition • Viola and Jones Algorithm (2001): • A statistical approach originally for the task of human face detection and tracking. • 15 times faster than any previous face detection approaches while achieving equivalent accuracy to the best published results. • Employed 3 techniques : • Haar-like features • Integral image • AdaBoosting Learning algorithm • Issues for hand postures: • Applicability • Classification besides detection • Selection of posture sets • Calibration

  8. 3. Posture Recognition (cont’d) • Haar-like features: • The value of a Haar-like feature: f(x)=Sumblack rectangle (pixel gray level) – Sumwhite rectangle (pixel gray level) • Compared with raw pixels, Haar-like features can reduce/increase the in-class/out-of-class variability, and thus making classification easier. Figure 1: The set of basic Haar-like features. Figure 2: The set of extended Haar-like features.

  9. A B P1 P2 D C P3 P4 P (x, y) The rectangle Haar-like features can be computed rapidly using “integral image”. Integral image at location of x, y contains the sum of the pixel values above and left of x, y, inclusive: The sum of pixel values within “D” can be computed by : P1 +P4-P2 -P3 3. Posture Recognition (cont’d)

  10. 3. Posture Recognition (cont’d) • To detect the hand, the image is scanned by a sub-window containing a Haar-like feature. • Based on each Haar-like feature fj , a weak classifier hj(x) is defined as:where x is a sub-window, and θis a threshold. pj indicating the direction of the inequality sign.

  11. 3. Posture Recognition (cont’d) • In machine vision: • HARD to find a single accurate classification rule; • EASY to find rules with classification accuracy slightly better than 50% (weak classifiers) . • AdaBoosting (Adaptive Boosting) is an iterative algorithm to improve the accuracy stage by stage based on a series of weak classifiers. • Adaptive: later classifiers are tuned up in favor of the samples misclassified by previous classifiers.

  12. 3. Posture Recognition (cont’d) • Adaboost starts with a uniform distribution of “weights” over training examples. The weights tell the learning algorithm the importance of the example. • Obtain a weak classifier from the weak learning algorithm, hj(x). • Increase the weights on the training examples that were misclassified. • (Repeat) • At the end, carefully make a linear combination of the weak classifiers obtained at all iterations.

  13. 3. Posture Recognition (cont’d) • A series of classifiers are applied to every sub-window. • The first classifier: • Eliminates a large number of negative sub-windows; • pass almost all positive sub-windows (high false positive rate) with very little processing. • Subsequent layers eliminate additional negatives sub-windows (passed by the first classifier) but require more computation. • After several stages of processing the number of negative sub-windows have been reduced radically.

  14. 3. Posture Recognition (cont’d) • Four hand postures have been tested with Viola & Jones algorithm: • Input device: A low cost Logitech QuickCam web-camera with a resolution of 320 × 240 up at 15 frames-per-second.

  15. 3. Posture Recognition (cont’d) • Training samples collection: • Negative samples: images that must not contain object representations. We collected 500 random images as negative samples. • Positive samples: hand posture images that are collected from humans hand, or generated with a 3D hand model. For each posture, we collected around 450 positive samples. As the initial test, we use the white wall as the background.

  16. 3. Posture Recognition (cont’d) • After the training process based on the AdaBoosting learning algorithm, we get a cascade classifier for each hand posture when the required accuracy is achieved: • “Two-finger” posture: 15 stage cascade classifier; • “Palm” posture: 10 stage cascade classifier; • “Fist” posture: 15 stage cascade classifier; • “Little finger” posture: 14 stage cascade classifier. • The performance of trained classifiers for 100 testing images:

  17. 3. Posture Recognition (cont’d) • To recognize these different hand postures, a parallel structure that includes all of the cascade classifiers is implemented:

  18. 3. Posture Recognition (cont’d) • The real-time performance of the posture recognition:

  19. 4. Gesture Recognition • As a gesture is a series of postures, a grammar-based syntactic analysis is suitable to describe the composite gestures based on postures, and thus enables the system to recognize the gestures based on their representations. • For pattern recognition, a grammar G= (N, T, P, S) • A finite set N of non-terminal symbols; • A finite set T of terminal symbols that is disjoint from N; • A finite set P of production rules; • A distinguished symbol S Nthat is the start symbol. • Issues in modeling the structure of hand gestures: • Choice of basic primitives • Choice of appropriate grammar type (context free, stochastic context free, regular, HMM)

  20. 5. Conclusions • The parallel cascade structure based Haar-like features and the AdaBoosting learning algorithm can achieve satisfactory real-time hand posture classification results; • The experiment result shows the Viola and Jones algorithm has very robust performance against scale invariance and a certain degree of robustness against in-plane rotation (±15˚) and out-of-plane rotation; • Viola and Jones algorithm also shows good performance for different illumination conditions, but poor performance for different backgrounds; • A two-level architecture that can capture the hierarchical nature of gesture classification is proposed: the lower level focused on the posture recognition while the higher level focused on the description of composite gestures using grammar-based syntactic analysis.

  21. Dziekuje 

More Related