570 likes | 853 Views
Outline. 1. Introduction2. Literature Review3. Interaction Model and Prototype System Design4. Real-time Segmentation of Hand Gestures5. Parameterized Image Motion Model and Robust Regression6. Spatio-temporal Appearance Modeling7. Dynamic Time Warping Based Recognition8. Experiment Results9
E N D
1. Vision-Based Recognition of Continuos Dynamic Hand Gestures Yuanxin Zhu
Department of Computer Science & Technology
Tsinghua University, Beijing, China
2. Outline 1. Introduction
2. Literature Review
3. Interaction Model and Prototype System Design
4. Real-time Segmentation of Hand Gestures
5. Parameterized Image Motion Model and Robust Regression
6. Spatio-temporal Appearance Modeling
7. Dynamic Time Warping Based Recognition
8. Experiment Results
9. Summary
10. Future Work
3. 1 Introduction Human-computer interaction (HCI) has become an increasingly important part of our daily lives.
Keyboards and mice are the most popular mode of HCI.
Virtual Reality and Wearable Computing require novel interaction modalities with following characteristics:
in a way that humans communicate with each other.
Hand gesture is a natural and intuitive communication mode.
Other applications: Sign Language Recognition, video transmission, and so on.
4. 1 Introduction Vision-based recognition of dynamic hand gestures is a challenging interdisciplinary project.
hand gestures are rich in diversities, multi-meanings, and space-time variation.
human hand is a complex non-rigid object.
computer vision itself is a ill-pose problem.
5. 1 Introduction To recognize continuous dynamic hand gesture :
Design of gesture command set and interaction model.
Real-time segmentation of gesture streams.
Modeling, analysis, and recognition of gestures.
Real-time processing is mandatory for practically using hand gestures in HCI.
6. 2. State of the the Art of Hand Gesture Recognition 2.1 Hand gesture taxonomy and interaction model
2.2 Hand gesture modeling
2.3 Hand gesture Analysis
2.4 Hand gesture recognition techniques
7. 2.1 Taxonomy of Gesture for Human-computer Interaction
8. 2.2 Hand Gesture Modeling Fig. 2: Classification of hand gesture models
9. 2.2 Hand Gesture Modeling (a) (b) (c) (d) (e)
Fig.3: Representing the same hand posture by different hand models. (a) 3-D textured volumetric model; (b) 3-D wireframe volumetric model; (c) 3-D skeletal model; (d) Binary silhouette; (e) Contour model.
10. 2.3 Gesture Analysis Gesture detection and feature extraction
skin color clues based approaches
motion clues based approaches
multiple clues based approaches
features include gray image, binary silhouette, moving region, edge, contour, and so on.
11. 2.3 Gesture Analysis Recovering gesture model parameters
Estimation of 3-D hand /arm model parameters
two sets of parameters: angular (joint angles) and linear (palm dimensions)
the initial parameter estimation
the parameter update as the hand gesture evolve in time.
Estimation of appearance based model parameters
image motion estimation (e.g. optical flow)
shape analysis (e.g. computing moments)
histogram based feature parameters (e.g. )
active contour model.
12. 2.4 Gesture Recognition Techniques Fig. 4: Classification of hand gesture recognition techniques
13. 3.1 Interaction Model Strength and weakness of gesture based interaction
Structure of interaction model
users performing gestures follow three steps.
suitable feedback
apply gesture based input to appropriate tasks
A set of rules for designing gesture command set.
Performing gestures intentionally and intensively, easy to learn, be symmetrical ...
14. 3.2 A Prototype System: Gesture-controlled Panoramic Map Browser (a) (b)
Fig. 5: Gesture-controlled panoramic map browser. (a) System setting; (b) User interface.
15. 3.3 Gesture Command Set Four translation gesture commands
move up (1); move down (2); move left (3); move right (4)
Six rotation gesture commands
yaw right (7); yaw left (8); roll clockwise (9); roll counterclockwise (10); pitch down (11); pitch (12)
Two other gesture commands
zoom in (5); zoom out (6).
16. 4 Real-Rime Segmentation of Continuous Dynamic Hand Gestures Goals
segment the moving hand from background.
partion of gesture streams into meaningful sections.
Methodology
integrating multiple clues: skin color, motion.
post-processing (morphological filtering techniques).
17. Fig. 6: Processing flow chart of real-time segmentation
18. 5. Recovering Image Motion Model Parameters by Robust Regression 5.1 Parameterized Image Motion Model
5.2 Constructing Objective Function
5.3 Robust Error Norms
5.4 Simultaneous Over Relaxation with Continuation Method.
5.5 Multi-resolution Analysis.
5.6 Examples of Experiment Results.
19. 4.1 Parameterized Image Motion Models Define:
Translation Model:
Affine Model:
Planar Model:
For example:
20. 4.2 Constructing Objective Function Brightness Constancy assumption:
21. 5.3 Robust Error Norms
22. 5.3 Robust Error Norms Fig. 7: Geman-McClure function. (a) Geman-McClure function; (b) Its derivative function.
23. 5.4 Simultaneous Over Relaxation with Continuation Method
24. 5.5 Multi-resolution Analysis
25. 5.6 Examples of Image Motion Estimation (d) (e) (f)
Fig.9: An example of robust image motion regression. (a) and (b) are the 2nd and 3nd frames in an image sequence. (c) Inliers and Outliers identified according to the result of the first regression. (d) Segmentation of the moving hand. (e) outliers identified according to result of the second regression. (e) The difference image between (a) and (b).
26. (d) (e) (f)
Fig. 10: Another example of robust image motion regression 4.6 Examples of Image Motion Estimation
27. 6. Spatio-Temporal Appearance Modeling 6. 1. Inter-frame Motion Appearance
6.2. Inner-frame Shape Appearance
6.3. Spatio-temporal Appearance
28. 6. 1. Inter-frame Motion Appearance
29. 6.2. Inner-frame Shape Appearance
30. 6.3. Spatio-temporal Appearance
31. 7.1 Dynamic Time Warping
32. 7.2 Modified DTW Our experiments find that the traditional DTW is not adequate to match two spatio-temporal appearance patterns.
Unlike the high sampling rate used in speech recognition, the sampling rate is usually 10 Hz in hand gesture recognition. Therefore, the fluctuation in the time axis of hand gesture patterns is much sharper than that of speech patterns.
A modified DTW algorithm, a kind of non-linear re-sampling technique, is developed to dynamically warp each spatio-temporal pattern to a fixed temporal length, which can reserve necessary temporal information and spatial distribution of original patterns.
33. 7.3 Template based Recognition The distance between two sptio-temporal appearance patterns is calculated based on correlation between their warped patterns.
34. 8. Experiment Results 8.1 Examples of Hand Gesture Segmentation.
8.2 Choosing Image Motion Models.
8.3 Examples of Spatio-temporal Appearance.
8.4 Examples of Warped Spatio-temporal Appearance.
8.5 Motion Appearance versus Shape Appearance.
8.6 Testing.
35. 8.1 Examples of hand gesture segmentation
36. 8.1 Examples of hand gesture segmentation
37. 8.1 Examples of hand gesture segmentation
38. 8.1 Examples of hand gesture segmentation
39. 8.2 Choosing Image Motion Model
40. 8.3 Examples of Spatio-temporal Appearances
41. 8.3 Examples of Spatio-temporal Appearances
42. 8.3 Examples of Spatio-temporal Appearances
43. 8.3 Examples of Spatio-temporal Appearances
44. 8.4 Determining of Warping Length
45. 8.5 Examples of Warped Spatio-temporal Appearance
46. 8.5 Examples of Warped Spatio-temporal Appearances
47. 8.5 Examples of Warped Spatio-temporal Appearances
48. 8.5 Examples of Warped Spatio-temporal Appearances
49. 8.6 Motion Appearance Vs Shape Appearance To explore the discrimination power of motion appearance or shape appearance separately, two experiments are carried out, one with only motion appearances being feature vectors and the other with only shape appearances being feature vectors.
50. 8.7 Testing Experiment The average recognition rate achieved on the test set is 89.6% .
Gesture-controlled panoramic map controller.
The prototype system can recognize hand gestures performed by a trained user with accuracy ranged from 83% to 92%.
51. 9. Summary Aiming at real-time gesture-controlled human-computer interaction, we propose novel approaches for visual modeling, analysis, and recognition of continuous dynamic hand gestures.
52. 9. Summary A spatio-temporal appearance model is proposed to represent dynamic hand gestures.
The model integrates temporal information, motion and shape appearances.
The motion appearance represents the image appearance changes caused by motion itself, not a temporal sequence of static configurations.
The shape appearance is based on the geometrical features of an ellipse fitted to the hand image region rather than the simply moment-based features.
53. 9. Summary Novel approaches are developed to extract model parameters by hierarchically integrating multiple clues.
At low level, fusion of flesh chrominance analysis and coarse image motion detection is employed to detect and segment hand gestures
At high level, the model parameters are recovered by integrating fine image motion estimation and shape analysis.
The approaches achieve both real-time processing and high recognition rates.
54. 9. Summary A modified Dynamic Time Warping algorithm is suggested for eliminating time variation of spatio-temporal appearance patterns due to various gesturing rates.
It is a kind of non-linear re-sampling technique.
It can reserve necessary temporal information and spatial distribution of original patterns.
55. 9. Summary A prototype system, gesture-controlled panoramic map browser, is designed and implemented to demonstrate the usability of gesture-controlled real-time interaction.
Dynamic hand gestures are recognized without resorting to any special marks, limited or uniform background, or particular illumination.
Only one uncalibrated video camera is utilized.
Higher recognition rates are achieved.
User is allowed to perform continuous hand gestures, starting at any point within the view field of the camera.
56. 10. Future Work We currently assume that the moving skin color region in the scene is the gesturing hand, which could be invalid when there appears a moving human face. Exploiting simple geometrical model of human body can alleviate this problem, in that case multiple cameras can be necessary.
57. 10. Future Work To practically use hand gestures in HCI, more gestural commands will be needed.
Some kind of commands would be more reasonably input by static hand gestures (hand postures).
On the other hand, speech commands will be an alternative to some gestural commands.
Cooperating hand gesture recognition into multi-modal interface (MMI) is our next work.
58. Thanks!