930 likes | 1.13k Views
Real Time Gesture Recognition of Human Hand. Wu Hai Atid Shamaie Alistair Sutherland. Overview: . What are gestures? What can gestures be used for? How to find a hand in an image? How to recognise its shape? How to recognise its motion? How to find its position in 3D space?.
E N D
Real Time Gesture Recognition of Human Hand Wu Hai Atid Shamaie Alistair Sutherland
Overview: • What are gestures? • What can gestures be used for? • How to find a hand in an image? • How to recognise its shape? • How to recognise its motion? • How to find its position in 3D space?
What is Gesture? A movement of a limb or the body as an expression of thought or feeling. --Oxford Concise Dictionary 1995
Mood, emotion • Mood and emotion are expressed by body language • Facial expressions • Tone of voice • Allows computers to interact with human beings in a more natural way
Human Computer Interface using Gesture • Replace mouse and keyboard • Pointing gestures • Navigate in a virtual environment • Pick up and manipulate virtual objects • Interact with a 3D world • No physical contact with computer • Communicate at a distance
Public Display Screens • Information display screens • Supermarkets • Post Offices, Banks • Allows control without having to touch the device
Sign Language • 5000 gestures in vocabulary • each gesture consists of a hand shape, a hand motion and a location in 3D space • facial expressions are important • full grammar and syntax • each country has its own Sign language • Irish Sign Language is different from British Sign Language or American Sign Language
F C A
Datagloves • Datagloves provide very accurate measurements of hand-shape • But are cumbersome to wear • Expensive • Connected by wires- restricts freedom of movement
Datagloves - the future • Will get lighter and more flexible • Will get cheaper ~ $100 • Wireless?
Our vision-based system Wireless & Flexible No specialised hardware Single Camera Real-time
Coloured Gloves • User must wear coloured gloves • Very cheap • Easy to put on • BUT get dirty • Eventually we wish to use natural skin
Colour Segment Noise Removal 32 32 Scale by Area
Demo • Gesture Video
Feature Space Each point represents a different image Clusters of points represent different hand-shapes Distance between points depends on how similar the images are
A continuous gesture creates a trajectory in feature space We can project a new image onto the trajectory
Multiple sub-spaces Classifying a new unknown image Gesture 2 Gesture 1 Global space
3D spatial position of hand y x camera Subspaces and trajectories calculated with hand at origin We know the image co-ordinates and the area of the hand in the original image We can calculate depth and xy-position
Yes/No? Yes/No? Yes/No? Yes/No? Y A B C
Hierarchical Search • We need to search thousands of images • How to do this efficiently? • We need to use a “coarse-to-fine”search strategy
Blurring Factor = 1 Original image Blurring Factor = 2 Blurring Factor = 3
Multi-scale Hierarchy Factor = 3.0 Factor = 2.0 Factor = 1.0
Motion Recognition • Hidden Markov Model ( HMM ) • --- time sequence of images modeling HMM1 (Hello) f P(f |HMM1) P(f |HMM2) HMM2 (Good) HMM3(Bad) HMM4 (House)
Prediction and Tracking • Given previous frames we can predict what will happen next • Speeds up search. • occlusions -
Co-articulation In fluent dialogue signs are modified by preceding and following signs intermediate forms A B
Future Work: • Occlusions (Atid) • Grammars in Irish Sign Language. --- Sentence Recognition • Body Language.
Facial Expressions Anger Fear Disgust Happy Sad Surprise
Face Recognition • Summary • Single pose • Multiple pose • Principal components analysis • Model-based recognition • Neural Networks
Single Pose • Standard head-and-shoulders view with uniform background • Easy to find face within image
Aligning Images Alignment • Faces in the training set must be aligned with each other to remove the effects of translation, scale, rotation etc. • It is easy to find the position of the eyes and mouth and then shift and resize images so that are aligned with each other
Nearest Neighbour • Once the images have been aligned you can simply search for the member of the training set which is nearest to the test image. • There are a number of measures of distance including Euclidean distance, and the cross-correlation
Principal Components • PCA reduces the number of dimensions and so the memory requirement is much reduced. • The search time is also reduced
Two ways to apply PCA (1) • We could apply PCA to the whole training set. • Then each face is represented by a point in the PC space • We could then apply nearest neighbour to these points
Two ways to apply PCA (2) • Alternatively we could apply PCA to the set of faces belonging to each person in the training set • Each class (person) is then reprented by a different ellipsoid and Mahalanobis distance can be used to classify a new unknown face • You need a lot of images of each person to do this
Problems with PCA • The same person may sometimes appear differently due to • Beards, moustaches • Glasses, • Makeup • These have to be represented by different ellipsoids
-------(2)--------------(3)--------------(4)------- -------(5)--------------(6)--------------(7)------- -------(8)--------------(9)--------------(10)-------
Problems with PCA • Facial expressions • Differing facial expressions • Opening and closing the mouth • Raised eyebrows • Widening the eyes • Smiling, frowing etc, • These mean that the class is no longer ellipsoidal and must be represented by a manifold
Facial Expressions • There are six types of facial expression • We could use PCA on the eyes and mouth – so we could have eigeneyes and eigenmouths Anger Fear Disgust Happy Sad Surprise
Multiple Poses • Heads must now be aligned in 3D world space • Classes now form trajectories in feature space • It becomes difficult to recognise faces because the variation due to pose is greater than the variation between people
Model-based Recognition • We can fit a model directly to the face image • Model consists of a mesh which is matched to facial features such as the eyes, nose, mouth and edges of the face. • We use PCA to describe the parameters of the model rather than the pixels.
Model-based Recognition • The model copes better with multiple poses and changes in facial expression.