1 / 93

Real Time Gesture Recognition of Human Hand

Real Time Gesture Recognition of Human Hand. Wu Hai Atid Shamaie Alistair Sutherland. Overview: . What are gestures? What can gestures be used for? How to find a hand in an image? How to recognise its shape? How to recognise its motion? How to find its position in 3D space?.

Jims
Download Presentation

Real Time Gesture Recognition of Human Hand

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Real Time Gesture Recognition of Human Hand Wu Hai Atid Shamaie Alistair Sutherland

  2. Overview: • What are gestures? • What can gestures be used for? • How to find a hand in an image? • How to recognise its shape? • How to recognise its motion? • How to find its position in 3D space?

  3. What is Gesture? A movement of a limb or the body as an expression of thought or feeling. --Oxford Concise Dictionary 1995

  4. Mood, emotion • Mood and emotion are expressed by body language • Facial expressions • Tone of voice • Allows computers to interact with human beings in a more natural way

  5. Human Computer Interface using Gesture • Replace mouse and keyboard • Pointing gestures • Navigate in a virtual environment • Pick up and manipulate virtual objects • Interact with a 3D world • No physical contact with computer • Communicate at a distance

  6. Public Display Screens • Information display screens • Supermarkets • Post Offices, Banks • Allows control without having to touch the device

  7. Sign Language • 5000 gestures in vocabulary • each gesture consists of a hand shape, a hand motion and a location in 3D space • facial expressions are important • full grammar and syntax • each country has its own Sign language • Irish Sign Language is different from British Sign Language or American Sign Language

  8. F C A

  9. Datagloves

  10. Datagloves • Datagloves provide very accurate measurements of hand-shape • But are cumbersome to wear • Expensive • Connected by wires- restricts freedom of movement

  11. Datagloves - the future • Will get lighter and more flexible • Will get cheaper ~ $100 • Wireless?

  12. Our vision-based system Wireless & Flexible No specialised hardware Single Camera Real-time

  13. Coloured Gloves • User must wear coloured gloves • Very cheap • Easy to put on • BUT get dirty • Eventually we wish to use natural skin

  14. Colour Segment Noise Removal 32 32 Scale by Area

  15. Demo • Gesture Video

  16. Feature Space Each point represents a different image Clusters of points represent different hand-shapes Distance between points depends on how similar the images are

  17. A continuous gesture creates a trajectory in feature space We can project a new image onto the trajectory

  18. Multiple sub-spaces Classifying a new unknown image Gesture 2 Gesture 1 Global space

  19. 3D spatial position of hand y x camera Subspaces and trajectories calculated with hand at origin We know the image co-ordinates and the area of the hand in the original image We can calculate depth and xy-position

  20. Yes/No? Yes/No? Yes/No? Yes/No? Y A B C

  21. Hierarchical Search • We need to search thousands of images • How to do this efficiently? • We need to use a “coarse-to-fine”search strategy

  22. Blurring Factor = 1 Original image Blurring Factor = 2 Blurring Factor = 3

  23. Multi-scale Hierarchy Factor = 3.0 Factor = 2.0 Factor = 1.0

  24. Motion Recognition • Hidden Markov Model ( HMM ) • --- time sequence of images modeling HMM1 (Hello) f P(f |HMM1) P(f |HMM2) HMM2 (Good) HMM3(Bad) HMM4 (House)

  25. Prediction and Tracking • Given previous frames we can predict what will happen next • Speeds up search. • occlusions -

  26. Co-articulation In fluent dialogue signs are modified by preceding and following signs intermediate forms A B

  27. Future Work: • Occlusions (Atid) • Grammars in Irish Sign Language. --- Sentence Recognition • Body Language.

  28. Face Recognition

  29. A noisy environment

  30. Errors

  31. Model-based Recognition

  32. Pose-tracking

  33. Facial Expressions Anger Fear Disgust Happy Sad Surprise

  34. Human Body Tracking

  35. Face Recognition • Summary • Single pose • Multiple pose • Principal components analysis • Model-based recognition • Neural Networks

  36. Single Pose • Standard head-and-shoulders view with uniform background • Easy to find face within image

  37. Aligning Images Alignment • Faces in the training set must be aligned with each other to remove the effects of translation, scale, rotation etc. • It is easy to find the position of the eyes and mouth and then shift and resize images so that are aligned with each other

  38. Nearest Neighbour • Once the images have been aligned you can simply search for the member of the training set which is nearest to the test image. • There are a number of measures of distance including Euclidean distance, and the cross-correlation

  39. Principal Components • PCA reduces the number of dimensions and so the memory requirement is much reduced. • The search time is also reduced

  40. Two ways to apply PCA (1) • We could apply PCA to the whole training set. • Then each face is represented by a point in the PC space • We could then apply nearest neighbour to these points

  41. Two ways to apply PCA (2) • Alternatively we could apply PCA to the set of faces belonging to each person in the training set • Each class (person) is then reprented by a different ellipsoid and Mahalanobis distance can be used to classify a new unknown face • You need a lot of images of each person to do this

  42. Problems with PCA • The same person may sometimes appear differently due to • Beards, moustaches • Glasses, • Makeup • These have to be represented by different ellipsoids

  43. -------(2)--------------(3)--------------(4)------- -------(5)--------------(6)--------------(7)------- -------(8)--------------(9)--------------(10)-------

  44. Problems with PCA • Facial expressions • Differing facial expressions • Opening and closing the mouth • Raised eyebrows • Widening the eyes • Smiling, frowing etc, • These mean that the class is no longer ellipsoidal and must be represented by a manifold

  45. Facial Expressions • There are six types of facial expression • We could use PCA on the eyes and mouth – so we could have eigeneyes and eigenmouths Anger Fear Disgust Happy Sad Surprise

  46. Multiple Poses • Heads must now be aligned in 3D world space • Classes now form trajectories in feature space • It becomes difficult to recognise faces because the variation due to pose is greater than the variation between people

  47. Model-based Recognition • We can fit a model directly to the face image • Model consists of a mesh which is matched to facial features such as the eyes, nose, mouth and edges of the face. • We use PCA to describe the parameters of the model rather than the pixels.

  48. Model-based Recognition • The model copes better with multiple poses and changes in facial expression.

More Related