1.48k likes | 1.52k Views
Gesture Recognition. CSE 4310 – Computer Vision Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington. Gesture Recognition. What is a gesture?. Gesture Recognition. What is a gesture? Body motion used for communication. Gesture Recognition.
E N D
Gesture Recognition CSE 4310 – Computer Vision Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Gesture Recognition • What is a gesture?
Gesture Recognition • What is a gesture? • Body motion used for communication.
Gesture Recognition • What is a gesture? • Body motion used for communication. • There are different types of gestures.
Gesture Recognition • What is a gesture? • Body motion used for communication. • There are different types of gestures. • Hand gestures (e.g., waving goodbye). • Head gestures (e.g., nodding). • Body gestures (e.g., kicking).
Gesture Recognition • What is a gesture? • Body motion used for communication. • There are different types of gestures. • Hand gestures (e.g., waving goodbye). • Head gestures (e.g., nodding). • Body gestures (e.g., kicking). • Example applications:
Gesture Recognition • What is a gesture? • Body motion used for communication. • There are different types of gestures. • Hand gestures (e.g., waving goodbye). • Head gestures (e.g., nodding). • Body gestures (e.g., kicking). • Example applications: • Human-computer interaction. • Controlling robots, appliances, via gestures. • Sign language recognition.
Dynamic Gestures • What gesture did the user perform? Class “8”
Gesture Recognition Example • Recognize 10 simple gestures performed by the user. • Each gesture corresponds to a number, from 0, to 9. • Only the trajectory of the hand matters, not the handshape. • This is just a choice we make for this example application. Many systems need to use handshape as well.
Decomposing Gesture Recognition • We need modules for:
Decomposing Gesture Recognition • We need modules for: • Computing how the person moved. • Person detection/tracking. • Hand detection/tracking. • Articulated tracking (tracking each body part). • Handshape recognition. • Recognizing what the motion means.
Decomposing Gesture Recognition • We need modules for: • Computing how the person moved. • Person detection/tracking. • Hand detection/tracking. • Articulated tracking (tracking each body part). • Handshape recognition. • Recognizing what the motion means. • Motion estimation and recognition are quite different tasks.
Decomposing Gesture Recognition • We need modules for: • Computing how the person moved. • Person detection/tracking. • Hand detection/tracking. • Articulated tracking (tracking each body part). • Handshape recognition. • Recognizing what the motion means. • Motion estimation and recognition are quite different tasks. • When we see someone signing in ASL, we know how they move, but not what the motion means.
Nearest-Neighbor Recognition Query M1 M2 • Question: how should we measure similarity? MN
Motion Energy Images • A simple approach. • Representing a gesture: • Sum of all the motion occurring in the video sequence. • For every frame (except the first one and the last one): • Compute the frame difference image for that frame. • The motion energy image is the sum of all frame difference images.
Motion Energy Images Each image is a motion energy image of a digit from 0 to 9. Can you guess which?
Motion Energy Images 7 6 1 2 9 3 8 0 5 4
Motion Energy Images • Assumptions/Limitations: • No clutter. • We know the times when the gesture starts and ends.
If Hand Location Is Known: • Frequently made assumption: hand location is known in all frames of the training gestures. • The training set is built offline. • In worst case, we can manual annotation to mark the hand locations, or to correct mistakes made by a hand detector. Example training gesture
Comparing Trajectories We can make a trajectory based on the location of the hand at each frame. 6 9 2 4 6 2 8 3 4 5 7 8 1 3 7 5 1
Comparing Trajectories How do we compare trajectories? 6 9 2 4 6 2 8 3 4 5 7 8 1 3 7 5 1
Comparing i-th frame to i-th frame is problematic. What do we do with frame 9? Matching Trajectories 2 2 4 6 8 3 1 5 7 8 9 6 4 3 7 5 1
Alignment: ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). ((s1, t1), (s2, t2), …, (sp, tp)) Matching Trajectories 2 2 4 6 8 3 1 5 7 8 9 6 4 3 7 5 1
Matching Trajectories 2 4 6 8 3 4 1 7 8 9 6 2 5 1 7 5 3 • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp))
Matching Trajectories 2 4 6 8 3 4 1 7 8 9 6 2 5 1 7 5 3 • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp))
Matching Trajectories 2 4 6 8 3 4 1 7 8 9 6 2 5 1 7 5 3 • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp))
Matching Trajectories 2 4 6 8 3 4 1 7 8 9 6 2 5 1 7 5 3 • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp))
Matching Trajectories 2 4 6 8 3 4 1 7 8 9 6 2 5 1 7 5 3 • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp))
Matching Trajectories 2 4 6 8 3 4 1 7 8 9 6 2 5 1 7 5 3 • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp))
Matching Trajectories 2 4 6 8 3 4 1 7 8 9 6 2 5 1 7 5 3 • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp))
Matching Trajectories 2 4 6 8 3 4 1 7 8 9 6 2 5 1 7 5 3 • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp))
Matching Trajectories 2 4 6 8 3 4 1 7 8 9 6 2 5 1 7 5 3 • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp))
Matching Trajectories 6 8 6 7 8 2 5 4 1 3 4 9 2 7 3 1 5 M = (M1, M2, …, M8). Q = (Q1, Q2, …, Q9). • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp)) • Can be many-to-many. • M1 is matched to Q2 and Q3.
Matching Trajectories 6 8 6 7 8 2 5 4 1 3 4 9 2 7 3 1 5 M = (M1, M2, …, M8). Q = (Q1, Q2, …, Q9). • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp)) • Can be many-to-many. • M4 is matched to Q5 and Q6.
Matching Trajectories 6 8 6 7 8 2 5 4 1 3 4 9 2 7 3 1 5 M = (M1, M2, …, M8). Q = (Q1, Q2, …, Q9). • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp)) • Can be many-to-many. • M5 and M6 are matched to Q7.
Matching Trajectories 6 8 7 6 5 8 2 3 4 1 9 2 4 7 3 1 5 • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp)) • Cost of alignment:
Matching Trajectories Alignment: ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). ((s1, t1), (s2, t2), …, (sp, tp)) Cost of alignment: cost(s1, t1) + cost(s2, t2) + … + cost(sm, tn) 8 7 5 2 6 8 6 2 4 3 9 4 1 1 7 3 5
Matching Trajectories 6 8 7 6 5 8 2 3 4 1 9 2 4 7 3 1 5 • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp)) • Cost of alignment: • cost(s1, t1) + cost(s2, t2) + … + cost(sm, tn) • Example: cost(si, ti) = Euclidean distance between locations. • Cost(3, 4) = Euclidean distance between M3 and Q4.
Matching Trajectories Alignment: ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). ((s1, t1), (s2, t2), …, (sp, tp)) Rules of alignment. Is alignment ((1, 5), (2, 3), (6, 7), (7, 1)) legal? 8 7 5 2 6 8 6 2 4 3 9 4 1 1 7 3 5
Matching Trajectories Alignment: ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). ((s1, t1), (s2, t2), …, (sp, tp)) Rules of alignment. Is alignment ((1, 5), (2, 3), (6, 7), (7, 1)) legal? Depends on what makes sense in our application. 8 7 5 2 6 8 6 2 4 3 9 4 1 1 7 3 5
Matching Trajectories Alignment: ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). ((s1, t1), (s2, t2), …, (sp, tp)) Dynamic time warping rules: boundaries s1 = 1, t1 = 1. sp = m = length of first sequence tp = n = length of second sequence. 6 8 6 7 8 2 5 4 4 3 2 9 1 7 3 1 5 first elements match last elements match
Matching Trajectories Illegal alignment (violating monotonicity): (…, (3, 5), (4, 3), …). ((s1, t1), (s2, t2), …, (sp, tp)) Dynamic time warping rules: monotonicity. 0 <= (st+1 - st|) 0 <= (tt+1 - tt|) 6 8 6 7 8 2 5 4 4 3 2 9 1 7 3 1 5 The alignment cannot go backwards.
Matching Trajectories Illegal alignment (violating continuity). (…, (3, 5), (6, 7), …). ((s1, t1), (s2, t2), …, (sp, tp)) Dynamic time warping rules: continuity (st+1 - st|) <= 1 (tt+1 - tt|) <= 1 6 8 6 7 8 2 5 4 4 3 2 9 1 7 3 1 5 The alignment cannot skip elements.
Matching Trajectories Alignment: ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). ((s1, t1), (s2, t2), …, (sp, tp)) Dynamic time warping rules: monotonicity, continuity 0 <= (st+1 - st|) <= 1 0 <= (tt+1 - tt|) <= 1 6 8 6 7 8 2 5 4 4 3 2 9 1 7 3 1 5 The alignment cannot go backwards. The alignment cannot skip elements.
Dynamic Time Warping Dynamic Time Warping (DTW) is a distance measure between sequences of points. The DTW distance is the cost of the optimal alignment between two trajectories. The alignment must obey the DTW rules defined in the previous slides. 8 7 5 2 6 8 6 2 4 3 9 4 1 1 7 3 5
DTW Assumptions • The gesturing hand must be detected correctly. • For each gesture class, we have training examples. • Given a new gesture to classify, we find the most similar gesture among our training examples. • What type of classifier is this?
DTW Assumptions • The gesturing hand must be detected correctly. • For each gesture class, we have training examples. • Given a new gesture to classify, we find the most similar gesture among our training examples. • Nearest neighbor classification, using DTW as the distance measure.
Computing DTW 6 8 7 5 8 6 2 9 2 4 3 1 4 7 3 1 5 Q M • Training example M = (M1, M2, …, M8). • Test example Q = (Q1, Q2, …, Q9). • Each Mi and Qj can be, for example, a 2D pixel location.
Computing DTW • Training example M = (M1, M2, …, M10). • Test example Q = (Q1, Q2, …, Q15). • We want optimal alignment between M and Q. • Dynamic programming strategy: • Break problem up into smaller, interrelated problems (i,j). • Problem(i,j): find optimal alignment between (M1, …, Mi) and (Q1, …, Qj). • Solve problem(1, 1):