1 / 57

Activity Analysis of Sign Language Video

Activity Analysis of Sign Language Video. Generals exam Neva Cherniavsky. MobileASL goal:. Challenges:. ASL communication using video cell phones over current U.S. cell phone network. Limited network bandwidth Limited processing power on cell phones FAQ. Activity Analysis and MobileASL.

willis
Download Presentation

Activity Analysis of Sign Language Video

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Activity Analysis of Sign Language Video Generals exam Neva Cherniavsky

  2. MobileASL goal: Challenges: • ASL communication using video cell phones over current U.S. cell phone network • Limited network bandwidth • Limited processing power on cell phones • FAQ

  3. Activity Analysis and MobileASL • Use qualities unique to sign language • Signing/Not signing/Finger spelling • Information at beginning and ending of signs

  4. Activity Analysis and MobileASL • Use qualities unique to sign language • Signing/Not signing/Finger spelling • Information at beginning and ending of signs • Decrease cost of sending video

  5. Activity Analysis and MobileASL • Use qualities unique to sign language • Signing/Not signing/Finger spelling • Information at beginning and ending of signs • Decrease cost of sending video • Maximum bandwidth

  6. Activity Analysis and MobileASL • Use qualities unique to sign language • Signing/Not signing/Finger spelling • Information at beginning and ending of signs • Decrease cost of sending video • Maximum bandwidth • Total data sent and received

  7. Activity Analysis and MobileASL • Use qualities unique to sign language • Signing/Not signing/Finger spelling • Information at beginning and ending of signs • Decrease cost of sending video • Maximum bandwidth • Total data sent and received • Power consumption

  8. Activity Analysis and MobileASL • Use qualities unique to sign language • Signing/Not signing/Finger spelling • Information at beginning and ending of signs • Decrease cost of sending video • Maximum bandwidth • Total data sent and received • Power consumption • Processing cost

  9. One Approach: Variable Frame Rate

  10. Variable Frame Rate • Decrease frame rate during “listening” • Goal: reduce cost while maintaining or increasing intelligibility • Maximum bandwidth? • Total data sent and received? • Power consumption? • Processing cost? NO YES YES YES

  11. Demo

  12. The story so far... • Showed variable frame rate can reduce cost (25% savings in bit rate) • Conducted user studies to determine intelligibility of variable frame rate videos • Quality of each frame held constant (data transmitted decreased with decreased frame rate) • Lowering frame rate did not affect intelligibility • Freeze frame thought unnatural

  13. Outline • Introduction • Completed Activity Analysis Research • Feature extraction • Classification • Proposed Activity Analysis Research • Timeline to complete dissertation

  14. Raw Data Classification Feature Extraction Modification Classification Engine Activity Analysis, big picture

  15. Signing, Listening Feature Extraction Classification , , , , Activity Analysis, thus far

  16. Features H.264 information: Type of macroblock Motion vectors

  17. Features cont. Features: (x,y) motion vector face (x,y) motion vector left (x,y) motion vector right # of I blocks

  18. Classification • Train via labeled examples • Training can be performed offline, testing must be real-time • Support vector machines • Hidden Markov models

  19. Support vector machines • More accurately called support vector classifier • Separates training data into two classes so that they are maximally apart

  20. Maximum margin hyperplane Small Margin Large Margin Support vectors

  21. What if it’s non-linear?

  22. Implementation notes • May not be separable • Use linear separation, but allow training errors • Higher cost for errors = more accurate model, may not generalize • libsvm, publicly available Matlab library • Exhaustive search on training data to choose best parameters • Radial basis kernel function • As originally published, no temporal information • Use “sliding window”, keep track of classification • Majority vote gives result

  23. Implementation notes • May not be separable • Use linear separation, but allow training errors • Higher cost for errors = more accurate model, may not generalize • libsvm, publicly available Matlab library • Exhaustive search on training data to choose best parameters • Radial basis kernel function • As originally published, no temporal information • Use “sliding window”, keep track of classification • Majority vote gives result

  24. Implementation notes • May not be separable • Use linear separation, but allow training errors • Higher cost for errors = more accurate model, may not generalize • libsvm, publicly available Matlab library • Exhaustive search on training data to choose best parameters • Radial basis kernel function • As originally published, no temporal information • Use “sliding window”, keep track of classification • Majority vote gives result

  25. SVM Classification Accuracy

  26. Hidden Markov models • Markov model: finite state model, obeys Markov property Pr[Xn = x | Xn-1 = xn-1, Xn-2 = xn-2, … X1 = x1] = Pr [Xn = x | Xn-1 = xn-1] • Current state depends only on previous state • Hidden Markov model: states are hidden, infer through observations

  27. 0.2 0.2 0.5 0.3 0.4 0.3 0.4 0.4 0.4 0.2 0.1 0.5 0.1 0.1 0.7 0.2 0.4 0.6

  28. 0.2 0.2 0.5 0.3 0.6 0.3 0.2 0.2 0.4 0.3 0.4 0.4 0.1 0.2 0.4 0.5 0.4 0.4 0.8 0.2 0.1 0.5 0.1 0.1 0.5 0.1 0.1 0.1 0.1 0.1 0.7 0.8 0.5 0.2 0.4 0.6 Different models

  29. ? ? ? Two ways to solve recognition • Given observation sequence O and a choice of models , maximize Pr(O| ) Speech recognition: which word produced observation? • Given observation sequence and model, find the most likely state sequence. Has been used for continuous sign recognition.

  30. ? ? ? Two ways to solve recognition • Given observation sequence O and a choice of models , maximize Pr(O| ) Speech recognition: which word produced observation? • Given observation sequence and model, find the most likely state sequence. Has been used for continuous sign recognition.

  31. Two ways to solve recognition • Given observation sequence O and model , what is Pr(O| )? Speech recognition: which word produced observation? • Given observation sequence and model, find the most likely state sequence. Has been used for continuous sign recognition [Starner95].

  32. Implementation notes • Use htk, publicly available library written in C • Model signing/not signing as “words” • Other possibility is to trace state sequence • Each is a 3 state model, no backward transitions • Must include some temporal info, else degenerate (biased coin flip) • Use 3, 4, and 5 frame window

  33. Implementation notes • Use htk, publicly available library written in C • Model signing/not signing as “words” • Other possibility is to trace state sequence • Each is a 3 state model, no backward transitions • Must include some temporal info, else degenerate (biased coin flip) • Use 3, 4, and 5 frame window

  34. HMM Classification Accuracy

  35. Outline • Motivation • Completed Activity Analysis Research • Proposed Activity Analysis Research • Recognize finger spelling • Recognize movement epenthesis • Timeline to complete dissertation

  36. Signing, Listening Feature Extraction Classification , , , , Activity Analysis, thus far

  37. Signing, Listening, Finger spelling Feature Extraction Classification , , , , Activity Analysis, proposed Movement epenthesis

  38. Proposed Research • Recognize new activity • Finger spelling • Movement epenthesis (= sign segmentation) • Questions • Why is this valuable? • Is it feasible? • How will it be solved?

  39. Why? Finger spelling Believe that increased frame rate will increase intelligibility Will confirm optimal frame rate through user studies

  40. Why? Movement epenthesis • Choose frames so that low frame rate more intelligible • Potentially first step in continuous sign language recognition engine • Irritation must not outweigh savings; verify through user studies

  41. Is it feasible? • Previous (somewhat successful) work: • Direct measure device • Rules-based • Change in motion trajectory, low motion [Sagawa00] • Finger flexion [Liang98] • Previous very successful work (98.8%) • Neural Network + direct measure device • Frame classified as left boundary, right boundary, or interior [Fang01]

  42. Is it feasible? • Previous (somewhat successful) work: • Direct measure device • Rules-based • Change in motion trajectory, low motion [Sagawa00] • Finger flexion [Liang98] • Previous very successful work (98.8%) • Neural Network + direct measure device • Frame classified as beginning of sign, end of sign, or interior [Fang01]

  43. How? • Improved feature extraction • Use the part of sign to inform extraction • See what works from the sign recognition literature • Improved classification

  44. Parts of sign • Handshape • Most work in sign language recognition focused here • Includes expensive techniques (time, power) • Movement • We only use this right now! • Often implicitly recognized in machine learning • Location • Palm orientation • Nonmanual signals (facial expression)

  45. Parts of sign • Handshape • Most work in sign language recognition focused here • Includes expensive techniques (time, power) • Movement • We only use this right now! • Often implicitly recognized in machine learning • Location • Palm orientation • Nonmanual signals (facial expression)

  46. Parts of sign • Handshape • Most work in sign language recognition focused here • Includes expensive techniques (time, power) • Movement • We only use this right now! • Often implicitly recognized in machine learning • Location • Palm orientation • Nonmanual signals (facial expression)

  47. Add center of gravity to features

  48. Parts of sign recognized by center of gravity • Handshape • Movement • Location • Palm orientation • Nonmanual signals (facial expression)

  49. Accurate COG • Bayesian filters • Very similar to hidden Markov models • What state are we in, given the (noisy) observations? • Find posterior pdf of state • Kalman filter, particle filter • Viola and Jones [01] object detection

  50. Bayesian filters Predict Update Kalman: assume linear system, minimize MSE; measure Particle: sum of weighted samples; measure, update weights Kalman: add in noise, guess state Particle: add in noise, guess particle location

More Related