1 / 61

Video Analysis

Video Analysis. Mei-Chen Yeh May 29, 2012. Outline. Video representation Motion Actions in Video. Videos. A natural video stream is continuous in both spatial and temporal domains. A digital video stream sample pixels in both domains. Video processing. YC b C r. YC b C r.

dyre
Download Presentation

Video Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Video Analysis Mei-Chen Yeh May 29, 2012

  2. Outline • Video representation • Motion • Actions in Video

  3. Videos • A natural video stream is continuous in both spatialand temporaldomains. • A digital video stream sample pixels in both domains.

  4. Video processing YCbCr YCbCr

  5. Video signal representation (1) • Composite color signal • R, G, B • Y, Cb, Cr • Why Y, Cb, Cr? • Backward compatibility (back-and-white to color TV) • The eye is less sensitive to changes of Cb and Cr components Luminance (Y) Chrominance (Cb + Cr)

  6. Video signal representation (2) • Y is the luma component and Cband Cr are the blue and red chromacomponents. Y Cb Cr

  7. Sampling formats (1) 4:4:4 4:2:2 (DVB) 4:1:1 (DV) Slide from Dr. Ding

  8. Sampling formats (2) 4:2:0 (VCD, DVD)

  9. TV encoding system (1) • PAL • Phase Alternating Line, is a color encoding system used in broadcast television systems in large parts of the world. • SECAM • (French: SéquentialCouleur Avec Mémoire), is an analog color television system first used in France. • NTSC • National Television System Committee, is the analog television system used in most of North America, South America, Burma, South Korea, Taiwan, Japan, Philippines, and some Pacific island nations and territories.

  10. TV encoding system (2)

  11. Uncompressed bitrate of videos Slide from Dr. Chang

  12. Outline • Video representation • Motion • Actions in Video

  13. Motion and perceptual organization • Sometimes, motion is foremost cue

  14. Motion and perceptual organization • Even poor motion data can evoke a strong percept

  15. Motion and perceptual organization • Even poor motion data can evoke a strong percept

  16. Uses of motion • Estimating 3D structure • Segmenting objects based on motion cues • Learning dynamical models • Recognizing events and activities • Improving video quality (motion stabilization) • Compressing videos • ……

  17. Motion field • The motion field is the projection of the 3D scene motion into the image

  18. Motion field P(t+dt) V P(t) • P(t) is a moving 3D point • Velocity of scene point: • V = dP/dt • p(t) = (x(t),y(t)) is the projection of P in the image • Apparent velocity v in the image: • vx = dx/dt • vy = dy/dt • These components are known as the motion field of the image v p(t+dt) p(t)

  19. Motion estimation techniques • Based on temporal changes in image intensities • Direct methods • Directly recover image motion at each pixel from spatio-temporal image brightness variations • Dense motion fields, but sensitive to appearance variations • Suitable when image motion is small • Feature-based methods • Extract visual features (corners, textured areas) and track them over multiple frames • Sparse motion fields, but more robust tracking • Suitable when image motion is large

  20. Optical flow • The velocity of observed 2-D motion vectors • Can be caused by • object motions • camera movements • illumination condition changes

  21. Optical flow the true motion field No motion field but shading changes Motion field exists but no optical flow

  22. Key assumptions • color constancy: a point in Itlooks the same in It+dt • For grayscale images, this is brightness constancy • small motion: points do not move very far • This is called the optical flowproblem. Problem definition: optical flow How to estimate pixel motion from image I(x,y,t) to image I(x,y,t+dt)? • Solve pixel correspondence problem • given a pixel in It, look for nearby pixels of the same color in It+dt

  23. Optical flow constraints (grayscale images) Let’s look at these constraints more closely: • brightness constancy: • small motion: (u and v are small) • using Taylor’s expansion = 0

  24. Optical flow equation • Combining these two equations • Dividing both sides by dt u, v: displacement vectors velocity vector spatial gradient vector Known as the optical flow equation

  25. Q: how many unknowns and equations per pixel? • 2 unknowns, one equation • What does this constraint mean? • The component of the flow perpendicular to the gradient (i.e., parallel to the edge) is unknown gradient (vx,vy) • If (vx,vy) satisfies the equation, so does (vx+u’, vy+v’) if (u’,v’) (vx+u’,vy+v’) edge

  26. Q: how many unknowns and equations per pixel? • 2 unknowns, one equation • What does this constraint mean? • The component of the flow perpendicular to the gradient (i.e., parallel to the edge) is unknown This explains the Barber Pole illusion 2 1

  27. The aperture problem Perceived motion

  28. The aperture problem Actual motion

  29. The barber pole illusion http://en.wikipedia.org/wiki/Barberpole_illusion

  30. The barber pole illusion http://en.wikipedia.org/wiki/Barberpole_illusion

  31. To solve the aperture problem… • We need more equations for a pixel. • Example • Spatial coherence constraint: pretends the pixel’s neighbors have the same (vx,vy) • Lucas & Kanade (1981)

  32. Outline • Video representation • Motion • Actions in Video • Background subtraction • Recognition of actions based on motion patterns

  33. Using optical flow:recognizing facial expressions Recognizing Human Facial Expression (1994) by YaserYacoob, Larry S. Davis

  34. Example use of optical flow: visual effects in films http://www.fxguide.com/article333.html

  35. Slide credit: BirgiTamersoy

  36. Background subtraction • Simple techniques can do ok with static camera • …But hard to do perfectly • Widely used: • Traffic monitoring (counting vehicles, detecting & tracking vehicles, pedestrians), • Human action recognition (run, walk, jump, squat), • Human-computer interaction • Object tracking

  37. Slide credit: BirgiTamersoy

  38. Slide credit: BirgiTamersoy

  39. Slide credit: BirgiTamersoy

  40. Slide credit: BirgiTamersoy

  41. Frame differencesvs. background subtraction • Toyama et al. 1999

  42. Slide credit: BirgiTamersoy

  43. Pros and cons Advantages: • Extremely easy to implement and use • Fast • Background models need not be constant, they change over time Disadvantages: • Accuracy of frame differencing depends on object speed and frame rate • Median background model: relatively high memory requirements • Setting global threshold Th… Slide credit: BirgiTamersoy

  44. Background subtraction with depth How can we select foreground pixels based on depth information? Leap: http://www.leapmotion.com/

  45. Outline • Video representation • Motion • Actions in video • Background subtraction • Recognition of action based on motion patterns

  46. Motion analysis in video • “Actions”: atomic motion patterns -- often gesture-like, single clear-cut trajectory, single nameable behavior (e.g., sit, wave arms) • “Activity”: series or composition of actions (e.g., interactions between people) • “Event”: combination of activities or actions (e.g., a football game, a traffic accident) Modifiedfrom VenuGovindaraju

  47. Surveillance http://users.isr.ist.utl.pt/~etienne/mypubs/Auvinetal06PETS.pdf

  48. Interfaces • https://flutterapp.com/

  49. Human activity in video:basic approaches • Model-based action/activity recognition: • Use human body tracking and pose estimation techniques, relate to action descriptions • Major challenge: accurate tracks in spite of occlusion, ambiguity, low resolution • Activity as motion, space-time appearance patterns • Describe overall patterns, but no explicit body tracking • Typically learn a classifier • We’ll look at a specific instance…

  50. The 30-Pixel Man • Recognize actions at a distance [ICCV 2003] • Low resolution, noisy data, not going to be able to track each limb. • Moving camera, occlusions • Wide range of actions (including non-periodic) [Efros, Berg, Mori, & Malik 2003] http://graphics.cs.cmu.edu/people/efros/research/action/

More Related