1 / 33

Using a Webcam as a Game Controller

Using a Webcam as a Game Controller. Jonathan Blow GDC 2002. Motivation. A potentially rich control paradigm, allowing for nuance. Removes the barrier of some funny plastic controller. Successful experiment: Konami’s Police 911. My game: Air Guitar.

ruby
Download Presentation

Using a Webcam as a Game Controller

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using a Webcam as a Game Controller Jonathan Blow GDC 2002

  2. Motivation • A potentially rich control paradigm, allowing for nuance. • Removes the barrier of some funny plastic controller. • Successful experiment: Konami’s Police 911

  3. My game: Air Guitar • A “beat-matching” game where you stand and play air guitar to your favorite songs. • Previous beat-matching games (Parappa, DDR) are very digital; I want to use a webcam to make Air Guitar more organic and to allow the user to be expressive. • Technically demanding as a vision app (needs semantics about what is what).

  4. Real-World Concerns • Noise • Illumination changes • Camera auto-adjusts • Background changes / camera moves • Shadows • Camera saturation / under-excitement

  5. Varying Lighting Conditions • Can’t rely on RGB values to identify pixels • Need context… hmm… this becomes a hard AI problem.

  6. Vision Techniques That Suck • Background subtraction (shadows, motion!) • Noise reduction by smoothing (resolution!) • Turning functions (unstable) • Frame coherence (just a band-aid) • Edge detection • Hysteresis (Latin for “cheap hack”) • Discreteness

  7. General Paradigm • Technique should: • Work on a still image • Be robust: avoid discrete decisions wherever possible. • Work in as general a case as we should manage, but we won’t strive to be ideally general. • We will do “whatever it takes” to get the job done.

  8. Restrained Ambition • Only trying to roughly determine the positions of torso and arms • Okay to say “the user must wear a long-sleeved shirt of uniform color that contrasts with the background” • We won’t dictate the color of the shirt (too restrictive!) • We won’t dictate colors of other things (user’s skin, background).

  9. Early Segmentation • Divide up the image into regions of “like” pixels to ease computation. • Ad hoc technique: iterate over scanlines potentially adding each pixel to its neighbor’s group. • This technique sucks.

  10. The Unreasonable Instability of Approximate Clustering • “Real” clustering is slow • “Loose” clustering is interactively unstable • Even just the small amount of camera noise makes things go berserk… motion is even worse. • Clustering is about continuous ==> discrete. We wanted to avoid that so we should be very careful.

  11. My solution: Be Inflexible • Simply divide the image into square regions of constant size. • If any region needs more detail, subdivide it. • Noise still affects this system (some regions subdividing / recombining from frame to frame) but it’s relatively stable.

  12. Which color space do we work in? • Want to group pixels that are “alike”: nearby in some color space. • Choices: nonlinear RGB, linear-light RGB, CIE LAB, many others. • CIE LAB produced nicer results for some ad hoc segmentation experiments, but is expensive to compute. • Linear-light RGB is the right thing for inverse rendering techniques; it is cheap to compute. • I started with CIE LAB, but now use linear RGB.

  13. Simple Inverse Rendering • Assume all surfaces have Lambertian reflectance • p = mlcosθ… θis angle between light and surface normal. • Can’t disambiguate material color from illuminant color • The compound color ml, under varying scale, forms a vector through the origin in RGB space. • This is a much more specific relation than e.g. Euclidean distance.

  14. Covariance Bodies: • 5 numbers’ worth of storage • Ellipsoid-shaped (take eigenvectors of matrix) • Statistical significance: expected value of points • Advantage: consistency under summation • Can use them to vaguely characterize shapes. • Generalizes to n dimensions.

  15. Covariance Bodies for Color Plane Fitting • Least-squares plane fit uses the same matrix. • Track RHS. 3 more numbers: • Sum these to get group plane fits. • (example)

  16. Calibration Mode • Stand in a fixed pose • Pose designed to be easily recognizable • Gives us things that help later: • Body measurements • Background of scene • Shirt color (and histogram) • Skin color • Coarse model of environment illumination

  17. How We Recognize This Pose • Pick a color to look for; isolate it. • Project this color to the X and Y image axes • Find spikes in projection • Use heuristics to judge shape and give a confidence value: • Outliers • Relative spike sizes • Screen real-estate occupied • (example)

  18. Try many colors. • Sort colors present in scene by popularity; cluster them. • Create a fuzzy color cone through each cluster. • Vary the cone radius. • Do the recognition listed on previous slide; select the color cone with the best score. • Fixed color grid (to combat instability!)

  19. (demo of calibration mode)

  20. Head Finding • Many heuristics: • Medium-detail region (Flatness + sharpness) • But not a long sharp edge • Compact body • Skin-colored • Not the background

  21. Skin color? • Fit points in RGB space with an approximating surface? • Where do I get a good skin color database?

  22. www.hotornot.com! • I get to work and check people out at the same time. • (app demo)

  23. Gameplay Recognition Mode • Goal: Find positions of user’s torso and arms. • When we’re actually playing the game, we use the info provided by calibration to help us. • Currently only use shirt + skin color.

  24. Body Shape Analysis • Slide a square window across the image; for each window position, use the pixel regions falling within the window to perform a local shape analysis. • Examine the resulting ellipses to find the arms. These are long, centered ellipses; round regions are the torso. (example) • Path-trace these to get an ordered series of points representing each arm. • Fit one or two line segments to this series of points (one segment = straight arm, two = bent).

  25. Hands in front of body? • The arm will blend into the body. • The hands will look like “holes” in the body. • This messes up arm detection.

  26. Multi-step Process: • Do a sliding window pass; approximate extents of torso using initial set of regions (holes may be there). • Look for hand-colored blobs in this area. • Merge those blobs with the set of torso regions. • Do another sliding window pass, now detecting elongated shapes (for arms).

  27. Creating a 3D character pose from 2D information • Resolve ambiguities with game-domain constraints (e.g. hands always within some plane in front of torso). • Use inverse kinematics and some simple body knowledge to recover 3D joint angles. • See the column “The Inner Product” in the April 2002 issue of Game Developer for an explanation of 3D IK, and source code.

  28. Method Advantages • It’s reasonably fast • Works with moving background / camera • Doesn’t care much about shadows

  29. Method Shortcomings • Currently confused by similar colors (low clustering resolution) • Requires a few more technical solutions before it will be truly robust (e.g. auto gamma detection).

  30. Future Work • Performance: 640x480 @ 30fps • More inverse rendering work (specularity) • Local surface modeling (eliminate confusion due to similar colors) • Texture classification • Mental model feedback

  31. Coding Issues • How do you get video images from a webcam in Windows? • VFW code by Nathan d’Obrenan in Game Programming Gems 2 • Unfortunately, VFW is a legacy API • DirectShow is the thing you need to use for future compatibility.

  32. DirectShow is terrible! • Needlessly complex and bloated. • The base classes provided in the DirectX SDK induce a lot of latency (latency = death) • A minimal implementation of “just give me a damn frame from the camera” took 1,500 lines of code; should have taken 8. • Ask me if you want the source code (jon@bolt-action.com) • Or use VFW or a proprietary API.

  33. Blatant Plug • Experimental Gameplay Workshop • Friday, 4pm-7pm, Fairmont Regency I

More Related