270 likes | 282 Views
A portable vision-based HCI system that operates on a projected interface, allowing real-time detection of user hand motion from a PDA/Smartphone's video camera. The system aims to provide an efficient method to run on portable devices, enabling a more instinctive way of data manipulation.
E N D
Portable Vision-Based HCI A Real-Time Hand Mouse System on Portable Devices 連矩鋒(Burt C.F. Lien) DepartmentofComputer Science and Information Engineering National Taiwan University
Problems • A Portable Vision-Based HCI • Hand mouse operating on a projected interface • Real-time detection of user hand motion from a user PDA/SmartPhone’s video camera (target platform) • Need an efficient method to run the idea on portable devices
Why important • Vision-based HCI is a more instinct way to manipulate data
Related Works I • A Portable System for Anywhere Interactions • Sukaviriya et al., IBM Research • Real-time hand tracking using a set of cooperative classifiers based on Haar-like features • Barczak1 et al., Institute of Information & Mathematical Sciences Massey University
Everywhere Display (IBM) Figure 1: Interactive store application
Related Works II • Rapid Object Detection Using a Boosted Cascade of Simple Features. • Viola, P., & Jones, M. (2001). • Robust real-time object detection. • Viola, P., & Jones, M. • Robust real-time face detection • P. Viola and M. Jones. • Adaboost-based real-time pedestrian detection • P. Viola, M. Jones, and D. Snow. • James W. Davis. "Hierarchical Motion History Images for Recognizing Human Motion," event, p. 39,IEEE Workshop on Detection and Recognition of Events in Video (EVENT'01), 2001 • Tim Weingaertner, Stefan Hassfeld, Ruediger Dillmann. "Human Motion Analysis: A Review," nam, p. 0090, 1997 IEEE Workshop on Motion of Non-Rigid and Articulated Objects (NAM '97),1997
Reference codes • Intel OpenCV Libraries • Motion Template • Motion History Image
Contribution • An efficient method to run a real-time vision-based HCI system on portable device • Experiment result: Typically 5~7% CPU Usage ( Intel Pentium M processor 730 (1.6G) ) with 640x320 resolution (3FPS) • The motion method used in this system does not need a training process. This significantly reduced lots of training efforts and can be more robust (lighting proof) on object detection even with a blurred image.
System Configuration Wireless projector projected contents data transmission Hand motion capture and interpretation Interactive Interface
Platform and Tools • Platform (prototype) • “Laptop” + “Low Cost Camera (USB) – NT300” • Software tools • “MS VC++” + “Intel OpenCV library”
Assumption • A rectangle screen shape • Background is static most of the time • 1 user only
Adaboost (old version) • To recognize a “hand” • Adaboost training ( 1397 hand images + 3000 background images ) • Takes 2 days for training a 11-stage classifier ( Viola & Jones order of weeks ) • Result: Classifier too weak to recognize and falsealarm rate is high
Haartraining Result Original test image Stress the outline of a hand manually Darkening the background
Motion Template • Give up adaboost learning classifier • Motion Template • Motion History Image : image ring buffer ( N=3) • To reduce the computation (take off complex mathematical computation and replace with some simple heuristics ) • To acquire and record the front edge of a motion • To define orientation (for different instruction) • To detect a “touch” behavior (density drop rate)
where each pixel (x,y) in the MHI is marked with a current timestamp if the function signals object presence (or motion) in the current video image I(x,y) ; the remaining timestamps in the MHI are removed if they are older than the decay value . This update function is called for every new video frame analyzed in the sequence. Motion History Image
Motion trajectory Note: Record the last 50 front edges
System Flow Chart start Capture from CAM Noise filter Find the screen (edge detection) Mouse/keyboard events Motion interpretation MHI Update Find frond point Image Diff
Find the Screen • During initialization, to find the projected screen • Algorithm • Canny edge detection • Find the screen • Find all the squares in the image and choose the biggest one • Adaptive • Adjust the screen every 10 second in case the camera is moved
Position (pixel) Mapping • Screen mapping (camera and computer) • Define the scale for coordinate translation • eg. 800x600 (camera resolution) 1280x800 (computer resolution). • scale-x = 1280/800 • scale-y = 800/600 Origin Camera Resolution Computer New Origin detected screen 800 600 Origin 800 1280
Event definition • To define mouse or keyboard events • mouse click • if image density dropped dramatically ( > 70%~80%), the position of last frond edge is defined coordinate of a mouse click • Page Up (PgUp) • if above action happens from the left side of the screen, we define this as a “PgUp” event. • Close current windows application • Consecutive 3 error detection within 8 seconds
Noise filtering • False positives • motion trajectories are recorded to filter out false positive signals (partly implemented) • Signal bouncing • A 10 second interval of bouncing is introduced after a valid mouse/keyboard event is detected
Performance • CPU: Pentium M Processor 730 (1.6GHz) • HaarDetectObjects (Typical) • 5 fps (640x480) : 80% CPU Usage • 3 fps (640x480) : 30% CPU Usage • 3 fps (640x480, hand+face classifier) : 50% • Motion Template (Typical) • 3 fps (176x144) : 2~5% CPU Usage • 3 fps (640*480) : 5~8% CPU Usage • 3 fps (800x600) : 10% CPU Usage
System Limitation • High error rate when moving fast • Can be solved by increasing the FPS • Unexpected stop in the middle of the screen will cause falsealarm • Shadow would impact the correctness • If the screen is not well detected, or if the mapping is distorted, accuracy of position will be very low.
Future Work • To improve the accuracy • To port the system to a handheld device • To advance to a real steerable interface (something like “Minority Report”) that a user can drag the icons directly on the projected screen.