10 likes | 146 Views
REAL TIME EYE TRACKING FOR HUMAN COMPUTER INTERFACES Subramanya Amarnag, Raghunandan S. Kumaran and John Gowdy Dept. of Electrical and Computer Engineering , Clemson University. Email: {asubram, ksampat, jgowdy}@clemson.edu. Abstract. Pre - Processing. Post Processing.
E N D
REAL TIME EYE TRACKING FOR HUMAN COMPUTER INTERFACES Subramanya Amarnag, Raghunandan S. Kumaranand John Gowdy Dept. of Electrical and Computer Engineering , Clemson University. Email: {asubram, ksampat, jgowdy}@clemson.edu Abstract Pre - Processing Post Processing In recent years considerable interest has developed in real time eye tracking for various applications including lip tracking. Although there exist many lip tracking algorithms, they are bound by a number of constraints such as color of the lips, the size and shape of the lips, constant motion of the lips etc, for their successful implementation. However, eye tracking algorithms may be designed to overcome these constraints. Hence eye tracking appears to be a reasonable solution to the lip tracking problem as a fix on the speakers eyes will give us a rough estimate on the position of the lips. • Clustering returns the total number of ‘dark islands’ in the image. • Post processing is done to identify the ‘eyes’ among these ‘dark islands’. • The first step is to merge clusters which are close to each other ( less than 5 pixels). • The next step uses the geometrical features of the clusters such as the size, width and the height to eliminate them. • Finally we should be left with 2 clusters which represent the eyes. • The location of the eyes are used to limit the search region for the next frame. • In this stage the intensity of the pixels is considered for eliminating • a number of pixels. • A threshold of 0.27 has been experimentally determined to be ideal • for most cases. • If the intensity of a pixel is above the threshold, then that pixel is • eliminated. • The remaining pixels are passed to the next stage. Eye tracking Bayesian Classifier Results Intrusive Non Intrusive • In this stage the problem consists of classifying the pixels into eye and non-eye classes. • Bayesian Classifier is used as the binary classifier. • Gaussian PDFs are used to model both the eye and non-eye classes. • Means and covariance of the classes are dynamically updated after processing each frame. • The system was implemented on an Intel Pentium III 997 MHz machine and achieved a frame rate of 26 fps. • The system was tested on 2 databases : Clemson University Audio Visual Experiments ( CUAVE ) database and the CMU audio-visual dataset. • Accuracy achieved: • CMU database : 88.3% • CUAVE database, stationary speaker : 86.4% • CUAVE database, moving speaker : 76.5% Advantages: Can be highly accurate. Disadvantages: Can be very cumbersome for the user. Not ideal for practical purposes. Advantages: User friendly. Disadvantages: The accuracy of systems developed thus far is not good when compared to intrusive systems Clustering • Bayesian Classifier does not eliminate all the non-eye pixels, especially facial hair and other dark pixels. • Clustering is performed to identify the ‘dark islands’ in the remaining image. • Our algorithm can be considered as an unsupervised c-means algorithm. The difference being that here no assumptions are made regarding the number of cluster or the cluster centers. • Our System - Highlights • Non IR based • Non Intrusive • Uses an ordinary camera to track the eyes • Utilizes a Dynamic training strategy thus making it user and lighting • condition invariant. • Ideal for systems where high accuracy is not required Results for a sequence of frames from the CMU dataset Exemplar(1) = x(1); noe = 0 This figure illustrates the performance of the System against complex backgrounds For i=1 to N Results for a sequence of frames from the CUAVE dataset Bayesian Classifier Post-Processing Frame Search Region Pre-Processing Input Frame Clustering For j=1 to noe Update exemplar References Is dist( x(i), exemplar(j) ) < threshold [1] S. Baluja and D. Pomerleau, “Non Intrusive gaze tracking using Artificial Neural Networks,” Technical Report CMU-CS-94-102, Carnegie Mellon University. [2] Advanced Multimedia Processing Lab, CMU, http://amp.ece.emu.edu/projects/AudioVisualSpeechProcessing/. [3] E.K. Patterson, S. Gurbuz, Z. Tufekci, and J.N. Gowdy, “ CUAVE: A New Audio-Visual Database for Multimodal Human-Computer Interface Research,” ICASSP, Orlando, May 2002. Eyes Located Successfully? Update Means And Covariance. Yes No Update frame Search Region Yes Yes noe = Number of exemplars j = noe Location Of the Eyes No, Process Next Frame Create a new cluster, noe = noe + 1