540 likes | 753 Views
A Non-obtrusive Head Mounted Face Capture System. Chandan K. Reddy Master’s Thesis Defense. Thesis Committee: . Dr. George C. Stockman (Main Advisor) Dr. Frank Biocca (Co-Advisor) Dr. Charles Owen Dr. Jannick Rolland (External Faculty). Modes of Communication.
E N D
A Non-obtrusive Head Mounted Face Capture System Chandan K. ReddyMaster’s Thesis Defense Thesis Committee: Dr. George C. Stockman (Main Advisor) Dr. Frank Biocca (Co-Advisor) Dr. Charles Owen Dr. Jannick Rolland (External Faculty)
Modes of Communication • Text only - e.g. Mail, Electronic Mail • Voice only – e.g. Telephone • PC camera based conferencing – e.g. Web cam • Multi-user Teleconferencing • Teleconferencing through Virtual Environments • Augmented Reality Based Teleconferencing
Problem Definition • Face Capture System ( FCS ) • Virtual View Synthesis • Depth Extraction and 3D Face Modeling • Head Mounted Projection Displays • 3D Tele-immersive Environments • High Bandwidth Network Connections
Thesis Contributions • Complete hardware setup for the FCS. • Camera-mirror parameter estimation for the optimal configuration of the FCS. • Generation of quality frontal videos from two side videos • Reconstruction of texture mapped 3D face model from two side views • Evaluation mechanisms for the generated frontal views.
Existing Face Capture Systems FaceCap3d - a product from Standard Deviation Optical Face Tracker – a product from Adaptive Optics Courtesy : Advantages : Freedom for Head Movements Drawbacks : Obstruction of the user’s Field of view Main Applications : Character Animation and Mobile environments
Existing Face Capture Systems Courtesy: Sea of Cameras (UNC Chappel Hill) National tele-immersion Initiative Advantages : No burden for the user Drawbacks : Highly equipped environments and restricted head motion Main Applications : Teleconferencing and Collaborative work
Proposed Face Capture System (F. Biocca and J. P. Rolland, “Teleportal face-to-face system”, Patent Filed, 2000.) Novel Face Capture System that is being developed. Two Cameras capture the corresponding side views through the mirrors
Advantages • User’s field of view is unobstructed • Portable and easy to use • Gives very accurate and quality face images • Can process in real-time • Simple and user-friendly system • Static with respect to human head • Flipping the mirror – cameras view the user’s viewpoint
Applications • Mobile Environments • Collaborative Work • Multi-user Teleconferencing • Medical Areas • Distance Learning • Gaming and Entertainment industry • Others
Optical Layout • Three Components to be considered • Camera • Mirror • Human Face
Specification Parameters • Camera • Sensing area: 3.2 mm X 2.4 mm (¼”). • Pixel Dimensions: Image sensed is of dimensions 768 X 494 pixels. Digitized image size is 320 X 240 due to restrictions of the RAM size. • Focal Length(Fc): 12 mm (VCL – 12UVM). • Field of View (FOV): 15.2 0 X 11.4 0. • Diameter (Dc): 12mm • Fnumber (Nc): 1 -achieve maximum lightness. • Minimum Working Distance (MWD)- 200 mm. • Depth of Field (DOF): to be estimated
Specification Parameters (Contd.) • Mirror • Diameter (Dm) / Fnumber (Nm) • Focal Length (fm) • Magnification factor (Mm) • Radius of curvature (Rm) • Human Face • Height of the face to be captured (H~ 250mm) • Width of the face to be captured (W~ 175 mm) • Distances • Distance between the camera and the mirror. (Dcm~150mm) • Distance between the mirror and the face. (Dmf ~200mm)
Customization of Cameras and Mirrors • Off-the-shelf cameras • Customizing camera lens is a tedious task • Trade-off has to be made between the field of view and the depth of field • Sony DXC LS1 with 12mm lens is suitable for our application • Custom designed mirrors • A plano-convex lens with 40mm diameter is coated with black on the planar side. • The radius of curvature of the convex surface is 155.04 mm. • The thickness at the center of the lens is 5 mm. • The thickness at the edge is 3.7 mm.
Problem Statement Generating virtual frontal view from two side views
Data processing • Two synchronized videos are captured in real-time (30 frames/sec) simultaneously. • For effective capturing and processing, the data is stored in uncompressed format. • Machine Specifications (Lorelei @ metlab.cse.msu.edu): • Pentium III processor • Processor speed: 746 MHz • RAM Size: 384 MB • Hard Disk write Speed (practical): 9 MB/s • MIL-LITE is configured to use 150 MB of RAM
Data processing (Contd.) • Size of 1 second video = 30 * 320 * 240 *3 = 6.59 MB • Using 150 MB RAM, only 10 seconds video from two cameras can be captured • Why does the processing have to be offline? • Calibration procedure is not automatic • Disk writing speed must be at least 14 MB/S. • To capture 2 videos of 640 * 480 resolution, the Disk writing speed must be at least 54 MB/S ???
Structured Light technique Projecting a grid on the frontal view of the face A square grid in the frontal view appears as a quadrilateral (with curved edges) in the real side view
Color Balancing • Hardware based approach • White balancing of the cameras • Why this is more robust ? – why not software based ? • There is no change in the input camera • Better handling of varying lighting conditions • No pre - knowledge of the skin color is required • No additional overhead • Its enough if both cameras are color balanced relatively
Off-line Calibration Stage Left Calibration Face Image Right Calibration Face Image Projector Transformation Tables
Operational Stage Left Face Image Right Face Image Transformation Tables Left Warped Face Image Right Warped Face Image Mosaiced Face Image
Comparison of the Frontal Views First row – Virtual frontal views Second row – Original frontal views
Video Synchronization (Eye blinking) First row – Virtual frontal views Second row – Original frontal views
Coordinate Systems There are five coordinate systems in our application • World Coordinate System (WCS) • Face Coordinate System (FCS) • Left Camera Coordinate system (LCCS) • Right Camera Coordinate system (RCCS) • Projector Coordinate System (PCS)
s L Pr C11C12C13C14 C21C22C23C24 C31C32C331 s W Px s W Py s L Pc = s W Pz s 1 Camera Calibration • Conversion from 3D world coordinates to 2D camera coordinates - Perspective Transformation Model Eliminating the scale factor uj = (c11 – c31 uj) xj + (c12 – c32 uj) yj + (c13 – c33 uj) zj + c14 vj = (c21 – c31 vj) xj + (c22 – c32 vj) yj + (c23 – c33 vj) zj + c24
Calibration sphere • A sphere can be used for Calibration • Calibration points on the sphere are chosen in such a way that the Azimuthal angle is varied in steps of 45o Polar angle is varied in steps of 30o • The location of these calibration points is known in the 3D coordinate System with respect to the origin of the sphere • The origin of the sphere defines the origin of the World Coordinate System
Projector Calibration • Similar to Camera Calibration • 2D image coordinates can not be obtained directly from a 2D image. • A “Blank Image” is projected onto the sphere • The 2D coordinates of the calibration points on the projected image are noted • More points can be seen from the projector’s point of view – some points are common to both camera views • Results appear to have slightly more errors when compared to the camera calibration
3D Face Model Construction • Why? • To obtain different views of the face • To generate the stereo pair to view it in the HMPD • Steps required • Computation of 3D Locations • Customization of 3D Model • Texture Mapping
Computation of 3D points • 3d point estimation using stereo • Stereo between two cameras is not possible because of the occlusion by the facial features • Hence two stereo pair computations • Left camera and projector • Right camera and projector • Using stereo, compute 3D points of prominent facial feature points in FCS
3D Generic Face Model A generic face model with 395 vertices and 818 triangles Left: front view and Right: side view
Evaluation Schemes • Evaluation of facial expressions and is not studied extensively in literature • Evaluation can be done for facial alignment, face recognition for static images • Lip and eye movements in a dynamic event • Perceptual quality – How are the moods conveyed? • Two types of evaluation • Objective evaluation • Subjective evaluation
Objective Evaluation • Theoretical Evaluation • No human feedback required • This evaluation can give us a measure of • Face recognition • Face alignment • Facial movements • Methods applied • Normalized cross correlation • Euclidean distance measures
Evaluation Images 5 frames were considered for objective evaluation First row – virtual frontal views Second row – original frontal views
Normalized Cross-Correlation • Regions considered for normalized cross-correlation ( Left: Real image Right: Virtual image)
Normalized Cross-Correlation • Let V be the virtual image and R be the real image • Let w be the width and h be the height of the images • The Normalized Cross-correlation between the two images V and R is given by where
Euclidean Distance measures Euclidean distance between two points i and j is given by Let Rij be the euclidean distance between two points i and j in the real image Let Vij be the euclidean distance between two points i and j in the virtual image Dij = | Rij - Vij |
Subjective Evaluation • Evaluates the human perception • Measurement of quality of a talking face • Factors that might affect • Quality of the video • Facial movements and expressions • Synchronization of the two halves of the face • Color and Texture of the face • Quality of audio • Synchronization of audio • A preliminary study has been made to assess the quality of the generated videos