Natural User Interface with Kinect for Windows

Natural User Interface with Kinect for Windows Clemente Giorio & Paolo Patierno

Natural User Interface

Hardware Overview 3-axis ACCELEROMETER IR PROJECTOR MIC ARRAY DEPTH CAMERA RGB CAMERA • Hardware Requirements: • Windows 7, Windows 8, Windows Embedded Standard 7, or Windows Embedded POSReady 7. • CPU x86 or x64 • Dual-core 2.66-GHz • Dedicated USB 2.0 bus • 2 GB RAM TILT MOTOR

Inside Kinect

IR Projector The pattern is composed by 3x3 sub-patterns of 211x165 dots pattern (for a total of 633x495 dots). In each sub-patterns one spot is much brighter than all the others. 827nm

Depth Camera CMOS with an IR-pass filter up-to 640x480pixels Each pixel, based on 11 bits, can represents 2048 levels of depth.

RGB Camera IR frame CMOS 1280x960@12fps 30fps@640x480with 8bits per channel producing a Bayer filter output with a RGGBD pattern

Tilt Motor & 3-axis Accelerometer 3-axis accelerometer configured for a 2g range (g is the acceleration value due to gravity) with 1-3 degree accuracy. Tilt Motor

Mic Array 4 x mic 24-bit Analog to Digital Converter The captured audio is encoded using Pulse-Code Modulation (PCM) with a sampling rate of 16 KHz and 16-bit depth. Advantages of multi-microphones Enhanced Noise Suppression, Acoustic Echo Cancellation Beam-forming technique.

SDK Overview

Camera Data

Step 1: Register for VideoFrameReady Event /// Active Kinect sensor privateKinectSensorsensor; // Turn on the color stream to receive color frames this.sensor.ColorStream.Enable(ColorImageFormat.RgbResolution640x480Fps30); // Add an eventhandler to be calledwheneverthereis new color frame datathis.sensor.ColorFrameReady += this.SensorColorFrameReady; // Start the sensor! this.sensor.Start();

Step 2: Read the Stream ///Eventhandler for Kinect sensor'sColorFrameReadyevent privatevoidSensorColorFrameReady(objectsender, ColorImageFrameReadyEventArgs e) { using (ColorImageFramecolorFrame = e.OpenColorImageFrame()) { if (colorFrame != null) { // Copy the pixel data from the image to a temporary array colorFrame.CopyPixelDataTo(this.colorPixels); // Write the pixel data intoour bitmap this.colorBitmap.WritePixels(new Int32Rect(0, 0, • this.colorBitmap.PixelWidth, • this.colorBitmap.PixelHeight), • this.colorPixels, • this.colorBitmap.PixelWidth * sizeof(int),0); } } }

DepthFrameReadyEvent voidsensor_DepthFrameReady(objectsender, DepthImageFrameReadyEventArgs e) { using (DepthImageFramedepthFrame = e.OpenDepthImageFrame()) { if (depthFrame != null) { // Copy the pixel data from the image to a temporary array depthFrame.CopyDepthImagePixelDataTo(this.depthPixels); //convert the depthpixels to coloredpixels ConvertDepthData2RGB(depthFrame.MinDepth, depthFrame.MaxDepth); this.depthBitmap.WritePixels( new Int32Rect(0, 0, this.depthBitmap.PixelWidth, this.depthBitmap.PixelHeight), this.colorDepthPixels, this.depthBitmap.PixelWidth * sizeof(int), 0); UpdateFrameRate(); } } }

Depth Data • ImageFrame.Image.Bits • Array of bytes - public byte[] Bits; • Array –Starts at top left of image –Moves left to right, then top to bottom –Representsdistance for pixel in millimeters

Distance • 2 bytes per pixel (16 bits) • Depth – Distance per pixel –Bitshiftsecond byte by 8 –Distance (0,0) = (int)(Bits[0] | Bits[1] << 8); –VB (int)(CInt(Bits(0)) Or CInt(Bits(1)) << 8); • DepthAndPlayerIndex – Includes Player index –Bitshift by 3 first byte (player index), 5 second byte –Distance (0,0) =(int)(Bits[0] >> 3 | Bits[1] << 5); –VB:(int)(CInt(Bits(0)) >> 3 Or CInt(Bits(1)) << 5);

Skeleton Tracking • Skeleton Data Y X Z

Skeleton Seated 10 Joints Default 20 Joints

Skeleton API

Joint Data • Maximum two players tracked at once • Six player proposals • Each player with set of <x, y, z> joints in meters • Each joint has associated state • Tracked, Not tracked, or Inferred • Inferred - Occluded, clipped, or low confidence joints

Step 1: SkeletonFrameReadyevent // Turn on the skeletonstream to receiveskeletonframes this.sensor.SkeletonStream.Enable(); // Add an eventhandler to be calledwheneverthereis new color frame data this.sensor.SkeletonFrameReady += this.SensorSkeletonFrameReady; ///Eventhandler for Kinect sensor'sSkeletonFrameReadyeventprivatevoidSensorSkeletonFrameReady (objectsender, SkeletonFrameReadyEventArgs e) { Skeleton[] skeletons = newSkeleton[0]; using (SkeletonFrameskeletonFrame = e.OpenSkeletonFrame()) { if (skeletonFrame != null) { skeletons = newSkeleton[skeletonFrame.SkeletonArrayLength]; skeletonFrame.CopySkeletonDataTo(skeletons); } }

Step 2: Read the skeleton data using (DrawingContext dc = this.drawingGroup.Open()) { // Draw a transparent background to set the render size dc.DrawRectangle(Brushes.Black, null, newRect(0.0, 0.0, RenderWidth, RenderHeight)); if (skeletons.Length != 0) { foreach (Skeletonskelinskeletons) { RenderClippedEdges(skel, dc); if (skel.TrackingState == SkeletonTrackingState.Tracked) { this.DrawBonesAndJoints(skel, dc);} elseif (skel.TrackingState == SkeletonTrackingState.PositionOnly) { dc.DrawEllipse(this.centerPointBrush, null,this.SkeletonPointToScreen(skel.Position), BodyCenterThickness, BodyCenterThickness); }}} // preventdrawingoutside of our render area this.drawingGroup.ClipGeometry = newRectangleGeometry(newRect(0.0, 0.0, RenderWidth, RenderHeight)); } }

Step 3: Use the joint data // Left Arm this.DrawBone(skeleton, drawingContext, JointType.ShoulderLeft, JointType.ElbowLeft); this.DrawBone(skeleton, drawingContext, JointType.ElbowLeft, JointType.WristLeft); this.DrawBone(skeleton, drawingContext, JointType.WristLeft, JointType.HandLeft);

Step 4: Fine-tune

Audio • As microphone • For Speech Recognition

Speech Recognition • Kinect Grammar available to download • Grammar – What we are listening for –Code – GrammarBuilder, Choices –Speech Recognition Grammar Specification (SRGS) • C:\Program Files (x86)\Microsoft Speech Platform SDK\Samples\Sample Grammars\

Grammar <grammarversion="1.0"xml:lang="it-IT"root="rootRule"tag-format="semantics/1.0-literals"xmlns="http://www.w3.org/2001/06/grammar"> <ruleid="rootRule"> <one-of> <item> <tag>FORWARD</tag> <one-of> <item> avanti </item> <item> vai avanti </item> <item> avanza </item> </one-of> </item> <item> <tag>BACKWARD</tag> <one-of> <item> indietro </item> <item> vai indietro </item> <item> indietreggia </item> </one-of> </item> </one-of> </rule> </grammar>

Netduino Plus based robot Magician chassis • Struttura • 2 DC motors Motor driver WiFi bridge Netduino Plus

Demo MotionControlRemote Connect & Commands MotionServer MotionClient MotionControlTB6612FNG TB6612FNG

Demo

DEMO Per visualizzare qualche attimo registrato durante la sessione:Demo GestureRecognition: https://vimeo.com/58336449 Demo Speech Recognition in Napoletano: https://vimeo.com/58336020

Resources & Contact Kinect for Windows: http://www.microsoft.com/en-us/kinectforwindows/ MSDN: http://msdn.microsoft.com/en-us/library/hh855347.aspx Clemente Giorio: http://it.linkedin.com/pub/clemente-giorio/11/618/3a Paolo Patierno: http://it.linkedin.com/in/paolopatierno

Ringraziamo gli sponsor!

Natural User Interface with Kinect for Windows