620 likes | 633 Views
EEC-492/592 Kinect Application Development. Lecture 19 Wenbing Zhao wenbing@ieee.org. Outline. Administrative This Wednesday, a guest seminar on entrepreneurship by Mr.Benjamin Rosolowski, CEO, CPI Group Limited Project final presentation signup Project self-assessment report due
E N D
EEC-492/592Kinect Application Development Lecture 19 Wenbing Zhao wenbing@ieee.org
Outline Administrative This Wednesday, a guest seminar on entrepreneurship by Mr.Benjamin Rosolowski, CEO, CPI Group Limited Project final presentation signup Project self-assessment report due Image processing with Emgu CV Shape detection Object detection Using Emgu CV with Kinect
Shapes Detection • Contours • The Hough transform, which will enable us to detect regular shapes like lines and circles in images • Calculation of bounding boxes
Contours • Difference between contours and edges • Edges are local maxima of intensity gradients • These gradient maxima are not all on the outlines of objects and they may be noisy • Canny edges are a lot like contours • Contours: a set of points connected to each other, most likely to be located on the outline of objects
Finding Contours public Contour<Point> FindContours(CHAIN_APPROX_METHOD method, RETR_TYPE type, MemStorage stor); • Obtain Canny edges: Image<>.Canny() • Call FindContours() on the cannyEdges image • Parameters: • method: The type of approximation method • type: The retrieval type • stor: The storage used by the sequences • Returns: Contour if there is any; null if no contour is found
Hough Transform • The Hough transform transforms a contour from X-Y space to a parameter space (typically the polar coordinate system) • It then uses certain parametric properties of target curves (like lines or circles) to identify the points that fit the target curve in parameter space
Detecting Lines with Hough Transform • A point in a 2D image can be represented in Cartesian coordinate as (x,y), or in polar coordinate as (r,) • Points that are collinear in the Cartesian space correspond to different sinusoids in the Hough space (i.e., polar coordinate) • Points in the same line would intersect at (r,) in Hough space => How lines are detected
Detecting Circles with Hough Transform • Need 3 parameters: two for the center, one for the radius (a, b, r) • Equation for the circle, given the center at (a,b) and a point in the circle (x,y): • x = a + r cos() • Y = b + r sin() • When the angle sweeps through the full 360 degree range, the points (x,y) trace the perimeter of a circle • Search for a, b, r • Use an accumulator: 3d array
Hough Transform: Accumulator • Accumulator: array used for shape detection in Hough space • Dimension of the array corresponds to the number of unknown parameters of a given Hough transform • Transform is implemented by quantizing the Hough parameter space into finite intervals called accumulator cells • One cell per each (x,y) point • Accumulator cell is incremented if (x,y) lies along the curve • Bins with the highest values or peaks in the accumulator array represent strong evidence that the shape is detected
Hough Transform APIs public LineSegment2D[][] HoughLinesBinary(double rhoResolution, double thetaResolution, int threshold, double minLineWidth, double gapBetweenLines); • Method defined on Image<> class. Must find edges first: Image<>.Canny() method • Parameters: • rhoResolution: Distance resolution in pixel-related units. • thetaResolution: Angle resolution measured in radians • threshold: A line is returned by the function if the corresponding accumulator value is greater than threshold • minLineWidth: Minimum width of a line • gapBetweenLines: Minimum gap between lines • Returns: The line segments detected for each of the channels
Hough Transform APIs public CircleF[][] HoughCircles(TColor cannyThreshold, TColor accumulatorThreshold, double dp, double minDist, int minRadius, int maxRadius); • Also defined on Image<> class. So do Canny edges as well as circle detection • Parameters: • cannyThreshold: The higher threshold of the two passed to Canny edge detector • accumulatorThreshold: Accumulator threshold at the center detection stage. The smaller it is, the more false circles may be detected. • dp: Resolution of the accumulator • minRadius: Minimal radius of the circles to search for • maxRadius: Maximal radius of the circles to search for • minDist: Minimum distance between centers of the detected circles. • Returns: The circle detected for each of the channels
Bounding Box • A rectangle of the minimum area enclosing a set of points • Defined in Contour class as a property public override Rectangle BoundingRectangle { get; }
Building Shape Detection App • Can be built on top of the image filtering app by adding a button for shape detection
Building Shape Detection App //Convert the image to grayscale and filter out the noise Image<Gray, Byte> gray = img.Convert<Gray, Byte>().PyrDown().PyrUp(); // Detecting circles Gray cannyThreshold = new Gray(180); Gray cannyThresholdLinking = new Gray(120); Gray circleAccumulatorThreshold = new Gray(500); CircleF[] circles = gray.HoughCircles( cannyThreshold, circleAccumulatorThreshold, 4.0, //Resolution of the accumulator used to detect centers of the circles 15.0, //min distance 5, //min radius 0 //max radius )[0]; //Get the circles from the first channel
Building Shape Detection App // detecting lines mage<Gray, Byte> cannyEdges = gray.Canny(cannyThreshold, cannyThresholdLinking); LineSegment2D[] lines = cannyEdges.HoughLinesBinary( 1, //Distance resolution in pixel-related units Math.PI / 45.0, //Angle resolution measured in radians. 20, //threshold 30, //min Line width 10 //gap between lines )[0]; //Get the lines from the first channel // find triangles and rectangles List<Triangle2DF> triangleList = new List<Triangle2DF>(); List<MCvBox2D> boxList = new List<MCvBox2D>(); //a box is a rotated rectangle
Building Shape Detection App //allocate storage for contour approximation using (MemStorage storage = new MemStorage()) for ( Contour<Point> contours = cannyEdges.FindContours( Emgu.CV.CvEnum.CHAIN_APPROX_METHOD.CV_CHAIN_APPROX_SIMPLE, Emgu.CV.CvEnum.RETR_TYPE.CV_RETR_LIST, storage); contours != null; contours = contours.HNext) { Contour<Point> currentContour = contours.ApproxPoly(contours.Perimeter * 0.05, storage);
Building Shape Detection App //only consider contours with area greater than 250 if (currentContour.Area > 250) { if (currentContour.Total == 3) //The contour has 3 vertices, it is a triangle { Point[] pts = currentContour.ToArray(); triangleList.Add(new Triangle2DF(pts[0], pts[1], pts[2] )); } else if (currentContour.Total == 4) //The contour has 4 vertices. { // determine if all the angles in the contour are within [80, 100] degree bool isRectangle = true; Point[] pts = currentContour.ToArray(); LineSegment2D[] edges = PointCollection.PolyLine(pts, true); for (int i = 0; i < edges.Length; i++) { double angle = Math.Abs(edges[(i + 1) % edges.Length].GetExteriorAngleDegree(edges[i])); if (angle < 80 || angle > 100) { isRectangle = false; break; } } if (isRectangle) boxList.Add(currentContour.GetMinAreaRect()); } } }
Building Shape Detection App Image<Bgr, Byte> shapesImg = img.CopyBlank(); // draw triangles and rectangles foreach (Triangle2DF triangle in triangleList) shapesImg.Draw(triangle, new Bgr(Color.DarkBlue), 2); foreach (MCvBox2D box in boxList) shapesImg.Draw(box, new Bgr(Color.DarkOrange), 2); // draw circles foreach (CircleF circle in circles) shapesImg.Draw(circle, new Bgr(Color.Brown), 2); image2.Source = ToBitmapSource(shapesImg);
The Object Detection Problem Object detection is typically done by comparing two sets of images One image is the model for the object you want to detect The other image contains the object you want to detect Keypoint-based object detection methods Avoid the use of the characteristics of the whole object model for comparison Find certain “important” points, i.e., keypoints, in the object model, and compare only the keypoints Using some notion of similarity and see how many keypoint descriptors match
Keypoints and Descriptors • Keypoint descriptors are often called features • Object detection algorithm typically is scale and rotation invariant • Scale invariance: actual objects can be bigger or smaller than the model image • Rotation invariance: actual objects may be rotated compared with the model image • Each keypoint has a scale and an orientation associated
SIFT Keypoint Detection • Convolves the image with Gaussians of successively increasing variance => each time, the scale is doubled and the image is down-sampled by a factor of 2 • This creates a scale pyramid • Successive images in this pyramid are subtracted from each other => resulting images are said to be the output of a Difference of Gaussians (DoG) operator on the original image • A point is selected as a keypoint only if it is higher or lower than all its 26 neighbors
SIFT Keypoint Detection • Keypoint locations are further filtered by checking that they have sufficient contrast and that they are not a part of an edge • A square region around the keypoint corresponding to a circle with a radius of 1.5 times the scale is selected in the image • The gradient orientation at every point in this region is also computed • An orientation histogram is constructed. Peaks in the histogram => orientation of the keypoints
SURF Keypoint Detection • Significantly faster than SIFT • SURF uses rectangular discretized integer approximations for complicated continuous real-valued functions • SURF uses maxima of the Hessian matrix determinant
SURF Keypoint Detection • Hessian matrix can be thought of as second-order spatial derivatives of the image after smoothing it with a Gaussian filter • Second-order spatial derivatives of the image will have a peak if there are sudden intensity changes • However, edges can also count as sudden intensity changes. The determinant of the matrix helps us to distinguish edges from corners • The determinant of the Hessian matrix: • det(H) = Lxx(x, s)*Lyy(x, s) – Lxy(x,s)2
SURF Keypoint Detection det(H) = Lxx(x, s)*Lyy(x, s) – Lxy(x,s)2 • Lxx and Lyy respond to vertical and horizontal edges, respectively • Lxy responds most favorably to corners formed by diagonal edges • The determinant therefore will have a high value when • there is an intersecting pair of horizontal and vertical edges (making a corner), or • when there is a corner made from diagonal edges
Determinant of a Matrix • The determinant is a value associated with a square matrix • the absolute value of the determinant gives the scale factor by which area or volume (or a higher dimensional analogue) is multiplied under the associated linear transformation, while its sign indicates whether the transformation preserves orientation
SURF Keypoint Detection • Convolution with a kernel: element-wise multiplication of pixel values with the kernel elements and then summing up • If kernel elements are constant, we can sum up the pixel value under the kernel and multiply the sum by the constant kernel value • Box filter: kernel with constant elements in rectangular regions • Integral image:For any image, its integral image at a pixel is its cumulative sum until that pixel, starting from the origin (top-left corner) • Mathematically, if I is an image and H is the integral image, the pixel (x, y) of H is given by:
SURF Keypoint Detection • To construct the scale-space pyramid, SURF increases the size of the Gaussian filter rather than reducing the size of the image • Next finds the extrema of the Hessian matrix determinant values at different scales by comparing a point with its 26 neighbors in the pyramid just like SIFT • This gives the SURF keypoints with their scales • Keypoint orientation is decided by selecting a circular neighborhood of radius 6 times the scale of the keypoint around a keypoint • At every point in this neighborhood, responses to horizontal and vertical box filters (called Haar wavelets) are recorded
SURF Keypoint Detection • The responses are weighted and are represented as vectors in a space with horizontal response strength along the x-axis and vertical response strength along the y-axis • A sliding arc with angle of 60 degrees sweeps a rotation through this space • All the responses within the window are summed to give a new vector; there are as many of these vectors as there are iterations of the sliding window • The largest of these vectors lends its orientation to the keypoint
Keypoint Descriptor • Each SIFT keypoint has a 128-element descriptor • Each SURF keypoint has a 64-element descriptor • Matching keypoint descriptors • Two descriptors are considered to match if the Euclidean distance between them is low • The matches are filtered using special algorithms called “nearest neighbor searches” to find out the test descriptor closest in the Euclidean sense to each train descriptor • A descriptor match is eliminated if the distance from the train descriptor to its 1st nearest neighbor is greater than a threshold times the distance to the 2nd nearest neighbor • This threshold is usually set to 0.8. This forces the matches to be unique and highly “discriminative” • The lower the threshold, the more stringent on matching
EmguCV MKeyPoint Struct • public struct MKeyPoint • float Angle; // orientation of the keypoint • int ClassId; // object id • int Octave; // octave (pyramid layer) from which the keypoint has been extracted • PointF Point; // coordinates of the keypoint • float Response; // the response by which the strongest keypoints have been selected: an indicator how good a point is • float Size; // diameter of the useful keypoint adjacent area
EmguCV ImageFeature Struct • public struct ImageFeature • public float[] Descriptor; • public MKeyPoint KeyPoint
Build SURFeature App • Import name spaces • Add member variables using System.Diagnostics; using System.Drawing; using System.Runtime.InteropServices; using Emgu.CV; using Emgu.CV.CvEnum; using Emgu.CV.Features2D; using Emgu.CV.Structure; using Emgu.CV.UI; Image<Gray, Byte> modelImg; Image<Gray, Byte> testImg;
Build SURFeature App • Constructor and init() public MainWindow() { InitializeComponent(); init(); } private void init() { this.modelImg = new Image<Gray, Byte>("box.png"); this.testImg = new Image<Gray, Byte>("box_in_scene.png"); image1.Source = ToBitmapSource(this.modelImg); image2.Source = ToBitmapSource(this.testImg); }
Build SURFeature App private void button1_Click(object sender, RoutedEventArgs e) { featureTest(); } • Add button click event handler • Feature detection private void featureTest() { SURFDetector surfParam = new SURFDetector(500, false); //extract features from the object image ImageFeature[] modelFeatures = surfParam.DetectFeatures(this.modelImg, null); //Create a Feature Tracker Features2DTracker tracker = new Features2DTracker(modelFeatures); // extract features from the observed image ImageFeature[] imageFeatures = surfParam.DetectFeatures(this.testImg, null);
Build SURFeature App Features2DTracker.MatchedImageFeature[] matchedFeatures = tracker.MatchFeature(imageFeatures, 2, 20); matchedFeatures = Features2DTracker.VoteForUniqueness(matchedFeatures, 0.8); matchedFeatures = Features2DTracker.VoteForSizeAndOrientation(matchedFeatures, 1.5, 20); HomographyMatrix homography = Features2DTracker.GetHomographyMatrixFromMatchedFeatures(matchedFeatures); //Merge the object image and the observed image into one image for display Image<Gray, Byte> res = this.modelImg.ConcateHorizontal(this.testImg); • Feature detection matchedFeatures are refined via Features2DTracker.VoteForUniqueness(), and then via Features2DTracker.VoteForSizeAndOrientation() Homography: a matrix defining an perspective transform
Build SURFeature App //draw lines between the matched features foreach (Features2DTracker.MatchedImageFeature matchedFeature in matchedFeatures) { PointF p = matchedFeature.ObservedFeature.KeyPoint.Point; // due to concatenation, need to shift horizontally by Width p.X += this.modelImg.Width; res.Draw(new LineSegment2DF(matchedFeature.SimilarFeatures[0].Feature.KeyPoint.Point, p), new Gray(0), 1); } • Feature detection
Build SURFeature App // draw the project region on the image if (homography != null) { Rectangle rect = this.modelImg.ROI; // region of interest PointF[] pts = new PointF[] { new PointF(rect.Left, rect.Bottom), new PointF(rect.Right, rect.Bottom), new PointF(rect.Right, rect.Top), new PointF(rect.Left, rect.Top)}; homography.ProjectPoints(pts); for (int i = 0; i < pts.Length; i++) pts[i].X += this.modelImg.Width; res.DrawPolyline(Array.ConvertAll<PointF, Point>(pts, Point.Round), true, new Gray(255.0), 5); } image3.Source = ToBitmapSource(res); } • Feature detection
Build a StopSign Detection App • Need one stop sign model image and a test image
Build a StopSign Detection App using System.Diagnostics; using System.Drawing; using System.Text; using Emgu.CV; using Emgu.CV.Features2D; using Emgu.CV.Structure; using Emgu.Util; using System.Runtime.InteropServices; // for DllImport • Import namespaces • Add member variables private Features2DTracker tracker; private SURFDetector detector; private MemStorage octagonStorage; private Contour<Point> octagon; Image<Bgr, Byte> img;
Build a StopSign Detection App public MainWindow() { InitializeComponent(); DisplayImage(); } private void DisplayImage() { this.img = new Image<Bgr, Byte>("stop-sign.jpg"); image1.Source = ToBitmapSource(this.img); } • Constructor and initialization • Button event handler privatevoid button1_Click(object sender, RoutedEventArgs e) { ProcessImage(); }
Build a StopSign Detection App private void ProcessImage() { this.detector = new SURFDetector(500, false); using(Image<Bgr, Byte> imgModel = new Image<Bgr, Byte>("stop-sign-model.png")) using (Image<Gray, Byte> redMask = GetRedPixelMask(imgModel)) { this.tracker = new Features2DTracker(this.detector.DetectFeatures(redMask, null)); } this.octagonStorage = new MemStorage(); this.octagon = new Contour<Point>(this.octagonStorage); this.octagon.PushMulti(new Point[] { new Point(1,0), new Point(2,0), new Point(3,1), new Point(3,2), new Point(2,3), new Point(1,3), new Point(0,2), new Point(0,1)}, Emgu.CV.CvEnum.BACK_OR_FRONT.FRONT); List<Image<Gray, Byte>> stopSignList = new List<Image<Gray, byte>>(); List<Rectangle> stopSignBoxList = new List<Rectangle>(); DetectStopSign(this.img, stopSignList, stopSignBoxList);
Build a StopSign Detection App Point startPoint = new Point(10, 10); for (int i = 0; i < stopSignList.Count; i++) { Rectangle rect = stopSignBoxList[i]; this.img.Draw(rect, new Bgr(Color.Aquamarine), 2); image1.Source = ToBitmapSource(this.img); label1.Content = "Stop sign detected!"; } if (stopSignList.Count < 1) label1.Content = "No stop sign detected"; }
Build a StopSign Detection App public void DetectStopSign(Image<Bgr, byte> img, List<Image<Gray, Byte>> stopSignList, List<Rectangle> boxList) { Image<Bgr, Byte> smoothImg = img.SmoothGaussian(5, 5, 1.5, 1.5); Image<Gray, Byte> smoothedRedMask = GetRedPixelMask(smoothImg); //Use Dilate followed by Erode to eliminate small gaps in some contour. smoothedRedMask._Dilate(1); smoothedRedMask._Erode(1); using (Image<Gray, Byte> canny = smoothedRedMask.Canny(new Gray(100), new Gray(50))) using (MemStorage stor = new MemStorage()) { Contour<Point> contours = canny.FindContours( Emgu.CV.CvEnum.CHAIN_APPROX_METHOD.CV_CHAIN_APPROX_SIMPLE, Emgu.CV.CvEnum.RETR_TYPE.CV_RETR_TREE, stor); FindStopSign(img, stopSignList, boxList, contours); } }
Build a StopSign Detection App public void DetectStopSign(Image<Bgr, byte> img, List<Image<Gray, Byte>> stopSignList, List<Rectangle> boxList) { Image<Bgr, Byte> smoothImg = img.SmoothGaussian(5, 5, 1.5, 1.5); Image<Gray, Byte> smoothedRedMask = GetRedPixelMask(smoothImg); //Use Dilate followed by Erode to eliminate small gaps in some contour. smoothedRedMask._Dilate(1); smoothedRedMask._Erode(1); using (Image<Gray, Byte> canny = smoothedRedMask.Canny(new Gray(100), new Gray(50))) using (MemStorage stor = new MemStorage()) { Contour<Point> contours = canny.FindContours( Emgu.CV.CvEnum.CHAIN_APPROX_METHOD.CV_CHAIN_APPROX_SIMPLE, Emgu.CV.CvEnum.RETR_TYPE.CV_RETR_TREE, stor); FindStopSign(img, stopSignList, boxList, contours); } }
Build a StopSign Detection App private static Image<Gray, Byte> GetRedPixelMask(Image<Bgr, byte> image) { using (Image<Hsv, Byte> hsv = image.Convert<Hsv, Byte>()) { Image<Gray, Byte>[] channels = hsv.Split(); try { //channels[0] is the mask for hue less than 20 or larger than 160 CvInvoke.cvInRangeS(channels[0], new MCvScalar(20), new MCvScalar(160), channels[0]); channels[0]._Not(); //channels[1] is the mask for satuation of at least 10, this is mainly used to filter out white pixels channels[1]._ThresholdBinary(new Gray(10), new Gray(255.0)); CvInvoke.cvAnd(channels[0], channels[1], channels[0], IntPtr.Zero); } finally { channels[1].Dispose(); channels[2].Dispose(); } return channels[0]; } }
Build a StopSign Detection App private void FindStopSign(Image<Bgr, byte> img, List<Image<Gray, Byte>> stopSignList, List<Rectangle> boxList, Contour<Point> contours) { for (; contours != null; contours = contours.HNext) { contours.ApproxPoly(contours.Perimeter * 0.02, 0, contours.Storage); if (contours.Area > 200) { double ratio = CvInvoke.cvMatchShapes(this.octagon, contours, Emgu.CV.CvEnum.CONTOURS_MATCH_TYPE.CV_CONTOURS_MATCH_I3, 0); if (ratio > 0.1) //not a good match of contour shape { Contour<Point> child = contours.VNext; if (child != null) FindStopSign(img, stopSignList, boxList, child); continue; }
Build a StopSign Detection App Rectangle box = contours.BoundingRectangle; Image<Gray, Byte> candidate; using (Image<Bgr, Byte> tmp = img.Copy(box)) candidate = tmp.Convert<Gray, byte>(); //set the value of pixels not in the contour region to zero using (Image<Gray, Byte> mask = new Image<Gray, byte>(box.Size)) { mask.Draw(contours, new Gray(255), new Gray(255), 0, -1, new Point(-box.X, -box.Y)); double mean = CvInvoke.cvAvg(candidate, mask).v0; candidate._ThresholdBinary(new Gray(mean), new Gray(255.0)); candidate._Not(); mask._Not(); candidate.SetValue(0, mask); }