390 likes | 584 Views
Image Understanding. A Crash Course for Robot Racers 17 January 2013. Terminology. Image processing Input is image, output is image Goal: make image look better to human viewer Machine vision Input is image, output is information about content Goal: determine what is in image
E N D
Image Understanding A Crash Course for Robot Racers 17 January 2013
Terminology • Image processing • Input is image, output is image • Goal: make image look better to human viewer • Machine vision • Input is image, output is information about content • Goal: determine what is in image • AKA computer vision, image understanding • Our task is machine vision, not image processing
Typical vision system organization Feature vector Class identity Raw data Pattern Classifier Feature Measurement Possible block contents Raw data Shape Analysis Noise removal Segmentation Features Matching Consistency Analysis
Identifying/evaluating objects • Critical in many applications • Inspection in industrial setting • Automatic target recognition • Designer knows a prioriwhat to look for • Feature set is application specific • Environment often simplified for robotic applications • Limited set of visually distinctive objects • Example: vertical pylons in racers (2008-2009) • A “general recognizer” is far more difficult • Consider Google’s self-driving cars
Typical building blocks • Common image operators can be found in • MATLAB, OpenCV, similar libraries • Can they help us? • Real-time operation critical for our application • Not ported to our platform • Developing a vision system • Find effective algorithm; use whatever is convenient • Implement simple C version from scratch, verify • Move to hardware if necessary
Follow the data: source • Sensor (MT9V024) captures Bayer RGB • Global shutter • Frame rate: 60 fps • Active array 752 x 480 (10 bits/pixel) • Camera gives you image byte at a time in: • Bayer, or YCbCr/YUV, or RGB (546,555,444) • Non-Bayer formats are interpolated • Camera offers unexplored capabilities • We’re hoping to get data sheet – proprietary
Follow the data: camera interface • VHDL core • Performs required handshaking with camera • Buffers pixels, performs desired processing • Writes resulting pixel values to memory • Informs CPU (via interrupt) that new image is available
Follow the data: format choice • Typical approach of previous teams • Let camera do Bayer-to-RGB conversion • Get RGB from camera • Latest technique • Get full Bayer from camera • Do conversion to RGB in VHDL • Results: • Greater color depth (8 bits/channel) • Better color discrimination in vision algorithms
Follow the data: software • Execution triggered by interrupt • Only real constraint on your processing is time • Need to finish processing before next image appears • Clever optimizations can help to speed processing
Noise: anticipate it • Actual images from camera • Probably more extreme than you will experience • Kalman filter, anyone?
Our (visual) simplifications • Only objects we must consider: • Trucks • Base stations • Landmarks • Obstacles Both have light towers Will have distinctive appearance
A simple approach: segmentation • Definition: • Segmentation is partitioning an image into connected, homogenous regions • Example: • Isolating dark objects on tan convey belt for inspection • Easy to separate light and dark with consistent lighting • For us, segments might be • Lights on towers • Obstacles • Navigation markers
Color segmentation: find red pixels • In RGB, requires 3 test per pixel • Is red channel in range? • Is green? • Is blue? • Observation: 3D nature of RGB adds complexity • Easier with gray-scale images
It gets worse… • Teams used segmentation to find pylons, BUT • Brightness and color changed with ambient lighting, view angle, camera settings, etc. • For light towers, appearance is more consistent because of LEDs, BUT • We’ll want to see things (landmarks, obstacles) that won’t have LEDs • We probably can’t rely on segmentation alone
Making segmentation fast • Method 1: • Process band of pixels near center of image • Process other rows only if candidate region identified • Rationale: Location of towers in images will be consistent • Method 2: • For each target, process all pixels& produce binary image • Sum each row of pixels, each column of pixels: find high values • Rationale: Tower lights will appear as rectangles in image
Reducing dimensionality • Segmentation in RGB is inherently 3D • What can we do to reduce the 3 tests per pixel? • Solution: use a different color space: • Consider HSI/HSV rather than RGB • Advantage: 1D color discrimination • VHDL cores exist to convert image to HSI/HSV
RGB vs. HSI: the gist White • sdf Saturation (S) Intensity (I) Hue Black Think about what happens to pixel values when lighting changes
Back to basics • What attracts our eye in an image? • Contrast plays a big part. • In image to right: • High contrast: man and background • Low contrast: features on coat.
Measuring contrast • Assume gray scale: 0 (black) to 255 (white) • Proposed algorithm: • Work through image array comparing intensity of adjacent pixels. • Effectively computing partial derivative or slope • If difference is high, pay attention. • Experiment: • Let’s construct new image where new pixel value is old pixel value minus pixel value to left(saturating to 0). • High contrast in image1 should be white in image2.
Discussion • Clearly we’re on to something • We can make out tripod, parts of head in result image. • But it is far from perfect. • It completely missed left side of coat – why? • Pixel difference was large but negative; saturated to 0 (black). • In noisy picture (say white pixel surrounded by black), you’d get bogus result.
Algorithm revisited • Let’s visualize the computation performed • Let array Ixy represent pixels in original picture. • Computation equivalent to dot product of each pair with small vector shown. -1 1
Generalizing • Cross correlation produces new image by • Sliding “kernel” over image in all possible positions • Computing sum of products of matching elements (dot product) at each position • Using numerical result at each point as new pixel value Kernel Image
Kernels • A wide variety of kernels can be used that vary in size and function computed. • Sometimes kernels are chosen to implement specific steps • Example: blur image based on Gaussian distribution and differentiate • Kernels are often tweaked until they work • Both size and values can be changed • Let’s explore a bit
Kernels • Limitation of [-1 1] kernel: • Estimate of change depends only on one adjacent pixel. • Idea: consider both left and right neighbors: [-1 0 1] • Improvement not striking with [-1 0 1] with [-1 1]
Kernels new kernel • Limitation of [-1 0 1] kernel: • Sensitive to noise • Considers just one row • Idea: improve by averaging vertically • New kernel: with [-1 0 1] -1 0 1 -1 0 1 -1 0 1
Kernels with Sobel kernel • Problem with kernel: • Why give equal weight toall rows? • New kernel (Sobel): -1 0 1 -1 0 1 -1 0 1 with old kernel -1 0 1 -2 0 2 -1 0 1
Kernels with Sobel kernel • Problems with Sobelkernel: • Catches edges going from black to white, not white to black. • Misses horizontal lines. (Could rotate kernel 90°and double the processing…) with Sobel kernel
Oops! What happened? Other kernels • Suppose you just want to remove noise. • Could use a kernel to smooth. • Try: 1 1 1 1 1 1 1 1 1 Our kernel did not preserve intensity. Kernel elements sum to 9.
Other kernels • Try again with • Note how image is blurred 1 1 1 1 1 1 1 1 1 1/9
Other kernels • Try again with • Note increased blurring 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1/25
Other kernels • Example • An approximation of Laplacian of brightness (related to 2nd derivative) -1 2 -1 2 -4 2 -1 2 -1
Kernel limitations • Edge operators based on kernel operations have problems with noisy images: • Edges will be • Too thick in places • Missing in places • Extraneous in places • More sophisticated techniques have been developed to solve these problems. • Most likely too complex for our project, platform.
Impressive results (Renegades of Funk) • From 2012 team website • How useful might this edge detection be? Original image With Sean Thomas’s Sobel kernel
The Hough transform • Uses voting procedure to find lines (shapes) • Finds edge points based on local pixel values • Each edge pixel votes for line in discretized parameter space • Could use (intercept, slope), but vertical lines a problem • Instead uses (r, ): r = x cos + y sin • r is distance from origin to line, is angle from origin to closest point on line • After processing image, votes above some threshold in 2D array indicate most likely lines
Example See Wikipedia article
Moving forward: a suggestion • Prototype with MATLAB or OpenCV • Take many images of light towers and landmarks, from varying distances in different lighting • Code and develop edge/shape/color detection algorithms, test thoroughly • Support for many image operators is built-in. • Design, implement, and test simplified version that can run on the Helios board • Critical you understand what functions do; must go beyond black-box understanding.
We further recommend... • Assign one team member responsibility for vision algorithms. • Look for online tutorials, demos, examples. • Don’t worry too much (initially) about the underlying mathematics: • Focus on (1) does it do what I want? and (2) can I build it? • Do lots of experiments in software • Make sure your approach is robust, reliable • Move to hardware (VHDL) only if it is simple (e.g. color space conversion) or too slow (e.g., yielding just 1 fps).
Big-picture: things to consider • At what frame rate must images be processed? • How noisy are images, how will you handle noise? • How will you recognize and distinguish objects? • If we add obstacles and landmarks, how should they be marked? • How will you estimate distance to objects? • How can you adapt to dynamic changes in lighting?