730 likes | 928 Views
PERCEPTION. Damien Blond Alim Fazal Tory Richard April 11th, 2000. Outline. 9.1: Introduction 9.2: Image Formation 9.3: Image Processing Operations for Early Vision 9.4: Extracting 3D Information using Vision 9.5: Using Vision for Manipulation and Navigation
E N D
PERCEPTION Damien Blond Alim Fazal Tory Richard April 11th, 2000
Outline 9.1: Introduction 9.2: Image Formation 9.3: Image Processing Operations for Early Vision 9.4: Extracting 3D Information using Vision 9.5: Using Vision for Manipulation and Navigation 9.6: Object Representation and Recognition 9.7: Summary
Introduction • Perception provides agents with information about the world they inhabit. • A sensor is anything that can change the computational state of the agent in response to a change in the state of the world. • The sensors that agents share with humans are vision, hearing, and touch.
Introduction • The main focus of the sensors will be on the processing of the raw information that they provide. • Where S is the sensory stimulus and W is the world. • S=f(W) • In order to gain information about the world we can take the straightforward approach and invert the equation. • W=f-1(S)
Introduction • A drawback of the straightforward approach is that it is trying to solve too difficult a problem. • In many cases, the agent does not need to know everything about the world. • Sometimes just one or two predicates are needed.
Introduction • Some of the possible uses for Vision: • Manipulation – Grasping, insertion, needs local shape information and feedback for motor control. • Navigation – Finding clear paths, avoiding obstacles, calculating one’s current velocity and orientation. • Object Recognition – A useful skill for distinguishing between multiple objects.
Outline 9.1: Introduction 9.2: Image Formation 9.3: Image Processing Operations for Early Vision 9.4: Extracting 3D Information using Vision 9.5: Using Vision for Manipulation and Navigation 9.6: Object Representation and Recognition 9.7: Summary
Outline 9.2: Image Formation Pinhole Camera Lens Systems Photometry of Image Formation
Image Formation • Vision works by gathering light scattered from objects in the scene and creating a 2-D image. • It’s important to the understand the geometry of the process in order to obtain information about the scene.
Image Formation Perspective Project Equations -x/f = X/Z, -y/f = Y/Z => x = (-fX)/Z, y = (-fY)/Z
Image Formation • The Perspective projection is often approximated using orthographic projection, but there is an important difference. • The Orthographic projection does not project vectors through a pinhole. • Instead, the vectors run parallel, either perpendicular to or at a consistent angle from the image plane.
Lens Systems • Both human and artificial eyes use a lens. • The lens is wider than a pinhole, allowing more light to enter, increasing the information collected. • The human eye focuses by bending the shape of the lens. • Artificial eyes focus by changing the distance between the lens and the image plane.
Photometry of Image Formation • A processed image plane contains a brightness value for each pixel. • The brightness of a pixel p in the image is proportional to amount of light directed toward the camera by the surface patch Sp that projects to pixel p. • The light is characterized as being either Diffuse or Specular reflection.
Photometry of Image Formation • Diffuse reflection redirects light equally in all directions, and is common for dull surfaces. • It is described by the following equation, known as Lambert's formula: • E = p E0cos(theta) • where p describes how dull/shiny the surface is, E0 is the intensity of the light source and (theta) is the angle between the light direction and surface normal.
Photometry of Image Formation • Phong's formula: • E = p E0cosm (theta) • p is the coefficient of Specular reflection • E0 is the intensity of the light source • m is the 'shininess' of the surface • (theta) is the angle between the light direction and surface normal.
Photometry of Image Formation • In real life, surfaces exhibit a combination of diffuse and specular properties. • Modeling this on the computer is what computer graphics is all about. • Rendering realistic images is usually done by ray tracing.
Outline 9.1: Introduction 9.2: Image Formation 9.3: Image Processing Operations for Early Vision 9.4: Extracting 3D Information using Vision 9.5: Using Vision for Manipulation and Navigation 9.6: Object Representation and Recognition 9.7: Summary
Outline 9.3: Image Processing Operations for Early Vision Edge Detection
Image-Processing Operations • Edge Detection • Edges are curves in the image plane across which there is a “significant” change in image brightness. • The goal of edge detection is the construction of an idealized line drawing
Image-Processing Operations • One idea to detect edges is to differentiate the image and look for places where the brightness undergoes a sharp change • Consider a 1-D example. Below is an intensity profile for a 1-D image.
Image-Processing Operations • Below we have the derivative of the previous graph. • Here we have a peak at x=18, x=50 and x=75. • These errors are due to the presence of noise in the image.
Image-Processing Operations • This problem is countered by convolving a smoothing function along with the differentiation operation. • The mathematical concept of convolution allows us to perform many useful image-processing operations.
Image-Processing Operations • One standard form of smoothing is to use a Gaussian function. • Now using the idea of convolving with the Gaussian function • we can revisit the 1-D example.
Image-Processing Operations • With the convolving applied we can more easily see the edge at x=50. Using convolving we are able to discover where edges are located and this allows us to make an accurate line drawing.
Image-Processing Operations • Here is an example of using convolving in an 2-D picture of Mona Lisa
Outline 9.1: Introduction 9.2: Image Formation 9.3: Image Processing Operations for Early Vision 9.4: Extracting 3D Information using Vision 9.5: Using Vision for Manipulation and Navigation 9.6: Object Representation and Recognition 9.7: Summary
Outline 9.4: Extracting 3D Information using Vision Motion Binocular Stereopsis Texture Gradient Shading Contour
Extracting 3-D Information Using Vision We need to extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition. Three aspects: 1.Segmentation 2.Position & Orientation 3.Shape To recover 3-D information there are a number of cues available including motion, binocular stereopsis, texture, shading and contour.
Extracting 3-D Information Using Vision • Motion • Optical Flow - resulting motion when a camera moves relative to the 3-D scene.
Extracting 3-D Information Using Vision • To measure Optical Flow, we need to find corresponding points between one time frame and the next. • One formula is Sum of Squared Differences (SSD) • SSD(Dx, Dy) = (x,y) (I(x, y, t) - I(x+Dx, y+Dy, t+Dt))2
Extracting 3-D Information Using Vision • The other formula to show this is Cross-Correlation(CC): • CC(Dx, Dy) = (x,y) I(x, y, t)I(x+Dx, y+Dy, t+Dt) • Cross-Correlation works best when there is texture in the scene. Because there is a significant brightness variation among the pixels.
Extracting 3-D Information Using Vision • Binocular Stereopsis • Binocular stereopsis uses multiple images in space. Where as motion used multiple images over time. • Because the scenes will be in a different places relative to the z-axis, if we superpose the two images, there will be disparity in the location of important features.
Extracting 3-D Information Using Vision • This also allows us to easily determine depth. Knowing the distance between the cameras, and the point at which their lines of sight intersect, it only requires a few simple geometric calculations to determine the depth coordinate z for any given (x, y) coordinate.
Extracting 3-D Information Using Vision • Texture Gradient • Texture refers to a spatially repeating pattern on a surface that can be sensed visually. • In the images, the apparent size, shape, spacing of the repeating texture elements(texels) vary.
Extracting 3-D Information Using Vision • The two main causes for this variation in size are: • Varying distance from the camera to the different texture elements. • Varying orientation of the texel relative to the line of sight from the camera. • It is possible to express the rate of change of these texel features, by using some mathematical analysis called texture gradients.
Extracting 3-D Information Using Vision Texture can be used to determine shape via a two-step process: (a) measure the texture gradients and (b) estimate the surface shape, slant and tilt that could give rise to them.
Extracting 3-D Information Using Vision • Shading • The variation in the intensity of light received from different portions of a surface in the scene. • Given the image brightness, I (x, y), our hope is to recover the scene geometry and the reflectance properties of the object. • But this has proved difficult to do in anything but the simplest cases.
Extracting 3-D Information Using Vision • The main problem is with dealing with interreflections. • In most scenes the surfaces are not only illuminated by the light sources, but also by the light reflected from other surfaces which serve as a secondary light source. • These mutual illumination effects are quite significant.
Extracting 3-D Information Using Vision • Contour • The use of lines in a line drawing to get a vivid perception of 3-D shapes and layout. • Determine the exact significance of each line in an image. • Also called the line labeling problem as the task is to label each line according to its significance.
Extracting 3-D Information Using Vision • In a simplified world, where all surface marks and shadows have been removed all the lines can be classified as either limbs or edges. • Limbs are the locus point on the surface where the line of sight is tangent to the surface. • Edge is a surface normal discontinuity. • Each edge can be further broken up into convex, concave and occluding edges.
Extracting 3-D Information Using Vision • "+" and "-" labels represent convex and concave edges respectively. • "<-" and "->" labels represent occluding edges. • "<-<-" and "->->" labels represent limbs.
Extracting 3-D Information Using Vision In 1971 two men (Huffman and Clowes) independently studied the line labeling problem for trihedral solids – objects in which exactly three plane surfaces come together at each vertex.
Extracting 3-D Information Using Vision For this particular trihedral world, Huffman and Clowes made an exhaustive list of all the different vertex types and the different ways in which they could be viewed under general view point.
Extracting 3-D Information Using Vision They created a junction dictionary to find a labeling for the line drawing. Later this work was generalized for arbitrary polyhedral and for piecewise smooth curved objects.
Outline 9.1: Introduction 9.2: Image Formation 9.3: Image Processing Operations for Early Vision 9.4: Extracting 3D Information using Vision 9.5: Using Vision for Manipulation and Navigation 9.6: Object Representation and Recognition 9.7: Summary
Outline 9.5: Using Vision for Manipulation and Navigation Driving Example Lateral Control Longitudinal Control
Using Vision for Manipulation and Navigation • One of the main uses of vision is to provide information for manipulating objects as well as navigating in a scene while avoiding obstacles. • A perfect example of the use of vision is the driving example.