380 likes | 827 Views
Image Databases. Conventional relational databases, the user types in a query and obtains an answer in response It is different in image databases
E N D
Image Databases • Conventional relational databases, the user types in a query and obtains an answer in response • It is different in image databases • a police officer may issue a query: “ Retrieve all pictures from the image database that are “similar” to this person and give the identities of the people.” • This query is fundamentally different from ordinary queries for 2 reasons: • 1. The query includes a picture as part of the query • 2. The query asks about similar pictures and therefore uses a notion of “imprecise match”
Raw images • the content of an image consists of all “interesting” objects in that image • each object is characterized by • a shape descriptor: that describes the shape/location of the region within which the object is located inside the image • a property descriptor that describes the properties of individual pixels (e.g. RGB values of the pixel, RGB values aggregated over a group of pixels, grayscale levels) • a property consists of • a property name, e.g., red, green, blue, texture • a property domain - range of values that a property can assume {0, 1, ..7}
Images • Every image is associated with a pair of positive integers (m,n), called grid-resolution, which divides the image into (mn) cells of equal size (called image grid) • Each cell consists of a collection of pixels • A cell property: (Name, Values, Method) • Example • (bwcolor, {b,w}, bwalgo}, where the possible values are b(black) and w(white), and bwalgo is an algorithm that takes a cell as an input and returns either black or white by somehow combining the black/white levels of the pixels in the cell • (graylevel, [0,1], grayalgo), where the possible values are real numbers within the interval [0,1].
Image Database • Image Database: (GI,Prop,Rec) • GI is a set of gridded images (Image,m,n) • Prop is a set of cell properties • Rec is a mapping that associates with each image, a set of rectangles denoting objects (in fact this does not necessarily have to be rectangle)
Problems with image databases • Images are often very large • infeasible to explicitly store the properties on a pixel by pixel basis • This led to a family of image “compression” techniques: attempt to compress the image into one containing fewer pixels • There is a need to determine the “features” of the image (compressed or raw) • done by “segmentation” : breaking up the image into a set of homogeneous rectangular regions called segments • Need to support “match” operations that compare either a whole image or a segmented image against another
Image Compression • Lossy Compression • Image may contain details that human eye cannot recognize • get rid of those details • DCT(Discrete Cosine Transform) • DFT(Discrete Fourier Transform) • DWT(Discrete Wavelet Transform) • convert images from time domain(Spatial) to frequency domain • get rid of the frequencies which do not contain information. • Transforms • DCT and DFT are similar concepts • From time domain to signal domain • Given a signal of length “n”, these transforms return a sequence of n frequencies. • Sample1, Sample2, . . . . . . . , Sample n transforms to : • Freq1, Freq2, . . . . . . . . . , Freq n.
Why do we use the transform • Noise removal is easier in the frequency domain • Various filters are easier to implement in frequency domain • Compression (gathers similar values together)
Desirable Properties of Transforms • DFT • Invertibility: It is possible to get back the original image I from its DFT representation. (useful for decompression) • Note: practical implementations of DFT often use DFT with other non-invertible operations: thus sacrifice invertibility • Distance preservation: DFT preserves Euclidean distance. • This is important in image matching applications where we often use distance measures to represent similarity levels • DCT • DCT preserves all the above • a given signal can be represented with fewer frequencies • DWT • DFT and DCT have no temporal locality • a change in one single part of data changes all frequencies • wavelets introduce locality
Fractal Compression • Transform-based approaches benefit from the difference in visual perception in different frequencies • What else can we use for compression ? • Self similarity • We can find self similarities in a given image and describe the image in terms of these similarities.
Image Processing: Segmentation • A process of taking an image as input and cutting up the image into disjoint homogeneous regions • Connected region (R): • is a set of cells C1 .. Cn in R such that the Euclidean distance between Ci and Ci+1 for all i < n is 1 • Example • R1,R2,R3 is connected • R1 R2 is connected • R2 R3 is connected • R1R2 R3 is connected • R1 R3 is not connected • Because the Euclidian • distance between (2,3) • and (3,4) is 2>1 4 3 2 1 R3 R1 R2 1 2 3 4
Measuring Homogeneity • Homogeneity predicate: is a function H that takes any connected region as input and returns either true or false • Example 1: • Suppose is some real number between 0 and 1 • Hbw can be defined as Hbw (R) is true if over (100*)% of cells in R have the same color Region H0.8bw H0.89bwH0.92bw R1 true false false R2 true true false R3 true true false Region # of black #of white cells cells R1 800 200 R2 900 100 R3 100 900
Measuring Homogeneity • Example 1: • Suppose each cell has a real value between 0, 1, this value is bw-level • Suppose f assigns a value between 0 and 1 to each cell • Assume is the noise factor and a threshold • H,f,(R) is true if {(x,y)| |bwlevel(x,y)-f(x,y)|< }/(mn) >
Segmentation • Given an image I with (mn) cells, a segmentation of I wrt a homogeneity predicate P is a set of R1, .Rk such that • Ri Rj = for all 1 i j k • I = R1 .. Rk • H(Ri) = true for all i j k • for all distinct i,j, 1 I, j n such that Ri Rj is a connected region, it is the case that H(Ri Rj) = false
An Example of Segmentation • For Hdyn,0.03(R) of the following (44) image will yield the following segmentation • R1 = {(1,1),(1,2)} • R2 = {(1,3),(2,1),(2,2),(2,3)} • R3 = {(3,1),(3,2),(3,3),(4,1),(4,2)} • R4 = {(3,4),(4,3),(4,4)} • R5 = {(1,4),(2,4)} Row/Col 1 2 3 4 1 0.1 0.25 0.5 0.5 2 0.05 0.30 0.6 0.6 3 0.35 0.30 0.55 0.8 4 0.6 0.63 0.85 0.90 Row/Col 1 2 3 4 1 0.1 0.25 0.5 0.5 2 0.05 0.30 0.6 0.6 3 0.35 0.30 0.55 0.8 4 0.6 0.63 0.85 0.90
Segmentation Algorithm • Split: • if the whole image is homogeneous, we are done • otherwise, split the image into two parts and recursively repeat this process till we find a set of R1 .. Rn such that each region is homogeneous • Merge: • check whether any of the Ri’s can be merged together • at the end of this step, we obtain a valid segmentation R1, ..Rk
Similarity Based Retrieval • The Metric Approach: • Uses a distance measure d that can compare tow images • The smaller the distance, the more similar they are • I.e., given an input image I, find the “nearest neighbor” of I in the image archive • The Transformation Approach: • The metric approach assumes that the notion of similarity is fixed • Whereas the transformation approach computes the cost of transforming one image into another based on user-specified cost functions that may vary from one query to another
The Metric Approach • We define a distance function on a k dimensional space (k=n+2) • the distance function satisfies the following properties • d(x,y) = d(y,x) • d(x,z) d(x,z) + d(z,y) • d(x,x) = 0 • Example: Let the image object consists of (256256) cells with 3 attributes (red,green,blue) each of which assumes a value from {0,…7} • di(o1,o2) = (diffr[i,j]+diffg[i,j]+diffb[i,j]) • where diffr[i,j] = (o1[i,j].red - o2[i,j].red)2 • diffg[i,j] = (o1[i,j].green - o2[i,j].green)2 • diffb[i,j] = (o1[i,j].blue - o2[i,j].blue)2 • Such computations can be cumbersome (65536 expressions being computed inside the sum)
The Metric Approach • How can this massive similarity computation be avoided? • Through feature extraction! • Use a good feature extraction function fe and use it to map objects into single points in a s-dimensional space where s would typically be pretty small compared to n+2 • This leads to two reductions • an object is originally is a set of points in an (n+2) dimensional space. In contrast, fe(0) is a single point • fe(o) is a point in s-dimensional space where s << (n+2) • The feature extraction mapping must preserve the distance relationships in the original space • (n+2) dim space s-dim space indexing algorithm index object repository (could be quadtree, R-tree for s-dim data)
Searching • Finding the best matches • find the nearest neighbors of fe(o) in the tree using a nearest neighbor search technique. • Finding sufficiently similar objects • execute a range query in the tree with center fe(o) and radius
The Transformation Approach • The main principle • the level of dis-similarity between o1,o2 is proportional to the cost of transforming o1 into o2, or vice-versa • Transformation operators • translation • rotation • scaling (uniform and nonuniform) • excision • Transformation of o into o’ is a sequence of transformation operations (to1,to2, ..tor) such that • to1(o) = o1 • …... • To(or) = o’ • Cost of transformation, cost(TS) = cost(toi)
Transformation vs. Metric • Advantages of the transformation model • user can setup his own notion of similarity by specifying certain transformation operators • user may associate a cost function with each transformation operator • Advantages of the metric model • by forcing the user to use only one similarity metric, the system can facilitate the indexing of data so as to optimize nearest neighbor search