150 likes | 293 Views
Learning the Relative Importance of Features in Image Data. Aparna Varde, Elke Rundensteiner, Giti Javidi, Ehsan Sheybani and Jianyu Liang IEEE ICDE’s DBRank Istanbul, Turkey April 2007. Introduction. Scientific domains Images from phenomena Image Features Visual Features
E N D
Learning the Relative Importance of Features in Image Data Aparna Varde, Elke Rundensteiner, Giti Javidi, Ehsan Sheybani and Jianyu Liang IEEE ICDE’s DBRank Istanbul, Turkey April 2007
Introduction • Scientific domains • Images from phenomena • Image Features • Visual Features • Metadata Features • Comparison of Images • Based on features Silicon Nanopore Herb Leaf
Motivation • Consider a similarity search process • Some features more important than others • Experts have subjective notions of comparison • Need to learn feature-based distance function Target Image Source Images
Goals • Given • Training data on images and their applicable features • Learn • Distance function for image comparison • Function should preserve relative importance of features in the domain
Proposed Approach: FeaturesRank • Input • Training samples: pairs of images • Level of similarity for each pair • Distance function: weighted sum of features • Process: Iterative approach • Cluster images in levels using distance function • Error: difference between similarity levels in clusters and samples • Adjust distance function based on error • Output • Distance function giving minimal error
Process of Learning • Use a clustering algorithm • Notion of distance • Δ = ∑f=1 to F αf Δf • Features given as inputs • Guess initial weights • Cluster images in L levels • L = number of levels in samples Clusters
Process of Learning P1: (I1,I16), LT(P1) = 2 P2: (I5,I14), LT(P2) = 1 P3: (I2,I3), LT(P3) = 0 P4: (I6,I18), LT(P4) = 1 P5: (I7,I9), LT(P5) = 0 P6: (I12,I19), LT(P6) = 2 P7: (I17,I20), LT(P7) = 1 P8: (I4,I11), LT(P8) = 3 P9: (I8,I10), LT(P9) = 2 P10: (I13,I15), LT(P10) = 3 • Error pair: level of similarity in clusters not equal to level of similarity in samples • Error: ratio of number of error pairs over total number of pairs • Error threshold: fraction of total number of pairs allowed to be error pairs Training Samples Clusters
Process of Learning P1: (I1,I16), LT(P1) = 2 P2: (I5,I14), LT(P2) = 1 P3: (I2,I3), LT(P3) = 0 P4: (I6,I18), LT(P4) = 1 P5: (I7,I9), LT(P5) = 0 P6: (I12,I19), LT(P6) = 2 P7: (I17,I20), LT(P7) = 1 P8: (I4,I11), LT(P8) = 3 P9: (I8,I10), LT(P9) = 2 P10: (I13,I15), LT(P10) = 3 • If level of similarity of pair in clusters greater than in samples • Images considered closer to each other in clusters than they should be • To push them apart, increase weights of some features in distance function Training Samples Clusters
Process of Learning • Step: Difference between similarity levels • |Level of similarity in training samples – Level of similarity in clusters| • Step = | LT (Ia, Ib) – LC (Ia,Ib) | • Blame: Responsibility of a feature for error • Distance due to feature f / Total distance between images • Blame = Δf (Ia, Ib) / Δ (Ia, Ib) • Feature Weight Heuristic • To increase weights • New weight of feature f = Old weight + Step*Blame • Conversely, to decrease weights • New weight = Old weight – Step*Blame
Process of Learning • Consider effect of each error pair and adjust weights • Use adjusted distance function for another iteration of clustering • Repeat until error below threshold or maximum number of iterations reached • Output the distance function giving lowest error
Experimental Evaluation • Real Images from Nanotechnology and Bioinformatics used for evaluation • Parameters: error threshold 0.1 to 0.05, maximum number of iterations = 1000, clustering seeds altered • Training Data • Nanotechnology: 60 images, 3 levels of similarity • Bioinformatics: 40 images, 2 levels of similarity • User Study with Test Data • Similarity search performed using learned distance function • Experts evaluate effectiveness of results
Learning Behavior: Nanotechnology • Convergence to error below threshold in less than 300 iterations • Experiments with 5% threshold take longer to converge than 10% • Not much difference in behavior with random and equal initial weights Random Initial Weights Equal Initial Weights
Learning Behavior: Bioinformatics • Error in bioinformatics data fluctuates more than in nanotechnology data • Possible reasons • Fewer images were used as training samples • Fewer levels of similarity were used • Other observations similar to nanotechnology data Random Initial Weights Equal Initial Weights
Similarity Search • Using learned distance function, target image compared with source images in distinct test set • Top 4 matches ranked in order of similarity • Experts verify that ranking is accurate Nanotechnology Target Image Top 4 Matches among Source Images Bioinformatics
Conclusions • Contributions of this work • FeaturesRank approach proposed to learn distance function for relative importance of features in images • Learned distance function assessed by ranking images for similarity search with real data from nanotechnology and bioinformatics • Ongoing work • Defining objective measures for accuracy • Performing comparative studies with state-of-the-art