310 likes | 327 Views
Evaluating the Quality of Image Synthesis and Analysis Techniques. Matthew O. Ward Computer Science Department Worcester Polytechnic Institute. Evaluation is Important. Has my modification improved the results? Which method works best for my data? What are limitations of my technique?
E N D
Evaluating the Quality of Image Synthesis and Analysis Techniques Matthew O. Ward Computer Science Department Worcester Polytechnic Institute
Evaluation is Important • Has my modification improved the results? • Which method works best for my data? • What are limitations of my technique? • Is my method better than XXX?
Evaluation is Difficult • What aspects to test? • How to measure? • What are limitations of evaluation procedure? • How to recruit evaluators?
Evaluation Often Avoided • Majority of papers show no substantive evaluation • Most common approach is subjective, by authors, on 1-3 test cases • Quantitative measures exist for computational performance, but not quality of results • Not much “glory” in evaluation
The Problem..... • Lack of rigorous assessment of visualization techniques • Lack of good test cases • Limited comparison with other techniques • Lack of guidelines for selection of appropriate techniques
A Possible Solution..... • Create a list of goals of visualization • what is the overall task? • what is desired/acceptable level of accuracy? • what are we looking for? • Locate/create data sets which contain desired features • Test users on a wide range of tasks using different visualization methods
Goals of Visualization • Identification - is there some interesting feature in the image? • Classification - what is it? • Quantification - how many? how big? how close? • Understanding - are there correlations conveyed by the image? • Comparison - does the image have characteristics similar to one generated with a different set of data?
Advantages of Synthetic Data • Easy to adjust characteristics • Less ambiguous than real data • Easy to create data which contains a single structure or phenomena • Real data can be noisy • Hard to find real data with desired characteristics
Advantages of Real Data • Results using real data are more “believable” • Reality is hard to simulate accurately • Real data has context which can help justify usefulness of tasks
Our Experiments • Select two data characteristics of interest (outliers and clusters) • Locate real data sets containing these features (validate with statistical analysis) • Create synthetic data sets containing these features (also validate) • Select three visualization techniques to test (scatterplots, parallel coordinates, principal components analysis with glyphs)
Our Experiments (continued) • Train subjects on interpreting different display techniques • Train subjects on the desired data characteristics • Test subjects on each characteristic, varying • number of outliers/clusters • degree or size • amount of noise in synthetic sets • location of outlier/clusters
Visualization Techniques Tested Glyphs Scatterplot Matrix Parallel Coordinates
Cluster Example Original Added Noise
Assessing the Results • Detection - did subjects identify some structure in the image? • Classification - did subjects correctly classify structure? • Measurement - • number of clusters or outliers • outlier and cluster degree of separation • size of cluster • Errors - false positives, missed structure, measurement accuracy
Summary of Experiments • Scatterplot matrix • best overall • weak on overlapping clusters, size estimation for large clusters, interior outliers • Glyphs • best for interior outliers • good for conveying outlier separation, overlapping clusters, measuring cluster size • poor for differentiating non-outliers • Parallel coordinates • generally worse than others • good for differentiating non-outliers
Future Work • Test alternate data characteristics (e.g. repeated patterns) • Test alternate perceptual tasks (e.g. correlation) • Test other visualization techniques (e.g. alternate glyphs, VisDB, dimensional stacking…..) • Create publicly available benchmark suite for data sets and analysis tools (submissions from other researchers always welcome) • Compare other multivariate visualization assessment methods as they arise.
Problem Statement • Image segmentation algorithms traditionally classified as model-based or context-free • Model-based methods highly effective, but expensive to design and execute • Context-free methods are fast, but quality of results often poor • Is there some way to improve the results of context-free systems without incurring costs of model-based methods?
Conjecture • In most image analysis domains, expectations can be placed on the likely occurrence of certain shapes, colors, and region/segment sizes. • Objects in an office scene mostly planar and non-specular • In medical images, boundaries are mostly smooth, and regions are usually small or moderate in size • Outdoor scenes contain a lot of fine texture • We should be able to use high-level domain constraint knowledge to improve the segmentation process by: • Selecting a segmentation method likely to produce good results • Set the segmentation parameters to their most effective values
Defining a Good Segmentation • All physical object boundaries should be isolated • False boundaries should be minimized • Boundary shape should be comparable to internal model of object in scene • Precision in shape and position needed varies based on application and importance of individual objects to task at hand
Defining a Good Evaluation Procedure • Should be based on real images • Influence of human subjectivity minimized • Errors categorized by type, severity, and significance • Magnitude of error should accurately reflect difference from ideal • Tolerance must be permitted
Problems with Pixel Counting 2 images with similar error counts, uniform dilation (left) and bad merge (right).
Procedure • Acquire representative set of images for multiple domains • Approximate constraints on edge/region features in domain • Interactively segment and label edges/region tolerance and priority to create ideal segmentation • Compute errors between ideal and algorithmically generated segmentation • If error > acceptable, adjust parameters (simplex algorithm) and recompute errors • Associate segmentation parameters with domain constraints
Creating the Ideal Segmentation • Start with initial region-based segmentation • Click on a region of interest • Merge, split, set tolerance level, set priority level • Iterate until all significant regions labeled • Results are domain and task specific
Comparing Ideal to Computed Results Edge Detection: 78% detected essentials 209% oversegmentation Region Growing: 67% detected essentials 120% oversegmentation
More Results Split and Merge: 79% detected essentials 73% oversegmentation Rule-based System: 88% detected essentials 93% oversegmentation
Summary and Future Work • Domain constraints produced better segmentations than context-free methods (after training) • Future work includes investigating other types of constraints (e.g., texture) and improve the tolerance specification and error calculation
General Procedure for Assessing Image-Based Algorithms • Determine the task to be performed by user of image • Determine image features most relevant to this task, and ascertain level of accuracy needed in detection, classification, and measuring • Create benchmark suite of data containing these features in varying degrees (real and synthetic data) • Create and administer user tests to evaluate effectiveness of algorithm to accurately and reliably convey desired data features, or • Develop image processing algorithms to identify desired data features and calculate error types and severities in images generated by algorithm being assessed
Summary of Presentation • Formal assessment has proven useful in both visualization and image processing applications • Results can be used to guide algorithm development and selection • Quantitative and qualitative approaches can provide many insights into effectiveness of image analysis and synthesis tasks