390 likes | 627 Views
EVALUATION OF IMAGE SEGMENTATION METHODS. Jayaram K. Udupa Medical Image Processing Group - Department of Radiology University of Pennsylvania 423 Guardian Drive - 4th Floor Blockley Hall Philadelphia, Pennsylvania - 19104-6021. CAVA. CAVA : Computer-Aided Visualization and Analysis.
E N D
EVALUATION OF IMAGE SEGMENTATION METHODS Jayaram K. Udupa Medical Image Processing Group - Department of Radiology University of Pennsylvania 423 Guardian Drive - 4th Floor Blockley Hall Philadelphia, Pennsylvania - 19104-6021
CAVA CAVA: Computer-Aided Visualization and Analysis The science underlying computerized methods of image processing, analysis, and visualization to facilitate new therapeutic strategies, basic clinicalresearch,education, and training.
CAD CAD: Computer-Aided Diagnosis The science underlying computerized methods for the diagnosis of diseases via images
Image Segmentation Recognition:Determining the object’s whereabouts in the scene. (humans > computer) Delineation:Determining the object’s spatial extent and composition in the scene. (computer > humans) In CAVA, Segmentation Delineation. Recognition is usually manual.
SEGMENTATION EVALUATION Can be considered to consist of two components: • Theoretical Study mathematical equivalence among algorithms. • Empirical Study practical performance of algorithms in specific application domains.
SEGMENTATION EVALUATION: Theoretical Segmentation approaches may be broadly classified into two groups: • pI approaches Purely image based – rely mostly on information available in the given image only. • SM approaches Shape model based – employ prior shape models for the objects of interest.
SEGMENTATION EVALUATION: Theoretical pI approaches Boundary-based: optimum boundary active contours/surfaces level sets Region-based clustering – kNN, CM, FCM graph cut fuzzy connectedness MRF watersheds optimum partitioning (Mumford-Shah, Chan-Vese) SM approaches manual tracing Live wire Active Shape Active Appearance m-Reps atlas-based
SEGMENTATION EVALUATION: Theoretical Fundamental challenges in image segmentation: (Ch1) Are major pI frameworks such asactive contours, level sets, graph cuts, fuzzy connectedness, watersheds, truly distinct or some level of equivalence exists among them? (Ch2) How to develop truly distinct methods constituting real advance? (Ch3) How to choose a method for a given application domain? (Ch4) How to set an algorithm optimally for an application domain? Currently any method A can be shown empirically to be better than any method B, even when they are equivalent.
SEGMENTATION EVALUATION: Theoretical A general theory of image segmentation: An idealized image F: a function is a bounded open subset of A digital image f: a function f is a digitization of F. is a subset of A delineation model M: is a segment of image F, is a parameter vector. Ciesielski, Udupa, SPIE Proceedings 6512:65120W-1-65120W-12, 2007. Ciesielski, Udupa, MIPG Technical Report 335, U of Pennsylvania, November 2007.
SEGMENTATION EVALUATION: Theoretical A delineation algorithmA:a mapping is a parameter vector, . Algorithm A represents model M: a limiting process. As the resolution of f increases, S approaches O. Algorithms A1 and A2 are model-equivalent: if there exists a model Msuch that both A1 and A2 represent M.
SEGMENTATION EVALUATION: Theoretical • Theorem: The Malladi-Sethian-Vemuri (PAMI-17, 1995) level set algorithm is model equivalent to Udupa-Samarasekera (GMIP-58, 1996) fuzzy connectedness algorithm with gradient based fuzzy affinity. • (FC method has definite computational advantages over LS.) • (2) Audigier and Lotufo have shown by a different approach (Image Foresting Transform) equivalence between particular forms of watershed and fuzzy connectedness.
Connectedness Gradient Texture Smoothness Shape Noise Optimization Fuzzy Connectedness Yes Gradient + homogeneity affinity Object feature affinity No No Scale based FC In RFC Chan-Vese No No Yes Yes No No Yes Mumford-Shah No No (not for edge detection) Yes Yes No Yes Yes KWT snake Boundary Yes No Yes No No Yes Maladi-Sethian-Vemuri LS Foreground when expanding Yes No No No No No Live wire Boundary Yes Yes Yes User No Yes Active shape Yes No No No Yes No Yes Active appearance Yes No Yes No Yes No Yes Graph cut Usually not Yes Possible No Usually not No Yes Clustering No No Yes No No No Yes SEGMENTATION EVALUATION: Theoretical Attributes used by some well known delineation models
SEGMENTATION EVALUATION: Empirical Need to specify Application Domain T : Example: Estimating the volume of brain. A task - B : Example: Head. A body region - Example: T2 weighted MR imaging with a particular set of parameters. Imaging protocol - P : Application domain: A particular triple From now on, we denote a digital image by
Precision : (Reliability) Repeatability taking into account all subjective actions influencing the result. Degree to which the result agrees with truth. Accuracy : (Validity) Efficiency : (Viability) Practical viability of the method. Udupa et al., Computerized Medical Imaging and Graphics, 30:75-87, 2006. SEGMENTATION EVALUATION: Empirical Thesegmentation efficacyof a methodM in an application domain may be characterized by three groups of factors:
SEGMENTATION EVALUATION: Empirical For determining accuracy, need true/surrogates of true delineation. S:A given set of images in Std : The corresponding set of images with true delineations. (1)Manual delineation in images in S – trace or paint Std . (2)Simulated images I: Create an ensemble of “cut-outs” of the object from different images and bury them realistically in different images S. The cut-outs are segmented carefully Std.
(a) (b) A slice (a) of an image simulated from an acquired MR proton density image of a Multiple Sclerosis patient’s brain and its “true” segmentation (b) of the lesions.
(a) (b) (c) (3)Simulated Images II: Start from (binary/fuzzy) objects (Std )segmented from real images. Add intensity contrast, blur, noise, background variation realistically S. White matter (WM) in a gray matter background, simulated by segmenting WM from real MR images and by adding blur, noise, background variation to various degrees: (a) low, (b) medium, and (c) high.
(a) (b) (c) (d) (4) Simulated Images III : As in (3) or (1) but apply realistic deformations to the images in Sand Std. Simulating more images (c) and their “true” segmentations (d) from existing images (a) and their manual segmentation (b) by applying known realistic deformations.
(5) Simulated Images IV: Start from realistic mathematical phantoms (Std). Simulate the imaging process with noise, blur, background variation, etc. Create Images S. http://www.bic.mni.mcgill.ca/brainweb/
(6) Estimating surrogate segmentations from manual segmentations. Have many manual segmentations for each image in S. Estimate the segmentation that represents the best estimate of truth Std. Warfield, S.K., Zou, K.H., Wells, W.M.: “Simultaneous Truth and Performance Level Estimation (STAPLE): An Algorithm for the Validation of Image Segmentation.” IEEE Trans Med Imaging 23(7):903-921, 2004.
SEGMENTATION EVALUATION: Empirical Precision Repeatability taking into account all subjective actions that influence the segmentation result. • Intra operator variations • Inter operator variations • Intra scanner variations • Inter scanner variations Inter scanner variations include variations due to the same brand and different brands.
SEGMENTATION EVALUATION: Empirical - Precision A measure of precision for method M in a trial that produces and for situation Ti is given by Intra/inter operator Intra/inter scanner Surrogates of truth are not needed.
SEGMENTATION EVALUATION: Empirical Accuracy The degree to which segmentations agree with true segmentation. Surrogates of truth are needed. For any scene C acquired for application domain ,
SEGMENTATION EVALUATION:Empirical – Accuracy Ud : A binary image representing a reference super set. (for example, the imaged body region ).
SEGMENTATION EVALUATION:Empirical – Accuracy Requirements for accuracy metrics: • Capture M’s behavior of trade-off between FP and FN. • Satisfy fractional relations: • Capable of characterizing the range of behavior of M. • Boundary-based FN and FP metrics may also be devised. • Any monotonic function g(FNVF, FPVF) is fine as a metric. • Appropriate for
SEGMENTATION EVALUATION:Empirical – Accuracy Delineation Operating Characteristic Each value of parameter vector of M gives a point on the DOC curve. The DOC curve characterizes the behavior of M over a range of parametric values of M. Brain WM segmentation in PD MRI images. 1-FNVF Area under the DOC curve FPVF
SEGMENTATION EVALUATION: Empirical Efficiency Describes practical viability ofa method. Four factors should be considered: (1) Computational time – for one time training of M (2) Computational time – for segmenting each scene (3) Human time – for one-time training of M (4) Human time – for segmenting each scene (2) and (4) are crucial. (4) determines the degree of automation of M.
Summary Accuracy : Precision : : intra operator : FP fraction for delineation : inter operator : FN fraction for delineation intra scanner : : Area under the DOC curve inter scanner : Efficiency : : computational time for algorithm training. : computational time for scene segmentation. : operator time for algorithm training. : operator time for scene segmentation.
Software OS Cost Tools 3D Doctor [162] W fee Manual tracing 3D Slicer [163] W, L, U no fee Manual, EM methods, level sets 3DVIEWNIX [164] L, U binaryno fee Manual, optimal thresh., FC family, live wire family, fuzzy thresh., clustering, live snake Amira [165] W, L, U, M fee Manual, snakes, region growing, live wire Analyze [166] W, L, U fee Manual, region growing, contouring, math morph, interface to ITK Aquarius [167] Unknown fee Unknown Brain Voyager [168] W, L, U fee Thresholding, region growing, histogram methods CAVASS [169] W, L, U, M no fee Manual, opt thresh., FC family, live wire family, fuzzy thresh, clustering, live snake, active shape, interface to ITK etdips [170] W no fee Manual, thresholding, region growing Freesurfer [171] L, M no fee Atlas-based (for brain MRI) Advantage Windows U, W fee Unknown Image Pro [172] W fee Color histogram SEGMENTATION EVALUATION: Empirical Software Systems for Segmentation
Imaris [173] W fee Thresholding (microscopic images) ITK [174] W, L, U, M no fee Thresh., level sets, watershed, fuzzy connectedness, active shape, region growing, etc. MeVisLab [175] W, L binary no fee Manual, thresh., region growing, fuzzy connectedness, live wire MRVision [176] L, U, M fee Manual, region growing Osiris [177] W, M no fee Thresholding, region growing RadioDexter [178] Unknown fee Unknown SurfDriver [179] W, M fee Manual SliceOmatic [180] W fee Thresholding, watershed, region growing, snakes Syngo InSpace [181] Unknown fee Automatic bone removal VIDA [182] Unknown fee Manual, thresholding Vitrea [183] Unknown fee Unknown VolView [184] W, L, U fee Level sets, region growing, watershed Voxar [185] W, L, U fee Unknown SEGMENTATION EVALUATION: Empirical Software Systems (cont’d)
Data sets Description True Segmentation Number of Images BrainWeb [186] Simulated brain T1, T2, PDMR images-Objects: CSF, GM, WM, vessels, skull, .. binary, fuzzy 20 (3D) CAVA DDSM [187] Digital database for screening mammography - Objects: lesions no 2,500 (2D) CAD ICBM [188] International consortium for brain mapping, MRI, images warped to template binary 3,000 (3D) CAVA LIDC [ 189] Lung spiral CT images - Objects: nodules binary (4 readers) 85 (3D) CAD OAI [190] Osteo arthritis initiative, x-ray and MRI knee images no 160 (2D, 3D) CAVA RIDER [191] Chest CT images over time of lung cancer patients, radiation therapy followup no 140 (3D) CAD VCC [192] Virtual colonoscopy; CT images of colon no 835 (3D) CAD VH [193-195] Visible human data sets; whole body sectional, CT, and MR images binary 2 (3D) CAVA SEGMENTATION EVALUATION: Empirical Publicly Available Data Sets
AnEvaluation Frameworkfor CAVAshould consist of: (FW1) Real life image data for several application domains (FW2) Reference segmentations (of all images) that can be used as surrogates of true segmentations. (FW3) Specification of computable, effective, meaningful metrics for precision, accuracy, efficiency. (FW4) Several reference segmentation methods optimized for each (FW5) Software incorporating(FW1) – (FW4). Segmentation Evaluation: Empirical
Remarks (1) Precision, accuracy, efficiency are interdependent. • accuracy efficiency. • precision and accuracy difficult. (2) “Automatic segmentation method” has no meaning unless the results are proven on a large number of data sets with acceptable precision, accuracy, efficiency, and with . (3) A descriptive answer to “is method M1 better than M2 under ?” in terms of the 11 parameters is more meaningful than a “yes” or “no” answer. (4) DOC is essential to describe the range of behavior of M. SEMENTATION EVALUATION: Empirical
Concluding Remarks • Need unifying segmentation theories that can explain equivalences/distinctness of existing algorithms. • This can ensure true advances in segmentation. • Need evaluation frameworks with FW1-FW5. • This can standardize methods of empirical comparison of competing and distinct algorithms.