1 / 46

Outline

Medical Image Classification with Advanced Markov Random Field/Gibbs Classification A Dissertation Proposal Zhihong Yang May 30,2000 Advisor committee Dr. Ian R. Greenshields Dr. Howard Sholl Dr. Reda Ammar. Outline. Medical image classification and its time-consuming properties

janice
Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Medical Image Classification with Advanced Markov Random Field/Gibbs ClassificationA Dissertation Proposal Zhihong Yang May 30,2000 Advisor committeeDr. Ian R. GreenshieldsDr. Howard ShollDr. Reda Ammar

  2. Outline • Medical image classification and its time-consuming properties • Introduction to MRF/Gibbs classification • Previous work and novelty of this study • Methods and techniques to be used • Partial parallel algorithm • ensemble-parallel algorithm • locally adaptive cooling schedule based on multiresolution tiling • Summary of research goals and expected contributions • Result evaluation plan • Availability and location of research facilities

  3. Medical Image Classification with MRF/Gibbs Methods--1/2 • Image classification is a procedure by which desired information is extracted from original image data through a designed algorithm • four elements are involved in the definition of image classification: original data/classified data/classification algorithm/estimation criterion • The scale of image classification problem--- • original input data: a 256 X 256 lattice grey image • classified image: a 256 X 256 lattice binary image • The number of possible output is power(2, 256*256). This is a very big number

  4. Medical Image Classification with MRF/Gibbs Methods--2/2 • Markov Random Field/Gibbs classification is a well-established method to classify image based on statistical inference. • In order to introduce the problem we are facing, we would like to summarize the properties of MRF/Gibbs classification. • MAP estimate • slow speed of convergence due to the cooling schedule • The principle of MRF/Gibbs classification follows the problem statement.

  5. The problem & Proposed methods • The time consuming property of MRF/Gibbs classification to medical image data because of both the volume of the input data and the nature of the algorithm • Proposed Methods to reduce the time for the MRF/Gibbs algorithm to converge • partial parallel algorithm • ensemble parallel algorithm • locally adaptive cooling schedule based on multiresolution tiling

  6. Introduction to MRF/Gibbs Classification • Priors and Posteriors • Bayes decision rules • Maximum A Posterior estimate • Markov Random Fields/Neighborhood System • Gibbs Fields • Markov Chains, Limit Theorems, Convergence of the algorithm • Gibbs Sampler/ Visiting Scheme • Simulated Annealing/Cooling Schedule

  7. Priors and posterior Distributions • Prior gives the model that we expect to see in an image before observation, for example--- • How many classes are actually in the image? • What is the percentage distribution of these tissue types? • Given a class, what is the distributions of data? • If it is normal distribution, what are the parameters of the distribution? • Posterior can be interrupted as an adjustment of priors to the real data after the observation.

  8. Gibbsian form of priors and posteriors • A strictly positive probability distribution will always have the Gibbsian form

  9. Bayes Decision rules • What is the best? Two elements have to be take into account • The ideal mode presented by priors • The real data that we observed • The Bayesian approach takes into account both requirements simultaneously by looking for desired posteriors. • A estimator minimizing the risk is called a Bayes estimator • MAP estimators are Bayes estimators for the 0-1 loss functions.

  10. MAP estimator • An estimate from observed data that will maximize the posterior distribution is called a Maximum A Posterior(MAP) estimate.The image is estimated as a whole in MAP. MAP is contextual related estimate. Thus it is equal to look for the estimate that will minimize the energy function of posterior distribution. • Why MAP estimator? MAP estimator is the best estimator under 0-1 loss function. • Computation complexity--exponential

  11. Markov Random Fields/Neighborhood System • Random Field---A strictly positive probability distribution on the space of configurations • Index Set of Sites/Pixel • space of states/Classes • space of configurations • Neighborhood System---Some axioms • One site is not its own neighbor. • Site S is site T’s neighbor, if and only if site T is site S’s neighbor. • Local characteristic-- the conditional probability of the site S, given the configuration of other sites, is called local characteristic. • Markov field-- A site’s local characteristic only relies on its neighbors. We want the neighborhood to be small.

  12. Gibbs Fields • Probability of Gibbs Form are always strictly positive and hence random fields. • Gibbs fields is induced by the energy function. • Energy function is given by the sum of potentials. • Potential is something related to a site’s class/state/position to reflect the relative relations with other sites’ class/state/position. • Thus the image configurations with posterior that is Gibbsian form is a Gibbs field.

  13. Markov Chains:Limit Theorem • Markov Kernal--the possibility from a old class(x) to new class(y). It is a matrix, with x-th row and y-th column. Markov kernels with a strictly positive power are called primitive. • Markov Chain--On the finite space X, given by an initial distribution v and Markov kernels P1,P2,P3,…. If Pi=P for all I then the chain is called homogeneous.

  14. limit theorem tells us that we should look for a strictly positive Markov kernel for which the distribution is invariant. One natural construction is based on the local characteristics of the possibility measure of the random field. A Markov Kernal is defined by a site’s local characteristic. The Gibbs field P and its local characteristics fulfill the detailed balance equation. If u and P fulfill the detailed balance equation then u is invariant for P. In particular, Gibbs fields are invariant for their local characteristics. After very large number of iterations, one will end up in a sample from a distribution close to the Gibbs field. Gibbs Sampler(1)

  15. Gibbs Sampler(2) • Visiting Scheme is an enumation of sites whose classes are waiting to be determined. • Gibbs Sampler. The above homogeneous Markov chain with transition probability P induces the following algorithm: an initial configuration x is chosen or picked at random according to some initial distribution v. In the first step, x is updated at site 1 by sampling from the single-site characteristic. • This yields a new configuration y=y1xs\{1} which in turn is updated at site 2. /This way all the sites in S are sequentially updated. This will be called a sweep. The first sweep results in a sample from vP. Running the chain for many sweeps produces a sample from vP…P. The procedure goes over and over …...

  16. Simulated Annealing/Cooling Schedule • The computation of MAP estimators for Gibbs fields amounts to the minimization of energy functions. • inverse temperature is defined by • Cooling schedule is an increasing sequence of positive numbers • For every n>=1, a Markov kernel is defined by • A Theorem by S. and D. Geman(1984)

  17. A Brief History of MRF/Gibbs Classification • S.Geman and D. Geman(1984) build a theoretical schema for this algorithm. • M.C. Zhang(1990) applied Geman’s brothers work to image classification problem. Their definition of potential function and energy are inherited in our study. • Tom Deggeet and I.R. Greenshields(1998) explored 3-D volume medical image classification. The proposed site-aging will be refined in our study. • Numerous other attempts to apply Geman’s theory in classifications

  18. Literature Review--Parallel Simulated Annealing • Parallel computation by speculative computation • [Sohn,1995],[Nabhan,1995],[Witte, 1991] • Partial Parallel Algorithm • cluster algorithm[Swensdon,1987],[Sokal,1989],[Fox,1995] • synchronous updating based on independent set partition • [Beba, 1997]---not based on independent set partition, no rigorous result on speed up. • [Jeng,1993]---failed to take the communication cost into consideration • Multiple trials/ensemble parallel • Pseudo parallel algorithm[Deggett, 1999]

  19. Comparison of the proposed parallel techniques

  20. Shorten the run time by adjusting cooling schedule • Various cooling schedules • polynomial[Young,1999],[Yuan, 1999] • adaptive cooling[Steinhofel,1998] • Characters • Problem-specific • limited Speedup • preliminary consideration on cooling schedule that is adaptive to the data. • Regular data partition.

  21. The novelty of this study • Partial parallel algorithm based on independent set partition on identified simple neighbor hood system • locally adaptive cooling schedule based multiresolution tiling • the application on medical image classification

  22. Method 1--Partial Parallel Algorithm • An example • For a 4-neighbor hood with northern, eastern, southern and western neighbors, an update at a 'black' site need no information about the states in other 'black' sites. Hence, given a configuration x, all 'black' processing units may do their job simultaneously and produce a new configuration y' on the basis of x and then the white processors may update y' in the same way and end up with a configuration y. Thus a sweep is finished after two time steps and the transition possibility is the same as for sequential updating over a sweep in |S| time steps.

  23. Independent sets • If a neighbor hood system is given, then a subset T of S is independent if it contains no pair of neighbors. • The transition probability of partially parallel based on independent sets coincides with the transition probability for one sequential sweep. • The limit theorem for sampling stays the same as the sequential case.

  24. Independent set partition--The graph coloring problem • The small number of independent sets is called chromatic number of the neighbor hood system. In fact, it is the smallest number of colors needed to paint the sites in such a fashion that neighbors never have the same color. • Two extreme cases. If the classes in the site are independent, then there are no neighbor hood at all and the chromatic number is 1. If the all sites interact each other, then chromatic number is |S|. • In general this problem is known as the graph coloring problem, which is NP complete.

  25. Independent set partition with small neighborhood • Though in general , the independent set partition is a problem that is even harder than the Gibbs algorithm, the problem is still solvable with a small neighborhood system. • For example, a 2-D four neighborhood (East, West, North, and South) partition is given by a checkerboard style partition. It is (B—black, W—white) • BWBWB • WBWBW • BWBWB • WBWBW

  26. 5-neighborhood partition • A 2-D five neighborhood (East, West, North, South, and southwestern) partition is given by (R—Red, G—Green, B—Blue). • RGBRGBRGBRGB • GBRGBRGBRGBR • BRGBRGBRGBRG • RGBRGBRGBRGB • GBRGBRGBRGBR

  27. 3-D checkboard • The partition to 3-D case is much more difficult than that to 2-D neighborhood system, but to the 6-neighborhood case (East, West, North, South, bottom, up) the partition is elegant. Suppose we have a checkerboard in each plain, we can have one plain beginning from a black block (the upper left corner), and another layer beginning from a white block, respectively. It is a 3-D checkerboard.

  28. Parallel strategy based on independent partition • With the independent set partition, we can consider the parallel strategy of this algorithm. In a cluster computer, we cut an image into several data block as a simple way, and assign each block to a computer. Here is the key to this implementation: each computer will not update the sites in raster scanner order, it will update the sites with white color first, black later or on the other way around. Each computer does this in same order. After these computers complete one color in the assigned data block, they need to exchange the neighbor column/row. Then they will proceed to update another color. After all color have been updated, the next iteration will begin.

  29. Data exchange after the black blocks are updated

  30. Data exchange after the white blocks are updated

  31. parallel algorithm on multigrid clusters

  32. Partial parallel algorithm

  33. Method2--Ensemble parallel • Multiple trials--the simultaneous execution of N instances of the same algorithm over the same dataset seeded different starting configurations. If these ensemble instances are driven from the same prior, then there is evidence that they will converge locally to a single instance, offering the possibility that K ensemble instances can be collapsed into 1 instance over identified point sets in the volume.

  34. Method3--locally adaptive cooling schedule based on multiresolution tiling • Site aging • multiresolution tiling with self-similar sets • local moments • Rank temperature based on tile

  35. Site aging • Observation---Background Area • If the possibility that the configuration of site’s neighborhood changes over iterations once the the schedule had reached an inverse temperature given by N0 is a small value, then we can change the visitation schedule so that that site does not invoke the sampler process once the temperature exceeds N0

  36. Seek heuristic that predicts the inverse temperature • Need a flexible and tunable plane partition • Multiresolution tilling with self-similar sets • Develop a function that reflect the complexity of the data in the sense of classification • Wavelet decomposition---Harr wavelet • local moments • rank the cooling temperature based on the functions

  37. Multiresolution tilings --1 • A well-established theory by Mathematicians • Dilation Transformation • Translation transformation • Multiresolution analysis • scaling function • definition of wavelet • If a multiresolution analysis’s scaling function is the characteristic function of a measurable set Q and |Q|=1, then Q can tile a plane.

  38. Multiresolution tilings --2 • If |Q|=1, Q is the attractor of an affine transformation. Q can be solved by the iteration Qn+1=Union(invA(Qn+k). Each intermediate Qn can tile a plane as well. K belongs to coset representative set. The union is over the complete coset. • So we have a geometrically tunable, irregular system over which heuristics for the cooling schedule can be developed

  39. Local Moments • Local moments definition • first moment--mean • second moment--variance • higher moments

  40. Heuristic • Choose a tile( inverse problem, but we are considering to choose one from tile banks based on some similarity measures) • wavelet decomposition over tilling--tree presentation • Ranking • Assign cooling schedule

  41. Evaluation Plan • Validation test • Apply the algorithm to synthetic image with known priors • comparison with existing classification result • Speed up • efficiency • scalability

  42. Summary of research goals and expected contributions • Development of a SIMD MRF/Gibbs classification algorithm based on independent set partition • Rigorous medical image classification result based on this algorithm with random initial configuration • Rigorous result of the speedup, efficiency about this algorithm • Exploration of ensemble parallel algorithm in the application of medical image classification • Exploration of multiresolution tiling on locally adaptive cooling schedule

  43. Research facilities • In general, the research facilities for this study is available or can be accessed in Computer Science and Engineering Department and Booth Research Center at UConn. • Hardware--SUN, PC, SGI workstation • Software--mobile agent, MPI, openMP,... • Network--100BaseTEthernet,Gigabit Ethernet,OC3ATM... • The data that will be used are Visible Human Data, which Image Processing Lab has the license to acquire this data. • Super computer facilities---NPACI

  44. Bibliography--1 • [1] The visible Human Dataset, National Library Medicine, http://www.nlm.nih.gov/pubs/factsheets/visible_human.html • [2] Stuart Geman and Donald Geman, “Stochastic Relaxation, Gibbs Distributions and the Bayesian Restoration of Images,”IEEE PAMI Vol 6, No.6, 1984 • [3] Ming chuang Zhu, Robert M. Haralick, James B. Campbell, “Multispectral Image Context Classification Using Stochastic Relaxation,” IEEE PAMI Vol. 20, No. 1, 1990 • [4] T. Degget, I.R. Greenshields and G. Weerasinghe, “Asynchronous, parallel pseudo Gibbs Classification of the VF Dataset,” Proceedings of twelfth IEEE symposium on computer-based medical systems • [5] Gerhard Winkler, “Image Analysis, Random fields and Dynamic Monte Carlo Methods--A Mathematical introduction,” Springer, 1995 • [6] W.T. Tutte, “Graph Theory, Encyclopedia of Mathematics and its Applications,” Cambridge University Press, 1984 • [7] Laurent Younes, “Synchronous Random Fields and Image Restoration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No.4, April 1998, pp. 380-390 • [8] Soo-Young Lee, Kyung Geun Lee, “Synchronous and Asynchronous Parallel Simulated Annealing with Multiple Markov Chains,” IEEE Transactions on Parallel and Distributed Systems, Vol. 7, No. 10, October 1996, pp. 993-1008

  45. Bibliography--2 • [9] Hao Chen, Nicholas S. Flann, and Deniel W. Watson, “Parallel Genetic Simulated Annealing: A Massively Parallel SIMD Algorithm,” IEEE Transactions on Parallel and Distributed Systems, Vol. 9, No.2, February 1998, pp. 126-136 • [10] B. Hajek, “Cooling Schedules for Optimal Annealing,” Mathematics of Operations Research, Vol 13, pp. 311-329, 1998 • [11] Andrew Sohn, “Parallel N-ary Speculative Computation of Simulated Annealing, IEEE Transactions on Parallel and Distributed Systems,” Vol. 6, No. 10, October, 1995 pp. 997-1005 • [12] Tarek M. Nabhan and Albert Y. Zomaya, “A parallel Simulated Annealing Algorithm with Low Communication Overhead,” Vol. 6, No. 12, December 1995, pp. 1226-1233 • [13] EE. Witte, R.D. Chambelain and M.A. Franklin, “Parallel Simulated Annealing using Speculative Compuation, ” IEEE Transaction on Parallel and Distributed Systems, Vol.2, No.4, pp483-495, April. 1991. • [14] Beba C. Vemuri, Chhandomay Mandal, and Shang-Hong Lai, “A Fast Gibbs Sampler for Synthesizing Constrained Fractals,” IEEE Transactions on Visualization and Computer Graphics, Vol.3, No. 4, October-December 1997, pp.337—351 • [15] Fure-Ching Jeng, John W. Woods, and Sanjeev Rastoni, “Compound Gauss-Markov Random Fields for Parallel Image Processing,” in Markov Random Fields—Theory and Applications, pp. 11-38, Edited by Rama Chellappa and Anil Jain, Academic Press, 1993

  46. Bibliography--3 • [16] Zhihong Yang, Ian R. Greenshields, “Volume Visible Human Data Classification with Parallel Dynamic Monte Carlo Methods,” to appear, in The 4th World Multiconference on Systemics, Cybernetics and Informatics and the 6th International Conference on Information Systems, Analysis and Synthesis. • [17] Madych, W., “Some Elementary Properties of Multiresolution Analyses of,” in Wavelets: A Tutorial in Theory and Applications, Ed. C.K. Chui, Academic Press, 1992 • [18] Ian R. Greenshields, Zhihong Yang, “A Multigrid Approach to the Gibbsian Classification of Mammograms,” to appear in the 13th IEEE symposium of Computer Based Medical Systems. • [19] Ian R. Greenshields, “Local Moments, Contractive IFS and Multiresolution Decompositions of 3D Imagery,” Proceeding of Microscopy, Holography and Interferometry in Biomedicine, SPIE Vol. 2083, 174-183, 1993 • [20] http://www.npaci.com • [21] Danny B. Lange, Mitsuru Oshima, “Programming and Deploying Java Mobile Agents with Aglets,” Addison-Wesley, 1998 • [22] Ian T. Foster, “Designing and Building Parallel Programs—Concepts and Tools for Parallel Software Engineering,” Addison-Wesley Publishing Company, 1994 • [23] Joel A. Rosiene, Ph.D. dissertation, “Affine Transformations and Image Representation,” the University of Connecticut, 1994 • [24] Thomas A. Daggett, Ph.D. dissertation, “MRF-Gibbs Context-Dependent Classification on a Small-scale Cluster Computing System,” the University of Connecticut, 1998

More Related