1 / 16

Compact and Understandable Descriptions of Mixtures of Bernoulli Distributions

Compact and Understandable Descriptions of Mixtures of Bernoulli Distributions. Jaakko Hollm én and Jarkko Tikka Helsinki Institute of Information Technology Helsinki University of Technology Espoo, Finland. Background on the problem.

heavynne
Download Presentation

Compact and Understandable Descriptions of Mixtures of Bernoulli Distributions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compact and Understandable Descriptions of Mixtures of Bernoulli Distributions Jaakko Hollmén and Jarkko Tikka Helsinki Institute of Information Technology Helsinki University of Technology Espoo, Finland Hollmén, Tikka: Compact and Understandable Descriptions of Mixtures of Bernoulli Models

  2. Background on the problem • Collaboration: Knuutila, Myllykangas at the University of Helsinki • DNA copy number amplifications are mutations in the DNA structure  cancer • Bibliomics survey of 838 journal articles during 1992-2002 • Data: chromosomal mutations of 4500 cancer patients Hollmén, Tikka: Compact and Understandable Descriptions of Mixtures of Bernoulli Models

  3. Example on the data collection S. Myllykangas, J. Himberg, T. Böhling, B. Nagy, J. Hollmén, and S. Knuutila. DNA copy number amplification profiling of human neoplasms . Oncogene, 25(55):7324-7332, November 2006 Hollmén, Tikka: Compact and Understandable Descriptions of Mixtures of Bernoulli Models

  4. Chromosomal regions: names • Standardized nomenclature for chromosomal regions (spatial) • 1p36.2: chromosome 1, the arm p, region 36, subregion 2 • Ranges: 1p36.1-p36.3 • Hierarchical, irregular naming scheme used in literature Hollmén, Tikka: Compact and Understandable Descriptions of Mixtures of Bernoulli Models

  5. DNA copy number amplification data as 0-1 data Cancer patients (i) Chromosomal areas; spatial coordinates (j) Hollmén, Tikka: Compact and Understandable Descriptions of Mixtures of Bernoulli Models

  6. Mixture models for 0-1 data • Cancer is a collection of diseases • Finite mixture model of multivariate Bernoulli distributions: • Learn the model with the EM algorithm Hollmén, Tikka: Compact and Understandable Descriptions of Mixtures of Bernoulli Models

  7. Model selection: how many components in a mixture? • 5-fold cross validation repeated 10 times • Try different solutions, based on average likelihood for a validation set  J=6 training validation Hollmén, Tikka: Compact and Understandable Descriptions of Mixtures of Bernoulli Models

  8. Mixture model: Chromosome 1 Mixture Components j=1,...,6 • Model is summarized by J + Jd parameters (about 200 parameters altogether) Chromosomal areas (spatial coordinates) Hollmén, Tikka: Compact and Understandable Descriptions of Mixtures of Bernoulli Models

  9. Mixture model in clustering Clustered cancer patients Chromosomal areas (spatial coordinates) Hollmén, Tikka: Compact and Understandable Descriptions of Mixtures of Bernoulli Models

  10. Solution creates a problem • We solved the modeling problem, but created a communications problem! • How do the cancer experts understand and refer to our models? Names? Hollmén, Tikka: Compact and Understandable Descriptions of Mixtures of Bernoulli Models

  11. Compact and Understandable Descriptions • Understandable (language, nomenclature) • Compact (size of the description) • Describe the parameters of the model • Use the model to cluster the data and describe the data in the clusters Hollmén, Tikka: Compact and Understandable Descriptions of Mixtures of Bernoulli Models

  12. Describe the model parameters • Mode of the component distribution • most probable chromosomal area • Hypothetical mean organism (HMO) • quantize the parameters to represent a hypothetical case of data Hollmén, Tikka: Compact and Understandable Descriptions of Mixtures of Bernoulli Models

  13. Describe the clustered data • Describe the margins of the clusters with maximal frequent itemsets • Why maximal: describe the largest representative commonality in the data; extracting frequent itemsets not feasible • Express the itemsets as ranges of contiguous chromosomal areas Hollmén, Tikka: Compact and Understandable Descriptions of Mixtures of Bernoulli Models

  14. Descriptions, Chromosome 1 • Maximal frequent itemsets extracted globally: 1q21-q22,1q22-q23 • Shadowing and spurious mutations Hollmén, Tikka: Compact and Understandable Descriptions of Mixtures of Bernoulli Models

  15. Amplification models and patterns 1q32-q44, 1q11-q44, 1q21-q25, 1q21-q23, 1p35-p32, 2p15-p14, 2q32, 2p25-2p24,2 p24-p23, 2p25-2p11.1, 3q26.1-q26.3, 3q11.1-q29, 3p26-q29, 3q25-q29, 3p24, 3q27-q29, 4q12, 4p15.3-p12, 5p13-p12, 5p15.3-p11, 5p15.3 5q35, 6q22, 6p25-q27, 6p25-p22, 6p12, 6p25-p11.1, 6q21-q27, 7q3- q36, 7p21, 7p13-p11.2, 7q21 ,7p22-q36, 7p22-p11.1, 8p23-q24.3, 8q24.1-q24.3, 8q23, 8q21.1-q22, 8q21.1-q24.3, 8q11.1-q24.3, 9q11-q34, 9p24 q34, 9q34, 9p24-p21, 10q11.1-q26, 10p15-p12, 11q11-q25, 11p15-q25, 11q23, 11q13, 11q14-q22, 11p12-p11.2, 11q12-q13, 12p13-p11.1, 12q13-q15, 12q11-q21, 12q12-12q23,12q24.1-q24.3, 12p12, 12q14-q15, 13q32-q34, 13p13-q34, 13q13-q14, 13q22-q34, 13q22-q31, 13q11-q34, 14q12-q21, 14q12-q32, 14q32, 15q11.1-q26, 15q24-q25, 16p13.3-q24, 16p13.3-p11.1, 16q22, 16p13.1-p12, 17q11.1-q25, 17p13-11.1, 17q21-q25, 17q12-q21, 17p13-q25, 17q24-q25, 17q22, 18q11.1-q23, 18q21, 18p11.3-18q23, 18p11.3-11.1, 19q13.1, 19p13.3-p13.2, 19p13.3-q13.4, 19q13.1-q13.4, 20q12, 20p12-p11.2, 20q11.1-q13.3, 20p13-q13.3, 20q13.1-q13.3, 20q11.1-q12, 21p13-q22, 21q11.2-q21, 21q21-q22, 21q11.1-q22, 22q11.1-q13, 22q13, 22p13-q13, Xp22.3-q28, Xp22.1-p11.2, Xq26-q28, Xq11-q28 Hollmén, Tikka: Compact and Understandable Descriptions of Mixtures of Bernoulli Models

  16. Summary and Conclusions • DNA copy number amplifications (mutations) in cancer: database collected from literature • Mixture modeling of 0-1 data • Models summarized based on: parameters and clustered data with maximal frequent itemsets • The collection of DNA copy number amplifications forms a new basis for cancer classification Hollmén, Tikka: Compact and Understandable Descriptions of Mixtures of Bernoulli Models

More Related