1 / 46

Robust methodologies for partition clustering

A comprehensive overview of robust methodologies for partition clustering, including decomposition of covariance matrix, landscape mapping, and validation using synthetic data sets and metabolic sub-typing.

rmarian
Download Presentation

Robust methodologies for partition clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Robust methodologies for partition clustering Paulo LisboaTerence Etchells, Ian Jarman and Simon Chambers

  2. Overview • Partition clustering - critique • Decomposition of the covariance matrix • Landscape mapping of cluster solutions • Validation for two synthetic data sets and metabolic sub-typing

  3. BioinformaticsNottingham Tenovous Primary Breast Carcinoma Series Consecutive series of 1,944 cases of primary operable invasive breast cancer(n=1,076 with all markers present) Patients presenting during 1986-98 Protein expression comprising 25 immunohistochemical markers related to tumour malignancyderived through high-throughput protein expression using TMA Abd El-Rehim et al, Int J Cancer, 116, 340-350, 2005.

  4. Partition clustering – relevance to bioinformatics p53 CK 5/6 C-erbB-2 BRCA1 ER PgR

  5. Partition clustering –open issues K-means i. Assume #K ii. Initialise #N ? iii. Sort by optimality ? iv. Select best for #K ? v. Select #K(s) ? vi. Single cluster or ensemble ? • Identify a suitable algorithm: • Model-based or model-free ? • Hierarchical, K-means, PAM ? • Return {Sa,...,Sz} solutions • Validate & interpret each solution

  6. Separation index:Decomposition of the scatter matrix SW1 SW2 SB • Scatter matrices

  7. Separation index:Decomposition of the scatter matrix SW1 SW2 SB • Invariant separation matrix and index

  8. N.B. If |ST|=0 → Project onto subspace of cohort means a1 a3 a2

  9. Theorem: is invariant to dimensionality reduction under Mahalanobis rotations ~ a1 ~ a3 ~ a2

  10. K-means clustering

  11. Adaptive Resonance Theory (ART) clustering

  12. Adaptive Resonance Theory (ART) clustering

  13. Concordance measure

  14. Optimality principle i. N initialisations ii. Sort by J iii. Select top p% iv. Calculate pairwise CV v. Retain med(CV) vi. Plot (J, med_CV) • Reproducibility with • Best Separation - max(J) • Best Concordance – max(CV) • under repeated initialisations

  15. Synthetic data (10 cohorts)

  16. Synthetic data (10 cohorts)

  17. Synthetic data (10 cohorts)

  18. Synthetic data – mixing structure (Sammon Map)

  19. Synthetic data – Visualisation in data space

  20. Synthetic data (10 cohorts) 10 2 9 85 58 100 97 66 45 6 38 1 5 113 5 52 55 18 133 48 59 44 6 42 177 89 8 118 7 24 84 3 3 42 118 78 92 4 124 63 4 88 112 3 208 93 6 79 1 55 189 150 127 24 23 69 101 1 1 189 3 59 54 219 117 7 137 177 7 238 5 21 49 2 172 238 212 60 2 2 143 335 5 183 161 978 294 238 2 47 192 738 2 142 2 185 8 388 738 173 29 153 94 1 455 8 190 4 28 177 1 170 98 181 455 28 192 177 9 98 2 361 4 1 164 181 177 383 100 5 169 6 97 190 144 2 173 1 161 3 176 171 190 97 176 19 96 4 5 160 96 4 3 132 1 96 129 3 129 126 132 127 97 97 3 6 7 4 97 97 95 95 97 95 96

  21. Synthetic data (10 cohorts) Max J SeCo Max Cv

  22. BioinformaticsNottingham Tenovous Primary Breast Carcinoma Series Consecutive series of 1,944 cases of primary operable invasive breast cancer(n=1,076 with all markers present) Patients presenting during 1986-98 Protein expression comprising 25 immunohistochemical markers related to tumour malignancyderived through high-throughput protein expression using TMA Abd El-Rehim et al, Int J Cancer, 116, 340-350, 2005.

  23. Marginal distributions

  24. Landscape map (SeCo)

  25. Stability index (Cv)

  26. Landscape map (SeCo)

  27. Cluster hierarchy (1) C5, 179 159 C7, 186 160 C2, 106 C4, 230 105 206 67 C1, 266 C5, 120 105 240 44 C3, 108 C2, 109 C4, 430 107 407 107 112 C4, 116 C3, 459 C3, 130 458 114 C6, 209 C4, 94 C1, 781 C3, 285 202 22 246 322 62 94 C1, 96 C2, 373 C5, 205 103 201 93 24 51 65 24 C2, 209 C1, 121 C2, 295 C8, 106 102 105 112 244 C1, 244 C2, 198 C6, 119 208 26 116 219 79 C6, 174 C1, 152 C3, 215 172 186 C2, 234 169 C4, 277 44 51 91 C1, 142 C5, 192 101 127 C3, 205 94 C7, 167

  28. Cluster hierarchy (2) C1, 177 164 C3, 185 172 C2, 131 C5, 184 120 167 C5, 237 C4, 189 15 183 201 46 65 C8, 183 C4, 209 C1, 338 300 134 161 116 228 C2, 249 C3, 459 C1, 241 458 155 125 78 105 C3, 246 C3, 163 C1, 781 C2, 365 209 322 151 C6, 121 C2, 373 C4, 252 240 114 91 102 51 124 C3, 238 C1, 119 C2, 295 C7, 106 19 243 C1, 244 C2, 229 C5, 104 228 229 116 93 99 101 C5, 97 C4, 135 C6, 120 113 117 C7, 138 17 C3, 117 116 136 198 C6, 126 C2, 198 20 62 C1, 90 66 C4, 93

  29. Solution A

  30. Solution A

  31. Solution B

  32. Solution A

  33. Sub-type profiling Clusters A Clusters B Luminal New 2 Luminal N

  34. Sub-type profiling Clusters A Clusters B Luminal A HER2

  35. Sub-type profiling Clusters A Clusters B Basal p53 - Basal muc1 + Basal p53 + Basal muc1 -

  36. Consistency with consensus clustering

  37. Molecular sub-typing

  38. Molecular sub-typing

  39. Summary • Partition clustering - critique • Decomposition of the covariance matrix • Landscape mapping of cluster solutions • Validation for two synthetic data sets and metabolic sub-typing

  40. Ferrara data (n=633)

  41. Ferrara data (n=633)

  42. Ferrara data (n=633)

  43. Ferrara data (n=633) JMU Cluster 1/5 JMU Cluster 2/5 JMU Cluster 4/5 JMU Cluster 3/5 JMU Cluster 5/5

  44. Ferrara data (n=633)

  45. Ferrara data (n=633)

More Related