1 / 30

Pattern Detection and Co-methylation Analysis of Epigenetic Features in Human Embryonic Stem Cells

Pattern Detection and Co-methylation Analysis of Epigenetic Features in Human Embryonic Stem Cells. Ben Niu , Qiang Yang, Jinyan Li, Hong Xue, Simon Chi-keung Shiu, Weichuan Yu, Huiqing Liu, Sankar Kumar Pal HKPolyU. Computational Epigenetics.

irisc
Download Presentation

Pattern Detection and Co-methylation Analysis of Epigenetic Features in Human Embryonic Stem Cells

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pattern Detection and Co-methylation Analysis of Epigenetic Features in Human Embryonic Stem Cells Ben Niu,Qiang Yang, Jinyan Li, Hong Xue, Simon Chi-keung Shiu, Weichuan Yu, Huiqing Liu, Sankar Kumar Pal HKPolyU

  2. Computational Epigenetics • An emerging and most exciting area incorporating the state of the art • Machine learning • Molecular biology • Aims to understand the epigenetic process in gene transcriptional regulation • Advance our knowledge to the medical arsenal in treating human diseases.

  3. The Research • Human Epigenome project (HEP): the next wave to the Human Genome Project (HGP) • Started in 2003 after completion of the Human Genome Project. • HEP aims to identify the epigenetic markers associated with human diseases • ‘Journal of Epigenetics’ has been released: first journal dedicated to the communications in Epigenetics, started in 2006. • Series of publications in highly cited journals in 2005-07: • Nature • Focus issue on epigenetics, Nature Review Genetics, April, 2007. • Cell • Special issue on epigenetics, Cell, Feburary, 2007. • J. Bioinformatics • We are jointly invited to write a review paper on computational epigenetics to the Journal of bioinformatics.

  4. The Industry • Epigenetics open a rapidly growing market of epigenetic medical services (diagnostic, drugs) • According to 2007 report of MarketResearch, as shown in the figure, the global market of epigenetic applications (i.e., drug+ diagnostic services) will be 4 billion US$, by 2012, the annual Growth rate at present time is 60.4%. Promising direction!

  5. What we know • Basically: • Genes can be turned on/ off through Cytosine methylation or Histone modifications, a reversible process • The epigenetic events is heritable, can change the cell’s phenotypes without altering its sequence • Functionally: • Dominate the growth of cancer and embryonic stem cells • These two type of cells are of great medical interests • Cancer is the leading cause of human death • hESCs are the answer to the regenerative treatments • For the two points see: Nature Insight: Epigenetics Vol. 447, 2007.

  6. What we don’t know • The logic behind DNA methylation underlying cells’ behaviors remains unclear • How DNA methylation concerts the product of molecular machineries for cell functions • In the context of epigenetics, we need to address two issues: • What are the rules of DNA methylation differing the cancer, the normal, the human ES cells from each other. • Uncover the interactive patterns of the genes in these cells. The role of methylation in coordinating the activities of genes.

  7. State of the art in Methylation Analysis • SVMs, ANNs have been successfully applied to predict the epigenetic events, for example, • Methylation status of CpG sites • Computational prediction of methylation status in human genomic sequence, PNAS, Vol. 103(28), 2006. • CpG islands/ promoter regions in DNA sequence • CpG island mapping by Epigenome prediction’, Plos Computational Biology, Volume 3(6), 2007. • Promoter prediction analysis on the whole human genome’, Nature Biotechnology, Vol. 22, 2004. • Cancers • Tumour class prediction and discovery by microarray-based DNA methylation analysis, NAR, Vol. 30, 2002. • Co-regulation analysis through clustering • Clustering of methylation arrays • Marjoram P, Chang J, Laird PW, Siegmund KD: Cluster analysis for DNA methylation profiles having a detection threshold. BMC Bioinformatics Vol. 7, 2006.

  8. 2 Problems • Traditional methods, SVMs, ANNs are • ‘black box’ models • Knowledge extracted are characterized by the connection weights, and Support Vectors. • hard to understand for biologists • Investigate the co-methylation patterns • Cancer cells • human Embryonic stem cells (hESCs) • Co-methylation analysis can help to uncover the hidden pathways leading to new drug design

  9. Methodogy • Two computational methods proposed • Adaptive Cascade Sharing Trees (ACS4) for problem 1 • To learn the human understandable DNA methylation rules • Adaptive clustering for problem 2 • To highlight the orchestration of genes for function through the methylation mechanism

  10. ACS4 method (1) • Promoters are regulatory elements upstream the 5’ end of TSS. • Methylation of promoter CpGs remodels the chromatin structure for gene expression methyl-binding proteins (MeCP) Methylated CpG Histone deacetylases (HDAC) methyltransferase

  11. ACS4 method (2) • Methylation levels of promoters can be measured using Microarrays • Each spot on the array corresponds to a promoter CpG sites. • The methylation intensity is a numerical value between 0 and 1.

  12. ACS4 method (3) • Objective: learn human understandable rules that define the epigenetic process in cancer and embryonic stem cells • Idea: • Adaptively partition the numeric attributes into a set of the linguistic domains, e.g., ‘high’, ‘very high’, ‘Medium’, ‘Low’, ‘Very Low’ . • Train a committee of trees to select the most salient features and predict through voting.

  13. ACS4 method (4)

  14. ACS4 method (5)

  15. ACS4 method (6)

  16. ACS4 method (7) • We have learned k rules • Given a testing sample, compute pi • Rules are weighted according to their Coverage, i.e., the number of matched samples • Overall prediction is made by voting across the rules.

  17. ACS4 method (8) • Dataset: • 37 hESC, 33 non-hESC, 24 cancer cell lines, 9 normal cell lines. • 1,536 attributes • Result • Just 2 attributes are enough to separate the 3 cell types • No need of 40 attributes by using fisher’s score in [1]. • Wet lab cost can be reduced by testing on 2 attributes only, instead of 40. • Accuracy is better, except when compared with SVM, but SVM cannot tell us ‘why’. • Rules can be easily understood to biologist to conceive new biological experiments seeking in wet lab proof. [1] ‘Human embryonic stem cells have a unique epigenetic signature‘, Genome Research, Vol. 16, 2006

  18. ACS4:Biological interpretation(1) • Example: • IFPI3-504is ‘High’ THENhESC • IFPI3-504is ‘Low’ ANDNPY-1009is ‘Low’ THENNormal • IFPI3-504is ‘Low’ ANDNPY-1009is ‘High’ THENCancer

  19. ACS4:Biological interpretation(2) • The two marker genes • PI3(PI 3-kinases )-activate the cell growth, proliferation, differentation, motility, intracellular trafficking • Down-regulated in hESCs • maintain stable state • Keep from growth, proliferation, differentiation… • Neuropeptide Y (NPY)- signal protein produced by nerves • [Immunology:Stress and Immunity, Science, Vol. 311, 2006.] • Experiment shows deficiency of NPY cause immune defects • Consistent to our computational result

  20. ACS4: Biological interpretation(3) • Example: • IFPI3-504is ‘High’ THENhESC • PI3 gene is silenced to maintain a stable cell context in hESCs • IFPI3-504is ‘Low’ ANDNPY-1009is ‘Low’ THENNormal • Normal cells can grow, and grow safely with immune defenses • IFPI3-504is ‘Low’ ANDNPY-1009is ‘High’ THENCancer • Cancer cells grow, and grow out of control, due to the immune deficiency

  21. Adaptive clustering (1) • Co-methylation of genes are important • Because we want to know how genes are co-working in the epigenetic framework • Clustering should reflect the true distribution of the gene space. • assuming data are normally distributed, which is usually the case in real world applications • Fisher’s criterion is computed to validate the result of clustering, and choose the best one.

  22. Adaptive clustering (2) • For embryonic and cancer cells we optimally cluster the 1536 genes • for each round of clustering with k-Means, we start from different # of initial centers. • Candidate clustering result with the largest Fisher’s discriminant score qualifies for further analysis. • Each cluster of genes can be functionally related, and participate in the same pathway of DNA methylation. • By further analysis of the sequences, we can find out the feature binding sites for each cluster of genes, and discover the epigenetic binding factors unknown before.

  23. Adaptive clustering (3) • For cancer and hESCs, 41 and 59 clusters generate the best separation • So, 41 and 59 functional domains are though to be underlying the 1536 genes.

  24. Adaptive clustering (4) • In experiments: • The distance measure dis based on Pearson’s correlation score. • N = 60.

  25. Adaptive clustering (5) • For hESC the formed clusters of the co-methylated genes, e.g., MAGEA1, STK23, EFNB1,MKN3, TMEFF2, AR, FMR1, are most related to differentiation, self-renewal, and migration of hESC activities.

  26. Adaptive clustering (6) • For cancer cells, the formed clusters of the co-methylated genes, e.g., RASGRF1, MYC, and CFTR, are highly involved in cell apoptosis, DNA repair, tumour suppressing, and ion transportation, which are typically the immunological activities of cells against DNA damages.

  27. Adaptive clustering (7) • Particularly, we discover: • gene CFTR (7q31), long in focus in medical research, is co-methylated with MT1A (16q13) and KCNK4 (11q13). • CFTR defects contribute to the disease of Cystic Fibrosis (CF). • One in twenty-two people of European descent carry one gene for CF, making it the most common and lethal genetic disease of still no cure at the present time among such people. • The CFTR and KCNK4 proteins formthe ion channels across cell membranes, while MT1A proteins bind with the ions as the transporters. They are all related to the transportation of ions across cell membrane, functionally related. • The can participate in the same pathway, the breakdown of which can explain the process of turmogenesis

  28. Adaptive clustering (8) • Two summarize: • Co-methylation occurs widely across the whole genome • It dominates the growth and development of various types of cells • Different cells exhibit different patterns of co-methylation • Our adaptive clustering algorithm can naturally capture the group-wise activities in these cells.

  29. Conclusion • Genome wide Epigenetic analysis: promising direction to research and industry • The logic of DNA methylation can be learned and interpreted by using our proposed ACS4 algorithm • Just 2 attributes are good enough to separate the 3 cell types • No need of 40 attributes by using fisher’s score in G.R. paper. • Wet lab cost can be reduced by testing on just 2 attributes, instead of 40, lab cost is significantly reduced, more cost - effective. • More accurate by adaptively partition the attribute domain • Knowledge learned are human understandable, to assist biologist design in wet lab test for further investigations • Adaptive clustering • Epigenetic events are highly active in cancer and hESCs. • Functionally related genes are co-methylated • patterns of co-methylation are much different in cancer and hESCs, highlighting the versatile roles of Epigenetic events in cell function.

  30. Thanks!

More Related