1 / 29

Integrating Genomes D. R. Zerbino, B. Paten, D. Haussler Science 336, 179 (2012)

Integrating Genomes D. R. Zerbino, B. Paten, D. Haussler Science 336, 179 (2012). Teacher: Professor Chao, Kun-Mao Speaker: Ho, Bin-Shenq June 4, 2012. Outline. Overview Obtaining Genomic Sequences Modeling Evolution of Genotype From Genotype to Phenotype Looking Ahead to Applications

Download Presentation

Integrating Genomes D. R. Zerbino, B. Paten, D. Haussler Science 336, 179 (2012)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integrating GenomesD. R. Zerbino, B. Paten, D. HausslerScience 336, 179 (2012) Teacher: Professor Chao, Kun-Mao Speaker: Ho, Bin-Shenq June 4, 2012

  2. Outline • Overview • Obtaining Genomic Sequences • Modeling Evolution of Genotype • From Genotype to Phenotype • Looking Ahead to Applications • Conclusion

  3. Overview • Specialization in computational genomics • Integration of genetic, molecular, and phenotypic information • Impact on diverse fields of science • New window into the story of life population genetics, phylogenetics human disease genetics + graph theory, signal processing statistics, computer science

  4. Milestones • First genome sequences_1970s Bacteriophage MS2 RNA: 3,569 nucleotides long_1976 • Computational genomics_1980 Smith and Waterman Stormo et al. • 16-fold improvement in computational power under Moore’s law • A 10,000-fold sequencing performance improvement in the past 8 years

  5. Computational Genomics Organismal phenotype gene product acting in cellular pathways affecting organisms ( function ) Evolution Genomic data DNA sequence evolving in time ( history ) Molecular phenotype chromatin piece interacting with other molecules ( mechanism )

  6. Obtaining Genomic Sequences Genome assembly given sufficient read redundancy • Large redundant regions (repeats) → complex networks of read-to-read overlaps not all reflecting actual overlaps → to determine which overlaps being legitimate and which being spurious → NP-hard problem → undetermined, prone-to-errors, costly-to-finish regions • Newer sequencing technologies with longer reads

  7. Obtaining Genomic Sequences Reference-based assembly • Tendency of bias toward reference genome • Newer sequencing technologies with longer reads

  8. Modeling Evolution of Genotype • Diversity of Genomes • Alignment • Phylogenetic analysis

  9. Diversity of Genomes every genome being the result of a 3.8-billion-year evolutionary journey from the origin of life Mostly shared and partly unique • Single-base change_substitution, SNP • Indel_insertion, deletion • Tandem duplication • Recombination • Transposition • Rearrangement_inversion, segmental deletion, segmental duplication, fusion, fission, translocation • Whole genome duplication

  10. Diversity of Genomes Germline selections ↓ Evolution Somatic selections ↓ Cancer / Immunity

  11. Assembly and Alignment Fig. 1. Assembly and alignment.

  12. Alignment • Alignment with assumption of derivation from a suitably recent common ancestor • What being conserved or changed during the evolution from common ancestor • Substitution, indel, segment order, copy number • Local alignment for conserved functional regions of more distantly related genomes • Global / Genome alignment for genomes from closely related species

  13. Phylogenetic Analysis • Single tree providing an explicit order of gene descent through shared ancestry • Finding optimal phylogeny under probabilistic or parsimony models of substitutions and indels being NP-hard • Being complicated by homologous recombination • Intending to construct a tractable unified theory of genome evolution with stochastic processes jointly describing diversification events of genome

  14. From Genotype to Phenotype Fig. 2. The dynamic processes that affect and are affected by the genome.

  15. Genomes_Mechanisms_Functions • Active molecules of the cell, including proteins, messenger RNAs, other functional RNAs • Epigenetic mechanisms regulating RNA and protein production and function • Gene regulatory networks • Protein signaling cascades • Metabolic pathways • Regulatory network motifs

  16. From Genotype to Phenotype • Exploring unfolding history and diversity of life • Deriving experimental data from an expansion of cell culture resources for diverse species / tissues and newer single-cell assay methodologies • Correlating specific segregating variants with phenotypic traits or diseases • Identifying causal variants by complete genome analysis in related as well as unrelated cases and controls and in combination with better prediction of possible effects of genome variants

  17. From Genotype to Phenotype • Constructing models of molecular phenotypes involving epigenetic state, RNA expression, and (inferred) protein levels through hidden Markov models, factor graphs, Bayesian networks, and Markov random fields • Incorporating biological knowledge into classification and regression methods (e.g., general linear models, neural networks, and support vector machines)

  18. Looking Ahead to Applications • Genome data growth collectively from petabytes (1015 bytes) today to exabytes (1018 bytes) tomorrow • Cancer diagnosis and treatment • Immunology • Stem cell therapy • Agriculture • Human prehistory study

  19. Conclusion • Facing challenges of obtaining maximum information from every sequencing experiment • To borrow and tie together advances from a spectrum of different research fields into foundational mathematical models • Between model comprehensiveness and computational efficiency • To be shaped by increasing knowledge of biology

  20. Thank You For Your Attention

More Related