1 / 49

Of Sea Urchins, Birds and Men

Algorithmic Functions of Computational Biology – Course 1 Professor Istrail. Of Sea Urchins, Birds and Men. Darwin ’ s Finches. and Coco. 2. Algorithmic Functions of Computational Biology – Course 1 Professor Istrail. The Father of All Dot Plots. The Human Genome. The Synteny Problem.

kelli
Download Presentation

Of Sea Urchins, Birds and Men

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Algorithmic Functions of Computational Biology – Course 1 Professor Istrail Of Sea Urchins, Birds and Men

  2. Darwin’s Finches and Coco 2

  3. Algorithmic Functions of Computational Biology – Course 1 Professor Istrail The Father of All Dot Plots The Human Genome

  4. The Synteny Problem Algorithmic Functions of Computational Biology - Course 1 Professor Istrail • Between distant species can reveal function • Conservation reveals selective pressure • Between near species • Conservation reveals evolutionary history • Between similar or the same species • Recent events in subpopulations • Phenotypic differences

  5. Chaining Phase Extension Phase Matching, Chaining, Extension Algorithmic Functions of Computational Biology – Course 1 Professor Istrail Matching Phase

  6. Dot Plots 101 Algorithmic Functions of Computational Biology – Course 1 Professor Istrail • a,b,c,d stand for letters A,B,C,D for words • Where letters match, put a dot • Where words match, put a line (words can be rc-ed)

  7. Dot Plots 101 Algorithmic Functions of Computational Biology – Course 1 Professor Istrail • When words line up • Reversed • Misplaced • Something gained (relative to horizontal) • Something lost (relative to horizontal)

  8. Algorithmic Functions of Computational Biology – Course 1 Professor Istrail Some large reversals in GP

  9. Algorithmic Functions of Computational Biology – Course 1 Professor Istrail NCBI has more of the centromere than anyone else (or is that N’s?)

  10. Algorithmic Functions of Computational Biology – Course 1 Professor Istrail Many reversals in GP, a piece of the end is re-ordered to the middle, celera assemblies boringly good.

  11. Algorithmic Functions of Computational Biology – Course 1 Professor Istrail Again everyone misses the first 10MB (or are those N’s) of NCBI31

  12. Rube Goldberg’s Innovation GENOMIC REGULATORY SYSTEMS Mixed character of the problem : continuous mathematics discrete mathematics

  13. Rube Goldberg’s Pencil Sharpener invention String (C) lifts small door (D) Emergency knife (S) is always handy in case opossum or the woodpecker gets sick and can't work. allowing woodpecker (Q) to chew wood from pencil (R), exposing lead. Open window (A) and fly kite (B).  allowing moths (E) to escape and eat red flannel shirt (F).   pulling rope (O) and lifting cage (P), which jumps into basket (N), As weight of shirt becomes less, shoe (G) steps on switch (H) which heats electric iron (I) and burns hole in pants (J).   Smoke (K) enters hole in tree (L), smoking out opossum (M)

  14. A Tale of Two Networks Algorithmic Functions of Computational Biology – Course 1 Professor Istrail Drosophila Sea Urchin

  15. One gene, 30 years of study, 300 docs and postdocs A Proposal for Nobel Prize “Programs built into the DNA of every animal.” Eric H. Davidson Genomic Regulatory Systems

  16. The Dogma Algorithmic Functions of Computational Biology - Course 1 Professor Istrail

  17. Genomic Regulatory Regions Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

  18. TF Binding Site Complexity Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

  19. Genome Complexity 1 Billion DNA bases 20,000 Genes

  20. Algorithmic Functions of Computational Biology - Course 1 Professor Istrail cis-Regulatory Modules Complexity 200,000 cis-Modules

  21. The DNA program that regulates the expression of endo16 in sea urchin • THE FIRST GENE

  22. THE FIRST NETWORK

  23. The View from the Genome Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

  24. The View from the Nucleus Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

  25. Building Protein-DNA Assemblies Algorithmic Functions of Computational Biology - Course 1 Professor Istrail • DNA • cismodule • Cooperativity • Linear-amp • Gates • Potentiality • Inter-cismodule linkage • Insulation • Communication

  26. The Building Blocks Algorithmic Functions of Computational Biology - Course 1 Professor Istrail • Free Energy Free energy is the “GLUE” • Protein • DNA • Protein-DNA Binding (free energy)

  27. Information Processing Algorithmic Functions of Computational Biology - Course 1 Professor Istrail

  28. Algorithmic Functions of Computational Biology - Course 1 Professor Istrail 0 1 1 0 0 1 0 0 0 • Boolean Circuit • Synchronous input and output • Completely defined gates

  29. 1.4 0.5 0 1 1 0 0 1 0 0 • 0 • 1.1 • Boolinear Circuit • Boolean Circuit • Asynchronous input and output • Synchronous input and output • Completely defined gates • Incompletely defined gates

  30. 1 1 0 1 AND OR NOT OR 1 IF (x1 = 1 AND x2= 1) THEN ….. GTAGGATTAAG …... CATCCTAATTC ……. GTATCTAGAAG …….

  31. Web page : • http://www.its.caltech.edu/~chyuh/cathy-mirsky-info.html Caltech, Davidson Lab October 2004

  32. Introduction SNPs, HAPLOTYPES

  33. Single Nucleotide Polymorphism (SNP) GATTTAGATCGCGATAGAG GATTTAGATCTCGATAGAG • The most abundant type of polymorphism A SNP is a position in a genome at which two or more different bases occur in the population, each with a frequency >1%. The two alleles at the site are G andT

  34. tttctccatttgtcgtgacacctttgttgacaccttcatttctgcattctcaattctatttcactggtctatggcagagaacacaaaatatggccagtggcctaaatccagcctactaccttttttttttttttgtaacattttactaacatagccattcccatgtgtttccatgtgtctgggctgcttttgcactctaatggcagagttaagaaattgtagcagagaccacaatgcctcaaatatttactctacagccctttataaaaacagtgtgccaactcctgatttatgaacttatcattatgtcaataccatactgtctttattactgtagttttataagtcatgacatcagataatgtaaatcctccaactttgtttttaatcaaaagtgttttggccatcctagatatactttgtattgccacataaatttgaagatcagcctgtcagtgtctacaaaatagcatgctaggattttgatagggattgtgtagaatctatagattaattagaggagaatgactatcttgacaatactgctgcccctctgtattcgtgggggattggttccacaacaacacccaccccccactcggcaacccctgaaacccccacatcccccagcttttttcccctgctaccaaaatccatggatgctcaagtccatataaaatgccatactatttgcatataacctctgcaatcctcccctatagtttagatcatctctagattacttataatactaataaaatctaaatgctatgtaaatagttgctatactgtgttgagggttttttgttttgttttgttttatttgtttgtttgtttgtattttaagagatggtgtcttgctttgttgcccaggctggagtgcagtggtgagatcatagcttactgcagcctcaaactcctggactcaaacagtcctcccacctcagcctcccaaagtgctgggatacaggtgtgacccactgtgcccagttattattttttatttgtattattttactgttgtattatttttaattattttttctgaatattttccatctatagttggttgaatcatggatgtggaacaggcaaatatggagggctaactgtattgcatcttccagttcatgagtatgcagtctctctgtttatttaaagttttagtttttctcaaccatgtttacttttcagtatacaagactttgacgttttttgttaaatgtatttgtaagtattttattatttgtgatgttatttaaaaagaaattgttgactgggcacagtggctcacgcctgtaatcccagcactttgggaggctgaggcgggcagatcacgaggtcaggagatcaagaccatcctggctaacatggtaaaaccccgtctctactaaaaatagaaaaaaattagccaggcgtggtggcgagtgcctgtagtcccagctactcgggaggctgaggcaggagaatggtgtgaacctgggaggcggagcttgcagtgagctgagatcgtgccactgcattccagcctgcgtgacagagcgagactctgtcaaaaaaataaataaaatttaaaaaaagaagaagaaattattttcttaatttcattttcaggttttttatttatttctactatatggatacatgattgatttttgtatattgatcatgtatcctgcaaactagctaacatagtttattatttctctttttttgtggattttaaaggattttctacatagataaataaacacacataaacagttttacttctttcttttcaacctagactggatgcattttttgtttttgtttgtttgtttgctttttaacttgctgcagtgactagagaatgtattgaagaatatattgttgaacaaaagcagtgagagtggacatccctgctttccccctgattttagggggaatgttttcagtctttcactatttaatatgattttagctataggtttatcctagatccctgttatcatgttgaggaaattcccttctatttctagtttgttgagattttttaattcatgtgattgcgctatctggctttgctctcatttctccatttgtcgtgacacctttgttgacaccttcatttctgcattctcaattctatttcactggtctatggcagagaacacaaaatatggccagtggcctaaatccagcctactaccttttttttttttttgtaacattttactaacatagccattcccatgtgtttccatgtgtctgggctgcttttgcactctaatggcagagttaagaaattgtagcagagaccacaatgcctcaaatatttactctacagccctttataaaaacagtgtgccaactcctgatttatgaacttatcattatgtcaataccatactgtctttattactgtagttttataagtcatgacatcagataatgtaaatcctccaactttgtttttaatcaaaagtgttttggccatcctagatatactttgtattgccacataaatttgaagatcagcctgtcagtgtctacaaaatagcatgctaggattttgatagggattgtgtagaatctatagattaattagaggagaatgactatcttgacaatactgctgcccctctgtattcgtgggggattggttccacaacaacacccaccccccactcggcaacccctgaaacccccacatcccccagcttttttcccctgctaccaaaatccatggatgctcaagtccatataaaatgccatactatttgcatataacctctgcaatcctcccctatagtttagatcatctctagattacttataatactaataaaatctaaatgctatgtaaatagttgctatactgtgttgagggttttttgttttgttttgttttatttgtttgtttgtttgtattttaagagatggtgtcttgctttgttgcccaggctggagtgcagtggtgagatcatagcttactgcagcctcaaactcctggactcaaacagtcctcccacctcagcctcccaaagtgctgggatacaggtgtgacccactgtgcccagttattattttttatttgtattattttactgttgtattatttttaattattttttctgaatattttccatctatagttggttgaatcatggatgtggaacaggcaaatatggagggctaactgtattgcatcttccagttcatgagtatgcagtctctctgtttatttaaagttttagtttttctcaaccatgtttacttttcagtatacaagactttgacgttttttgttaaatgtatttgtaagtattttattatttgtgatgttatttaaaaagaaattgttgactgggcacagtggctcacgcctgtaatcccagcactttgggaggctgaggcgggcagatcacgaggtcaggagatcaagaccatcctggctaacatggtaaaaccccgtctctactaaaaatagaaaaaaattagccaggcgtggtggcgagtgcctgtagtcccagctactcgggaggctgaggcaggagaatggtgtgaacctgggaggcggagcttgcagtgagctgagatcgtgccactgcattccagcctgcgtgacagagcgagactctgtcaaaaaaataaataaaatttaaaaaaagaagaagaaattattttcttaatttcattttcaggttttttatttatttctactatatggatacatgattgatttttgtatattgatcatgtatcctgcaaactagctaacatagtttattatttctctttttttgtggattttaaaggattttctacatagataaataaacacacataaacagttttacttctttcttttcaacctagactggatgcattttttgtttttgtttgtttgtttgctttttaacttgctgcagtgactagagaatgtattgaagaatatattgttgaacaaaagcagtgagagtggacatccctgctttccccctgattttagggggaatgttttcagtctttcactatttaatatgattttagctataggtttatcctagatccctgttatcatgttgaggaaattcccttctatttctagtttgttgagattttttaattcatgtgattgcgctatctggctttgctctca t c g a g a t c t c g a g c t c g a t c t c t c g a g a t c g a t c g a g c g c g a g a t c g a g c g a g a • Human Genome contains ~ 3 G basepairs arranged in 46 chromosomes. • Two individuals are 99.9% the same. I.e. differ in ~ 3 M basepairs. • SNPs occur once every ~600 bp • Average gene in the human genome spans ~27Kb • ~50 SNPs per gene

  35. Haplotype C A G Haplotypes T T G G C T C G A C A A C A G G T T C G T C A A C A G SNP SNP SNP Two individuals

  36. Mutations Infinite Sites Assumption: Each site mutates at most once

  37. Haplotype Pattern C A G T T T G A C A T G C T G T 0 0 0 0 1 1 0 1 0 0 1 0 0 1 0 1 At each SNP site label the two alleles as 0 and 1. The choice which allele is 0 and which one is 1 is arbitrary.

  38. Recombination G T T C G A C A A C A T A C G T A T C T A T T A G T T C G A CT A T T A

  39. Recombination The two alleles are linked, I.e., they are “traveling together” G T T C G A C A A C A T A C G T A T C T A T T A Recombination disrupts the linkage ? G T T C G A CT A T T A

  40. Emergence of Variations Over Time Disease Mutation Common Ancestor present time Linkage Disequilibrium (LD) Variations in Chromosomes Within a Population

  41. Disease-Causing Mutation 2,000 gens. ago 1,000 gens. ago Extent of Linkage Disequilibrium Time = present

  42. A Data Compression Problem • Select SNPs to use in an association study • Would like to associate single nucleotide polymorphisms (SNPs) with disease. • Very large number of candidate SNPs • Chromosome wide studies, whole genome-scans • For cost effectiveness, select only a subset. • Closely spaced SNPs are highly correlated • It is less likely that there has been a recombination between two SNPs if they are close to each other.

  43. Disease Associations

  44. Control Non-responder Disease Responder Allele 0 Allele 1 Marker A: Allele 0 = Allele 1 = Marker A is associated with Phenotype Association studies

  45. Evaluate whether nucleotide polymorphisms associate with phenotype T T C T C T A G G G G A G A A A G G A C A A A A T T G T G G Association studies

  46. T T T C C T G G A G A G G A G G A A A A C A A A G T T T G G Association studies

  47. 1 1 1 0 0 1 0 0 1 0 1 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 1 1 0 0 Association studies

  48. Data Compression A------A---TG-- G------G---CG-- A------G---TC-- A------G---CC-- G------A---TG-- ACGATCGATCATGAT GGTGATTGCATCGAT ACGATCGGGCTTCCG ACGATCGGCATCCCG GGTGATTATCATGAT Selecting Tagging SNPs in blocks Haplotype Blocks based on LD (Method of Gabriel et al.2002)

  49. Real Haplotype Data A region of Chr. 2245 Caucasian samples Our block-free algorithm Two different runs of the Gabriel el al Block Detection method + Zhang et al SNP selection algorithm

More Related