1 / 30

What is Bioinformatics?

What is Bioinformatics?. The Data The Analysis Comparison Evolution Long Distance: Comparative Genomics Short Distance: Variation Analysis Homology Non-homology

Download Presentation

What is Bioinformatics?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What is Bioinformatics? The Data The Analysis Comparison Evolution Long Distance: Comparative Genomics Short Distance: Variation Analysis Homology Non-homology Physical/Chemical/Statistical Mathematical Modelling

  2. The Data & its growth. 1976/79 The first viral genome –MS2/fX174 1995 The first prokaryotic genome – H. influenzae 1996 The first unicellular eukaryotic genome - Yeast 1997 The first multicellular eukaryotic genome – C.elegans 2001 The human genome 3Gb 1.5.03: Known >1000 viral genomes 96 prokaryotic genomes 16 Archeobacterial genomes A series multicellular genomes are coming. A general increase in data involving higher structures and dynamics of biological systems

  3. Genomes & Tree of Life • 3.5-3.8 Gyr Origin of Life • 3+ Gyr LUCA • ~1.4 Gyr Origin of Eukaryotes • 5-600 Myr Origin of Vertebrates • 200+ Myr Origin of Mammals • 80-100 Myr Mouse Mammalian Split • 5-7 Myr Chimp-Human Split • 100 Kyr – Myr Age of Polymorphisms From Janssen, 2003

  4. Comparison of Evolutionary Objects. ACTGT Cabbage 6 1 3 4 7 8 2 5 ACTCCT 4 Turnip Sequences RNA (Secondary) Structure Gene Order/Orientation. Protein Structure 8 6 2 3 5 1 7 Renin HIV proteinase General Theme. Formal Model of Structure Stochastic Model of Structure Evolution. Interaction Networks Any Graph. Gene Structure

  5. The Phylogeny for Evolutionary Objects MRCA-Most Recent Common Ancestor ? Time Direction Parameters:time rates, selection UnobservableEvolutionary Path ATTGCGTATATAT….CAG ATTGCGTATATAT….CAG ATTGCGTATATAT….CAG observable observable observable 3 Problems: i. Test all possible relationships. ii. Examine unknown internal states. iii. Explore unknown paths between states at nodes.

  6. Gene and Genome Evolution TGTGTATA TGCGTATC Chimp Mouse E.coli Higher Cells Fish • Basic Events • substitutions. • insertion deletions. • Chromosome Level events: inversions, duplications, transpositions,.. • Average Number of Mitoses • Per Male generation (15:35 .. 20:150) • Per Female generation: ~24 • Single nucleotide substitutions: ~10-7 • Microsatellites (~100.000): ~10-2 • Small insertion deletions: ~10-8

  7. Principles of String Comparison: Alignment ACTGT ACTGT ACTG-T ACTCCT ACT-GT ACTCCT ACTGCT ACTCGT .41 .41 ACTCCT ACTCCT Cost 2 Probability: e-16.47

  8. Maximum likelihood phylogeny and alignment Gerton Lunter Istvan Miklos Alexei Drummond Yun Song Human alpha hemoglobin;Human beta hemoglobin; Human myoglobin Bean leghemoglobin Probability of data e-1560.138 Probability of data and alignment e-1593.223 Probability of alignment given data 4.279 * 10-15 = e-33.085 Ratio of insertion-deletions to substitutions: 0.0334

  9. Rooting using irreversibility (Lunter) P( )= P( )* P( )* P( ) Reversibility: = The Pulley Principle: = Contagious Dependence CG avoidance creates irreversibility Lunter and Hein, ISMB2004

  10. Comparison of Evolutionary Objects. C C A A G C A U U Observable Unobservable Goldman, Thorne & Jones, 96 Knudsen & Hein, 99 Eddy & co. Meyer and Durbin 02 Pedersen & Hein, 03 Siepel & Haussler 03 Observable Unobservable

  11. The Rise of Comparative Genomics Lander et al(2001) Figure 25A

  12. Recursive Definition of Strings s d d d d s s s Exon 1 Exon 2 Exon 3 Gene Grammar RNA Grammar S I ssS S sS S A S I I ssdSd ssddSdd S S A A S I I E E ssddSdds S A A ATG GAG S S -> E I E ->eE eI e I ->iE iI e S -> sSSs dSd SSe

  13. Stochastic Grammars S--> (0.3)aSa (0.5)bSb (0.1)aa (0.1)bb S -> aSa -> abSba -> abaaba (.015) 0.3 0.5 0.1 If there is a 1-1 derivation (creation) of a string, the probability of a string can be obtained as the product probability of the applied rules.

  14. A starting symbol: • A set of substitution rules applied to variables - - in the present string: Grammars: Finite Set of Rules for Generating Strings Regular Context Free Context Sensitive General (also erasing) finished – no variables

  15. Structure Dependent Evolution: RNA 2 3 4 5 6 7 8 1 2 C 3 4 5 C 6 7 A G C A U U 2 3 4 5 6 7 8 1 2 3 4 5 6 7 U A C A C C G U U A C A C C G U U A C A C C G U From Bjarne Knudsen U A C A C C G U

  16. RNA Structure Application From Knudsen & Hein (1999) Knudsen & Hein, 2003

  17. Observing Evolution has 2 parts C C A A G C A U U P(x): x x P(Further history of x):

  18. Inter- and Intra-species Comparisons At shorter time scales • For sequences sampled within a population, their relationship is determined by population structure. There is no analogue for this for interspecies sequences. • Is within species variation a short time slice of long term variation? • Where do the species and population perspective meet?

  19. Short Time Evolution: Population Genetics and History Time Ancestral Recombination Graph 1 2 1 2 1 2 1 2 1 2 Population N 1 Three large areas of application: Interpretation of Variation Human Population History Gene Mapping Pathogen Evolution Cardon Donnelly Griffiths McVean Wiuf Song Schierup

  20. Time slices All positions have found a common ancestors on one sequence All positions have found a common ancestors Time 1 2 1 2 1 2 1 2 1 2 N 1 Population

  21. Applications to Human Genome (Chr 1) (Wiuf and Hein,97) 0 260 Mb 0 52.000 *35 0 7.5 Mb 6890 8360 *250 30kb 0 4Ne 20.000Segments 52.000 Ancestors 6.800 A randomly picked ancestor: (ancestral material comes in batteries!)

  22. C C C T G G G G A A T A The Origin of Variation C G T C G Time T Show variation N 1 Inter.SNP Consortium (2001): A map of human genome sequence variation containing 1.42 million SNPs. Nature 409.928-33

  23. Slice in Space Time N 1

  24. Minimal ARGs and Haplotype Blocks (Song) a: (3,4) b: (3,4) c: (15,16) d: (16,17) e: (35,36) f: (35,36) g: (36,37)

  25. Yun Song, 2004

  26. Genotype and Phenotype Covariation: Gene Mapping Time Reich et al. (2001) Rafnar et al.(2004) – Morris et al(2001) +

  27. Finding Homologies New Sequence Database / P( ) ) P( ) * P( R. Doolittle et al.(1983). New Sequence: Simian Sarcoma Virus onc Gene Similar Sequence: Platelet-Derived Growth Factor P28SIS 51 GGELESLARGSLGSLSVAEPAMIAECKTRTEVFEISAALIDATNANFLVWPPCVEVQACSGCCNNRN.. PDGF-1 1 ----------SLGSLTIAEPAMIAECKTREEVCFCIAAL?DA????????PPCVEVKACTGCCNNRN.. ***** ************ ** *** ** ****** ** ******* Properties for the known sequence are transferred to the new sequence, immediately yielding biological hypotheses about the new sequence.

  28. “Knowledge Based..”: The Products of Evolution - An Example (D.Baker) Sequence Structure Make a List: Choose global structure that doesn’t create new local structures!

  29. What is Bioinformatics? The Data The Analysis Comparison Evolution Long Distance: Comparative Genomics Short Distance: Variation Analysis Homology Non-homology Physical/Chemical/Statistical Mathematical Modelling

  30. Funding: MRC & EPSRC Jotun Hein Alexei Drummond Roald Forsberg Bjarne Knudsen Istvan Miklos Jakob Skou Pedersen Santiago Schnell Carsten Wiuf …. Gerton Lunter Rune LyngsoeIrmtraud Meyer Yun Song Jennifer Taylor Lizhong Hao Ben Holtom Stephen McCauley • Methodology • Evolutionary Models • Alignment • Expression Data • Genome and Gene Evolution • Sequence Variation Data & Recombination • RNA Secondary Structure and Evolution • ………… • Collaborations • William Cookson (WCHG) • John Hancock (Harwell MRC) • Peter Simmonds (Edinburgh) • Bioinformatics Research Centre, Dk • ……… Homepage: http://www.stats.ox.ac.uk/mathgen/bioinformatics/

More Related