1 / 39

Introduction to bioinformatics (I617)

Introduction to bioinformatics (I617). Haixu Tang School of Informatics Email: hatang@indiana.edu Office: EIG 1008 Tel: 812-856-1859. Textbook. A Primer of Genome Science (2nd Edition) by Greg Gibson, Spencer V. Muse, Sinauer Associates, 2004

Download Presentation

Introduction to bioinformatics (I617)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to bioinformatics(I617) Haixu Tang School of Informatics Email: hatang@indiana.edu Office: EIG 1008 Tel: 812-856-1859

  2. Textbook • A Primer of Genome Science (2nd Edition) by Greg Gibson, Spencer V. Muse, Sinauer Associates, 2004 • Suggested reading materials will be posted on the class wiki page: http://cheminfo.informatics.indiana.edu/djwild/I617_2006_wiki/index.php/Main_Page • Office Hour: MW 11:00-12:00, EIG 1008 or appointment

  3. Grading • Class project: selected from one of four covered areas (bioinformatics, Chemical informatics, Laboratory informatics and Health informatics) 25% • Suggested Bioinformatics topics will be posted on the class wiki page • Homework: 25% in Bioinformatics • 4, each 6.25%

  4. Bioinformatics = BIOlogy + informatics? • Not really: it is a term (somehow arbitrarily chosen) to define a multi-disciplinary area that combines life sciences, physical sciences and computer science / informatics; • It addresses biological problems using theoretical informatics approaches, not vice versa; • It is transforming classical Biology into a Information Science.

  5. The birth of bioinformatics • A revolution in biology research: the emergence of Genome Science • Technology advancement in both biology and information science

  6. Classical Biology Genome Science Data Hypothesis Knowledge Knowledge Genome science: a revolution of biology Hypothesis Data Hypothesis driven approach Data driven approach

  7. Classical Biology Data Hypothesis 1 2 3 … Bioinformatics: from data analysis to data mining • Genome Science Hypothesis Data High throughput data Low throughput data Hypothesis generation Hypothesis confirmation / rejection

  8. Classical Biology Data Hypothesis Knowledge Knowledge Bioinformatics: in the driver’s seat • Genome Science Hypothesis Data Data mining Data analysis

  9. Key technology advancements • High throughput biotechnologies • Genome sequencing techniques • DNA microarray • Mass spectrometry • Large-scale experiments • HGP, HapMap • Omics / Systems Biology • Massive data generation, storage, exchange and analysis • CPU, storage, etc. • High speed network (Internet) • Bioinformatics

  10. For biologists Fragment assembly in genome sequencing Genome comparison Gene clustering in DNA microarray analysis Protein identification in proteomics For computer scientists String algorithms / Tree algorithms Alternative Eulerian path (BEST theorem) Reversal distances Probabilistic graphic models (HMMs, BNs, etc.) Bioinformatics: mutually beneficial

  11. Two origins of bioinformatics • Combinatorial pattern matching in theoretical computer science • DNA and protein sequence analysis • Physical and analytical chemistry of Biomolecules • Protein structure analysis  Structural bioinformatics • Bio-analytical chemistry  Proteomics

  12. Bioinformatics addresses computational challenges in life and medical sciences • New computational problems for automatic data analysis • Reformulation of old problems using new high throughput data • Formulating new problems using high throughput data

  13. Bioinformatics addresses computational challenges in life and medical sciences • New computational problems for automatic data analysis • Genome sequencing • Proteomics • Transcriptomics • Data representation and visualization • Genome Browser • Solving biological problems by in silico approaches • Reformulation of old problems using new high throughput data • Gene finding • Protein structure and function • Formulating new problems using high throughput data • Comparative genomics • Polymorphisms / Population genetics • Systems Biology

  14. Bioinformatics resources • Databases • Nucleic Acid Research (NAR) annual database issue • Organization • ISCB (International Society in Computational Biology) • Conferences • ISMB • RECOMB • Many other smaller or regional conferences, e.g. ECCB, CSB, PSB, etc, including local Indiana Bioinformatics conference

  15. A case study • How bioinformatics help and transform classical biological topics? • Molecular evolutionary studies: from anatomical features to molecular evidences • Genome evolution: comparison of gene orders

  16. Early Evolutionary Studies • Anatomical features were the dominant criteria used to derive evolutionary relationships between species since Darwin till early 1960s

  17. Early Evolutionary Studies • Anatomical features were the dominant criteria used to derive evolutionary relationships between species since Darwin till early 1960s • The evolutionary relationships derived from these relatively subjective observations were often inconclusive. Some of them were later proved incorrect

  18. Evolution and DNA Analysis: the Giant Panda Riddle • For roughly 100 years scientists were unable to figure out which family the giant panda belongs to • Giant pandas look like bears but have features that are unusual for bears and typical for raccoons, e.g., they do not hibernate

  19. Evolution and DNA Analysis: the Giant Panda Riddle • In 1985, Steven O’Brien and colleagues solved the giant panda classification problem using DNA sequences and bioinformatics algorithms

  20. Evolutionary Tree of Bears and Raccoons

  21. Evolutionary Trees: DNA-based Approach • 40 years ago: Emile Zuckerkandl and Linus Pauling brought reconstructing evolutionary relationships with DNA into the spotlight • In the first few years after Zuckerkandl and Pauling proposed using DNA for evolutionary studies, the possibility of reconstructing evolutionary trees by DNA analysis was hotly debated • Now it is a dominant approach to study evolution.

  22. Evolutionary Trees How are these trees built from DNA sequences?

  23. Evolutionary Trees How are these trees built from DNA sequences? • leaves represent existing species • internal vertices represent ancestors • root represents the common evolutionary ancestor

  24. Rooted and Unrooted Trees • In the unrooted tree the position of the root (“common ancestor”) is unknown. Otherwise, they are like rooted trees

  25. Distances in Trees • Edges may have weights reflecting: • Number of mutations on evolutionary path from one species to another • Time estimate for evolution of one species into another • In a tree T, we often compute dij(T) - the length of a path between leaves i and j dij(T) – treedistance between i and j

  26. j i Distance in Trees: an Exampe d1,4 = 12 + 13 + 14 + 17 + 12 = 68

  27. Distance Matrix • Given n species, we can compute the n x n distance matrixDij • Dij may be defined as the edit distance between a gene in species i and species j, where the gene of interest is sequenced for all n species. Dij – editdistance between i and j

  28. Fitting Distance Matrix • Given n species, we can compute the n x n distance matrixDij • Evolution of these genes is described by a tree that we don’t know. • We need an algorithm to construct a tree that best fits the distance matrix Dij

  29. Tree reconstruction for any 3x3 matrix is straightforward We have 3 leaves i, j, k and a center vertex c Reconstructing a 3 Leaved Tree Observe: dic + djc = Dij dic + dkc = Dik djc + dkc = Djk

  30. Turnip vs Cabbage: Look and Taste Different • Although cabbages and turnips share a recent common ancestor, they look and taste different

  31. Turnip vs Cabbage: Comparing Gene Sequences Yields No Evolutionary Information

  32. Turnip vs Cabbage: Almost Identical mtDNA gene sequences • In 1980s Jeffrey Palmer studied evolution of plant organelles by comparing mitochondrial genomes of the cabbage and turnip • 99% similarity between genes • These surprisingly identical gene sequences differed in gene order • This study helped pave the way to analyzing genome rearrangements in molecular evolution

  33. Turnip vs Cabbage: Different mtDNA Gene Order • Gene order comparison: Before After Evolution is manifested as the divergence in gene order

  34. Turnip vs Cabbage: Different mtDNA Gene Order • Gene order comparison:

  35. Turnip vs Cabbage: Different mtDNA Gene Order • Gene order comparison:

  36. Turnip vs Cabbage: Different mtDNA Gene Order • Gene order comparison:

  37. Turnip vs Cabbage: Different mtDNA Gene Order • Gene order comparison:

  38. Transforming Cabbage into Turnip Reversal distance

  39. History of Chromosome X Rat Consortium, Nature, 2004

More Related