1 / 18

Computational Genomics

Computational Genomics. Lecture 1, Tuesday April 1, 2003. Biology in One Slide: 2 Paradigms. Molecular Paradigm. Evolution Paradigm. High Throughput Biology. Gene Expression. Biology is becoming an information science. DNA Sequencing. …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT

tawny
Download Presentation

Computational Genomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Genomics Lecture 1, Tuesday April 1, 2003

  2. Biology in One Slide: 2 Paradigms Molecular Paradigm Evolution Paradigm

  3. High Throughput Biology Gene Expression Biology is becoming an information science DNA Sequencing …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT ACTGATGACTAGATTACAG ACTGATTTAGATACCTGAC TGATTTTAAAAAAATATT…

  4. Goals of this course • Introduction to Computational Biology • Basic biology for computer scientists • Breadth: mention many topics & applications • In-depth coverage of Computational Genomics • Algorithms for sequence analysis • Current applications, trends, and open problems • Coverage of useful algorithms • Hidden Markov models • Dynamic Programming • String algorithms • Applications of AI techniques

  5. Topics in CS262 Part 1: In-depth coverage of basic computational methods for analysis of biological sequences • Sequence Alignment & Dynamic Programming • Hidden Markov models These methods are used heavily in most genomics applications: • DNA sequencing • Comparison of DNA and proteins across organisms • Discovery of genes, promoters, regulatory sites

  6. Topics in CS262 Part 2: Topics in computational genomics, more algorithms, and areas of active research • DNA sequencing & assembly: reading a complete genome such as the human DNA • Gene finding: marking genes on the DNA sequence • Large-scale comparative genomics: comparing whole genomes from multiple organisms • Microarrays & regulation: understanding the regulatory code, and potential disease-causing genes • RNA structure: predicting the folding of RNA • Phylogeny and evolution: quantifying the evolution of biological sequences

  7. Course responsibilities • Homeworks [72%] • 4 challenging problem sets, 4-5 problems/pset • Collaboration allowed – please give credit • Hws due Thursday, solutions explained Friday • Two worst problems in all hws do not count • Final [18%] • Takehome, 1 day • Collaboration not allowed • Basic questions – much easier than homeworks • Scribing [10%] • Due one week after the lecture, except special permission

  8. Reading material • Books • “Biological sequence analysis” by Durbin, Eddy, Krogh, Mitchinson • Chapters 1-4, 6, (7-8), (9-10) • “Algorithms on strings, trees, and sequences”by Gusfield • Chapters (5-7), 11-12, (13), 14, (17) • Papers • Lecture notes

  9. Topic 1. Sequence Alignment

  10. Complete genomes

  11. Evolution

  12. Evolution at the DNA level C …ACGGTGCAGTCACCA… …ACGTTGCAGTCCACCA… SEQUENCE EDITS REARRANGEMENTS

  13. Evolutionary Rates next generation OK OK OK Changes in non-functional sites are OK, so will be propagated X X Still OK? Most changes in functional sites are deleterious and will be rejected

  14. Sequence conservation implies function 100% Interleukin region in human and mouse 40%

  15. Sequence Alignment AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Definition Given two strings x = x1x2...xM, y = y1y2…yN, an alignment is an assignment of gaps to positions 0,…, M in x, and 0,…, N in y, so as to line up each letter in one sequence with either a letter, or a gap in the other sequence

  16. What is a good alignment? Alignment: The “best” way to match the letters of one sequence with those of the other How do we define “best”? Alignment: A hypothesis that the two sequences come from a common ancestor through sequence edits Parsimonious explanation: Find the minimum number of edits that transform one sequence into the other

  17. Scoring Function • Sequence edits: AGGCCTC • Mutations AGGACTC • Insertions AGGGCCTC • Deletions AGG.CTC Scoring Function: Match: +m Mismatch: -s Gap: -d Score F = (# matches)  m - (# mismatches)  s – (#gaps)  d

More Related