1 / 32

Theory and Application of Multiple Sequence Alignments

Theory and Application of Multiple Sequence Alignments. a.k.a What is a Multiple Sequence Alignment, How to Make One, and What to Do With It. Brett Pickett, PhD. History. Structure of DNA discovered (1953) First (phage) genome determined in 1977 Human genome project begun in 1990

chelsey
Download Presentation

Theory and Application of Multiple Sequence Alignments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Theory and Application of Multiple Sequence Alignments a.k.a What is a Multiple Sequence Alignment, How to Make One, and What to Do With It Brett Pickett, PhD

  2. History • Structure of DNA discovered (1953) • First (phage) genome determined in 1977 • Human genome project begun in 1990 • First living organism (H.i.) sequenced in 1995 • Human “Rough draft” completed in 2000 • NHGRI (public) vs. J. Craig Venter (private) • Used “super” computer to put human genome together in right order

  3. What is a Genome? • Genetic material required for organism to replicate • Eukaryotes (Humans): # chromosomes • Prokaryotes (Bacteria): 1 chromosome • Viruses: “what’s a chromosome?” • 10 trillion cells in human body X 2m = 3.2 Gb • 780,000 times around Earth • 67.8 roundtrips to the sun • Bacteria (580 kb- 10 Mb) • Virus (3.5 kb – 1.3 Mb) http://www.rsc.org/chemsoc/timeline/pages/2001.html

  4. Why are Genomes so Important? • Encode all organismal functions • DNA -> RNA -> protein • Unique to each organism • Find differences (mutations) only by comparing genomes with each other www.thednastore.com/images/cells/mrdna1.jpg

  5. How are Sequences Made? • Make lots of copies of original sequence (PCR) • Put the copies into a machine to make even more copies • Fluorescent (glow-in-the-dark) bases get incorporated randomly into new DNA molecule • Laser detects glowing bases and tells the computer the order of bases = sequence http://bjpsbiotech.edublogs.org/files/2007/12/electropherogram.jpg

  6. What’s the Next Step? • After sequence is determined, then what? • Make sense of it by comparing with other related (homologous) sequences • Multiple Sequence Alignment

  7. What is an Alignment? Unaligned Aligned • Lining up related (homologous) positions • Allows comparison

  8. Comparing Sequences (Genomes) • All DNA contains a unique genetic “fingerprint” • Similarity reveals • Related function • Shared evolutionary history education.vetmed.vt.edu/.../FINGERPRINT.jpg

  9. Aligning with Computational Methods • Computers can’t “see” patterns • Use math to find best alignment by assigning scores • Match • Mismatch • Gap • Internal • Insertion / deletion (indel) • Terminal • Missing information?

  10. What is a Gap? • Allows bases to be lined up even if sequences are different lengths • Insertions / deletions (indels) • Impossible to tell which sequence has lost (gained) information • Terminal gaps • Sequence is either naturally shorter or artificially cutoff

  11. Nucleotide Alignment • Custom Scores • Match • Mismatch • Gap-opening penalty • Penalized for not having letter (begin a gap) • Why? • Gap-extension penalty • Little or no penalty for lengthening a gap • Why? • Scores balance between mismatch & gap Gaps Mismatches

  12. Dynamic Programming • Used to calculate alignment • Breaks a very complicated process into smaller steps • Helps computers to solve the problem faster Math Read http://www.myspacepimper.com/images/232763/Disney-s-Goofy-Baking-a-Cake.htm

  13. Manual Alignment Match = 5 Mismatch = -2 Gap Opening = -4 Gap Extension = 0 Traceback: Follow the highest scores back to the beginning Up or sideways = gap, diagonal = homology (line up) A - A A T T C C

  14. Computer-Generated Alignment • Much faster than we are • 2 GHz = 2B calculations per second • Don’t get tired, make mistakes, or get handcramps

  15. Alignment Process

  16. Types of Alignment • Global • Aligns entire sequence • Permits gaps • Forced even if sequences not homologous • Local • Aligns longest region possible with minimal (no) gaps

  17. Beware! • The computer is not always right • Alignments • Optimal: highest score • True: evolutionarily correct • Can be improved • Hard for computer to accurately place indels (gaps) • Apply prior knowledge--codons Nucleotide Sequence Amino Acid Sequence AA- ACC C ??? Thr ? Asn Lys - AAA CCC Lys Pro vs.

  18. BLAST • Basic Local Alignment Search Tool • Most frequently used alignment tool • Local alignment of 1 sequence (query) against all known sequences (subjects) in database • Uses a “heuristic” to reduce number of sequences it actually has to align • Like using “Google” to find most homologous sequences

  19. BLAST Input

  20. BLAST Output

  21. How Does This Impact Me? • Human Microbiome project • Sequence all bacteria in intestines • Millions of bacteria in each gram of excrement • Which ones make us sick? How different is flora between people? • Ocean Virus Metagenomics project • Try to get an idea of virus diversity across the globe • Boat goes around N.A. collecting samples • Billions of viruses in each gallon of seawater

  22. How Does This Impact Me (cont’d)? • Used to take swabs, grow colonies on agar • Antimicrobial resistance in turkeys • Sequencing removes middle step • How to quickly assign genus and species to new sequences? • BLAST • Project: New Phage from ponds

  23. Other Uses for Alignments

  24. SNP Detection • Single Nucleotide Polymorphism • Genetic changes occurring in at least one sequence • May have biological significance • Antibiotic resistance • Changes could avoid detection by immune system • Cause of genetic disease (CF)

  25. Phylogenetic Trees • Computer generated by: • Examining alignment • Looking for shared mutations • Show relationship(s) between sequences • History of sequences • Where they came from • Genetic changes that have occurred Clade Node Leaf Branch iOSPhylogram App (Free)

  26. Recombination RdRP Daughter Sequence Major Parent • Can occur in all types of organisms • Eukaryotes • Prokaryotes • Viruses • May change characteristic of organism • Make you sick (or not) • Not recognized by immune system • Fast way of getting lots of genetic changes Genome 1 Minor Parent Breakpoint Genome 2

  27. Reassortment • Chromosomes (segments) from one organism replace those from another • May change characteristic of organism • Make you sick (or not) • Not recognized by immune system • Fast way of getting lots of genetic changes = +

  28. Other Analysis Options • Align Sequences • Look for genetic changes (genotype) that are associated with traits (phenotype) • Host • How sick it makes you • Drug resistance • Inherited disease • Do any mutations consistently accompany the traits? • Genome Wide Association Studies http://lovestats.wordpress.com/dman/

  29. How Does an Alignment Get a Score? • Amino acids • Identical >> Similar >> Dissimilar

  30. Score Lookup Table (Matrix) Symmetrical Positive Scores on Diagonal (Matches) Some Mismatches get Negative Scores Some Mismatches don’t

More Related