320 likes | 336 Views
This article explores the concept of multiple sequence alignments, including the process of creating alignments and their significance in comparing and analyzing genomes. It also discusses computational methods, the importance of gaps in alignments, and the use of dynamic programming. The article concludes with a discussion on nucleotide and amino acid alignments, as well as the application of BLAST in sequence analysis.
E N D
Theory and Application of Multiple Sequence Alignments a.k.a What is a Multiple Sequence Alignment, How to Make One, and What to Do With It Brett Pickett, PhD
History • Structure of DNA discovered (1953) • First (phage) genome determined in 1977 • Human genome project begun in 1990 • First living organism (H.i.) sequenced in 1995 • Human “Rough draft” completed in 2000 • NHGRI (public) vs. J. Craig Venter (private) • Used “super” computer to put human genome together in right order
What is a Genome? • Genetic material required for organism to replicate • Eukaryotes (Humans): # chromosomes • Prokaryotes (Bacteria): 1 chromosome • Viruses: “what’s a chromosome?” • 10 trillion cells in human body X 2m = 3.2 Gb • 780,000 times around Earth • 67.8 roundtrips to the sun • Bacteria (580 kb- 10 Mb) • Virus (3.5 kb – 1.3 Mb) http://www.rsc.org/chemsoc/timeline/pages/2001.html
Why are Genomes so Important? • Encode all organismal functions • DNA -> RNA -> protein • Unique to each organism • Find differences (mutations) only by comparing genomes with each other www.thednastore.com/images/cells/mrdna1.jpg
How are Sequences Made? • Make lots of copies of original sequence (PCR) • Put the copies into a machine to make even more copies • Fluorescent (glow-in-the-dark) bases get incorporated randomly into new DNA molecule • Laser detects glowing bases and tells the computer the order of bases = sequence http://bjpsbiotech.edublogs.org/files/2007/12/electropherogram.jpg
What’s the Next Step? • After sequence is determined, then what? • Make sense of it by comparing with other related (homologous) sequences • Multiple Sequence Alignment
What is an Alignment? Unaligned Aligned • Lining up related (homologous) positions • Allows comparison
Comparing Sequences (Genomes) • All DNA contains a unique genetic “fingerprint” • Similarity reveals • Related function • Shared evolutionary history education.vetmed.vt.edu/.../FINGERPRINT.jpg
Aligning with Computational Methods • Computers can’t “see” patterns • Use math to find best alignment by assigning scores • Match • Mismatch • Gap • Internal • Insertion / deletion (indel) • Terminal • Missing information?
What is a Gap? • Allows bases to be lined up even if sequences are different lengths • Insertions / deletions (indels) • Impossible to tell which sequence has lost (gained) information • Terminal gaps • Sequence is either naturally shorter or artificially cutoff
Nucleotide Alignment • Custom Scores • Match • Mismatch • Gap-opening penalty • Penalized for not having letter (begin a gap) • Why? • Gap-extension penalty • Little or no penalty for lengthening a gap • Why? • Scores balance between mismatch & gap Gaps Mismatches
Dynamic Programming • Used to calculate alignment • Breaks a very complicated process into smaller steps • Helps computers to solve the problem faster Math Read http://www.myspacepimper.com/images/232763/Disney-s-Goofy-Baking-a-Cake.htm
Manual Alignment Match = 5 Mismatch = -2 Gap Opening = -4 Gap Extension = 0 Traceback: Follow the highest scores back to the beginning Up or sideways = gap, diagonal = homology (line up) A - A A T T C C
Computer-Generated Alignment • Much faster than we are • 2 GHz = 2B calculations per second • Don’t get tired, make mistakes, or get handcramps
Types of Alignment • Global • Aligns entire sequence • Permits gaps • Forced even if sequences not homologous • Local • Aligns longest region possible with minimal (no) gaps
Beware! • The computer is not always right • Alignments • Optimal: highest score • True: evolutionarily correct • Can be improved • Hard for computer to accurately place indels (gaps) • Apply prior knowledge--codons Nucleotide Sequence Amino Acid Sequence AA- ACC C ??? Thr ? Asn Lys - AAA CCC Lys Pro vs.
BLAST • Basic Local Alignment Search Tool • Most frequently used alignment tool • Local alignment of 1 sequence (query) against all known sequences (subjects) in database • Uses a “heuristic” to reduce number of sequences it actually has to align • Like using “Google” to find most homologous sequences
How Does This Impact Me? • Human Microbiome project • Sequence all bacteria in intestines • Millions of bacteria in each gram of excrement • Which ones make us sick? How different is flora between people? • Ocean Virus Metagenomics project • Try to get an idea of virus diversity across the globe • Boat goes around N.A. collecting samples • Billions of viruses in each gallon of seawater
How Does This Impact Me (cont’d)? • Used to take swabs, grow colonies on agar • Antimicrobial resistance in turkeys • Sequencing removes middle step • How to quickly assign genus and species to new sequences? • BLAST • Project: New Phage from ponds
SNP Detection • Single Nucleotide Polymorphism • Genetic changes occurring in at least one sequence • May have biological significance • Antibiotic resistance • Changes could avoid detection by immune system • Cause of genetic disease (CF)
Phylogenetic Trees • Computer generated by: • Examining alignment • Looking for shared mutations • Show relationship(s) between sequences • History of sequences • Where they came from • Genetic changes that have occurred Clade Node Leaf Branch iOSPhylogram App (Free)
Recombination RdRP Daughter Sequence Major Parent • Can occur in all types of organisms • Eukaryotes • Prokaryotes • Viruses • May change characteristic of organism • Make you sick (or not) • Not recognized by immune system • Fast way of getting lots of genetic changes Genome 1 Minor Parent Breakpoint Genome 2
Reassortment • Chromosomes (segments) from one organism replace those from another • May change characteristic of organism • Make you sick (or not) • Not recognized by immune system • Fast way of getting lots of genetic changes = +
Other Analysis Options • Align Sequences • Look for genetic changes (genotype) that are associated with traits (phenotype) • Host • How sick it makes you • Drug resistance • Inherited disease • Do any mutations consistently accompany the traits? • Genome Wide Association Studies http://lovestats.wordpress.com/dman/
How Does an Alignment Get a Score? • Amino acids • Identical >> Similar >> Dissimilar
Score Lookup Table (Matrix) Symmetrical Positive Scores on Diagonal (Matches) Some Mismatches get Negative Scores Some Mismatches don’t