170 likes | 188 Views
Learn about the fundamentals of sequence alignments, such as nucleotide and amino acid sequences, as well as pairwise and multiple alignments. Discover how alignments can help determine gene function, evolutionary relationships, and protein structure. Explore visualization techniques, genetic mutations, and scoring alignments. Understand the differences between global and local alignments and where to find sequences to work with.
E N D
Sequence Alignments Chi-Cheng Lin, Ph.D.Associate ProfessorDepartment of Computer ScienceWinona State University – Rochester Centerclin@winona.edu
Sequence Alignments • Cornerstone of bioinformatics • What is a sequence? • Nucleotide sequence • Amino acid sequence • Pairwise and multiple sequence alignments • What alignments can help • Determine function of a newly discovered gene sequence • Determine evolutionary relationships among genes, proteins, and species • Predict structure and function of protein
Why Align Sequences? • The draft human genome is available • Automated gene finding is possible • Gene: AGTACGTATCGTATAGCGTAA • What does it do? • One approach: Is there a similar gene in another species? • Align sequences with known genes • Find the gene with the “best” match
Visualization of Sequence Alignment • Dot Plot • One of the simplest and oldest methods for sequence alignment • Visualization of regions of similarity • Assign one sequence on the horizontal axis • Assign the other on the vertical axis • Place dots on the space of matches • Diagonal lines means adjacent regions of identity
A Simple Example • Construct a simple dot plot for TAGTCGATGTGGTCATC • The alignment is TAGTCGATGTGGTC-ATC
Genes Accumulate Mutations over Time • Mistakes in gene replication or repair • Deletions, duplications • Insertions, inversions • Translocations • Point mutations • Environmental factors • Radiation • Oxidation
Deletions • Codon deletion:ACG ATA GCG TAT GTA TAG CCG… • Effect depends on the protein, position, etc. • Almost always deleterious • Sometimes lethal • Frame shift mutation:ACG ATA GCG TAT GTA TAG CCG…ACG ATA GCG ATG TAT AGC CG?… • Almost always lethal
Indels • Comparing two genes it is generally impossible to tell if an indel is an insertion in one gene, or a deletion in another, unless ancestry is known:ACGTCTGATACGCCGTATCGTCTATCTACGTCTGAT---CCGTATCGTCTATCT
The Genetic Code Substitutions are mutations accepted by natural selection. Synonymous: CGC CGA Non-synonymous: GAU GAA
Wild-type hemoglobin DNA 3’----CTT----5’ mRNA 5’----GAA----3’ Normal hemoglobin ------[Glu]------ Mutant hemoglobin DNA 3’----CAT----5’ mRNA 5’----GUA----3’ Mutant hemoglobin ------[Val]------ Point Mutation Example: Sickle-cell Disease
image credit: U.S. Department of Energy Human Genome Program, http://www.ornl.gov/hgmis.
Comparing Two Sequences • Point mutations, easy:ACGTCTGATACGCCGTATAGTCTATCTACGTCTGATTCGCCCTATCGTCTATCT • Indels are difficult, must align sequences:ACGTCTGATACGCCGTATAGTCTATCTCTGATTCGCATCGTCTATCTACGTCTGATACGCCGTATAGTCTATCT----CTGATTCGC---ATCGTCTATCT
Scoring a Sequence Alignment • Example • Match score: +1 • Mismatch score: +0 • Gap penalty: –1 ACGTCTGATACGCCGTATAGTCTATCT ||||| ||| || ||||||||----CTGATTCGC---ATCGTCTATCT • Matches: 18 × (+1) • Mismatches: 2 × 0 • Gaps: 7 × (– 1) • Various scoring scheme exist. Score = 18 + 0 + (-7) = +11
How can we find an optimal alignment? • Finding the alignment is computationally hard:ACGTCTGATACGCCGTATAGTCTATCTCTGAT---TCG-CATCGTC--T-ATCT • There are ~888,000 possibilities to align the two sequences given above. • Algorithms using a technique called “dynamic programming” are used – out of the scope of this workshop.
Global and Local Alignments • Global alignments – score the entire alignment • Local alignment – find the best matching subsequence • Why local sequence alignment? • Global alignment is useful only if the sequences to be aligned are very similar • Subsequence comparison between a DNA sequence and a genome • Identify • Conserved regions • Protein function domains
Example • Compare the two sequences: TTGACACCCTCCCAATT ACCCCAGGCTTTACACAG • Global alignment (does it look good?) TTGACACCCTCC-CAATT || || || ACCCCAGGCTTTACACAG • Local alignment (does it look good?) ---------TTGACACCCTCCCAATT || |||| ACCCCAGGCTTTACACAG--------
Where do we get sequences to work with? • Biological databases • NCBI Entrez (http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi?term=) • Wet labs • Simulations • Other people’s results • On-line education resources • BEDROCK (http://www.bioquest.org/bedrock/) • BLAST results