300 likes | 445 Views
Doug Raiford Lesson 4. Evolution. Darwinian Evolution. Parental combinations and mutational changes. Heritable traits Variation in population (parental combinations and mutations) Visible, phenotypical differences lead to different survival rates. Natural Selection. Meiosis.
E N D
Doug Raiford Lesson 4 Evolution
Darwinian Evolution Parental combinations and mutational changes • Heritable traits • Variation in population (parental combinations and mutations) • Visible, phenotypical differences lead to different survival rates Natural Selection Evolution
Meiosis • Parental combinations Each of us has two copies of each chromosome (diploid) • One allele from each parent • Allele: one of a series of different forms of a gene • Each chromatid has a copy of each gene Sperm and egg only one copy Evolution
DNA and evolution • Over time, genes accumulate mutations • Environmental factors • Radiation • Oxidation • Mistakes in replication or repair • Evolution: change in allele frequency over time Evolution
Classification • In the past phenotypical differences were used to classify • Now that have sequenced genomes can look at similarity of DNA • Chimps and humans ~99% identical • Diverged about 6 million years ago Evolution
Homologs, Paralogs, and Orthologs • To compare, species must look at similar genes • Homologous genes • Orthologous genes • Separated by speciation • Paralogous genes • Similar due to gene duplication event Evolution
Why do homologs drift apart?Types of mutations • Point mutations • Insertions, deletions • Duplications, inversions, translocations • Remember paralogs? • Causes • Radiation (cosmic, UV, X-ray) • Replication (mitosis) or crossover (meiosis) Evolution
If we want to compare two homologous genes… • Point mutations (substitutions), easy:ACGTCTGATACGCCGTATAGTCTATCTACGTCTGATTCGCCCTATCGTCTATCT Evolution
Insertions and deletions • Indels are difficult, must alignsequences:ACGTCTGATACGCCGTATAGTCTATCTCTGATTCGCATCGTCTATCTACGTCTGATACGCCGTATAGTCTATCT----CTGATTCGC---ATCGTCTATCT Evolution
Deletions • Codon deletion:ACG ATA GCG TAT GTA TAG CCG… • Effect depends on the protein, position, etc. • Almost always deleterious • Sometimes lethal • Frame shift mutation (muscular dystrophy and sickle-cell): ACG ATA GCG TAT GTA TAG CCG… ACG ATA GCG ATG TAT AGC CG?… • Almost always lethal Evolution
Insertion or deletion? • Comparing two genes it is generally impossible to tell if an indel is an insertion in one gene, or a deletion in another, unless ancestry is known:ACGTCTGATACGCCGTATCGTCTATCTACGTCTGAT---CCGTATCGTCTATCT Evolution
Synonymous Serine UCU to UCT No change in protein Non-synonymous Serine to tyrosine UCU to UAU Does it change the protein? Evolution
Gene duplication event One can continue to perform function Other can accumulate mutations – experiment Paralogs Nature experiments Evolution
Why align sequences? • Already said… • Can then measure differences between genes to determine evolutionary distance • see where indels and substitutions are • Why else? • What if wanted to do a database search? • Databases great at perfect matches • But to find homologous genes need “fuzzy” matches Database of sequences Sequence query Evolution
But, how to align? • Exhaustively, could try all possible alignments From---ACGTACT----ToACGT-------ACT And everything in between Evolution
Exhaustive placement of spaces • Could setup a loop and place gaps in all possible locations Or, could solverecursively ---ACGT ACT---- --A-CGT ACT---- --AC-GT ACT---- . . . ACGT--- ----ACT • Tricky • Have to avoid all gap-gap situations • Must find a way to look at ALL possibles Evolution
Recursion • A function that calls itself • Can often be an elegant solution to difficult problems • Elegant: non-obvious solution that is much more simple in design than the problem would suggest Example: factorial Definition of f(n): return n*f(n-1) Evolution
A more practical example • Factorial recurrence relation • factorial(n) = n * factorial(n-1) • Define f(n): return n * f(n-1) • Example f(3) • 3 * f(2) • 2 * f(1) • 1 • 3 * 2 * 1 • How did the program know to stop at f(1)? Evolution
Base case • To know when to stop, must have a base case • Base case for factorial is when n equals 1 #Example my $answer = factorial(10); print "$answer\n"; sub factorial{ my $passedArg = shift; #check base case if($passedArg == 1){ return 1; }else{ #if base case not satisfied, recurse return $passedArg * factorial($passedArg - 1); } } Evolution
What does this have to do with aligning? • Recursive solutions can usually be found by breaking a problem into sub problems • Insert no gap, recurse on rest • Insert a gap in string 1, recurse on rest • Insert a gap in string 2, recurse on rest Three sub-problems Evolution
Example t g c g _ tg c g t g _ cg • atg • acg a tg a cg _ atg a cg a tg _ acg Evolution
Scoring • When scoring alignments there must be a gap penalty, a mismatch penalty, and a bonus for a match • For any two strings the best alignment score with be the maximum of three possibilities • Recurrence relations Match or mismatch of first chars + allign(rest of string1, rest of string2) Gap penalty + allign(string1, rest of string2) Gap penalty + allign(string1 starting at pos 2, string2) max Evolution
What is the base case? • If down to empty string for either • Return gap penalty * the length of the non-empty string (return 0 if both empty) Base Case Evolution
Pseudo code • Definition of allign(string1, string2) If base case satisfied return base score Otherwise Return the max of Gap penalty + allign(string1, rest of string2) Match or mismatch of first chars + allign(rest of string1, rest of string2) Gap penalty + allign(string1 starting at pos 2, string2) Evolution
Example • These two strings • atagcgcc • ataggcc • Align like… • atagcgcc • atag_gcc Have now taken a problem in biology and mapped it to a common problem-solving technique in computer science: Recursion Evolution
Life is good, but… • The previous example (an 8 character string aligned with a 7 character string) took 103,342 invocations of allign • Why? Number of sub-problems 3n This is exponential And so on Evolution
Is exponential bad? • Aligning two strings of size 500 • More invocations of align than there are subatomic particles in the universe • If took one nanosecond per invocation • Universe is 14 Billion years old • It would take 8.2 * 10208 times the age of the universe to calculate the alignment score Exponential = bad Corollary Tree = exponential Evolution
Complexity analysis • Fixed: best • Linear: next best • Polynomial (n2): not bad • Exponential (3n): very bad • Big O notation • O(1), O(n), O(n3), O(3n) Big ONotation Evolution
Next lesson • Speeding things up • Dynamic programming solution Dynamic Programming Evolution