160 likes | 278 Views
Chapter Six in R Book Chapter 5 in Mount. M.M. Dalkilic. Lecture VI. Outline. Algorithm vs. Heuristic Statistics FASTA & BLAST brief introduction. Algorithm. General, well-specified sequence of instructions capable of being run on a Turing-complete computing device or formalism.
E N D
Chapter Six in R BookChapter 5 in Mount M.M. Dalkilic Lecture VI
Outline • Algorithm vs. Heuristic • Statistics • FASTA & BLAST brief introduction
Algorithm • General, well-specified sequence of instructions capable of being run on a Turing-complete computing device or formalism. • You’ve already used BU Dynamic programming to find the minimal path from s to t in a structure w x s t y z
Algorithm • Formal parameters (signature) Input (list of weights) Output (path + score) • Did you make your program generic? • Error checking? • Two other solutions w x s t y z
Algorithm • Formal parameters (signature) Input (list of weights) Output (path + score) • Solution (I) Enumerate (path + scores) • (p1 + s1), (p2 + s2), … • linearly search for minimal pi
Heuristic • Formal parameters (signature) Input (list of weights, threshold t, fitness function f, error e) Output (path + score) • Solution (II) Genetic Program (Sketch—so you can work out some of the details yourselves) • Encode solution into binary form sb • Randomly change bits in sb to create a family of solutions S = {sb, sb1, sb2, … , sbk} • Form S’ = {sbi | f(sbi) > t} • Limit <= max(f(S’)) • If (Previous Limit – Current Limit < e) return sbi that is maximum • From S” by randomly swapping bits BETWEEN solutions • From S” change a few bits randomly in a few solutions • GOTO 3
Heuristic • Formal parameters (signature) Input (list of weights, threshold t, fitness function f, error e) Fitness function measures “goodness” of solution Error is the degree to which you’re willing to be different from the actual solution were it to exist (think about that)
Heuristic • Problems • Local optima (in this case convex areas) • Difficult to search entire space—(kangaroos in the mist)—so must sometimes make “leaps of faith” • Not guarantee to converge—so you need to keep track of iterations • Does not produce same output, given same input
Heuristic • Problems • Local optima (in this case convex areas) • Difficult to search entire space—(kangaroos in the mist)—so must sometimes make “leaps of faith” • Not guarantee to converge—so you need to keep track of iterations • Does not produce same output, given same input
More Statistics • Recall F test is ratio of estimation of population variance from sample means to estimate average of sample variances • A large F value indicates difference—a small indicates no difference. “Large” can be associated with P value—the uncertainty you’re willing to accept in assuming the F value is truly reflective of the population.
Handout showing (Matlab and P values) For comparing two variances
t Test • Most often used test • Most often incorrectly used test • Cannot do sum of jointwise groups without taking into all parameters that affect P value • t = ratio of difference of sample means to standard error of difference of sample means • When there are two samples F=t2
t Test • What if samples differ?
To Do’s Due Next Friday • Pick a disease of unknown etiology and begin accumulating papers on it—minimal (10) • Rewrite solutions to BU DP problem using Sol (I) and Sol (II). • Create a 2D plot in R of the solutions you generate from the above. The abscissa is a number created by prefixing the nodes on the leftmost side of the graph 1,2,3,4,5 from top to bottom to the base 10 value of the sequence of 1’s and 0’s for up and down respective. The bottom most path would be 5(1010)_2 = 510. This is paired with the value of the path 17. You would then have a point at (510,17). Plot Sol (I) in RED and Sol (II) in BLUE. Interpret the graph with respect to the search space and solution. • Problem 1 page 222 in Mount • What does TFIIIA bind to? Using BLASTA what orthologues do you find. What is its function? • You have three groups, Control, Group A, Group B. What do you conclude from the data given next about the groups: