190 likes | 360 Views
Basic String Alignment. Probability theory and statistics String alignment problem Basic string alignment algorithms. Author: Roel Wijgers email: rwijgers@cs.uu.nl. Probability Theory. Conditional chance: P(A|B) = P(A / B) / P(B) Independence of A and B: when P(A / B) = P(A)P(B).
E N D
Basic String Alignment Probability theory and statistics String alignment problem Basic string alignment algorithms Author: Roel Wijgers email: rwijgers@cs.uu.nl
Probability Theory • Conditional chance: • P(A|B) = P(A /\ B) / P(B) • Independence of A and B: • when P(A /\ B) = P(A)P(B) Author: Roel Wijgers email: rwijgers@cs.uu.nl
String Alignment • No gaps allowed: • Gaps allowed in one of the strings: • Gaps allowed in both strings: Author: Roel Wijgers email: rwijgers@cs.uu.nl
Matching models The random model, i.e. each letter a occurs independently with some frequency qa This means that the probability of two sequences x and y is defined as follows : Author: Roel Wijgers email: rwijgers@cs.uu.nl
Matching models(2) Independence between values xiand yjis not very usefull: odds ratio: Author: Roel Wijgers email: rwijgers@cs.uu.nl
Matching models(3) We rather have an additional scoring system, i.e.: This scoring system is called the log-odds ratio, and associated with it is the log-likelihood ratio: Author: Roel Wijgers email: rwijgers@cs.uu.nl
Log likelihood table Author: Roel Wijgers email: rwijgers@cs.uu.nl
Gap penalties We expect to penalise gaps. You can use different functions for this, although the linear function is most common to use: Author: Roel Wijgers email: rwijgers@cs.uu.nl
Gap penalties(2) Where f(g) is a geometric distribution: Author: Roel Wijgers email: rwijgers@cs.uu.nl
Alignment algorithms Author: Roel Wijgers email: rwijgers@cs.uu.nl
Global alignment: Needleman-Wunsch algorithm Find the optimal global alignment between 2 sequences, allowing gaps. Author: Roel Wijgers email: rwijgers@cs.uu.nl
Global alignment: Needleman-Wunsch algorithm(2) Author: Roel Wijgers email: rwijgers@cs.uu.nl
Local alignment: Smith-Waterman algorithm Find the best alignment between subsequences of x and y. Author: Roel Wijgers email: rwijgers@cs.uu.nl
Local alignment: Smith-Waterman algorithm Author: Roel Wijgers email: rwijgers@cs.uu.nl
Repeated Matches Search for multiple local matches. • One of the sequences is fixed and contains the domain or motif. • We have some threshold T to exclude short local alignments. Author: Roel Wijgers email: rwijgers@cs.uu.nl
Repeated Matches(2) Author: Roel Wijgers email: rwijgers@cs.uu.nl
Overlap matches We expect that one of the sequences contains the other, or they overlap. Author: Roel Wijgers email: rwijgers@cs.uu.nl
Overlap matches(2) Author: Roel Wijgers email: rwijgers@cs.uu.nl
Questions Author: Roel Wijgers email: rwijgers@cs.uu.nl