110 likes | 224 Views
Sequence Matching and alignment algorithms in the field of Bioinformatics. Presented by Jennifer Johnstone. Introduction. What is Bioinformatics? Sequence Matching Problem The Alignment Problem Future Research. What is Bioinformatics?.
E N D
Sequence Matching and alignment algorithms in the field of Bioinformatics Presented by Jennifer Johnstone
Introduction • What is Bioinformatics? • Sequence Matching Problem • The Alignment Problem • Future Research
What is Bioinformatics? Bioinformatics is the application of computers in Biology using algorithms, statistics and other mathematical techniques to decipher the language of DNA.
The Sequence Matching Problem Given a string s, of size n, and a pattern p, of size m, for what indices I of s does p exactly match s. Example: Let p = ABA and s = AABAAGTABA then I = {2, 8} since AABAAGTABA ABA and AABAAGTABA ABA
Algorithms • Naive String Matching Algorithm, O(m*n). • String Matching with Finite Automata , O((m*|Σ|)+n). • Boyer-Moore Algorithm, O(m+n) (in practice). • String Matching with Compact Suffix Trees, O(n log(n) + m*|Σ| +k). • String Matching using Suffix Arrays , O(n+m log(n) +k).
String Matching with Finite Automata Given a pattern p = aba and a string s = acbababa we must first define the state function δ(q,x). Now we see that the match condition is met for i = 6, 8. Then the starting indexes are j = i – 3+ 1, such that I ={ 4, 6 }.
The Alignment Problem Given two strings we want to generate an optimal alignment. The alignment of two strings may involve the insertion of gaps and\or the acceptance of mismatched entries. Example: Consider the following possible alignment of the two strings GACGGATTATG and GATCGGAATAG: GACGGATTATG GATCGGAATAG
Dynamic vs. Heuristic Dynamic Approach • Computing Optimal Alignment using a dynamic programming matrix and a scoring function. (O(m*n)) Heuristic Approach used in practice to speed up search times on large databases. Consider the Human genome which is over 3 billion characters long for which you mayneed to align only a small portion. • FASTP and FASTA Programs • BLAST Algorithm
Future Research • Development of the Heuristic approaches is constantly being improved upon and researched as the algorithms themselves are only 10 -15 years old. • Development of tools that can perform a 10-way comparison of genomes. Bioinformatics as a whole is an active field of research that strongly needs qualified professionals who have an aptitude for computing and\or biology.
References • Bockenhauer, Hans-Joachim and Bongartz, Dirk (2007) Algorithmic Aspects of Bioinformatics. Berlin: Springer pg.37-114 • Haubold, Bernhard and Wiehe, Thomas (2006) Introduction to Computational Biology: An Evolutionary Approach. Basel: Birkhauser pg.65-85. • Jones, Neil C. and Pevzner, Pavel A. (2004) An Introduction to Bioinformatics Algorithms. Cambridge: The MIT Press pg. 148-226 and 311-337. • Parida, Laxmi (2008) Pattern Discovery in Bioinformatics: Theory & Algorithms. Boca Raton: Chapman & Hall/CRC pg. 139-182 and 183-212. • Polanski, Andrzej and Kimmel, Marek (2007) Bioinformatics. Berlin: Springer pg. 155-183 and 349-354.