170 likes | 319 Views
Incorporating Bioinformatics in an Algorithms Course. Lawrence D’Antonio Ramapo College of New Jersey. What is Bioinformatics?. Algorithms to analyze DNA, RNA, or protein sequences Database searches to find homologous sequences Construction of evolutionary trees Structure prediction
E N D
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey
What is Bioinformatics? • Algorithms to analyze DNA, RNA, or protein sequences • Database searches to find homologous sequences • Construction of evolutionary trees • Structure prediction • Human Genome Project
Why use Bioinformatics in an Algorithms Course? • Real-life applications of algorithms • Variety of string processing algorithms • Use of similarity instead of exact matching • Dynamic programming examples • Theory vs. Practice Issues
Models for Incorporating Bioinformatics • Infusion – include material from bioinformatics in computer science courses • Paired Courses – have joint lectures and projects from, e.g., Algorithms and Genetics courses • Tracked Courses – have a separate Algorithms for Bioinformatics course
Biology Basics • Primary DNA structure – Oriented character string • Double strand constructed through base pairing • Central Dogma – Information passes in one direction, from DNA to RNA to protein • Amino acids formed from triples of bases, called codons
Complexity of DNA Problems • 3 billion base pairs in human genome • Many NP complete problems • 10600 possible alignments for two 1000 character sequences
Sequence Alignment • Determine the alignment of two sequences that maximizes similarity (global alignment) • Determine substrings of two sequences with maximum similarity (local alignment) • Determine the alignment for several sequences that maximizes the sum of pairs similarity (multiple alignment)
Edit Operations Substitution Insertion Deletion AATAAGC AAT-AAGC AATAAGC ATTAAGC AATTAAGC AA-AAGC
Dynamic Programming Alignment Algorithm (Needleman-Wunsch) • Match ai+1 with bj+1 • Match ai+1 with a space — • Match bj+1with a space — If a1,a2,…,ai and b1,b2,…,bj have been aligned, there are three possible next moves: Choose the move that maximizes the similarity of the two sequences
Alignment Scoring System • +1 for a character match • -1 for a mismatch (substitution) • -2 for using a space (indel) or • a + b·k for a gap of k spaces (affine gap penalty)
Other Bioinformatics Algorithms • Palindromes • Tandem Repeats • Longest Common Subsequence • Double Digest (NP complete) • Shortest Common Superstring (NP complete)
References • Clote and Backofen, Computational Molecular Biology, Wiley • Gusfield, Algorithms on Strings, Trees, and Sequences, Cambridge University Press • Mount, Bioinformatics, Cold Spring Harbor Press • Setubal and Meidanis, Introduction to Computational Molecular Biology, PWS • Waterman, Introduction to Computational Biology, CRC Press