1 / 26

Some frequently-used Bioinformatics Tools

Some frequently-used Bioinformatics Tools . Konstantinos Mavrommatis Prokaryotic Superprogram. Outline. Pairwise Alignment Global/Local, Scoring BLAST, BLAT, SIM, LALIGN, Dotlet, Ublast Multiple Sequence Alignment

gella
Download Presentation

Some frequently-used Bioinformatics Tools

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram

  2. Outline • Pairwise Alignment • Global/Local, Scoring • BLAST, BLAT, SIM, LALIGN, Dotlet, Ublast • Multiple Sequence Alignment • ClustalW, Kalign, MAFFT, Muscle, T-Coffee, MSA, DIALIGN, Match-Box, Multalin, MUSCA • Phylogenetic analysis and tree construction • BIONJ, DendroUPGMA, PHYLIP, PhyML, Phylogeny.fr, POWER, BlastO, TraceSuite II • HMM • Protein family profiles http://expasy.org/tools/

  3. Alignment • Insert spaces in arbitrary locations -> same length and no two spaces in the same position. • Find arrangement of two sequences to identify regions of similarity

  4. Alignment methods: Dot plots

  5. Global vs Local alignment • Global alignment: An alignment that assumes that the two sequences are basically similar over the entire length of one another • Local alignment: An alignment that searches for segments of the two sequences that match well • It may seem that one should always use local alignments! However each has its application

  6. Substitution matrices http://www.russelllab.org/aas/

  7. Scoring an alignment

  8. Global alignment S1=HGSAQVKGHG S2=KTEAEMKASEDLKKHGT

  9. KTEAEMKAESEDLKKHGT --HG--SA--Q-VKGHG-

  10. Local Alignment

  11. How BLAST works Query MLVTTILAFALFKNSYAQQCGSQAGGALCSNRLCCSKFGYCGSTDPYCGTGCQSQCGGGG Subject (database) Common 3mer VVWMLLVGGSYGVQCGTEAGGALCPRGLCCSQWGWCGSTIDYCGPGCQSQCGG extend GCQSQCGG ++ L SY QCG++AGGALC LCCS++G+CGST YCG GCQSQCGG HSP Score = 66.6 bits (161), Expect = 3e-12, Method: Compositional matrix adjust. Identities = 32/53 (60%), Positives = 39/53 (74%), Gaps = 0/53 (0%) Query 6 ILAFALFKNSYAQQCGSQAGGALCSNRLCCSKFGYCGSTDPYCGTGCQSQCGG 58 ++ L SY QCG++AGGALC LCCS++G+CGST YCG GCQSQCGG Sbjct 15 VVWMLLVGGSYGVQCGTEAGGALCPRGLCCSQWGWCGSTIDYCGPGCQSQCGG 67

  12. Types of Blast Query Database Nucleic acids sequence database blastn Nucleic sequence: atcgatatatatagactgactgact 6 frame translation 6 frame translation tblastx blastx tblastn Protein seqeunces database blastp Protein sequence: MTAVYHILRALRARARVARARVH

  13. Exact multiple alignment by dynamic programming • Compexity= O(nS2SS2) • N: length of sequences • S: number of sequences • Only feasible for 4-5 sequences max.

  14. Neighbor Joining

  15. Unrooted NJ tree

  16. Comparison of Multiple sequence alignment programs

  17. Primary sequence changes:

  18. Profiles CGGSV 0.8 * 0.4 * 0.8 * 0.6 * 0.2 = .031 ln(0.8)+ln(0.4)+ln(0.8)+ln(0.6)+ln(0.2) = -3.48

  19. Hidden Markov Models • Assumptions: • Observations are ordered • Random process can be represented by a stochastic finite state machine with emitting states Probabilistic parameters of a Hidden Markov Model x – states, y – possible observations a – state transition probabilities, b –output/emision probabilities

  20. HMM estimation, usage & applications Training/Estimation • Feed an architecture (given in advance) a set of observation sequences • The training process will iteratively alter its parameters to fit the training set • The trained model will assign the training sequences high probabilities Usage • Evaluate the probability of an observation sequence given the model (Forward) • Find the most likely path through the model for a given observation sequence (Viterbi) Applications • Gene finding • Protein family modeling • …

  21. Profile HMMs • Families of functional biological sequences • Primary sequences have diverged due to evolution, while maintaining structure/function. • Questions: • Does a biological sequence belong to a certain protein family? For example is a given protein (sequence) a globin? • Given a set of sequences, find more sequences of the same family

  22. Trade offs

  23. Questions?

More Related