1 / 1

Problem: Max Blast Sequence

BLAST: Basic Local Alignment Search Tool. Query is compiled to form a list of length w substrings called w-mers. Search database for “hits”. A list of matches shows up. Then search for an exact match between any substring on the w-mers list and the database sequence.

thuy
Download Presentation

Problem: Max Blast Sequence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BLAST: Basic Local Alignment Search Tool Query is compiled to form a list of length w substrings called w-mers. Search database for “hits”. A list of matches shows up. Then search for an exact match between any substring on the w-mers list and the database sequence. If calculated score is less than Sg, then discard this database sequence from list of “hits.” If calculated score is equal to or greater than Sg, keep this sequence in the list of “hits.” Each substring is extended locally in both directions until the score of the substring no longer improves. Find alignment (either gapped or ungapped) with the max score. For every matched pair of nucleotides, add 5 to the total score. For every mismatched pair of nucleotides, called a “gap”, subtract 4 from the total score. Is the max alignment score statistically significant? • If yes, database sequence will be displayed as output. If no, discard database sequence. Select a moderate score Sg to which the calculated score is compared. Sg indicates if there are too many gaps to make the sequence a likely match to the substring and query sequences. Meng Cao, Arthur Lee, Peiying Li, Matt Prorok Instructor: Owen Astrachan, CompSci 4G 2007 Shortcomings Introduction Flowchart of Steps APT BLAST, which stands for basic local alignment search tool, is a heuristic algorithm that is used to find similar sequences of amino acids or nucleotides. BLAST approximates the dynamic programming algorithm more directly than its predecessors. Dynamic programming is an optimization process for solving a problem. In this approach, the user finds the best decision for a subproblem and bases that decision from the best decision from the previous subproblem. Unfortunately, this method would have exceptional computational requirements, thus the use of the heuristic algorithms.       In order to find similar sequences, BLAST first finds the highest scoring pair of sequences , or the maximal segment pair (MSP), and in one version of BLAST, matching DNA nucleotides are given a score of +5, while mismatches are assigned -4 (as outlined in the flowchart). In order to more directly approximate dynamic programming, BLAST chooses the boundaries of the MSP to maximize its score by either extending or shortening the two segments that are compared. Because molecular biologists are more likely to be interested in all the conserved regions, not just the most conserved region, BLAST returns all MSP’s that score above a cutoff. While BLAST may be faster than the dynamic programming algorithm, it is a heuristic algorithm, and because it sacrifices accuracy for speed, BLAST can sometimes make mistakes. Left: Table of some of the v versions of BLAST (see Reference 4). Below: Screenshot of BLAST homepage (see Reference 3). Problem: Max Blast Sequence Problem Statement: Given a string of DNA and an array database of several strings, return the string with the highest BLAST score to the original DNA string. The scoring system of BLAST assigns +5 to perfect nucleotide matches and -4 to all other scenarios. In this APT, all compared strings will be of the same length, and there will be no tied scores. Definition: Class: BLAST Method: maxSequence Parameters: String dna, String [] strands Returns: String Method Signature: public String maxSequence(String[] strands, String dna) Class: public class BLAST{ public String maxSequence (String[] strands, String dna){ //fill in code here } } Constraints: - Every string in a test is the same length. - Only the letters ‘a’, ‘g’, ‘c’, ‘t’ are used. - Array strands has at most 50 elements. - A string has at most 50 nucleotides. Examples: 1) strands = {“aaa”, “ggg”, “ccc”, “ttt”} string = “ggg” Returns “ggg” because this strand is exactly the same as the given string. 2) strands = {“aggt”, “accg”, “aacc”, “agtc”} string = “agtg” Returns “agtc” because this strand is the most similar to the given string. Though BLAST represents a huge advancement in the ability to compare DNA, it is not without its shortcomings. The basic premise behind the algorithm is that it searches for segments of DNA that are likely to be the most similar, rather than comparing each individual section with every other one. This innovation increases the speed with which DNA can be searched, but is not perfect. It is possible that BLAST will return data that is off. This result has been shown empirically by using BLAST to analyze gene sequences. Koski and Golding report that, in E. coli, in 27% of cases, BLAST returned hits that were not from E. coli’s nearest phylogenetic neighbor, with 7% of cases returning a hit from a different domain of life. However, as BLAST is refined and its DNA database becomes larger, the accuracy should improve. Nonetheless, it is important to emphasize that the closest BLAST hit is based on the computer algorithm and thus, merely implies biological similarity. Conclusion BLAST is currently one of the most popular bioinformatics search programs. The algorithm’s major emphasis on speed appeals greatly to many researchers who are aiming to solve complex problems. BLAST also supplies statistical significance and other analytical techniques involved in computer science. Biological problems that BLAST can help answer deal with DNA and protein sequences. A researcher can use a BLAST search to find and compare gene or protein sequences between organisms and look for similarities. Similar sequences can then be used to describe biological relationships and to give further insight on how systems work. References • Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Journal of Molecular Biology, 215, 403-410. • Korf, I., Yandell, M., & Bedell, J. (2003). BLAST. Cambridge: O’Reilly. • National Center for Biotechnology Information. (n.d.). BLAST: Basic Local Alignment Search Tool. Retrieved October 24, 2007, from http://www.ncbi.nlm.nih.gov/BLAST/. • Sotiriades, E., & Dollas, A. (2007). A General Reconfigurable Architecture for the BLAST Algorithm. Journal of VLSI Signal Processing, 48, 189–208. Retrieved October 16, 2007, from http://www.springerlink.com/content/p2175ql615589u22/fulltext.pdf. • University of Texas at El Paso. (n.d.). Basic Local Alignment Search Tool (BLAST). Retrieved October 15, 2007 from http://www.math.utep.edu/Faculty/mleung/teaching/m4370s04/slides/blast.pdf.

More Related