140 likes | 203 Views
Explore the history, working mechanisms, types, and future potential of BLAST - a vital tool for comparing biological sequences. Uncover its impact on genomics and bioinformatics.
E N D
BLAST and GenBank Zachary Potts
Outline • What is BLAST? • History/Context • How Does it Work? • Types of BLAST • Future Implications • References
What is BLAST? • Basic Local Alignment Search Tool • Program used to search for similar sequences • Variants exist for searching different nucleotide and protein databases • GenBank is a database containing all publicly available DNA sequences • Part of the International Nucleotide Sequence Database Collaboration along with DNA DataBank of Japan and European Nucleotide Archive • Daily data exchange between each database • GenBank releases occur every two months
History/Context (and why it was important) • 1990: BLAST introduced – official start of the program • 1991: Entrez introduced – system for linking databases in CD form • 1992: GenBank goes under NCBI – large database of nucleotides, data sharing with European Molecular Biology Laboratory (EMBL) and DNA Data Bank of Japan (DDBJ) • 1993: Entrez goes online – linking databases online for easier access • 1995: Work with genomes – provided a way for researchers to organize genomic information • 1996: Online Mendelian Inheritance in Man (OMIM) – directory containing information on human genes and genetic disorders • 1999: Human Genome Project – human chromosome #22 sequenced and put into NCBI database
History/Context continued • 1999: LocusLink, RefSeq, and dbSNP – resources released by NCBI to aid in human genome analysis • 2000: Human Genome Project – entire human genome sequence draft made available at NCBI • 2002: Whole Genome Shotgun – sequences available at GenBank • 2003: Entrez Gene – LocusLink now under Entrez Gene, provides information on relationships between data sets • 2007: Genome Reference Consortium – Consortium of various institutes including NCBI forms to improve quality of human reference genome and model organisms • 2008: 1000 Genomes Project – effort to sequence genomes of 1000 participants from around the world and make it widely available
How does it work? • Smith-Waterman Algorithm (early 1980s) • Compared two sequences of arbitrary lengths to find similarities • Very time consuming • More accurate • Altschul Algorithm (early 1990s) • First finds matches between short sequences and then expands to create larger alignments • Very fast • Somewhat privy to false positive and negative matches; probability of error is provided
How does it work? • Compares sequences • Test sequence to sequences within a large database • Finding similar sequences helps to find relationships and better establish identities of organisms • Similar sequences will have similar identities and similar functions
Types of BLAST • Nucleotide BLAST: nucleotide to nucleotide • Protein BLAST: protein to protein • blastx: translated nucleotide to protein across the six reading frames • tblastn: protein to translated nucleotide across the six reading frames
Future Implications • Better access to sequence data for all to use • More ability to look for relationships • Phylogenetic • Genomic • Functions of proteins
References • Altschul, S., Gish, W., Miller, W., Myers, E., and Lipman, D. (1990). Basic Local Alignment Search Tool. J. Mol. Biol.215, 403-410. • Benton, D. (1990). Recent changes in the GenBank on-line service. Nucleic Acids Research (18):6, 1517-1520 • Bioinformatics Explained: BLAST versus Smith-Waterman [Internet]. CLC Bio. (Denmark); 2007 • The NCBI Handbook [Internet]. 2nd edition. Bethesda (MD): National Center for Biotechnology Information (US); 2013 • Johnson, M., Zaretskaya, I., Raytselis, Y., Merezhuk, Y., McGinnis, S., and Madden, T. (2008). NCBI BLAST: a better web interface. Nucleic Acids Research 36, W5-W9. • https://blast.ncbi.nlm.nih.gov/Blast.cgi (screenshots and general information) • https://www.ncbi.nlm.nih.gov/genbank/ (general information)