150 likes | 258 Views
BLAST & GenBank. Jeremy Badgett. Using computers for Genomic reference and research. Computers offer Biologist a tool to use in their ever expanding kit of Molecular techniques
E N D
BLAST & GenBank Jeremy Badgett
Using computers for Genomic reference and research • Computers offer Biologist a tool to use in their ever expanding kit of Molecular techniques • One of the main advantages of using a computer is the ability to completerepetitive tasks in large scale in relatively short time • This is important to genetics and sequencing issues do to the shear volume of data needing to be processed • Simply by creating tools such as GenBank and BLAST we have moved at a much faster pacethan our predecessors
What are GenBank and BLAST • GenBank and BLAST are the current tools of Bioinformatics • Providing information and connectivity to scientists and researchers • Allowing for comparison of genomes • Used for forming phylogenetic trees • Continually updated with current information • Linked to multi-national systems allowing for international data linking • Cross referencing DNA with RNA with proteins • compairing and predicting conserved regions of a genome
What is the purpose of these advancements • Taking task that would take people months and even years to complete down to minutes even seconds through use of a computer • These tasks range from checking similarity of sequences in proteins and nucleic acids • Uploading discovered sequences • Making a general database for genetic information • In general bringing genomic data to the world
Genbank a history • GenBank was started in 1982 • Was the brainchild of Walter Goad • Had more than 2000 sequences in the database one year later • Was a multi-agency product • Partners responsible for production were • NIH (National Institutes of Health) • NSF (National Science Foundation) • DOE (Department of Energy) • DOD (Department of Defense) • By the end of 1992 GenBank was moved to the new agency NCBI (National Center for Biotechnology Information) • Current release of genbanks is number 232
Walter Goad Life: 1925-2000 1950s- Worked on the H bomb 1960s- began taking an interest in biology even taking a year research at UC medical 1970s- joined the T-10 group at LANL T-10 started focusing in sequences and began working on sequence comparison and analysis This led to the formation of GenBank after receiving a grant from the NIH
The growth of GenBank • Throughout its lifetime GenBank has increased in size at an exponential rate • We are currently on release 232 • With the advent of techniques such as lumina the speed at which new genomic data is being added will keep increasing • We have only sequenced a small portion of genetic data from microorganisms and difficult to obtain samples • This shows that genbank in concert with WGS(whole genome shotgun sequencing) will continue to grow
What GenBank Offers GenBank offers a wide variety of services that we as biologists can take advantage of • Check genomes of multiple domains • Whole Genome shotgun sequencing (WGS) • Metagenomes (microorganismal genomic data that are unculturable) • TPA (third party annotation), TSA (transcriptome shotgun assembly), INSDC (international Nucleotide sequence database collaboration), HTG (high-throughput genomic sequences), dbEST (info of single-pass cDNA sequences), GSS (similar to EST but uses mRNA), TLS (Targeted Locus Study)
How genbank works GenBank is a database using XML defined by ASN.1 XML is a language that encodes for data in a form that is readable by both machines and humans. XML is a open-source that aims to be used across many platforms. ASN.1 (abstract syntax notation one) is a language used for defining data structures in a serial cross platform method that is secure and stable. Interconnects with multiple Genetic databases including ones across the world to help create a universal genomic database Using these languages and servers GenBank is open to access for the public in a secure and referenceable/uploadable manner.
BLAST and its history • BLAST was developed in 1990 through the NIH • Faster than FASTA, BLAST by looking at the most significant sequence patterns can derive similarities between two sequences. • BLAST is better then FASTA for time concerns due to this analysis of significant instead of pure local sequence alignment • FASTA was developed in 1985 by David Lipman and William Pearson • We still see its namesake and sometimes usage in FASTA format • Before this the Smith-Waterman algorithm was used • Which remains one of the most accurate and complete comparison tools • uses a complete sequence alignment • The most accurate method however consumes massive computing power and time.
How BLAST works BLAST breaks the sequence into 3 letter segments called words then proceeds to match based off sequence scores with a minimum score similarity. (seeding) Once a sequence has been seeded BLAST will extend in both directions matching the segments increasing the alignment score. The algorithm will then show sequences that meet a threshold of points and show them with their respective scores. The algorithm can be adjusted by changing the value for W and T increasing either can increase the speed of the blast but decreasing sensitivity.
BLAST vs Smith-Waterman algorithm The main difference is the sequencing algorithms used by BLAST which focuses on word score and the MxN matrices that smith-waterman uses. BLAST will do a series of increasing tallies whereas with Smith-Waterman it uses a matrix of indeterminate size to estimate the similarities of match. However due to the computing requirements exponential increase depending on matrix size Smith-Waterman takes longer though it is more accurate
David J. Lipman • Got his bachelor's at brown • MD at SUNY Buffalo • Father of modern Bioinformatics sequencing • Was a primary author on Wilbur-Lipman algorithm, FASTA, BLAST, and gapped BLAST and PSI BLAST • Was the director of the NCBI from 1989 to 2017 • Contributes heavily to the upkeep of GenBank • Editor in Chief of Biology Direct • Has received many awards for his work and advancement to the biomolecular/ bioinformatics fields • Major proponent of free and open access to bioinformatic tools and data Basically modern Biology would be at least 10-15 years behind if not for the work of this one man
BLASTing a tool • BLAST is a search tool that uses an algorithm to search an uploaded sequence against a reference database of sequences. • Some of the references that could queried include the human genome, along with other genomic sequences. • Input a sequence as a FASTA or GenBank format • Gives the forms of BLASTn for Nucleotide similarities • BLASTx nucleotide to protein • tBLASTn protein to nucleotide • pBLAST protein similarities • Can be downloaded to reference against a unique database or used to check against general databases such as GenBank. • New Blast for primer design for PCR
references Hallam Stevens, 'From bomb to bank: Walter Goad and the introduction of computers into biology' in Outsider scientists: routes to innovation in biology Oren Harmon and Michael Dietrich, eds. Chicago, IL: University of Chicago Press, 2013: 128-144 Bosak, Jon; Bray, Tim (May 1999). "XML and the Second-Generation Web". Stephen Altschul; Warren Gish; Webb Miller; Eugene Myers; David J. Lipman (1990). "Basic local alignment search tool". Journal of Molecular Biology. 215 (3): 403–410. Wilbur, W. J.; Lipman, D. J. (1983). "Rapid similarity searches of nucleic acid and protein data banks". Proceedings of the National Academy of Sciences of the United States of America. 80 (3): 726–730. Adapted from Biological Sequence Analysis I, Current Topics in Genome Analysis https://www.ncbi.nlm.nih.gov/genbank/ https://blast.ncbi.nlm.nih.gov/Blast.cgi