1 / 15

BLAST & GenBank

BLAST & GenBank. Jeremy Badgett. Using computers for Genomic reference and research. Computers offer Biologist a tool to use in their ever expanding kit of Molecular techniques

marion
Download Presentation

BLAST & GenBank

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BLAST & GenBank Jeremy Badgett

  2. Using computers for Genomic reference and research • Computers offer Biologist a tool to use in their ever expanding kit of Molecular techniques • One of the main advantages of using a computer is the ability to completerepetitive tasks in large scale in relatively short time • This is important to genetics and sequencing issues do to the shear volume of data needing to be processed • Simply by creating tools such as GenBank and BLAST we have moved at a much faster pacethan our predecessors

  3. What are GenBank and BLAST • GenBank and BLAST are the current tools of Bioinformatics • Providing information and connectivity to scientists and researchers • Allowing for comparison of genomes • Used for forming phylogenetic trees • Continually updated with current information • Linked to multi-national systems allowing for international data linking • Cross referencing DNA with RNA with proteins • compairing and predicting conserved regions of a genome

  4. What is the purpose of these advancements • Taking task that would take people months and even years to complete down to minutes even seconds through use of a computer • These tasks range from checking similarity of sequences in proteins and nucleic acids • Uploading discovered sequences • Making a general database for genetic information • In general bringing genomic data to the world

  5. Genbank a history • GenBank was started in 1982 • Was the brainchild of Walter Goad • Had more than 2000 sequences in the database one year later • Was a multi-agency product • Partners responsible for production were • NIH (National Institutes of Health) • NSF (National Science Foundation) • DOE (Department of Energy) • DOD (Department of Defense) • By the end of 1992 GenBank was moved to the new agency NCBI (National Center for Biotechnology Information) • Current release of genbanks is number 232

  6. Walter Goad Life: 1925-2000 1950s- Worked on the H bomb 1960s- began taking an interest in biology even taking a year research at UC medical 1970s- joined the T-10 group at LANL T-10 started focusing in sequences and began working on sequence comparison and analysis This led to the formation of GenBank after receiving a grant from the NIH

  7. The growth of GenBank • Throughout its lifetime GenBank has increased in size at an exponential rate • We are currently on release 232 • With the advent of techniques such as lumina the speed at which new genomic data is being added will keep increasing • We have only sequenced a small portion of genetic data from microorganisms and difficult to obtain samples • This shows that genbank in concert with WGS(whole genome shotgun sequencing) will continue to grow

  8. What GenBank Offers GenBank offers a wide variety of services that we as biologists can take advantage of • Check genomes of multiple domains • Whole Genome shotgun sequencing (WGS) • Metagenomes (microorganismal genomic data that are unculturable) • TPA (third party annotation), TSA (transcriptome shotgun assembly), INSDC (international Nucleotide sequence database collaboration), HTG (high-throughput genomic sequences), dbEST (info of single-pass cDNA sequences), GSS (similar to EST but uses mRNA), TLS (Targeted Locus Study)

  9. How genbank works GenBank is a database using XML defined by ASN.1 XML is a language that encodes for data in a form that is readable by both machines and humans. XML is a open-source that aims to be used across many platforms. ASN.1 (abstract syntax notation one) is a language used for defining data structures in a serial cross platform method that is secure and stable. Interconnects with multiple Genetic databases including ones across the world to help create a universal genomic database Using these languages and servers GenBank is open to access for the public in a secure and referenceable/uploadable manner.

  10. BLAST and its history • BLAST was developed in 1990 through the NIH • Faster than FASTA, BLAST by looking at the most significant sequence patterns can derive similarities between two sequences. • BLAST is better then FASTA for time concerns due to this analysis of significant instead of pure local sequence alignment • FASTA was developed in 1985 by David Lipman and William Pearson • We still see its namesake and sometimes usage in FASTA format • Before this the Smith-Waterman algorithm was used • Which remains one of the most accurate and complete comparison tools • uses a complete sequence alignment • The most accurate method however consumes massive computing power and time.

  11. How BLAST works BLAST breaks the sequence into 3 letter segments called words then proceeds to match based off sequence scores with a minimum score similarity. (seeding) Once a sequence has been seeded BLAST will extend in both directions matching the segments increasing the alignment score. The algorithm will then show sequences that meet a threshold of points and show them with their respective scores. The algorithm can be adjusted by changing the value for W and T increasing either can increase the speed of the blast but decreasing sensitivity.

  12. BLAST vs Smith-Waterman algorithm The main difference is the sequencing algorithms used by BLAST which focuses on word score and the MxN matrices that smith-waterman uses. BLAST will do a series of increasing tallies whereas with Smith-Waterman it uses a matrix of indeterminate size to estimate the similarities of match. However due to the computing requirements exponential increase depending on matrix size Smith-Waterman takes longer though it is more accurate

  13. David J. Lipman • Got his bachelor's at brown • MD at SUNY Buffalo • Father of modern Bioinformatics sequencing • Was a primary author on Wilbur-Lipman algorithm, FASTA, BLAST, and gapped BLAST and PSI BLAST • Was the director of the NCBI from 1989 to 2017 • Contributes heavily to the upkeep of GenBank • Editor in Chief of Biology Direct • Has received many awards for his work and advancement to the biomolecular/ bioinformatics fields • Major proponent of free and open access to bioinformatic tools and data Basically modern Biology would be at least 10-15 years behind if not for the work of this one man

  14. BLASTing a tool • BLAST is a search tool that uses an algorithm to search an uploaded sequence against a reference database of sequences. • Some of the references that could queried include the human genome, along with other genomic sequences. • Input a sequence as a FASTA or GenBank format • Gives the forms of BLASTn for Nucleotide similarities • BLASTx nucleotide to protein • tBLASTn protein to nucleotide • pBLAST protein similarities • Can be downloaded to reference against a unique database or used to check against general databases such as GenBank. • New Blast for primer design for PCR

  15. references Hallam Stevens, 'From bomb to bank: Walter Goad and the introduction of computers into biology' in Outsider scientists: routes to innovation in biology Oren Harmon and Michael Dietrich, eds. Chicago, IL: University of Chicago Press, 2013: 128-144 Bosak, Jon; Bray, Tim (May 1999). "XML and the Second-Generation Web". Stephen Altschul; Warren Gish; Webb Miller; Eugene Myers; David J. Lipman (1990). "Basic local alignment search tool". Journal of Molecular Biology. 215 (3): 403–410. Wilbur, W. J.; Lipman, D. J. (1983). "Rapid similarity searches of nucleic acid and protein data banks". Proceedings of the National Academy of Sciences of the United States of America. 80 (3): 726–730. Adapted from Biological Sequence Analysis I, Current Topics in Genome Analysis https://www.ncbi.nlm.nih.gov/genbank/ https://blast.ncbi.nlm.nih.gov/Blast.cgi

More Related