350 likes | 787 Views
GenBank. Nucleotide only sequence database Archival in nature Data shared nightly among three collaborating databases GenBank at NCBI DNA Database of Japan (DDBJ) EMBL at EBI. The International Sequence Database Collaboration. Source NCBI. NCBI site map: A good place to find resources.
E N D
GenBank • Nucleotide only sequence database • Archival in nature • Data shared nightly among three collaborating databases • GenBank at NCBI • DNA Database of Japan (DDBJ) • EMBL at EBI
The International Sequence Database Collaboration Source NCBI
NCBI site map: A good place to find resources http://www.ncbi.nlm.nih.gov/Sitemap/index.html
GeneBank Release 131.0December 15 2003 • 30968418 Sequences • 36553368485 Bases • full release every two months • incremental and cumulative updates daily • available only through internet ftp://ftp.ncbi.nih.gov/genbank/
GenBank Record • Header • information that apply to • the whole record • Features • annotations on the record • Sequence
GenBank Record Header GeneBank Record modification date Molecule Type Locus Name Sequence Length Accession Number Modification Date Version Number GenBank Division
FEATURE GeneBank Record Link to Seq
Sequence GenBank Record
Entrez http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi Select GenBank
Find mRNA sequence for human “epidermal growth factor receptor”
Specify human as an organism : Click Preview/Index Specify “human” by selecting “Organisms” from “All Fields” drop-down menu
2 1
Limit your search Exclude all technology generated records Select mRNA in the “Molecule” list Select “Refseq” in the database list
RefSeq • Database of reference sequences • Curated • Non-redundant; one record for each gene, or each splice variant, from each organism represented • Each record is intended to present an encapsulation of the current understanding of a gene or protein, similar to a review article RefSeq FAQ
Find Gene Name by searching LocusLink : http://www.ncbi.nlm.nih.gov/LocusLink/ Select organism
Find mRNA sequence for epidermal growth factor receptor (EGFR): Starts with gene name EGFR • Limit search to • Gene Name • exclude all technology generated records • Select mRNA as Molecule • Select “Refseq” as source database
PubMed abstracts Taxonomy Genomes Nucleotide sequences Entrez: Neighbors and Hard Links Word weight 3-D Structure 3 -D Structure Phylogeny VAST Protein sequences BLAST BLAST Source NCBI
Permanent session List of public servers Temporary session Documentation http://srs.ebi.ac.uk Database Information -which are present -when indexed
What is SRS? • Central resource for molecular biology data • Data retrieval system - more than 250 databanks have been indexed. More than 35 SRS servers over the WWW • Data analysis applications server - 11 protein applications - 6 nucleic acid applications • Uniform query interface on the web
History of SRS • 1990 - Main author Dr. Thure Etzold • Development started in EMBL, Heidelberg • 1997 • Moved to EBI in Cambridge. Development work was supported by various grants amongst others from the EMBnet. • 1998 • Etzold and his group join LionBiosciences
Why SRS? • Information retrieval • Easy way to retrieve information from sequence and sequence-related databases • Possibility to search for multiple words/other criteria • Linkage between different databases • E.g. Find all primary structures with known three-dimensional structure • ... and much more
parsed Index file Data Retrieval Searchable links between database entries Philosophy of SRS Original database file -plain text, html, xml
Workbenches QueryForms Library groups Libraries The Library Select Page
SRS main toolbar tabs • Top Page: displays databases in different database groups • Query: displays either the standard or extended query form • Resultsor “the query manager”: maintains a history of all the results obtained during a session • Projects or “the project manager”: maintains a history of all queries and views used during a session • Views: allows a user to define a user specific view for one or more databases • Databanks: contains a list and some facts about the databases available in the system
Search terms in SRS • SRS indexed fields can be searched using any of the following: • Single word search • Multiple word phrases • Numbers and dates • Regular expressions • Wildcards