1k likes | 1.31k Views
NCBI Molecular Biology Resources —— Entrez. 王禄山. Mar . 200 5. NCBI Resources. About NCBI NCBI Sequence Databases Primary Database – GenBank Derivative Databases - RefSeq Entrez Databases and Text Searching BLAST. Bethesda, MD. The National Institutes of Health.
E N D
NCBI Molecular Biology Resources—— Entrez 王禄山 Mar. 2005
NCBI Resources • About NCBI • NCBI Sequence Databases • Primary Database – GenBank • Derivative Databases - RefSeq • Entrez Databases and Text Searching • BLAST
Bethesda, MD The National Institutes of Health
The National Center for Biotechnology Information • Accepts submissions of primary data • Develops tools to analyze these data • Creates derivative databases based on the primary data • Provides free search, link, and retrieval of these data, primarily through the Entrez system
The National Center for Biotechnology Information (NCBI) • Created as a part of the National Library of Medicine in 1988 • Establish public databases • Research in computational biology • Develop software tools for sequence analysis • Disseminate biomedical information • Tools: Entrez(1992) ,BLAST(1990), • GenBank (1992) • Free MEDLINE (PubMed, 1997) • Other databases: dbEST, dbGSS, dbSTS, MMDB, OMIM, UniGene, GeneMap, Taxonomy, CGAP, SAGE, LocusLink, RefSeq
Christmas & New Year Number of Users and Hits Per Day 1997 1998 1999 2000 2001 2002 2003
Homepage - accessing the data all[filter]
all[filter] 1/11/2005
Entrez Nucleotide Primary Data • GenBank / DDBJ / EMBL 46,974,918 (98.86 %) Derivative Data • RefSeq 533,236 (1.12 %) • PDB (structures) 5,484 • Third Party Annotation (TPA) 4,516 “Total” 47,518,338 GenBank
Release 145 Dec 2004 40.6 x 106 Records 44.5 x 109 Nucleotides 153 Gigabytes 705 files GenBank: NCBI’s Primary Sequence Database • full release every two months • incremental and cumulative updates daily • available only through internet • release notes: gbrel.txt ftp://ftp.ncbi.nih.gov/genbank/ ftp://genbank.sdsc.edu/pub ftp://bio-mirror.net/biomirror/genbank
Molecular Databases • Primary Databases • Original submissions by experimentalists • Database staff organize but don’t add additional information • Example:GenBank • Derivative Databases • Human curated • compilation and correction of data • Example:SWISS-PROT, NCBI RefSeq mRNA • Computationally Derived • Example:UniGene • Combinations • Example:NCBI Genome Assembly
C GA ATT GA ATT C C C ATT C ACT GA TA Curators Primary vs. Derivative Databases Sequencing Centers UniGene UniSTS EST GenBank Updated by NCBI STS Updated ONLY by submitters RefSeq: annotation pipeline GSS HTG INV VRT PHG VRL PRI ROD PLN MAM BCT RefSeq RefSeq: Entrez Gene and Genomes pipelines Labs
Header Feature Table Sequence GenBank Records The Flatfile Format
LOCUS NM_019570 4279 bp mRNA linear INV 28-OCT-2004 DEFINITION Mus musculus REV1-like(S. cerevisiae)(Rev1l),mRNA ACCESSION NM_019570 VERSION NM_019570.3 GI:50811869 KEYWORDS . = Title A Typical GenBank Record Entrez
GenBank Record: Feature Table Entrez GenPept identifier Blast
skip GenBank Record: sequence Blast
http://www.ncbi.nlm.nih.gov/ NCBI Homepage
Entrez NCBI Homepage Mendelian Inheritance in Man BLAST
Using Entrez An integrated database search and retrieval system
PubMed abstracts Taxonomy Genomes Nucleotide sequences Entrez: Neighboring and Hard Links Word weight 3-D Structure 3 -D Structure VAST Phylogeny (MMDB) Protein sequences BLAST BLAST
GEO(gene expression omnibus, 基因表达汇编):收集、存贮微阵列基因表达数据的数据库。
Database Searching with Entrez Using limits and field restriction to find mouse GAPD Linking and neighboring with mouse GAPD
Mouse Entrez Nucleotides
Document Summaries: Mouse[All Fields] 7 million records
Data Rich,Knowledge Poor 不要把自己淹没于「数据信息的海洋」中, 要去找「知识的岛屿」。
什么是数据、信息、知识? 一定注意现在生物信息学存贮数据库叫DATABASE
Mouse Entrez Nucleotides: Limits: Preview/Index
Accession All Fields Author Name EC/RN Number Feature key Filter Gene Name Issue Journal Name Keyword Modification Date Organism Page Number Primary Accession Properties Protein Name Publication Date SeqID String Sequence Length Substance Name Text Word Title Word Uid Volume Field Restriction Entrez Nucleotides: Limits Mouse Exclude unwanted categories of sequences Gene Location Genomic DNA/RNA Mitochondrion Chloroplast Molecule Genomic DNA/RNA mRNA rRNA Only From RefSeq GenBank EMBL DDBJ
7,247,131[All Fields] -6,850,905[Organism] 397,226 Document Summaries: Mouse[Organism]
Adding Terms: Preview/Index Search History
161 Mouse GAPD Records
3 19