320 likes | 361 Views
NCBI PowerScripting. Lecture 1: Introduction to Entrez. October 16-19, 2007. The Entrez Query System at NCBI. The Entrez Query System at NCBI. Entrez Help Document. Entrez Functions. Search one or all of 31 databases. Generate brief “document summaries” for a list of records.
E N D
NCBI PowerScripting Lecture 1: Introduction to Entrez October 16-19, 2007
Entrez Functions • Search one or all of 31 databases. • Generate brief “document summaries” for a list of records. • Link from one list of records to another. • Perform boolean operations on lists of records. • Format records for display and download.
Word weight Computational PubMed abstracts Taxonomy VAST Phylogeny Computational Genomes Nucleotide sequences BLAST BLAST Computational Computational Links Between and Within Nodes 3-D Structure 3 -D Structures Protein sequences
Entrez Transactions • Each record in an Entrez database is assigned an integer called a UID, or “unique identifier”. • Entrez transactions are performed on lists of UIDs. • Transactions include boolean operations and the tracking of links within and between database records.
Entrez Database Queries • Entrez supports text searches with field restrictions, boolean operators (sometimes implicit), and term grouping • Field restrictions vary among the databases • Term-mapping happens • Explicitly fielded searches are not term-mapped • Quoted phrases are searched as a unit
Term Mapping (PubMed) • Untagged terms that are entered in the search box are matched (in this order) against: • - a MeSH (Medical Subject Headings) translation table • - a Journals translation table • the Full Author translation table • Author index • the Full Investigator (Collaborator) translation table • - and an Investigator (Collaborator) index
Term: cold • PubMed:"chronic obstructive pulmonary disease"[Text Word]OR "pulmonary disease, chronic obstructive"[MeSH Terms]OR ("common cold"[TIAB]NOT Medline[SB]) OR "common cold"[MeSH Terms]OR "cold"[MeSH Terms]OR cold[Text Word] • PMC:"pulmonary disease, chronic obstructive"[MeSH Terms]OR "common cold"[MeSH Terms]OR "cold"[MeSH Terms]OR cold[Text Word] • Nucleotide:cold[All Fields] • Taxonomy: cold[All Names]
Term: mouse • PubMed:("mice"[TIAB]NOT Medline[SB]) OR "mice"[MeSH Terms] OR mouse[Text Word] • PMC:"mice"[MeSH Terms]OR mouse[Text Word] • Nucleotide:"Mus musculus"[Organism]OR mouse[All Fields] • Taxonomy:mouse[All Names] • Genome: "Mus musculus"[Organism]OR mouse[All Fields]
Viewing Indexed Terms on the Web Preview-Index Tab
Patterns are Recognized PubMed, PMC, Nucleotide, Protein, Structure and others • miller baker: miller[All Fields] AND baker[All Fields] • miller j baker m: miller j[Author] AND baker m[Author] All Databases • AF123456, P12243,555: direct retrieval of record
Search History • Separate search history is maintained for each database. • Previous searches can be recalled and combined using a query key and a cookie, called a “WebEnv”. • Available on the Web under the 'History Tab'
DocSums • Brief summaries of database records are generated quickly on frontend servers. • Full records are retrieved from backend machines.
Pubmed17,454,100 Recordsbiomedical literature citations and abstracts • Key Field Restrictions • [author] • [title] • [pdat] – publication date • [mesh] Medical Subject Headings • [journal] • [volume]
CoreNucleotide41,888,768 Recordssequence database (GenBank) • Key Field Restrictions • [organism] • [accession] • [author] • [title] • [sequence length] • [properties] • [gene]
Protein18,192,257 RecordsProtein sequence records • Key Field Restrictions • [organism] • [title] • [author] • [molecular weight] • [sequence length] • [gene] • [ecno] enzyme commission number
Gene3,723,441 RecordsGene database: locus-centered records • Key Field Restrictions • [organism] • [gene] official symbol of gene locus • [chromosome] • [title] • [accession]
Eutilities • A set of eight server-side programs. • Support a uniform URL syntax. • Translate a standard set of URL-encoded input parameters for the array of programs comprising the Entrez system.
Entrez Functions and EUtils • Searches: esearch.fcgi • DocSums: esummary.fcgi • Links: elink.fcgi • Uploads: epost.fcgi • Downloads: efetch.fcgi • Global Query: egquery.fcgi • Spelling: espell.fcgi • Information: einfo.fcgi
An Esearch Followed by Multiple Rounds of Efetch http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?usehistory=y&db=gene&term=mammalia[orgn] Elapsed time: 0 seconds 0%, 0 records of 161815 retrieved. Tue Jan 25 20:46:32 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=0&rettype=native&WebEnv=0ImHxGDH2zI93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xmlElapsed time: 40 seconds 0.3%, 500 records of 161815 retrieved. Tue Jan 25 20:47:09 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=500&rettype=native&WebEnv=0ImHxGDH2zI93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xmlElapsed time: 79 seconds 0.61%, 1000 records of 161815 retrieved. Tue Jan 25 20:47:48 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=1000&rettype=native&WebEnv=0ImHxGDH2zI93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xmlElapsed time: 118 seconds 0.92%, 1500 records of 161815 retrieved. Tue Jan 25 20:48:27 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=1500&rettype=native&WebEnv=0ImHxGDH2zI93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xmlElapsed time: 158 seconds 1.23%, 2000 records of 161815 retrieved. Tue Jan 25 20:49:07 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=2000&rettype=native&WebEnv=0ImHxGDH2zI93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xmlElapsed time: 204 seconds 1.54%, 2500 records of 161815 retrieved. Tue Jan 25 20:49:53 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=2500&rettype=native&WebEnv=0ImHxGDH2zI93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xml
A Download of 161825 Mammalian Entrez Gene Records S E C O N D S Efetch calls