400 likes | 620 Views
PubMed. On-line access to searching the Biomedical Literature. PubMed Tutorial. http://www.nlm.nih.gov/bsd/pubmed_tutorial/m1001.html This is a useful tool. You need to have the correct download for the interactive animations.
E N D
PubMed On-line access to searching the Biomedical Literature
PubMed Tutorial • http://www.nlm.nih.gov/bsd/pubmed_tutorial/m1001.html • This is a useful tool. • You need to have the correct download for the interactive animations. • http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=helppubmed&part=pubmedhelp • Another useful tool with lots of Quick Start hints. • I have summarized some in this lecture
Interrelationships NCBI Entrez • Entrez links between databases • It is a life sciences search engine • http://www.ncbi.nlm.nih.gov/sites/gquery
PubMed vs. Google • PubMed • Peer reviewed journals • Multiple layers of quality control • Edited and reviewed text and grammar • Combines automated and manual searching • Structured links to other data sets (nucleic acid and protein sequences) • Google • The internet • Free, but you get what you pay for • Variable document structure and grammar • Fully automated search • Unstructured links
Entrez PubMed • Access http://www.ncbi.nlm.nih.gov/entrez/ • Entrez covers the Biomedical research broadly. Even recent journals are indexed. • Lack of coverage in • CS and engineering • Physical chemistry • Plant science • Searchable content • Free text search • title, abstract, indexing, address • Controlled vocabulary • Mesh indexing, journal, dates, substance names, secondary indices
PubMedCentral • U.S. National Library of Medicine's digital archive of life sciences journal literature • Full text of many journal archives • Not the most recent issues • Limited journal collection • Access to PMC is free and unrestricted http://www.pubmedcentral.nih.gov/about/faq.html
PubMed Entry • The PubMed Entry includes: • Citation • Link to paper (maybe) • Abstract • PMID# • UID#
Searching the Biomedical Literature • The PubMed literature is also in a flat file format with various fields. • Knowledge of the fields in the file can allow you to focus your search and find what you are looking for more quickly. • For example, you can search by author and journal if you are looking for a specific person’s work and know where it was published.
Uses and Limits of MeSH Manually indexed • Major topics => intelligent filtering • Pick up things that are not in the title/abstract • Takes time to add new headings (no MeSH headings for most recent several months) • People are fallible, so some misclassification occurs • Subheadings can be very useful, but are less reliable Strong medical focus • Good for biomedical searches • Not as useful in technical areas, agriculture and plants
MeSH Vocabulary • The MeSH controlled vocabulary is a distinctive feature of MEDLINE. • It imposes uniformity and consistency to the indexing of biomedical literature. • MeSH terms are arranged in a hierarchical, categorized system. • These MeSH Tree Structures are updated annually.
Curating-Not in a Museum • Curating in Bioinformatics is an action taken by someone (often a scientist trained in technical areas) to regularize the language of Science. • Science uses too many synonyms- word that mean roughly the same thing. • Curating makes that regular so we can search for things. • MESH is a form of curation.
MeSH Homepage • http://www.nlm.nih.gov/mesh/meshhome.html • MeSH is needed to help organize searching for efficiency. • This reduces the synonyms and abbreviations in the biomedical literature. • Humans help with the sorting of the headings is “curation.”
Structure of MeSH Divisions Anatomy [A] Organisms [B] Diseases [C] Chemicals and Drugs [D] Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] Psychiatry and Psychology [F] Biological Sciences [G] Physical Sciences [H] Anthropology, Education, Sociology and Social Phenomena [I] Technology and Food and Beverages [J] Humanities [K] Information Science [L] Persons [M] Health Care [N] Geographic Locations [Z] Hierarchy with Multiple Inheritance Amino Acids, Peptides, and Proteins [D12] Proteins [D12.776] DNA-Binding Proteins [D12.776.260] NF-kappa B [D12.776.260.600] Amino Acids, Peptides, and Proteins [D12] Proteins [D12.776] Nuclear Proteins [D12.776.660] NF-kappa B [D12.776.260.600] Amino Acids, Peptides, and Proteins [D12] Proteins [D12.776] Transcription Factors [D12.776.930] NF-kappa B [D12.776.260.600]
MeSH Full Listing NF-kappa B Ubiquitous, inducible, nuclear transcriptional activator that binds to enhancer elements in many different cell types and is activated by pathogenic stimuli. The NF-kappa B complex is a heterodimer composed of two DNA-binding subunits: NF-kappa B1 and relA. Year introduced: 1991 Ssubheadings:administration and dosage agonists analysis antagonists and inhibitors biosynthesis blood cerebrospinal fluid chemistry classification deficiency diagnostic use drug effects genetics immunology isolation and purification metabolism pharmacokinetics pharmacology physiology radiation effects secretion therapeutic use toxicity ultrastructure Restrict Search to Major Topic headings only Do Not Explode this term (i.e., do not include MeSH terms found below this term in the MeSH tree). Entry Terms: NF-kB NF kB Nuclear Factor kappa B kappa B Enhancer Binding Protein Immunoglobulin Enhancer-Binding Protein Enhancer-Binding Protein, Immunoglobulin Immunoglobulin Enhancer Binding Protein Transcription Factor NF-kB Factor NF-kB, Transcription NF-kB, Transcription Factor Transcription Factor NF kB Ig-EBP-1 Ig EBP 1 Previous Indexing: DNA-Binding Proteins (1987-1990) Transcription Factors (1987-1990) See Also: I-kappa B All MeSH Categories Chemicals and Drugs Category Amino Acids, Peptides, and Proteins Proteins DNA-Binding Proteins NF-kappa B All MeSH Categories Chemicals and Drugs Category Amino Acids, Peptides, and Proteins Proteins Nuclear Proteins NF-kappa B All MeSH Categories Chemicals and Drugs Category Amino Acids, Peptides, and Proteins Proteins Transcription Factors NF-kappa B
Journals Database Entrez -> Journals A database of journal names and information Entry structure: Nature genetics. pISSN: 1061-4036 MEDLINE Abbr: Nat Genet ISO Abbr: Nat. Genet. NLM ID: 9216904
Boolean Logic • Boolean logic symbolically represents relationships between entities. There are three Boolean operators: • AND • Use the AND operator to retrieve a set in which each citation contains ALL the search terms. This operator places no condition on where the terms are found in relation to one another; the terms simply have to appear somewhere in the same citation.
Boolean Logic • OR • Use the OR operator to retrieve documents that contain at least one of the specified search terms. • Use OR when you want to pull together articles on similar subjects. • NOT • Use the NOT operator to exclude the retrieval of terms from your search. • Be careful with NOT as you can exclude things you might want
Boolean Logic in PubMed • Boolean operators -- AND, OR, NOT -- must be entered in uppercase letters. • Boolean operators are processed from left to right. • Use parentheses to nest terms together so they will be processed as a unit and then incorporated into the overall strategy. • Boolean Logic is revealed by clicking Details
Phrase Searching • Specify with quotes “transcription factor” vs. “transcription” “factor” • Precomputed • Fast • Often mapped to synonyms and MeSH terms • Just because you get a “phrase not found” message does not mean it is not present
Text Neighboring Related articles link (single or multiple articles) • Term usage similarity • Articles talking about the same thing are likely to use the same words • Good recall (sensitivity) • Precomputed and fast Limitations • Strictly algorithmic, no understanding • “Ras activates PI3K” vs. “PI3K activates Ras” • Historical and author biases in vocabulary • Poor precision (specificity) • Ranking can not satisfy everyone
Computational Issues in Statistical Text Retrieval • Stop words • Simple words like “the” and “and” are not worth scoring • Term weights • We should weight matches of rare words more heavily than matches of common words • Stemming and synonyms • Need to stem verbs and plural forms • May or may not be able to reduce to a normalized set of synonyms
Computational Issues in Statistical Text Retrieval • Normalizing for length • Don’t want to exclude short articles or articles without an abstract • All vs. all comparison is not feasible • 107 articles => 1014 comparisons, not feasible • Compute demands of the task are growing faster than Moore’s law
Entrez Clipboard • The Clipboard gives you a place to collect selected citations from one search or several searches. • After you add citations to the Clipboard, you may then want to use the print, save, or order buttons. • The maximum number of items that can be placed in the Clipboard is 500.
Entrez Clipboard • Once you have added items to the Clipboard, you can click on Clipboard from the Features bar to view your selections. • PubMed Central uses cookies to add your selections to the Clipboard. To use this feature, your web browser must be set to accept cookies.
Using Clipboard • Add to Clipboard • To place an item in the Clipboard, click on the check box to the left of the citation. • Select Clipboard from the Send to pull-down menu. • Then click the Send to button. Once you have added a citation to the Clipboard, the record number color will change to green. Send to “clipboard” • You can save results collected from multiple searches • The Clipboard will hold a maximum of 500 items. • Clipboard items will be lost after 8 hours of inactivity.
Saving from the Clipboard • Citations are initially displayed in the summary format in the relevancy order. • Use Sort to change the order. You can select all or individual citations to display or save in one of the citation display formats.
Saving from the Clipboard • Select the desired format from the pull-down menu, click Save to save your selections to a file, or use the Print feature of your web browser to print the citations.
Saving from the Clipboard • Printing from your web browser will only print the information and citations listed on the web page. • You may also display citations as plain text without the sidebar menu and toolbars by clicking the Text button.
Modifying the Display • PubMed Central citations are initially displayed in a summary format. You can choose to display other formats: • Click on the Abstract, Full Text, PDF or PubLink hyperlink for a specific citation. • All Citations -Select a display format from the Display pull-down menu and then click Display to view a different display or Links for all citations on the page. • Selected Citations - Click on the boxes to the left of each author to select specific citations and then select a format or Links from the Display pull-down menu and click Display. • You can also use the link-out function in the display menu which can be handy.
Entrez History • Retrieve and use your search history • Boolean combinations of search results. To combine searches use # before search number, e.g., #2 AND #6. • Filtering of previous search results • This can help you on big searches to remember and build on your terms • Search History will be lost after eight hours of inactivity
Address Fields Find a local expert in PubMed “Marshall University” AND (25755) [ad] OR “West Virginia” [ad] NOT WVU [ad]) Need to think about all the ways people write addresses “Joan C. Edwards” fails to pick up “MUSOM.” Zip codes are very specific, but only get about 70%, since they might not list all authors zips Won’t catch co-authored articles with a remote collaborator
Related Articles • PubMed uses a powerful word-weighted algorithm to compare words from the Title and Abstract of each citation, as well as the MeSH headings assigned. • The best matches for each citation are pre-calculated and stored as a set. • THIS MAKES IT FAST. • You may see a few citations without the Related Articles link. These citations have not yet gone through the algorithm, which takes several days.