1 / 65

Getting Started: PCB3063 Term Project and NCBI’s OMIM, PubMed and Sequence Resources

Getting Started: PCB3063 Term Project and NCBI’s OMIM, PubMed and Sequence Resources. Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries/ U.F. Genetics Institute PCB3063, General Genetics tennantm@ufl.edu. Today’s Session. Your term project

philip-kidd
Download Presentation

Getting Started: PCB3063 Term Project and NCBI’s OMIM, PubMed and Sequence Resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Getting Started: PCB3063 Term Projectand NCBI’s OMIM, PubMed and Sequence Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries/ U.F. Genetics Institute PCB3063, General Genetics tennantm@ufl.edu

  2. Today’s Session • Your term project • Resources to help you with your project … • HSCL Website, Catalog, etc. • NCBI Resources: • OMIM – “review articles” • PubMed – journal articles • Nucleotides/RefSeq – gene sequences • Receive your term project topic

  3. Your Term Project • Scientific poster on an assigned genetic disorder • Should cover all aspects of genetics – • Mode of inheritance • What gene normally does • What protein is encoded by gene • Map location and gene structure • Types of mutations and what they do to protein • Potential for gene therapy • Etc. (more info next time)

  4. Your Term Project • Four assignments for your project: • Part A: • Identify disorder/gene in OMIM and MeSH • Sakai assessment by start of class Feb. 10 • Part B: • Literature and sequence searches • Sakai assessment by start of class Feb. 24 AND paper form and search print-outs in class Feb. 24 • Part C: • Structure, SNP, map and clinical db searches • Sakai assessment by start of class Mar. 24 AND paper form and search print-outs in class Mar. 24 • Poster Presentations – Apr. 14 • Note – keep Parts A, B, and C (and the corresponding search print-outs) once they are returned to you; you may need to resubmit them with your poster.

  5. NCBI • National Center for Biotechnology Information • Located on the Bethesda National Institutes of Health campus • Part of the National Library of Medicine (NLM), which is part of the NIH • Created by Congress in 1988 • Home of GenBank since 1992

  6. NCBI Mandates • Develop automated systems for the storage, retrieval, and analysis of molecular, genetic and biochemical information • Develop software for the study of molecule structure and function

  7. NCBI Mandates • Facilitate the use of molecular databases and programs by both researchers and clinicians • Coordinate international cooperation in gathering molecular, genetics and biochemical data

  8. Effective Searchers ... • know the content of the database • subjects, type of data, years of coverage, curated vs. non-curated • understand the structure of the database • record structure, searchable fields, controlled vs non-controlled vocabularies • understand searching options and tools • thesaurus, limits, AND/OR, etc.

  9. Entrez • Search tool on the NCBI website • Contains a variety of databases: • Nucleotide sequence; Protein sequence; Molecular structure; SNPs; Expression data; Journal literature • Each “database” contains “records” • Each “record” in database contains “fields”

  10. Entrez Search Options • Similar among the various databases • Entrez conventions: AND, OR, NOT, * • Three ways to search: • Basic: just enter your search terms • Advanced: more controlled search - uses limits, preview/index, history • Complex Boolean: command language with qualifiers in brackets; • syntax= term [field] AND term [field] etc.

  11. Entrez Differences • Differences among the various databases • Different search fields available • Different limits available • Some controlled, some non-controlled • Some archival, some curated

  12. Two Ways to Get to NCBI • Directly at - http://www.ncbi.nlm.nih.gov/ • Through HSC Library’s webpage: • http://www.library.health.ufl.edu/ • Click on “Databases” icon • Click on “NCBI” icon

  13. www.library.health.ufl.edu Click on “Databases” from HSCL Website

  14. OMIM - Online Mendelian Inheritance in Man • Catalog of human genes and genetic disorders • 21,305 records (as of 1/25/11) • Records are basically “review articles” • Records link to PubMed, sequences, structures, etc. • Built on Entrez architecture • Search tip – look for your disease or gene in “title” field on “Limits” page

  15. Choose OMIM from the dropdown and then click on “search” to reach the OMIM page

  16. We will search for information on “Sipple Syndrome”, but first we limit so that we search only in the title field x Limit so that your terms reside only in the “title”

  17. Type in Sipple Syndrome, then click “Go” Link to discussion of Sipple Syndrome Link to OMIM Gene Map

  18. Table of Contents for Sipple Syndrome record Record was retrieved via these words in title Link to record for the RET Oncogene

  19. Table of Contents for RET gene record

  20. PubMed • Journal literature database • Pre-clinical and clinical information – best literature database to use for Dr. Miyamoto’s project • Approximately 5,200 journals covered; 20,547,557 records (as of 1/25/2011) • Most citations include abstract • Can search via keyword, but has been built to take advantage of controlled vocabulary search

  21. Controlled vs Non-controlled Vocabularies • “Old People” Example

  22. Controlled Vocabulary • Controlled terms act as “umbrella” to pick up all synonyms, spelling differences (hemoglobin/haemoglobin), singular vs plural, etc. • In PubMed, use MeSH Database to find and search controlled MeSH terms (Medical Subject Headings) • Once in MeSH Database, can use additional options to enhance search (major heading, subheadings, etc.)

  23. MeSH Example • Find journal articles on the “immunological aspects of breast cancer and vaccines”; but only those papers where “immunological aspects of breast cancer” is the main point of the articles you find. • Search PubMed

  24. Enter PubMed through our direct link (rather than through NCBI) and you will be able to directly see if the HSCL owns the journal articles you find

  25. The “ufhsclib” indicates that you have entered PubMed correctly, and that the journals the library owns will be apparent Use the MeSH Database as a dictionary to find the appropriate MeSH term, and then to refine your search

  26. Note that we have left PubMed and are in the MeSH “dictionary” You typed “breast cancer” into MeSH database Use “breast neoplasms” rather than breast cancer Click on the link to refine the search

  27. Topical subheadings help focus search to one or more aspects of the subject Check here and your topics will be the main point of the articles you find – you won’t get peripheral citations. Not recommended the first time you search a topic – if there are few papers in existence for your topic, you may be left with no articles at all

  28. Note that the term “Breast Neoplasms” will pick up all the more specific types of breast cancer

  29. Send your search to the search builder MeSH automatically builds the search for you – in this example, you are looking for papers in which the immunological aspects of breast cancer are the main point of all the articles you retrieve Click “Search PubMed”

  30. Once you have sent the search to the search box, and clicked on “search PubMed”, you leave the MeSH Database, and the search is performed in PubMed Note that this is the search the MeSH Database built for you – it used the MesH term “breast neoplasms”, glued “immunology” directly to the search by using the slash, and picked up all the different types of breast neoplasms. MeSH also retrieved only the papers where these topics were the main points of the articles. You did not need to do any of this yourself – MeSH did it for you once you found the proper MeSH term, and clicked on subheading. Now we need to complete the second half of the search – vaccines

  31. Now we need to complete the second half of the search – vaccines. Pull down the drop-down so you are in MeSH again, and search for the MeSH term. Look through the list to see if there is one that is most appropriate. Since we are looking for vaccines related to breast cancer, perhaps “cancer vaccines” would be useful. Read the “scope note” to be sure. Scope Note

  32. As in the breast cancer search, you can choose a subheading and limit to articles where this topic is the main point; I’ve chosen not to do so here. Send to search builder by clicking “Add to search builder”; click “search PubMed” You’ve now found articles on cancer vaccines, but you need to combine the breast cancer and cancer vaccines concepts

  33. Boolean Operators • Search statements may be combined using AND, OR, NOT AND OR NOT

  34. To combine searches, choose “Advanced Search” The Advanced Search screen displays your PubMed history; from here you can combine your two searches using the appropriate Boolean operator For Part B, print the PubMed history, which shows your searches.

  35. You have now found papers in which the immunology of breast cancer is the main point of the article, and those papers are also about cancer vaccines

  36. MeSH etc. • MeSH Database: • Found appropriate search terms • Automatically exploded “breast neoplasms”, so narrower terms (“breast neoplasms, male”, “carcinoma, ductal, breast”, etc) were ORed together • Allowed the addition of subheadings (immunology) to narrow to a particular aspect • Allowed narrowing to “main point” • Use History to combine (AND)

  37. MeSH Caveats • Performing a MeSH search is usually more precise and exhaustive than a keyword search, however: • The most recent papers are not searched - therefore should also complete a keyword search “in process” • Very new concepts/scientific terms may not yet be represented by MeSH • Very specific or rare concepts may never be represented by MeSH • So sometimes you will need to do a keyword search as well

  38. In Process • In our “breast cancer, immunology, cancer vaccine” example, perform the following keyword search, only in the newest records (in process) • ((vaccin*) AND (breast cancer* OR breast neoplasm* OR breast tumor*)) AND in process [sb] • Try as many synonyms as possible • [sb] must be included to tell computer to just search the “in process” part of the database • * truncates to word root • This search picks up the current articles that do not yet have MeSH terms

  39. Link Out to E-journals • Remember, if you entered PubMed directly from the HSCL’s icon, you can see if the HSCL owns the journal articles you found • Choose the “abstract” or “citation” displays from the pulldown menu • Brown and blue icons tell if the HSCL owns that journal issus electronically or in print • Will NOT tell you what is available at Marston Science Library

  40. What if PubMed does not indicate the article is owned at UF? • Use the “Catalog” to see if the paper is available in print at the HSCL, Marston Science Library or elsewhere on campus • The catalog may also be used to help locate books, government documents, videotapes, etc – items that are not indexed in PubMed

  41. www.library.health.ufl.edu Click on “Catalog” from HSCL Website

  42. Entrez Nucleotides (GenBank) • Database of nucleotide sequences (ATGC) • Actually contains data from several databases - GenBank, EMBL, DDBJ, RefSeq • Hard to search because many submitting scientists send in redundant information and poorly annotated information

  43. Nucleotide Data Domain • As of December 15, 2010 • Over 122,082,812,719 bases • Over 129,902,276sequence records • Over 400,000 species represented • Some complete genomes and chromosomes

  44. Organisms Represented • Homo sapiens • Many model organisms, including: • Mus musculus • Caenorhabditis elegans • Oryza sativa • Drosophila melanogaster • Arabidopsis thaliana • Non-model organisms as well (trout, etc.)

  45. International Nucleotide Sequence Database Collaboration • Contributors: • GenBank • European Molecular Biology Laboratory (EMBL) • DNA Databank of Japan (DDBJ) • Daily exchange of data among these groups

  46. GenBank Sample Record • Before searching, we will look at the GenBank sample record • Retrieve the sample record from the main page – click on “DNA & RNA”, then “GenBank”, then choose the “record” link. • Note that the “Features” field provides useful biological information, and may be searched

More Related