1 / 79

Informatics for Molecular Biologists

A comprehensive guide to searching tools, internet resources, and molecular biology databases for molecular biologists and researchers. Learn about PubMed, NCBI, genome browsers, structure visualization tools, and more.

dnowicki
Download Presentation

Informatics for Molecular Biologists

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System University of Pittsburgh

  2. Molecular Biology Information Service Falk Library of Health Sciences Health Sciences Library System University of Pittsburgh 200 Scaife Hall Desoto and Terrace Streets Pittsburgh, PA 15261

  3. Topics • Searching tools • Internet • PubMed • NCBI developed bioinformatics tools • Entrez Gene • Structure visualization tools • Cn3D • Genome Browsers • UCSC genome browsers • NCBI Map viewer

  4. Information search space • Biomedical literature databases • Molecular databases • Organism whole genome sequences

  5. Literature database • NCBI PubMed • contains over 15 million citations dating back to the mid-1950's. Search: “apoptosis”: 130,476 “breast cancer”: 160,055 “p53”: 42,418

  6. Molecular databases

  7. Organisms whole genome sequences http://www.genomesonline.org/

  8. Internet for Biologists • Google Vs Clusty • Google: Chronological list of search results • Clusty: Search results categorized into topical clusters Vivísimo's clustering technology creates topical categories on-the-fly from the search results, using terms in the title, snippet, and any other available textual description in the search results themselves

  9. Google Vs Clusty • Search Example: Pittsburgh • Google • Clusty

  10. Clusty Clusters help you see your search results by topic, so you can zero in on exactly what you’re looking for or discover unexpected relationships between items.

  11. Search examples for Clusty • SNP • BLAST • Lupus

  12. Web 2.0 • Website bookmark and tagging tool • Del.icio.us a social bookmarking web service for storing, sharing, and discovering web bookmarks.

  13. Web 2.0 • Connotea; http://www.connotea.org/

  14. Medline searching tool • PubMed vs ClusterMed Search example: macular degeneration, cell cycle, p53

  15. Molecular databases • DNA Sequence Databases and Analysis Tools   • Enzymes and Pathways   • Gene Mutations, Genetic Variations and Diseases   • Genomics Databases and Analysis Tools   • Immunological Databases and Tools   • Microarray, SAGE, and other Gene Expression   • Organelle Databases   • Other Databases and Tools (Literature Mining, Lab Protocols, Medical Topics, and others)   • Plant Databases   • Protein Sequence Databases and Analysis Tools   • Proteomics Resources • RNA Databases and Analysis Tools   • Structure Databases and Analysis Tools  

  16. HSLS OBRC • http://www.hsls.pitt.edu/guides/genetics/obrc/

  17. Types of databases • By level of curation: • Archival • GenBank, GenPept, ssSNP • Curated • Refseq, SwissProt, RefSNP

  18. Types of databases • Archival data • repository of information • redundant; might have many sequence records for the same gene, each from a different lab • submitters maintain editorial control over their records: what goes in is what comes out • no controlled vocabulary • variation in annotation of biological features Example: GenBank record

  19. GenBank • archival database of nucleotide sequences from >130,000 organisms • records annotated with coding region (CDS) features also include amino acid translations • each record represents the work of a single lab • redundant; can have many sequence records for a single gene

  20. International Nucleotide Sequence Database Collaboration

  21. Types of databases

  22. Refseq • Curated data • non-redundant; one record for each gene, or each splice variant • each record is intended to present an encapsulation of the current understanding of a gene or protein, similar to a review article • records contain value-added information that have been added by an expert(s)

  23. Refseq • Database of reference sequences • Curated • Non-redundant; one record for each gene, or each splice variant, from each organism represented • A representative GenBank record is used as the source for a RefSeq record • Value-added information is added by an expert(s) • Each record is intended to present an encapsulation of the current understanding of a gene or protein, similar to a review article • Variety of accession number prefixes (NM_ , NP_ , etc.) and status codes (provisional, reviewed, etc.). More about those in later slides. • RefSeq database includes genomic DNA, mRNA, and protein sequences, so organizes information according to the model of the central dogma of biology

  24. RefSeq

  25. Searching GenBank • Find messenger RNA sequence for Human epidermal growth factor (EGF) gene.

  26. Databases developers • NCBI • EBI

  27. PubMed abstracts Taxonomy Genomes Nucleotide sequences Neighbors and Hard Links Word weight 3-D Structure 3 -D Structure Phylogeny VAST Protein sequences BLAST BLAST Source NCBI

  28. NCBI Tools

  29. Entrez Gene NCBI’s database for gene centric information focuses on organisms genome • completely sequenced • an active research community to contribute gene-specific information • scheduled for intense sequence analysis • Total Taxa: 4246; Total Genes: 284,3587 • 160,000 organisms in the nucleotide sequence database (Genbank)

  30. Entrez gene • each record represents a single gene from a given organism Gene record includes: • a unique identifier or GeneID assigned by NCBI • a preferred symbol • and any one or more of: • sequence information • map information • official nomenclature from an authority list • alternate gene symbols • summary of gene/protein function • published references that provide additional information on function • expression • homology data • and more

  31. Gene / Protein Exon-Intron Structure Chromosomal Localization mRNA Sequence Genomic Sequence Homologous Sequences SNP Expression Profile Amino acid Sequence 3D Structure Interacting Partners Disease

  32. Searching Entrez Gene

  33. Entrez gene Find: • gene symbols and aliases • sequences: genomic, mRNA, protein • intron-exon architecture • genomic context: neighboring and antisense genes • Interacting partners • associated gene ontology terms: function, cellular component and biological process

  34. Entrez Gene record Query: BRCA1 • Search Tips: • Query text box: BRCA1 • Limits: • To limit your search to a specific field, select: “Gene name” from drop-down menu • Limit by taxonomy: select “Homo sapiens” Name and aliases Chromosomal location

  35. Sourse: NCBI

  36. Entrez Gene: sequences and genomic context mRNA Seq Genomic Seq Sequences: mRNA, Genomic, Protein ProteinSeq

  37. Transcription and alternative splicing Alternative splicing: http://www.exonhit.com/UserFiles/Image/epissage.swf?PHPSESSID=d9u8tiu2sioqa8u29bkop3l0l2

  38. Entrez Gene: intron-exon architectures Tips: Change Display to “Gene Table” from “Summary”

  39. mRNA Seq Genomic Seq ProteinSeq

  40. Gene Ontology • Controlled vocabulary tagging • Function • Biological Processes • Cellular Component

  41. Entrez Gene : Gene Ontology

  42. Homologous sequences

  43. Entrez Gene: Homologous sequence Tips: change Display settings from" summary” to “Alignment score” to “Multiple Alignment”

  44. Single nucleotide polymorphisms Single nucleotide polymorphisms (SNP) are DNA sequence variations that occur when a single nucleotide (A,T,C,or G) in the genome sequence is altered. For example a SNP might change the DNA sequence AAGGCTAA to ATGGCTAA

  45. SNPs

  46. Coding SNPs

More Related