470 likes | 640 Views
Tutorial: Bioinformatics Resources. ( http://pir.georgetown.edu/pirwww/workshop/bioinfo_resource.html ). Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department of
E N D
Tutorial: Bioinformatics Resources (http://pir.georgetown.edu/pirwww/workshop/bioinfo_resource.html) Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center
What is Bioinformatics? computer + mouse = bioinformatics(information) (biology) • NIH Biomedical Information Science and Technology Initiative (BISTI) Working Definition (2000) - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualizesuch data.
--968 key databases of 14 categories Molecular Biology Database Collection (http://nar.oxfordjournals.org/cgi/content/full/35/suppl_1/D3/DC1)
2007 Online Access to Database Collection http://pir.georgetown.edu/pirwww/workshop/2005_database_update.html http://www.oxfordjournals.org/nar/database/cap/
Overview Database Contents, Search and Retrieval • Text search / Information retrieval • Sequence & genomics databases • Protein family databases • Database of protein functions • Databases of protein structures • Proteomics databases
Entrez Text Searches (http://www.ncbi.nlm.nih.gov/Entrez/)
PubMed Literature Database (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=PubMed) Literature mining
iProLINK: Protein Literature Mining Resource Text mining for protein phosphorylation Gene/protein name thesaurus: synonyms, ambiguous names… http://pir.georgetown.edu/iprolink/
BioThesaurus:Gene/protein name searches - synonyms, ambiguous names… Synonyms: CRYAA crystallin, alpha A CRYA1 HSPB4… http://pir.georgetown.edu/iprolink/biothesaurus
RLIMS-P: Text mining for protein phosphorylation http://pir.georgetown.edu/iprolink/rlimsp/
UniProt Text Search (http://www.pir.uniprot.org/cgi-bin/textSearch) Googletype search vs. Booleansearches: AND, OR, NOT
PIR Text Search (I) (http://pir.georgetown.edu/pirwww/search/textsearch.html) Search: alpha crystallin A chain that are in protein families? Search for synonyms
PIR Text Search (II) Search: what crystallins are enzymes and what families they belong to? Can you find which crystallins have 3D structure determined?
I. Sequence & Genomics Databases • GenBank: An annotated collection of all publicly available nucleotide and protein sequences. • RefSeq: NCBI non-redundant set of reference sequences, including genomic DNA, transcript (RNA), and protein products • UniProtConsortium Database: Universal protein resource, a central repository of protein sequence and function. • Entrez Gene: Gene-centered information at NCBI. • UniGene: Unified clusters of ESTs and full-length mRNA sequences . • OMIM: Online Mendelian inheritance in man: a catalog of human genetic and genomic disorders. • Model Organism Genome Databases: MGD, RGD, SGD, Flybase… • GeneCards: Integrated database of human genes, maps, proteins and diseases. • SNP Consortium Database; International HapMapProject: Genes associated with human disease (http://www.oxfordjournals.org/nar/database/cap/)
4.1 million UniProt Consortium Databases Universal Protein Resource (http://www.uniprot.org) UniProtKB UniRef UniParc
UniProt Sequence Report (I) UniProtKB What’s the difference between CRYAA_RABIT & CYRBAA? (http://www.pir.uniprot.org/cgi-bin/unipEntry?id=CRYAA_RABIT)
UniProt Report (II): UniRef100 & 90 UniRef100 (http://www.pir.uniprot.org/cgi-bin/unipEntry?id=UniRef100_P02489) UniRef90 (http://www.pir.uniprot.org/cgi-bin/unipEntry?id=UniRef90_P02489)
Entrez Gene – Gene centric information http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&dopt=Graphics&list_uids=12954#ubor0_RefSeq
OMIM:Online Mendelian inheritance in man (http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=123580)
II. Protein Family Databases • Whole Proteins • PIRSF: Network Classification Based on Evolutionary Relationship of Whole Protein • COG (Clusters of Orthologous Groups) of Complete Genomes • PANTHER: Proteins Classified into Families/Subfamilies of Shared Function • ProtoNet: Automated Hierarchical Classification of Proteins • Protein Domains • Pfam: Alignments and HMM Models of Protein Domains • SMART: Protein Domain Families • CDD: Conserved Domain Database • Protein Motifs • PROSITE: Protein Patterns and Profiles • BLOCKS: Protein Sequence Motifs and Alignments • PRINTS: Compendium of Protein Fingerprints (a group of conserved motifs) • Integrated Family Databases • InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom, SMART, PIRSF, SuperFamily…
Protein Clustering Initial version COGs:(http://www.ncbi.nlm.nih.gov/COG/) New version: Includes Eukaryotic Clusters - KOGs
PIRSF: Full Length ClassificationiProClass Family Report (http://pir.georgetown.edu/cgi-bin/ipcSF?id=SF002280)
Domain Classification – Pfam Domain (http://www.sanger.ac.uk/cgi-bin/Pfam/swisspfamget.pl?name=CRYAA_RABIT) (http://pir.georgetown.edu/cgi-bin/ipcEntry?id=P02493)
Pfam Domain (http://www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF00525)
Protein Motifs: PROSITE –A database of protein families and domains. It consists of biologically significant sites, patterns and profiles. (http://us.expasy.org/prosite/)
Integrated Family Classification InterPro: An integrated resource unifying PROSITE, PRINTS, ProDom, Pfam, SMART, and TIGRFAMs, PIRSF. (http://www.ebi.ac.uk/interpro/search.html) Mapping of families
III. Databases of Protein Functions • Metabolic Pathways, Enzymes, and Compounds • Enzyme Classification: Classification and Nomenclature of Enzyme-Catalysed Reactions (EC-IUBMB) • KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways • LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes • EcoCyc: Encyclopedia of E. coli Genes and Metabolism • MetaCyc: Metabolic Encyclopedia (Metabolic Pathways) • BRENDA: Enzyme Database • UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways • Inter-Molecular interactions and Regulatory Pathways • IntAct: Protein interaction data from literature and user submission • BIND: Descriptions of interactions, molecular complexes and pathways • DIP: Catalogs experimentally determined interactions between proteins • Reactome - A curated knowledgebase of biological pathways • BioCarta: Biological pathways of human and mouse • GO: Gene Ontology Consortium Database • Pathway Resources - Pathguide
Biological Pathway Resource Collection http://www.pathguide.org/ • Protein-protein interactions • Metabolic pathways • Signaling pathways • Pathway diagrams • Transcription factors / gene regulatory networks • Protein-compound interactions • Genetic interaction networks
KEGG Metabolic & Regulatory Pathways • KEGG is a suite of databases and associated software, integrating our current knowledge • on molecular interaction networks, the information of genes and proteins, and of chemical • compounds and reactions. (http://www.genome.ad.jp/kegg/kegg2.html) (http://www.genome.ad.jp/dbget-bin/show_pathway?hsa00220+4.3.2.1)
BioCyc: EcoCyc/MetaCyc Metabolic Pathways • The BioCyc Knowledge Library is a collection of Pathway/Genome Databases (http://biocyc.org/)
BioCarta Cellular Pathways (http://www.biocarta.com/index.asp)
Reactome:http://www.reactome.org/ • Collaboration of CSHL, EBI and GO Consortium • Curated resource of core pathways and reactions in human biology • Authored by biological researchers of field experts • Cross-referenced with NCBI, Ensembl and UniProt, HapMap, KEGG… • Inferred orthologous events in 22 non-human species (mouse, rat…)
Transforming Growth Factor (TGF) beta signaling [Homo sapiens] (http://reactome.org/cgi-bin/eventbrowser?DB=gk_current&FOCUS_SPECIES=Homo%20sapiens&ID=170834&) Reactome: events and objects (including modified forms and complex) Event ->REACT_6879.1: Activated type I receptor phosphorylates R-SMAD directly [Homo sapiens] Object -> REACT_7364.1: Phospho-R-SMAD [cytosol] Event -> REACT_6760.1: Phospho-R-SMAD forms a complex with CO-SMAD [Homo sapiens] Object -> REACT_7344.1: Phospho-R-SMAD:CO-SMAD complex [cytosol] Event -> REACT_6726.1: The phospho-R-SMAD:CO-SMAD transfers to the nucleus Object -> REACT_7382.2: Phospho-R-SMAD:CO-SMAD complex [nucleoplasm] ……
Protein-Protein Interaction Database - IntAct (http://www.ebi.ac.uk/intact/)
Gene Ontology (GO) (http://www.geneontology.org/) - Molecular Function - Biological Process - Cellular Component
IV. Databases of Protein Structures • Protein Structure • PDB: Structure Determined by X-ray Crystallography and NMR • PDBsum: Summaries and analyses of PDB structures • MMDB: NCBI’s database of 3D structures, part of NCBI Entrez • SWISS-MODEL Repository: Database of annotated protein 3D models • ModBase: Annotated comparative protein structure models • Structure Classification • CATH: Hierarchical Classification of Protein Domain Structures • SCOP: Familial and Structural Protein Relationships • FSSP: Protein Fold Classification Based on Structure--Structure Alignment
PDB: Experimental 3D Structure Repository Rat gamma-crystallin (chain A, B.) Can you do a text search at PIR to find this (CRGE_RAT)? (http://www.rcsb.org/pdb/)
PDBsum: Pictorial Database to Provide Summary and Analysis to PDB Entries Search 3-D structure summary 2-D structure (http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/)
Protein Structural Classification (1) CATH: Hierarchical domain classification of protein structures (http://www.cathdb.info/latest/index.html)
Protein Structural Classification (2) SCOP:comprehensive description of structural and evolutionary relationships between all proteins whose structure is known. (http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.html)
SWISS-MODEL Repository A database of annotated three-dimensional comparative protein structure models(http://swissmodel.expasy.org/repository/smr.php?sptr_ac=CRGE_RAT&job=2)
VI. Proteomic Resources • GELBANK (http://gelbank.anl.gov): 2D-gel patterns of species with completed genomes. • SWISS-2DPAGE (http://www.expasy.org/ch2d/): index of 2D-gels • PEP (http://cubic.bioc.columbia.edu/ pep/): Predictions for Entire Proteomes: summarized analyses of protein sequences • Integr8 (http://www.ebi.ac.uk/integr8/): A browser for information relating to completed genomes and proteomes, based on data contained in Genome Reviews and the UniProt proteome sets • PRIDE (http://www.ebi.ac.uk/pride/): PRoteomics IDEntifications database Expression Profiling databases • GPMdb (http://gpmdb.thegpm.org/): Mass Spec Proteomics Databases
2D-Gel Image Databases (http://us.expasy.org/ch2d/) Part of WORLD-2DPAGE: index to 2-D PAGE databases and services (http://us.expasy.org/swiss-2dpage/ac=P02489)
GPMdb: MS Data Search (http://gpmdb.thegpm.org/) Craig, et al., J Proteome Res. 2004, 3:1234-42.
PRIDE: centralized, standards compliant, public data repository for proteomics data http://www.ebi.ac.uk/pride/ HUPO Plasma Proteome Project
Lab: • Text search / Information retrieval • Literature search and text mining • Finding synonyms (BioThesaurus) • Information extraction (e.g., protein phosphorylation sites) • Find the sequence for the rabbit alpha crystallin A chain • Find all alpha crystallin A chain classified in protein families • Search crystallins that have active enzyme activities • Find crystallins that have determined 3D structures • Database contents (reports) • Sequence & genomics databases (UniProt) • Protein family databases (PIRSF) • Database of protein functions (KEGG) • Databases of protein structures (PDB) • Proteomics databases (Swiss-2D) • Protein Examples • Rabbit alpha crystallin A (UniProtKB: CRYAA_RABIT/P02493) • Delta crystallin II (Argininosuccinate lyase) (UniProtKB: ARLY2_ANAPL/P24058) • Any additional proteins of your interest for search and retrieval