1 / 43

Other biological databases

Other biological databases. Biological systems. Sequence data. Protein folding and 3D structure. Taxonomic data Literature. Pathways and networks. Protein families and domains. Small molecules. Whole genome data. Ontologies -GO. Biological systems. Other Biological Databases.

ownah
Download Presentation

Other biological databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Other biological databases

  2. Biological systems Sequence data Protein folding and 3D structure Taxonomic data Literature Pathways and networks Protein families and domains Small molecules Whole genome data Ontologies -GO Biological systems

  3. Other Biological Databases • Transcription factor binding sites -TRANSFAC • Protein structure databases- PDB, SCOP, CATH • Protein family databases- Pfam, Prints, PROSITE etc. • Chemicals and small molecules -ChEBI • Gene expression databases –GEO, ArrayExpress • Metabolic pathways - Reactome, KEGG • Genome Databases- Ensembl, FlyBase, WormBase etc. • Human genetics-related databases –HapMap, dbSNP

  4. Transcription factor binding sites • TRANSFAC –database of eukaryotic transcription factors: http://www.gene-regulation.com/pub/databases.html#transfac • TESS –Transcription Element Search System –for predicting transcription factor binding sites, uses TRANSFAC: http://www.cbi.upenn.edu/tess • TFsearch –for searching transcription factor binding sites: http://www.cbrc.jp/research/db/TFSEARCH.html

  5. Protein structure databases • Main resource is Protein Data Bank (PDB): http://www.rcsb.org/pdb/ • Contains the spatial coordinates of macromolecule atoms whose 3D structure has been obtained by X-ray or NMR studies • Proteins represent more than 90% of available structures (others are DNA, RNA, sugars, viruses, protein/DNA complexes…) • Can search by PDB code

  6. Searching MSD http://www.ebi.ac.uk/msd -Search by PDB code

  7. Protein structure-related databases • Structural family databases based on PDB –SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/) and CATH (http://www.biochem.ucl.ac.uk/bsm/cath/) • Predicted structures in SWISS-MODEL (http://swissmodel.expasy.org//SWISS-MODEL.html)

  8. Protein family databases • Databases that produce signatures for identifying protein families or domains • Used for functional classification of proteins • E.g. Pfam, PROSITE, Prints, SMART, TIGRFAMs etc. • Integrated into single resource InterPro (http://www.ebi.ac.uk/interpro)

  9. InterProScan sequence search Stand-alone version available

  10. InterPro text search Search keyword, protein acc or InterPro acc

  11. Results for protein acc

  12. Example InterPro entry

  13. Chemicals and small molecules • Chemical abstracts- http://www.cas.org/ • ChEBI- http://www.ebi.ac.uk/chebi • KEGG –part of it includes chemicals http://www.genome.jp/kegg • ChemID plus -chemicals cited in NLM databases http://chem2.sis.nlm.nih.gov/chemidplus/chemidlite.jsp • MSD-Chem –ligands and chemicals in MSD

  14. CheBI example entry

  15. Hierarchy for chemicals

  16. Gene expression databases • NCBI Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/ • ArrayExpress http://www.ncbi.nlm.nih.gov/geo/ • Stanford microarray database http://genome-www5.stanford.edu/ • Can usually search for experiments or particular expression profiles

  17. GEO search page

  18. Profiles search results

  19. Specific entry and experiment info

  20. ArrayExpress search results

  21. What does the data look like? • Info on experiment, array used, etc. • Raw or processed tab delimited file containing spots and their intensities cy3/cy5 ratios) across different samples • Files with meta data e.g. sample info, annotation and coordinates of each spot on array

  22. Proteomics: SWISS-2DPAGE

  23. Enzymes and metabolic pathways Contain information describing enzymes, biochemical reactions and metabolic pathways; ENZYME and BRENDA: nomenclature databases that store information on enzyme names and reactions; IntEnz: Integrated relational Enzyme database

  24. Enzyme nomenclature • E.C. (Enzyme Commission) numbers assigned based on reactions they catalyze • Hierarchy, high level groups: • EC 1 –Oxidoreductases • EC 2 –Transferases • EC 3 –Hydrolases • EC 4 –Lyases • EC 5 –Isomerases • EC 6 –Ligases

  25. EC example

  26. Metabolic Pathway databases • PATHGUIDE >200 pathways • KEGG (Kyoto encyclopedia of genes and genomes): http://www.genome.jp/kegg -includes: • Database of chemicals, genes and networks (metabolic, regulatory etc.) • Well-curated and quite specific • EcoCyc (Encyclopedia of E. coli K12 genes and metabolism): http://ecocyc.org –curation of entries genome • Reactome –curated biological pathways: http://www.reactome.org/ • GenMAPP –pathways contributed by users

  27. http://www.genome.ad.jp/kegg Different pathway in different species: -> comparison

  28. Pathway in Reactome

  29. Example of a pathway in BioCyc

  30. Protein-protein interaction databases • Protein-protein interaction databases store pairwise interactions or complexes • Can get 1 to more than 20,000 interactions per publication • IntAct http://www.ebi.ac.uk/intact • DIP (Database of Interacting Proteins) http://dip.doe-mbi.ucla.edu/ • BIND (Biomolecular Interaction Network Database) http://submit.bind.ca:8080/bind/

  31. Protein-protein interactions in IntAct

  32. Integrated functional interactions in STRING

  33. Genome browsers • Integrate sequence & functional data for a genome • Ensembl –genome browser for major eukaryotic genomes, e.g. human, mouse etc. http://www.ensembl.org • UCSC browser -http://genome.ucsc.edu/ • FlyBase –Drosophila genome database: http://www.ebi.ac.uk/flybase • WormBase –C. elegans: http://www.wormbase.org • PlasmoDB –Plasmodium (malaria): http://plasmodb.org • Etc.

  34. Ensembl genome browser

  35. Ensembl gene view 1

  36. Ensembl gene view 2

  37. Gene within context on chromosome

  38. Human genetics databases • GeneCards (http://www.genecards.org/) • HapMap (http://hapmap.ncbi.nlm.nih.gov/) • OMIM http://www.ncbi.nlm.nih.gov/omim • HGDP Human Genome Diversity Project (http://hagsc.org/hgdp/files.html)

  39. Mutation/polymorphism databases Most of the databases are disease or gene centric i.e. p53

  40. dbSNP http://www.ncbi.nlm.nih.gov/SNP/ Repository of all known mutation (human and other organisms)

  41. Where to find the databases • Table of addresses for major databases and tools • Nucleic Acids Research Database issue January each year • Nucleic Acids Research Software issue –new • Expasy list of tools: http://ca.expasy.org/links.html

  42. Large scale data retrieval • Programmatic access to many databases • MySQL access to some • BioMart access –public and private • FTP sites –large data downloads

  43. Other tutorials • http://www.ensembl.org/info/website/tutorials/index.html • http://www.ebi.ac.uk/training/online/ • http://www.ebi.ac.uk/2can/home.html

More Related