430 likes | 614 Views
Other biological databases. Biological systems. Sequence data. Protein folding and 3D structure. Taxonomic data Literature. Pathways and networks. Protein families and domains. Small molecules. Whole genome data. Ontologies -GO. Biological systems. Other Biological Databases.
E N D
Biological systems Sequence data Protein folding and 3D structure Taxonomic data Literature Pathways and networks Protein families and domains Small molecules Whole genome data Ontologies -GO Biological systems
Other Biological Databases • Transcription factor binding sites -TRANSFAC • Protein structure databases- PDB, SCOP, CATH • Protein family databases- Pfam, Prints, PROSITE etc. • Chemicals and small molecules -ChEBI • Gene expression databases –GEO, ArrayExpress • Metabolic pathways - Reactome, KEGG • Genome Databases- Ensembl, FlyBase, WormBase etc. • Human genetics-related databases –HapMap, dbSNP
Transcription factor binding sites • TRANSFAC –database of eukaryotic transcription factors: http://www.gene-regulation.com/pub/databases.html#transfac • TESS –Transcription Element Search System –for predicting transcription factor binding sites, uses TRANSFAC: http://www.cbi.upenn.edu/tess • TFsearch –for searching transcription factor binding sites: http://www.cbrc.jp/research/db/TFSEARCH.html
Protein structure databases • Main resource is Protein Data Bank (PDB): http://www.rcsb.org/pdb/ • Contains the spatial coordinates of macromolecule atoms whose 3D structure has been obtained by X-ray or NMR studies • Proteins represent more than 90% of available structures (others are DNA, RNA, sugars, viruses, protein/DNA complexes…) • Can search by PDB code
Searching MSD http://www.ebi.ac.uk/msd -Search by PDB code
Protein structure-related databases • Structural family databases based on PDB –SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/) and CATH (http://www.biochem.ucl.ac.uk/bsm/cath/) • Predicted structures in SWISS-MODEL (http://swissmodel.expasy.org//SWISS-MODEL.html)
Protein family databases • Databases that produce signatures for identifying protein families or domains • Used for functional classification of proteins • E.g. Pfam, PROSITE, Prints, SMART, TIGRFAMs etc. • Integrated into single resource InterPro (http://www.ebi.ac.uk/interpro)
InterProScan sequence search Stand-alone version available
InterPro text search Search keyword, protein acc or InterPro acc
Chemicals and small molecules • Chemical abstracts- http://www.cas.org/ • ChEBI- http://www.ebi.ac.uk/chebi • KEGG –part of it includes chemicals http://www.genome.jp/kegg • ChemID plus -chemicals cited in NLM databases http://chem2.sis.nlm.nih.gov/chemidplus/chemidlite.jsp • MSD-Chem –ligands and chemicals in MSD
Gene expression databases • NCBI Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/ • ArrayExpress http://www.ncbi.nlm.nih.gov/geo/ • Stanford microarray database http://genome-www5.stanford.edu/ • Can usually search for experiments or particular expression profiles
What does the data look like? • Info on experiment, array used, etc. • Raw or processed tab delimited file containing spots and their intensities cy3/cy5 ratios) across different samples • Files with meta data e.g. sample info, annotation and coordinates of each spot on array
Enzymes and metabolic pathways Contain information describing enzymes, biochemical reactions and metabolic pathways; ENZYME and BRENDA: nomenclature databases that store information on enzyme names and reactions; IntEnz: Integrated relational Enzyme database
Enzyme nomenclature • E.C. (Enzyme Commission) numbers assigned based on reactions they catalyze • Hierarchy, high level groups: • EC 1 –Oxidoreductases • EC 2 –Transferases • EC 3 –Hydrolases • EC 4 –Lyases • EC 5 –Isomerases • EC 6 –Ligases
Metabolic Pathway databases • PATHGUIDE >200 pathways • KEGG (Kyoto encyclopedia of genes and genomes): http://www.genome.jp/kegg -includes: • Database of chemicals, genes and networks (metabolic, regulatory etc.) • Well-curated and quite specific • EcoCyc (Encyclopedia of E. coli K12 genes and metabolism): http://ecocyc.org –curation of entries genome • Reactome –curated biological pathways: http://www.reactome.org/ • GenMAPP –pathways contributed by users
http://www.genome.ad.jp/kegg Different pathway in different species: -> comparison
Protein-protein interaction databases • Protein-protein interaction databases store pairwise interactions or complexes • Can get 1 to more than 20,000 interactions per publication • IntAct http://www.ebi.ac.uk/intact • DIP (Database of Interacting Proteins) http://dip.doe-mbi.ucla.edu/ • BIND (Biomolecular Interaction Network Database) http://submit.bind.ca:8080/bind/
Genome browsers • Integrate sequence & functional data for a genome • Ensembl –genome browser for major eukaryotic genomes, e.g. human, mouse etc. http://www.ensembl.org • UCSC browser -http://genome.ucsc.edu/ • FlyBase –Drosophila genome database: http://www.ebi.ac.uk/flybase • WormBase –C. elegans: http://www.wormbase.org • PlasmoDB –Plasmodium (malaria): http://plasmodb.org • Etc.
Human genetics databases • GeneCards (http://www.genecards.org/) • HapMap (http://hapmap.ncbi.nlm.nih.gov/) • OMIM http://www.ncbi.nlm.nih.gov/omim • HGDP Human Genome Diversity Project (http://hagsc.org/hgdp/files.html)
Mutation/polymorphism databases Most of the databases are disease or gene centric i.e. p53
dbSNP http://www.ncbi.nlm.nih.gov/SNP/ Repository of all known mutation (human and other organisms)
Where to find the databases • Table of addresses for major databases and tools • Nucleic Acids Research Database issue January each year • Nucleic Acids Research Software issue –new • Expasy list of tools: http://ca.expasy.org/links.html
Large scale data retrieval • Programmatic access to many databases • MySQL access to some • BioMart access –public and private • FTP sites –large data downloads
Other tutorials • http://www.ensembl.org/info/website/tutorials/index.html • http://www.ebi.ac.uk/training/online/ • http://www.ebi.ac.uk/2can/home.html