1 / 28

Biological Databases Pharmamatrix Workshop 2010 Philip Winter Ishwar V. Hosamani

This workshop provides an overview of various biological databases, including sequence databases, structural databases, and nucleotide databases. It covers topics such as sequence submission, data retrieval, analysis tools, and cross-referencing with other databases. The workshop also explores databases like NCBI, GenBank, UniProt, Pfam, and PDB, explaining their features, search capabilities, and accession formats.

paniz
Download Presentation

Biological Databases Pharmamatrix Workshop 2010 Philip Winter Ishwar V. Hosamani

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biological Databases Pharmamatrix Workshop 2010 • Philip Winter • Ishwar V. Hosamani

  2. Some databases in the field of molecular biology… AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb, ARR, AsDb,BBDB, BCGD,Beanref,Biolmage, BioMagResBank, BIOMDB, BLOCKS, BovGBASE, BOVMAP, BSORF, BTKbase, CANSITE, CarbBank, CARBHYD, CATH, CAZY, CCDC, CD4OLbase, CGAP, ChickGBASE, Colibri, COPE, CottonDB, CSNDB, CUTG, CyanoBase, dbCFC, dbEST, dbSTS, DDBJ, DGP, DictyDb, Picty_cDB, DIP, DOGS, DOMO, DPD, DPlnteract, ECDC, ECGC, EC02DBASE, EcoCyc, EcoGene, EMBL, EMD db, ENZYME, EPD, EpoDB, ESTHER, FlyBase, FlyView, GCRDB, GDB, GENATLAS, Genbank, GeneCards, Genline, GenLink, GENOTK, GenProtEC, GIFTS, GPCRDB, GRAP, GRBase, gRNAsdb, GRR, GSDB, HAEMB, HAMSTERS, HEART-2DPAGE, HEXAdb, HGMD, HIDB, HIDC, HlVdb, HotMolecBase, HOVERGEN, HPDB, HSC-2DPAGE, ICN, ICTVDB, IL2RGbase, IMGT, Kabat, KDNA, KEGG, Klotho, LGIC, MAD, MaizeDb, MDB, Medline, Mendel, MEROPS, MGDB, MGI, MHCPEP5 Micado, MitoDat, MITOMAP, MJDB, MmtDB, Mol-R-Us, MPDB, MRR, MutBase, MycDB, NDB, NRSub, 0-lycBase, OMIA, OMIM, OPD, ORDB, OWL, PAHdb, PatBase, PDB, PDD, Pfam, PhosphoBase, PigBASE, PIR, PKR, PMD, PPDB, PRESAGE, PRINTS, ProDom, Prolysis, PROSITE, PROTOMAP, RatMAP, RDP, REBASE, RGP, SBASE, SCOP, SeqAnaiRef, SGD, SGP, SheepMap, Soybase, SPAD, SRNA db, SRPDB, STACK, StyGene,Sub2D, SubtiList, SWISS-2DPAGE, SWISS-3DIMAGE, SWISS- MODEL Repository, SWISS-PROT, TelDB, TGN, tmRDB, TOPS, TRANSFAC, TRR, UniGene, URNADB, V BASE, VDRR, VectorDB, WDCM, WIT, WormPep, YEPD, YPD, YPM, etc .................. !!!!

  3. What we expect from a database..!! • Sequence, functional, structural information, related bibliography • Well Structured and Indexed • Well cross-referenced (with other databases) • Periodically updated • Tools for analysis and visualization

  4. Biological Databases • Sequence databases • Structure databases

  5. Nucleotide databases • Protein databases Sequence databases

  6. Sequence databases

  7. Nucleotide databases • International Nucleotide Sequence Database Collaboration (INSDC) • NCBI • EMBL • DDBJ

  8. Standard contents of a sequence database • Sequences • Accession number • References • Taxonomic data • Annotation/curation • Keywords • Cross-references • Documentation

  9. NCBI • Very comprehensive biological database • GENBANK: The nucleotide sequence database • Provides 42 different resource • Provides a simple and easy to use web interface http://www.ncbi.nlm.nih.gov/

  10. Sequence submission: done using Bankit or Sequin • Search Engine for data retrieval: Entrez • Retrieves information across all the resources under NCBI Example: PubMed, taxonomy, SNP, PubChem etc.

  11. Tools for analysis • BLAST • Primer-BLAST • B-Link • ORF finder • Genome workbench

  12. Protein Sequence databases • UniProt • PFAM • Gene Index project

  13. UniProt • Universal Protein Resource • Formed through the merger of : • SIB • EBI-SwissProt • TrEMBL • PIR-PSD

  14. Entry names are often the names of the gene followed by the species. • Accession numbers are of the following format: • e.g. P26367 (PAX6_HUMAN)

  15. Uniprot features Blast Align Retrieve ID mapping

  16. Pfam • Proteins contain conserved regions • Based on the conserved regions, proteins are classified into families • Provides links to external databases like PDB, SCOP, CATH etc.

  17. Pfam: Features • Sequence search • View Pfam family • View a clan • View a sequence • View a structure • Keyword search

  18. Gene Indices • Project aimed at indexing genes and their variants in the various genome sequences. • Creating a catalogue of genes in a wide range of organisms • Reduce redundancy

  19. Gene Indices Software Tools • TGI Clustering tools • Clview • SeqClean • Cdbfasta/cdbyank

  20. Structural databases

  21. PDB – Protein Data Bank • CATH • SCOP – Structural Classification of Proteins

  22. wwPDB • Contains information about experimentally determined structures of proteins, nucleic acids, and complex assemblies • RCSB-PDB, PDBe, PDBj, BMRB – repositories of protein structure data • Files in PDB, mmCIF, PDBML/XML formats

  23. Advanced search – provides comprehensive information about a protein. • Sequence info, domain info, sequence similarity, literature, apart from the details of the structure. • Cross referenced to SCOP and CATH

  24. CATH • Classification of proteins based on domain structures • Each protein chopped into individual domains and assigned into homologous superfamilies. • Hierarchial domain classification of PDB entries.

  25. CATH hierarchy • Class –derived from secondary structure content is assigned automatically • Architecture – describes gross orientation of secondary structures, independent of connectivity • Topology – clusters structures according to their topological connections and numbers of secondary structures • Homologous superfamily – this level groups together protein domains which are thought to share a common ancestor and can therefore be described as homologous

  26. SCOP • Description of structural and evolutionary relationships between all the proteins with known structures • Uses the PDB entries • Search using keywords or PDB identifiers

  27. Hierarchy in SCOP • Class • Fold • Superfamily • Family • Species

  28. Thank you

More Related