1 / 29

Databases & Applications

Databases & Applications. Jack da Silva, PhD Bioinformatics Specialist NCSC. Overview. Molecular Biology Databases Bioinformatics Applications User Interfaces Research & Development Summary. Molecular Biology Databases. Public Domain NC Initiatives Rest of the World Commercial

sbattle
Download Presentation

Databases & Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Databases & Applications Jack da Silva, PhD Bioinformatics Specialist NCSC NC BioGrid

  2. Overview • Molecular Biology Databases • Bioinformatics Applications • User Interfaces • Research & Development • Summary NC BioGrid

  3. Molecular Biology Databases • Public Domain • NC Initiatives • Rest of the World • Commercial • NC BioGrid Database Service NC BioGrid

  4. NCSC Public-Domain Databases • High-Performance Bioinformatics Initiative • Major sequence repositories • GenBank, EMBL, DDBJ, etc. • Formatted for GCG & BLAST • ExPASy (Expert Protein Analysis System) Mirror Site • Peptide databases & associated tools • SWISS-PROT Knowledgebase NC BioGrid

  5. Specialized Public-Domain Databases & NC Initiatives • Value-added • Highly annotated (e.g., interactions) • Organism specific (e.g., human) • Molecule specific (e.g., protein) • Data specific (e.g., gene expression) • North Carolina Initiatives • Please come forward NC BioGrid

  6. Commercial Databases • Celera Genomics • Assembled & annotated human & mouse genome databases + • DoubleTwist • Assembled & annotated human genome database • LabBook • OSU Annotated Human Genome Database • Free to Academia • Incyte Genomics • Human transcript database + NC BioGrid

  7. Major Seq. Repositories (7) Comparative Genomics (7) Gene Expression (19) Gene ID & Structure (31) Genetic & Physical Maps (9) Genomic (49) Intermolecular Interactions (5) Metabolic Pathways & Cellular Regulation (12) Mutation (34) Pathology (8) Protein (51) Protein Sequence Motifs (18) Proteome Resources (8) Retrieval Systems & DB Structure (3) RNA Sequences (26) Structure (32) Transgenics (2) Varied Biomedical (18) Molecular Biology Databases Around the World (335) Baxevanis, A.D. 2002. Nucleic Acids Research 30: 1-12. NC BioGrid

  8. NC BioGrid

  9. NC BioGrid Database Service • Establish service • Housing & updating data • Public-domain & commercial • Virtual data federation • Collaborative effort • High band-width network environment (NCREN) NC BioGrid

  10. Federated Databases • Provide uniform access/view of heterogeneous databases • IBM DiscoveryLink • “Provides a single-format virtual database view of multiple heterogeneous data sources” • Lion bioscience SRS • “The power of SRS lies in its ability to effectively integrate heterogeneous data sources behind a single interface and integration framework.” • Data standards development (e.g., XML) NC BioGrid

  11. I3C Workflow Demo Interoperable Informatics Infrastructure Consortium Demo uses XML-in, XML-out paradigm NC BioGrid

  12. Bioinformatics Applications • Grid-Unaware • Grid-Aware • NC BioGrid Application Service NC BioGrid

  13. Grid-Unaware Applications • Any application can run on a grid server • NCSC High-Performance Bioinformatics Apps • Public-domain apps on other NC servers • Commercial apps on NC servers NC BioGrid

  14. NCSC Applications • High-Performance Bioinformatics • Parallel applications optimized for parallel supercomputers • Accelrys GCG Wisconsin Package (commercial) • BLAST & HT-BLAST • Parallel Clustal & HT Clustal • Parallel Molecular Systematics Apps • ExPASy tools • High-performance molecular modeling packages (commercial) NC BioGrid

  15. Public & Commercial Apps on NC Servers • Any public-domain application • Open source, “Freeware” • Commercial apps will vary in licensing from restrictive to relatively unrestrictive • Please come forward with suggestions NC BioGrid

  16. FEATURE • “Grid-unaware”, public-domain application from the Rus Altman Lab, Stanford • Identifies functional or structural sites of interest in a protein • FEATURE is serial! • Multiple instances run concurrently on NPACI-net LEGION grid test bed • Scanned entire PDB (10,911 structures) in ~10 hrs (177 hrs or 1 wk sequentially) NC BioGrid

  17. FEATURE Analysis NC BioGrid

  18. FEATURE & the Grid • Compiled FEATURE code on LEGION for Intel Linux, DEC Alpha Linux, & Sun Solaris • Registered binaries into “LEGION space” • Provided file specifying where to find input and deposit output • Used legion_run_multi command to spawn multiple instances of FEATURE (np = 50) across nodes, each scanning a single file from the PDB NC BioGrid

  19. Grid-Aware Applications • Not many – production grids don’t exist • TurboBLAST (TurboGenomics) • Commercial • Not marketed specifically to grids • Distributes BLAST search over heterogeneous network of computers NC BioGrid

  20. TurboBLAST NC BioGrid

  21. TurboBLAST NC BioGrid

  22. NC BioGrid Application Service • Establish service • Housing & updating binaries, source code, documentation • Public-domain & commercial • Collaborative effort • High band-width network environment (NCREN) • Cross-referenced to databases (NCBDS) NC BioGrid

  23. One View of the User’s Views • Database centric • Cross-referenced to appropriate applications • Application centric • Cross-referenced to appropriate databases • Analysis centric • References appropriate databases & applications • Suggests workflows NC BioGrid

  24. User Interfaces to the BioGrid • Single sign-on • Simple, graphical • Allow user to “see” everything on grid • Give the impression that resources are on user’s desktop NC BioGrid

  25. UNICORE Grid Technology NC BioGrid

  26. European DataGrid Simulator NC BioGrid

  27. Vanet LEGION Grid Test Bed (US Nodes) NC BioGrid

  28. Research & Development Opportunities • Uniform access/view of data • “Gridize” applications • Database & application services • User interface development • Collaboration required • span academic-commercial boundary NC BioGrid

  29. Summary • The NC BioGrid aims to provide easy, high band-width access to: • databases • applications • Opportunities for collaborative R&D NC BioGrid

More Related