1 / 20

(near term) Develop Database Requirements to Yield Schema and Interfaces

(near term) Develop Database Requirements to Yield Schema and Interfaces MoBIoS: Database Management for Data in Metric Spaces Daniel P. Miranker Univ. of Texas. What we know for sure: Exploit Commodity Architecture. External Data/DB Sources. Web App Server. Curating New Content.

damara
Download Presentation

(near term) Develop Database Requirements to Yield Schema and Interfaces

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. (near term) Develop Database Requirements to Yield Schema and Interfaces • MoBIoS: Database Management for Data in Metric Spaces • Daniel P. Miranker • Univ. of Texas

  2. What we know for sure: Exploit Commodity Architecture External Data/DB Sources Web App Server Curating New Content Computing Grid DB Users

  3. Repository Schema and Interface Definitions Issue: • Database organization and data interchange should be addressed simultaneously • Once established, difficult to change  Best to get this right the first time.

  4. What we know for sure: 1. Data transfer XML & Nexus files 2. Curate: (manage quality) Web App Server Curating New Content Computing Grid DB Schema Users Both 1 & 2 impact schema, (data provenance)

  5. XML and Bioinformatics • Taxonomic Markup Language (TML) • PhyloML • BEAST: Bayesian Evolutionary Analysis Sampling Trees • AGAVE: Architecture for Genomic Annoation Visualization and Exchange §

  6. Answers Start with a Requirements Analysis • Who • What • Why • How “Use cases”: specific examples of what is to be accomplish

  7. A Head Start Requirements of Phylogenetic Databases(with Nakhleh, Barbancon Piel & Donoghue)[BIBE ’03] • Did a requirements analysis • Proof of concept for a correctly normalized database schema 1 evolutionary (tree)-edge = 1 row in the database

  8. Who is interested in using Phylogenies? • Casual Users • Visualization • Study Development • Super-tree algorithms • Simulation Studies • Parameter Derivation • Comparative Genomics

  9. Super-Tree Algorithms Use-Cases Construct phylogenies by assembling existing studies Collect those studies by: • Determine minimum spanning clade for a set of taxa • Find all phylogenies sufficiently similar to a given phylogeny

  10. Requirements of Phylogenetic Databases

  11. M o B I o S M o B I o S S o I B o M S o I B o M The MoBIoS ProjectMolecular Biological Information System Daniel P. Miranker University of Texas

  12. MoBIoS – A Simple IdeaOrganize the Storage Manager Around Metric Space Indexing

  13. Biological queries conducted with sequential scans. • Sequence (BLAST) • Phylogenies (Tree of Life) • Mass Spectra (Proteomics) • Ligand Docking (Rational Drug Design)

  14. Metric Space is • a pair, M=(D,d), where • D is a set of points • d is [metric] distance function with the following properties: • d(x, y) = d (y, x) (symmetry) • d(x, y) > 0, d(x, x) = 0 (non negativity) • d(x, y) <= d(x, z) + d(z, y) (triangle inequality)

  15. Already metrics re: Phylogenetic trees Ligand docking First Biologically Effective Metric Model of Amino Acid Substitution [Xu&Miranker 03]  In effect, precisely the phylogenetic relationships among sequences are exploited to form a database index. Metrics for proteomic mass-spectra underway Can Biology Be Modeled by Metrics?

  16. MoBIoS Architecture(Molecular Biological Information System) phylogenies

  17. First Application (with Randy Linder) Compared: {entire Arib. Genome} x {“entire” Rice genome} To determine conserved pairs of primer pairs, In O(m log n), will repeat study again soon, faster.

  18. Primary data is stored in text or blob fields Annotations may be relational Data retrieval Filter DB, sequential dump, O(n), to utilities E.g. BLAST, TreeBASE, Sequest When biological data is put in to an RDBMS

  19. Homework: Due tomorrow morning • Who are you, (generically)? • Use case involving the database

  20. Don’t know: A General Web Service ToL Infrastructure @ SDSC Web App Server Curating New Content Computing Grid Computing Grid DB Schema

More Related