200 likes | 335 Views
(near term) Develop Database Requirements to Yield Schema and Interfaces MoBIoS: Database Management for Data in Metric Spaces Daniel P. Miranker Univ. of Texas. What we know for sure: Exploit Commodity Architecture. External Data/DB Sources. Web App Server. Curating New Content.
E N D
(near term) Develop Database Requirements to Yield Schema and Interfaces • MoBIoS: Database Management for Data in Metric Spaces • Daniel P. Miranker • Univ. of Texas
What we know for sure: Exploit Commodity Architecture External Data/DB Sources Web App Server Curating New Content Computing Grid DB Users
Repository Schema and Interface Definitions Issue: • Database organization and data interchange should be addressed simultaneously • Once established, difficult to change Best to get this right the first time.
What we know for sure: 1. Data transfer XML & Nexus files 2. Curate: (manage quality) Web App Server Curating New Content Computing Grid DB Schema Users Both 1 & 2 impact schema, (data provenance)
XML and Bioinformatics • Taxonomic Markup Language (TML) • PhyloML • BEAST: Bayesian Evolutionary Analysis Sampling Trees • AGAVE: Architecture for Genomic Annoation Visualization and Exchange §
Answers Start with a Requirements Analysis • Who • What • Why • How “Use cases”: specific examples of what is to be accomplish
A Head Start Requirements of Phylogenetic Databases(with Nakhleh, Barbancon Piel & Donoghue)[BIBE ’03] • Did a requirements analysis • Proof of concept for a correctly normalized database schema 1 evolutionary (tree)-edge = 1 row in the database
Who is interested in using Phylogenies? • Casual Users • Visualization • Study Development • Super-tree algorithms • Simulation Studies • Parameter Derivation • Comparative Genomics
Super-Tree Algorithms Use-Cases Construct phylogenies by assembling existing studies Collect those studies by: • Determine minimum spanning clade for a set of taxa • Find all phylogenies sufficiently similar to a given phylogeny
M o B I o S M o B I o S S o I B o M S o I B o M The MoBIoS ProjectMolecular Biological Information System Daniel P. Miranker University of Texas
MoBIoS – A Simple IdeaOrganize the Storage Manager Around Metric Space Indexing
Biological queries conducted with sequential scans. • Sequence (BLAST) • Phylogenies (Tree of Life) • Mass Spectra (Proteomics) • Ligand Docking (Rational Drug Design)
Metric Space is • a pair, M=(D,d), where • D is a set of points • d is [metric] distance function with the following properties: • d(x, y) = d (y, x) (symmetry) • d(x, y) > 0, d(x, x) = 0 (non negativity) • d(x, y) <= d(x, z) + d(z, y) (triangle inequality)
Already metrics re: Phylogenetic trees Ligand docking First Biologically Effective Metric Model of Amino Acid Substitution [Xu&Miranker 03] In effect, precisely the phylogenetic relationships among sequences are exploited to form a database index. Metrics for proteomic mass-spectra underway Can Biology Be Modeled by Metrics?
MoBIoS Architecture(Molecular Biological Information System) phylogenies
First Application (with Randy Linder) Compared: {entire Arib. Genome} x {“entire” Rice genome} To determine conserved pairs of primer pairs, In O(m log n), will repeat study again soon, faster.
Primary data is stored in text or blob fields Annotations may be relational Data retrieval Filter DB, sequential dump, O(n), to utilities E.g. BLAST, TreeBASE, Sequest When biological data is put in to an RDBMS
Homework: Due tomorrow morning • Who are you, (generically)? • Use case involving the database
Don’t know: A General Web Service ToL Infrastructure @ SDSC Web App Server Curating New Content Computing Grid Computing Grid DB Schema