180 likes | 331 Views
BODHI, A Bio-diversity Database Pla(n)tform. Jayant Haritsa Database Systems Lab Supercomputer Education and Research Centre Indian Institute of Science. Team. B. J. Srikanta (next talk) Prof. Madhav Gadgil Prof. V. Nanjundiah (Centre for Ecological Sciences, IISc)
E N D
BODHI,A Bio-diversity Database Pla(n)tform Jayant Haritsa Database Systems Lab Supercomputer Education and Research Centre Indian Institute of Science BODHI
Team • B. J. Srikanta (next talk) • Prof. Madhav GadgilProf. V. Nanjundiah(Centre for Ecological Sciences, IISc) • Several Masters Students • Funded by DBT BODHI
Motivation • GATT – Patent Laws • To be in place by 2005 • Loss • Neem • Basmati (estimated export value: Rs. 1,198 crore) • Turmeric • Global and local efforts • GBIF (Global Biodiversity Information Facility) • Karnataka Bio-diversity Board [Deccan Herald - Aug 26 2000] BODHI
Bio-diversity Data • Taxonomy of species • Phenetic (physical) characteristics • Phylogenetic (evolutionary) characteristics • Habitat / Spatial distribution • Political Layout • Geographic Layout • Biospheres • Genetic information • Bio-molecular sequences • Structural information BODHI
MULTI-DOMAIN QUERY • Retrieve all plant species that share a common habitat, have identical Inflorescence characteristics, and have a DNA sequence within BLAST score of 80, with respect to “Michelia-champa”. BODHI
Difficulties: • Complex range of data types • sets, hierarchies, aggregations, sequences, geometries, maps, audio, images … • Multidimensional data • spatial (latitude, longitude, elevation) toproteins (hundreds of coordinates) • Computationally-intensive operators • species relationships, spatial distributions, sequence alignments, ... BODHI
Current Solutions • Small-Scale • MS-Access / FoxPro / Excel / ... • Pentium PCs • Large-Scale • RDBMS: Oracle / DB2 / Informix / Sybase / … • Unix servers: Sun / SGI / IBM / HP / ... BODHI
Limitations: • RDBMS approach of “the world is a flat collection of tables with simple attributes” suits financial applications, NOT scientific (biological) applications • In particular,taxonomic / spatial / sequence / multimedia data modeling and processingare very cumbersome and coarse BODHI
Limitations (contd) • Spatial and other applications are not within the database kernel but are connected externally. E.g. Many GIS systems have ArcInfo and MS-Access hooked up in a “black-box” manner. Or, Blast/FASTA utilizing sequence files generated from Oracle. • Problem: Slow and ugly! BODHI
Is there Hope? • Object-Oriented DBMS • “Natural” for biological applications • High-performance data access methods • Path Dictionary Index, Multi-key Type Index,Pyramid Tree, ... • High-performance specialized operators • spatial join, data mining, sequence processing, … • XML = HTML + Semantics BODHI
Goals of BODHI • Seamless integration of taxonomic, spatial and genomic data using OO technology • Latest access methods and operatorsfor all three types of data • Utilize XML for data exchange • Low-cost (ideally, free!) BODHI
The Internet Architecture of BODHI Client Interface Framework Query Processor Spatial Operations Object Operations Genome Operations Spatial Indexes Object Indexes Genome Indexes Spatial Model Taxonomy Model Genome Model Spatial Services Object Services Sequence Services OBJECT STORAGE MANAGER BODHI
The Internet Implementation of BODHI Client Interface Framework –DB Overlaps, Contains,Closest, Within Inheritance Aggregation Alignment BLAST, FASTA R*-tree, Hilbert-Rtree Multi-Key Type, Path-Dictionary ??? Indexes (next talk) Country, State, City, River, Road Species, Genera, Family, Order DNA, Protein Spatial Services Object Services Sequence Services Basic Types (Point, Line, Polygon, Sets, Sequences, ...) SHORE MICRO-KERNEL BODHI
Query Flow BODHI
Project Status • Prototype (minus Client Interface Framework) is operational since last month ! • Platform: PIII-700MHz running Redhat Linux. • For Code, contact “bodhi@dsl.serc.iisc.ernet.in” BODHI
Performance Evaluation • SEQUOIA 2000 spatial benchmark: Competitive with Paradise GIS from Wisconsin • Taxonomy + Spatial Queries: Reasonably fast • But Genomics slows things down a lot due to absence of indexes (next talk) BODHI
More details • “Design and Implementation of a Biodiversity Information System”,Proc. of Intl. Conf. On Management of Data (COMAD), Pune, December 2000 • “The Building of BODHI, A Bio-diversity Database System”,TechRep-2001-02, DSL/SERC, IISc • Available at http://dsl.serc.iisc.ernet.in BODHI
End of Talk BODHI