210 likes | 358 Views
Use of c-squares spatial indexing and mapping in the 2004 release of OBIS, the Ocean Biogeographic Information System. Tony Rees Divisional Data Centre CSIRO Marine Research, Australia (Tony.Rees@csiro.au). OBIS Concept.
E N D
Use of c-squares spatial indexing and mapping in the 2004 release of OBIS, the Ocean Biogeographic Information System Tony Rees Divisional Data Centre CSIRO Marine Research, Australia (Tony.Rees@csiro.au)
OBIS Concept • Intention of OBIS is to be an on line “marine species atlas” by providing electronic access to data from multiple sources via a single gateway or portal • Current marine species data are scattered (hundreds, thousands of potential data sources) and require harmonisation of data formats, species names, etc.; potentially 200,000+ species • OBIS has made a start with connections to 10-15 data sources, holding data on some 30,000 species (approx. 2.5m records); intention is to pass 10m records by 2010 • Initial OBIS portal has been operational since 01/2002 (located at Rutgers Univ., NJ.): www.iobis.org (upgraded March 2004).
Visualisation / analysis tools Native Portal functions – including data retrieval and integration OBIS Distributed data sources OBIS Concept - diagrammatic
Real time data queries Mapping tool 1 Mapping tool 1 Mapping tool 1 OBIS Architecture – initial implementation Initial versionJan 2002 – Mar 2004 Custom DB wrappers data provider 1 www user 1 OBIS Portal data provider 2 www user 2 data provider 3 www user 3 (etc.) (etc.)
Strengths / weaknesses Good things about this approach... • Source data stays with the providers – no versioning problems, good for distributed “ownership” of the OBIS concept, no IP issues, content is always up-to-date • OBIS portal can be structurally very simple (simply relays requests and responses, and provides access to on-line mapping tools) • Portal does not have to be a data manager (with associated resourcing, ongoing data integrity issues), or have any intelligent understanding of the data content (simply does matching on text strings)
Strengths / weaknesses Less good things about this approach... • System is only as fast as its slowest link, i.e. performance is dependent on factors outside the Portal’s control; can wait minutes to return data on one species from all providers • One or more providers may be off line at time of query – will never know whether or not they have data of interest (and mapping potentially incomplete) • Many searches may return no data (only approx. 10-15% of the marine biota covered at the present time, plus spatial coverage is very patchy), also user has to spell name/s correctly (a common problem) • No ability to query by common terms e.g. “all fishes”, “all whales”, as this information not held at provider level (typically just the scientific names) • Some species can be known by multiple / variant names, user may not be aware of this. Also some bad / irrelevant data amongst provider input, will show up with appropriate searches (not filtered out) ... Portal is basically “dumb”, cannot provide user with any pre-search information about what content is available or unavailable.
Semi-equivalent situation at author’s agency • (Component 1): “Data Warehouse” data repository, with 0.25m marine species distribution records, for 3000 species ... can be slow / cumbersome to query, many queries return no data • (Component 2): Separate “CAAB” master names list – all possible species which occur in the region (c.20,000 names) • CAAB upgraded to show which species on the master names list have data in the Warehouse • Also, parsed all the 0.25m species records and built a spatial index – list of squares in which each species has been recorded; this table then stored as part of the CAAB database • Now ... can do name and spatial queries on the (smaller) CAAB database (= Index) – show all names for which there are data, what species occur in any square (0.1 x 0.1 degrees in this instance), and distribution of any species, direct from the Index, without needing to establish a connection to the full “Warehouse” database • Can then support full Warehouse queries as “stage 2” if needed.
C-squares spatial indexing ... • Doesn’t store the point data, just a list of the squares in which data are present, for each species • Efficient for data reduction, where multiple points occur in the same square • Easy to store and query ... choice of square size is a design decision (CAAB index uses 0.1 x 0.1 deg. squares, =~ 10 km)
Hierarchical nomenclature for the c-squares codes Lat 40.5 S, long 140.2 Eis in...10 x 10º square “3414” 5 x 5º square “3414:1” 1 x 1º square “3414:100” x 0.5 x 0.5º square “3414:100:3” (etc.)
Behind the scenes, spatial index looks like this ... • Index must be refreshed when new data are added to the Warehouse (or records deleted / modified) • Spatial query logic is very simple (standard text match, on part or all of a “word”)
OBIS New version ... • For OBIS – similar approach taken, i.e. introduce name index and spatial index (= new “metadata layer” – the OBIS Index), this time using 0.5 x 0.5 degree squares (~50 km global resolution) • Name index also enhanced with additional metadata and value-adding... • how many records of each species (0-40,000+) • which sources have the data • date range (start, end year) • what group a species belongs to (fishes, whales, barnacles...) • common name for the species, where available (plus more) • Can now do many queries – including name lists / metadata, spatial queries and “quick maps” – direct from the Index (smaller, rapid to query, plus everything runs locally) • Only need to query the remote data sources for “stage 2” (= get data) queries. In production version, local data cache of key fields introduced as well, for further performance benefits, and guaranteed data availability.
Real time data queries Mapping tool 1 Mapping tool 1 Mapping tool 1 OBIS Architecture – initial implementation (reprise) www user 1 Initial version2002-2004 Custom DB wrappers data provider 1 OBIS Portal data provider 2 www user 2 data provider 3 www user 3 (etc.) (etc.)
“Stage 2” queries “Stage 1” queries Provider crawling Index building Mapping tool 1 Mapping tool 2 Mapping tool 1 global names list (partial) OBIS Architecture – 2004 version New versionMar 2004 onwards www user 1 DiGIR translation software data provider 1 OBIS Portal data provider 2 www user 2 data provider 3 www user 3 (etc.) (etc.) OBIS Index Data Cache (refreshed on regular cycle)
OBIS User Interface – 2004 version • Click-on-a-map spatial search (all categories, or single category) • Name search (scientific name, common name, partial match, “soundalike” search) • Browse a list of names – all categories (alphabetic), or subset by category • Show only names with data, or all names (shows status of content building, also confirms that user has entered a valid name – whether data held or not)
Result in practice... • Can now generate lists of names matching search criteria extremely rapidly – e.g. • “all whales” (35 spp.) ... <4 secs, including c-squares distribution data (up to 1000 squares per species) • all whales in a 10 x 10 deg. square ... <3 secs, including distribution data • all fishes beginning with “lu..” (115 spp) ... <10 secs, including distribution data (Compare with previous situation, of 2 mins+ per species, and numerous “no data returned” messages) • “Quick maps” available directly from search results page (require no connection to the source data) • Also can hold summary statistics (nos. of names per category, overall category distribution maps, etc.) as “meta- metadata”, for presentation to user.
Search OBIS for “Lutjanus” ... 64 spp., < 6 secs Note, presence of common names, other metadata, “Quick Map” buttons, plus “Get OBIS Data” (= Stage 2) hyperlinks
HTML results page has all the c-squares for “quick maps” already loaded – e.g. for 1 species (portion of 1 row of the HTML table) ...
C-squares mapper output • User can choose from a range of available base maps, at variety of sizes / scales • List of squares returned in the HTML code along with every map as a new form, for re-submission to the mapper if needed (e.g. if user requests a different base map) • Clicking any point on the map triggers a “Stage 2” request for the source (point) data (implemented as a 5 x 5 degree search on the cache, for the species in question).
Australia using variable-resolution encoding Concept for multiple-square searches Some limitations ... • Whole world at 0.5 x 0.5 degrees requires 259,200 codes – may exceed present mapper limit (around 60,000 codes), also be a problem for storage. One solution: Multiple contiguous codes can be “collapsed” into next larger step of the hierarchy (i.e., 648 10 x 10 degree squares cover the world), giving quadtree-like efficiencies • Spatial queries are fastest when constrained to a single square (potentially at any of a range of scales). Multiple-square queries are also possible, (basically, a Boolean “OR” search), but will be slower to execute • System becomes somewhat less efficient towards the poles (square size becomes smaller) • Searching on complex polygons (e.g. country boundaries) not really supported – would require a true GIS or spatial database environment to implement (although can come close).
Summary • Metadata-driven approach provides orders-of-magnitude improvements in application functionality, user interactivity, and response times for OBIS • C-squares spatial indexing supports both spatial searching and provision of “quick maps” directly from the Index – faster, efficient for data storage • Index / front end can be run as standalone system (decoupled from source data), and requires no GIS environment for implementation (these aspects will be important for future move to a system of replicated OBIS nodes) • “Quick maps” form a set of custom GUIs which can be used as direct data access points in an intuitive manner • C-squares system is available for use in other app’s as desired (e.g. see satellite data search presentation, this workshop); OBIS is a demonstration of performance / implementation of a large scale c-squares enabled system in practice. More information: • C-squares description: “Oceanography”, March 2003 (vol. 16 no. 1) • C-squares website:http://www.marine.csiro.au/csquares/ • OBIS website:http://www.iobis.org/ .