1 / 21

Tony Rees Divisional Data Centre CSIRO Marine Research, Australia (Tony.Rees@csiro.au)

Use of c-squares spatial indexing and mapping in the 2004 release of OBIS, the Ocean Biogeographic Information System. Tony Rees Divisional Data Centre CSIRO Marine Research, Australia (Tony.Rees@csiro.au). OBIS Concept.

amir-glenn
Download Presentation

Tony Rees Divisional Data Centre CSIRO Marine Research, Australia (Tony.Rees@csiro.au)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Use of c-squares spatial indexing and mapping in the 2004 release of OBIS, the Ocean Biogeographic Information System Tony Rees Divisional Data Centre CSIRO Marine Research, Australia (Tony.Rees@csiro.au)

  2. OBIS Concept • Intention of OBIS is to be an on line “marine species atlas” by providing electronic access to data from multiple sources via a single gateway or portal • Current marine species data are scattered (hundreds, thousands of potential data sources) and require harmonisation of data formats, species names, etc.; potentially 200,000+ species • OBIS has made a start with connections to 10-15 data sources, holding data on some 30,000 species (approx. 2.5m records); intention is to pass 10m records by 2010 • Initial OBIS portal has been operational since 01/2002 (located at Rutgers Univ., NJ.): www.iobis.org (upgraded March 2004).

  3. Visualisation / analysis tools Native Portal functions – including data retrieval and integration OBIS Distributed data sources OBIS Concept - diagrammatic

  4. Real time data queries Mapping tool 1 Mapping tool 1 Mapping tool 1 OBIS Architecture – initial implementation Initial versionJan 2002 – Mar 2004 Custom DB wrappers data provider 1 www user 1 OBIS Portal data provider 2 www user 2 data provider 3 www user 3 (etc.) (etc.)

  5. Strengths / weaknesses Good things about this approach... • Source data stays with the providers – no versioning problems, good for distributed “ownership” of the OBIS concept, no IP issues, content is always up-to-date • OBIS portal can be structurally very simple (simply relays requests and responses, and provides access to on-line mapping tools) • Portal does not have to be a data manager (with associated resourcing, ongoing data integrity issues), or have any intelligent understanding of the data content (simply does matching on text strings)

  6. Strengths / weaknesses Less good things about this approach... • System is only as fast as its slowest link, i.e. performance is dependent on factors outside the Portal’s control; can wait minutes to return data on one species from all providers • One or more providers may be off line at time of query – will never know whether or not they have data of interest (and mapping potentially incomplete) • Many searches may return no data (only approx. 10-15% of the marine biota covered at the present time, plus spatial coverage is very patchy), also user has to spell name/s correctly (a common problem) • No ability to query by common terms e.g. “all fishes”, “all whales”, as this information not held at provider level (typically just the scientific names) • Some species can be known by multiple / variant names, user may not be aware of this. Also some bad / irrelevant data amongst provider input, will show up with appropriate searches (not filtered out) ... Portal is basically “dumb”, cannot provide user with any pre-search information about what content is available or unavailable.

  7. Semi-equivalent situation at author’s agency • (Component 1): “Data Warehouse” data repository, with 0.25m marine species distribution records, for 3000 species ... can be slow / cumbersome to query, many queries return no data • (Component 2): Separate “CAAB” master names list – all possible species which occur in the region (c.20,000 names) • CAAB upgraded to show which species on the master names list have data in the Warehouse • Also, parsed all the 0.25m species records and built a spatial index – list of squares in which each species has been recorded; this table then stored as part of the CAAB database • Now ... can do name and spatial queries on the (smaller) CAAB database (= Index) – show all names for which there are data, what species occur in any square (0.1 x 0.1 degrees in this instance), and distribution of any species, direct from the Index, without needing to establish a connection to the full “Warehouse” database • Can then support full Warehouse queries as “stage 2” if needed.

  8. C-squares spatial indexing ... • Doesn’t store the point data, just a list of the squares in which data are present, for each species • Efficient for data reduction, where multiple points occur in the same square • Easy to store and query ... choice of square size is a design decision (CAAB index uses 0.1 x 0.1 deg. squares, =~ 10 km)

  9. Hierarchical nomenclature for the c-squares codes Lat 40.5 S, long 140.2 Eis in...10 x 10º square “3414” 5 x 5º square “3414:1” 1 x 1º square “3414:100” x 0.5 x 0.5º square “3414:100:3” (etc.)

  10. Behind the scenes, spatial index looks like this ... • Index must be refreshed when new data are added to the Warehouse (or records deleted / modified) • Spatial query logic is very simple (standard text match, on part or all of a “word”)

  11. OBIS New version ... • For OBIS – similar approach taken, i.e. introduce name index and spatial index (= new “metadata layer” – the OBIS Index), this time using 0.5 x 0.5 degree squares (~50 km global resolution) • Name index also enhanced with additional metadata and value-adding... • how many records of each species (0-40,000+) • which sources have the data • date range (start, end year) • what group a species belongs to (fishes, whales, barnacles...) • common name for the species, where available (plus more) • Can now do many queries – including name lists / metadata, spatial queries and “quick maps” – direct from the Index (smaller, rapid to query, plus everything runs locally) • Only need to query the remote data sources for “stage 2” (= get data) queries. In production version, local data cache of key fields introduced as well, for further performance benefits, and guaranteed data availability.

  12. Real time data queries Mapping tool 1 Mapping tool 1 Mapping tool 1 OBIS Architecture – initial implementation (reprise) www user 1 Initial version2002-2004 Custom DB wrappers data provider 1 OBIS Portal data provider 2 www user 2 data provider 3 www user 3 (etc.) (etc.)

  13. “Stage 2” queries “Stage 1” queries Provider crawling Index building Mapping tool 1 Mapping tool 2 Mapping tool 1 global names list (partial) OBIS Architecture – 2004 version New versionMar 2004 onwards www user 1 DiGIR translation software data provider 1 OBIS Portal data provider 2 www user 2 data provider 3 www user 3 (etc.) (etc.) OBIS Index Data Cache (refreshed on regular cycle)

  14. OBIS User Interface – 2004 version • Click-on-a-map spatial search (all categories, or single category) • Name search (scientific name, common name, partial match, “soundalike” search) • Browse a list of names – all categories (alphabetic), or subset by category • Show only names with data, or all names (shows status of content building, also confirms that user has entered a valid name – whether data held or not)

  15. Result in practice... • Can now generate lists of names matching search criteria extremely rapidly – e.g. • “all whales” (35 spp.) ... <4 secs, including c-squares distribution data (up to 1000 squares per species) • all whales in a 10 x 10 deg. square ... <3 secs, including distribution data • all fishes beginning with “lu..” (115 spp) ... <10 secs, including distribution data (Compare with previous situation, of 2 mins+ per species, and numerous “no data returned” messages) • “Quick maps” available directly from search results page (require no connection to the source data) • Also can hold summary statistics (nos. of names per category, overall category distribution maps, etc.) as “meta- metadata”, for presentation to user.

  16. Result of query for “all whales” ... 35 spp., < 4 secs

  17. Search OBIS for “Lutjanus” ... 64 spp., < 6 secs Note, presence of common names, other metadata, “Quick Map” buttons, plus “Get OBIS Data” (= Stage 2) hyperlinks

  18. HTML results page has all the c-squares for “quick maps” already loaded – e.g. for 1 species (portion of 1 row of the HTML table) ...

  19. C-squares mapper output • User can choose from a range of available base maps, at variety of sizes / scales • List of squares returned in the HTML code along with every map as a new form, for re-submission to the mapper if needed (e.g. if user requests a different base map) • Clicking any point on the map triggers a “Stage 2” request for the source (point) data (implemented as a 5 x 5 degree search on the cache, for the species in question).

  20. Australia using variable-resolution encoding Concept for multiple-square searches Some limitations ... • Whole world at 0.5 x 0.5 degrees requires 259,200 codes – may exceed present mapper limit (around 60,000 codes), also be a problem for storage. One solution: Multiple contiguous codes can be “collapsed” into next larger step of the hierarchy (i.e., 648 10 x 10 degree squares cover the world), giving quadtree-like efficiencies • Spatial queries are fastest when constrained to a single square (potentially at any of a range of scales). Multiple-square queries are also possible, (basically, a Boolean “OR” search), but will be slower to execute • System becomes somewhat less efficient towards the poles (square size becomes smaller) • Searching on complex polygons (e.g. country boundaries) not really supported – would require a true GIS or spatial database environment to implement (although can come close).

  21. Summary • Metadata-driven approach provides orders-of-magnitude improvements in application functionality, user interactivity, and response times for OBIS • C-squares spatial indexing supports both spatial searching and provision of “quick maps” directly from the Index – faster, efficient for data storage • Index / front end can be run as standalone system (decoupled from source data), and requires no GIS environment for implementation (these aspects will be important for future move to a system of replicated OBIS nodes) • “Quick maps” form a set of custom GUIs which can be used as direct data access points in an intuitive manner • C-squares system is available for use in other app’s as desired (e.g. see satellite data search presentation, this workshop); OBIS is a demonstration of performance / implementation of a large scale c-squares enabled system in practice. More information: • C-squares description: “Oceanography”, March 2003 (vol. 16 no. 1) • C-squares website:http://www.marine.csiro.au/csquares/ • OBIS website:http://www.iobis.org/ .

More Related