1 / 35

The OBIS Index

The OBIS Index. Where we are – as at October 2003 Tony Rees – CSIRO Marine Research, Hobart for: OBIS IC meeting, Washington DC. Advance information Subject of this talk is. - New (mostly created within last 8 weeks, some within last 8 days)

pcurtis
Download Presentation

The OBIS Index

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The OBIS Index Where we are – as at October 2003 Tony Rees – CSIRO Marine Research, Hobart for: OBIS IC meeting, Washington DC

  2. Advance informationSubject of this talk is ... - New (mostly created within last 8 weeks, some withinlast 8 days) - Innovative (uses special components available only from CSIRO, plus others custom created for this project) - Powerful (offers a major ramp-up of OBIS functionality, for modest additional complexity) - Exciting (opens the possibility to many new features) - so – worth a look!

  3. OBIS: A Distributed System Strengths of this approach ... • Data sources remain under custodianship of OBIS contributors (no IP issues, good for community building, owners do their own QA and updates) • Portal concerns itself with technical issues, not a data manager • Portal size, resource requirements don’t increase as OBIS membership and content grow • No problems with version control

  4. OBIS: A Distributed System Weaknesses with this approach ... • Availability, speed of links, and speed of responses to/from contributors are critical to proper functioning of the system (compounds with increasing number of contributors) – i.e., system response depends on factors outside OBIS’ control • Portal has no knowledge of OBIS provider content (has to do a live distributed query for every piece of information) – also, a user may search repeatedly on taxa for which no data are held in the system • No opportunity to provide value-adding, such as search by taxonomic group (as contributors do not provide this information in any enforced way) • No opportunities for advanced search functions e.g. “near match” (would be difficult to do in real-time distributed query)

  5. One example: The “Zero Records” Problem ... • Incorrect spelling? • No data available via OBIS for this taxon? • Data exist, but are in one of the sources which are off-line? NB, these responses are actually the slowest to generate, as well!

  6. A solution - the OBIS Index = a reduced subset of OBIS data, stored in a standardized format, in a convenient location • Single record per species, with relevant summary information, i.e., number of records, date range, depth range, plus “c-squares” spatial index (sufficient distribution information for “quick maps” and spatial searches) • Master genus list, with cross-references to a simple taxonomic hierarchy • Degree of QA, i.e. masking informal/unresolved taxa, and freshwater/ terrestrial species

  7. C-squares spatial indexing ... • Doesn’t store the point data, just a list of the squares in which data are present, for each taxon • Efficient for data reduction • Easy to store and query Choice of square size is a design decision (this index uses 0.5 x 0.5 deg. squares, =~ 50 km)

  8. Index benefits ... - Initial taxon searches and mapping take place by querying the index, not the remote data sources: • rapid response time • always complete (irrespective of whether any data sources are off line) • can return lists of multiple taxa as desired (no longer need to search for taxa sequentially) • limits user selection to a picklist of species represented in the system - no more “zero records” responses • correct user spelling not required (enter part of a name, or browse a category, or ask for “near matches”) • can return information for user’s desired taxonomic group(s) only • Use as “pre-filter” to answer many queries directly fromthe index, without needing to do a distributed searchuntil actual data are required – i.e., a 2-stage process.

  9. Index Development So Far ... • Nov. 2002 – July 2003:initialconcept development and refinement (Tony, Rainer, Phoebe) – incl. endorsement by OBIS IC, Mar. ’03 • Aug. – Sept. 2003: • design/build initial prototype Index, plus partially populate with summary data (Tony) • construct master genus list and taxonomic hierarchy (=“OBIS categories”), and tag most genera with relevant category (Tony) • Sept. 2003:circulate URL and background information to OBIS IC, TWG for comment • Sept. – Oct. 2003: • refine prototype index (Tony) • construct “crawler” and finish first-pass population of the index (Tony, Pamela) • tag remaining genera with taxonomic attribution (Tony) • build spatial search module (Tony)

  10. Reality check – what do users need ... Key OBIS functions: • Show/get distribution data for a desired species • Show/get species information for an area (preceded by ...) • List species for which data are available! (e.g. by organism type) • Show areas for which data are available! (e.g. by organism type)

  11. Current (prototype) OBIS Index Search Interface- as at October 2003 (www.marine.csiro.au/datacentre/obis/quicksearch1.htm)

  12. Current OBIS Categories (Oct. 2003) - page 1 of 2 (approx. 140 in total)

  13. Current OBIS Categories (Oct. 2003) - page 2 of 2

  14. Example Possible Index Searches ... “Generate List ...” function: previously offered? • All fishes beginning with “B...”, or “Bathy...” N • All whales, or decapods, or bryozoans N • All species of the genus “Raja” Y/N • All “near matches” to “Coelorhynchus” N “Spatial Search ...” function: • All fishes, or hexacorals, or “any invertebrates”, or any N OBIS taxa, in any 10 x 10 degree square • All species of “Raja” in a given 10 x 10 degree square Y/N • Global distribution map for any OBIS taxonomic category N (e.g. can use to identify data gaps) (Note, could also offer searching by 5 x 5 degree square or smaller, but data are probably too patchy for this to be useful at present time)

  15. Live OBIS Index Search Interface ...

  16. Costs associated with the Index ... • Design, build costs (i.e., person hours) • mostly done - although will be refined further (CSIRO contribution) • Hosting costs • CSIRO is happy to host, at least for present; access via web can be seamless, once integrated into the portal • Refreshing/ content maintenance costs • some person time needed, in addition to automated “crawler” – upload taxon lists from new data contributors, check for bad data, flag new genera with relevant taxonomic group as needed • crawling ideally should be repeated frequently, to keep index current • Continued development and integration into OBIS Portal • ??

  17. Recap – what’s new ... • Speed, consistency, reliability • includes no more 0 records, or “try later” messages (at least on “Stage 1” searches) • Many new functions, including • User need only enter part of a name • Can automatically correct for spelling errors • Report on multiple taxa simultaneously (tens to thousands) • Spatial searches from clickable maps • Introduction of “OBIS categories” • OBIS content available at a glance (summary statistics, spatial coverage by category) • Screening of irrelevant, and/or bad data • Expansion of ease-of-use, from expert to increasingly non-expert users, without compromising integrity of the system.

  18. Future tasks ... • Include common names in search results, search interface • Auto-resolution of synonyms, variants ... • Quick Images? Quick Species Pages? • How to embed seamlessly into Portal • Further development of CSIRO mapper, and/or c-squares enabling for other mappers? (KGS, SEAMAP...) • Think about replication, system load issues • How to manage development process from here • Any overlap with GBIF activities? (OBIS is “marine component of GBIF”; GBIF has indicated interest in indexing) • Other ??

  19. Summary ... - Interesting challenge thus far! - Reasonably complex package (database, software, content building and maintenance) - Personal opinion – major step forward in OBIS functionality - Close to deployment in “production” version - How to integrate with OBIS work plan?

  20. A peek behind the scenes – the master genus table (portion)

  21. Search for “all whales” via OBIS Index Search Interface ...

  22. List for “all whales” ... - result in <4 secs.

  23. “Quick map” for Balaenoptera physalus (38,000+ OBIS records) - in < 6 secs.

  24. Full OBIS search on any species is 1 click away ...

  25. Also: maps are all now “active maps” – click on/near any red square initiates “live” OBIS spatial search for the relevant base data.

  26. Another example: “near match ...” for a genus name – where user is unsure of correct spelling ...

  27. Coelorhynchus X

  28. Search result in 2-3 seconds ...

  29. Example “Spatial Search ...” for a category ...

  30. Pre-generated map presented, showing all records for category ...

  31. Search result in <6 secs. ...

  32. Another feature: can use stored record count information to generate OBIS summary statistics per category, e.g.:

  33. OBIS Statistics – from master Index table

More Related