1 / 40

Tony Rees Divisional Data Centre CSIRO Marine Research, Australia (Tony.Rees@csiro.au)

C-squares - a new approach to representing, querying, displaying and exchanging dataset spatial extents at the metadata level. Tony Rees Divisional Data Centre CSIRO Marine Research, Australia (Tony.Rees@csiro.au). Talk Outline. Introduce myself, my agency, our approach to data and metadata

gerry
Download Presentation

Tony Rees Divisional Data Centre CSIRO Marine Research, Australia (Tony.Rees@csiro.au)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. C-squares - a new approach to representing, querying, displaying and exchanging dataset spatial extents at the metadata level Tony Rees Divisional Data Centre CSIRO Marine Research, Australia (Tony.Rees@csiro.au)

  2. Talk Outline • Introduce myself, my agency, our approach to data and metadata • Review characteristics of metadata, and current handling of spatial extents in metadata records • Describe limitations of “bounding rectangles” representation for non-rectangular / patchy data • The “C-squares” approach • Current c-squares resources / future possibilities

  3. Acknowledgements ... • CMR staff and colleagues in Australia, Europe and USA for helpful discussions • WMO and Australian “Blue Pages” for nomenclature for the squares and their subdivisions • Miroslaw Ryba (CMR) for programming used in the c-squares mapper and search interface • David Hastings / NOAA “GLOBE” Task Team and CSIRO Atmospheric Research for images used as base maps • Doug Nebert / FGDC for hosting my US visit and interest in the system

  4. Author/Agency Background • From CSIRO Marine Research in Australia (located in Hobart, Tasmania, + 2 other locations; c. 300 staff)

  5. (dataset descriptions) metadata exposure via distributed searches Single, searchable metadatabase metadata export Multiple, heterogeneous stored data forms CMR’s Data and Metadata Storage- similar to many other agencies ...

  6. Metadata functions • Dataset discovery - by providing a filtered subset of all possible records (according to user-specified criteria) • Dataset description - permits a degree of resource appraisal (“will this data be what I need?”) • Dataset surrogate - may enable some questions to be answered, and/or statistics compiled, without need to access the actual data … Should also provide access route to the data if required (online link or contact point) … “C-squares” assists each of the first three points above.

  7. data rectangle hit hit hit search rectangle “Bounding Rectangles” Representation- and “overlapping rectangles” search method • Current metadata systems hold a “bounding rectangle” (bounding box) for each dataset (N, S, E, W bounding coordinates) • Spatial searching is carried out by an “overlapping rectangles” test: … cases (1) and (2) include the tacit assumption that the “data rectangle” is actually filled with data: all overlaps with the data rectangle are inferred to be overlaps with the actual data.

  8. The “California” Problem • The State of California is a classic (previously cited) case where the bounding rectangle is a poor fit to the real spatial extent ... … “search” regions in Nevada, a little of Arizona, plus offshore Pacific Ocean will all intersect this “data” rectangle (=“false hits”)

  9. search rectangle convex edges partial sampling convex + concave edges oblique alignment - linear exclusion areas oblique alignment - rectangular “false hit” multiple sampled regions disjunct sampled regions False Hits from “Overlapping Rectangles” Searches Potential problems can be deconstructed into 3 contributing ones ... (a) Filled polygons, but a poor fit to their bounding rectangle (b) Multiple discrete polygons (c) Incompletely filled polygons

  10. Consequences of False Hits ... • Can get nonsensical results (sea ice at the Equator, marine species in the desert) • Time / effort wasted accessing inappropriate datasets • Cannot use resultsets quantitatively, e.g. … • how many records / species occur in this defined region • compare content of one defined region with another • sum the results of consecutive searches • etc.

  11. bounding rectangle actual sampling points (= true dataset extent) Author’s Agency Data (typical):

  12. gridded representation “C-squares” approach: • gives flexibility to represent a variety of dataset shapes, also patchiness (gaps in data coverage)

  13. Highlighted Squares: … can be expressed as a set of codes (labels) in an ASCII string, e.g.: code1 | code2 | code5 | code7 | code13 | code14 | code15 | code21 | (etc.) • List of codes is potentially more succinct (concise) than original data … • codes potentially terse in themselves • multiple points in single square only coded once • empty cells not coded • Now has capability for increased precision of querying (on individual square, not bounding rectangle)

  14. What Notation to Use?(= choosing a taxonomy of space) Available coding systems (global grids): • Lat/long-based systems • 10 x 10 degree squares (WMO squares, Marsden Squares) • 6 x 4 degree squares (International Map of the World) • 2 x 1 degree squares (Maidenhead locators) • Equal-area systems • UTM grids • other National or local grids (e.g. US, UK national systems; local mapsheet refs) • commercial products (e.g. Go2, MapPlanet) • Dutton’s “Quaternary Triangular Mesh” (=basis for MS Encarta) ...Other numeric systems (e.g. postcodes, numbered features or zones) - unsuitable because of local usage only, and/or lack of scalability

  15. Basis for C-squares Codes ... • WMO (World Meteorological Organization) 10 x 10 degree squares chosen as starting point for codes • Subsequent subdivisions are base 10 (with intermediate “base 2” divisions embedded), for compatibility with decimal degrees • Name = “C-squares” (Concise Spatial Query and Representation System”) • any square (at any resolution) encoded according to this method can also be termed a “c-square”.

  16. 1800 7817 7800 1000 1017 7017 7000 5017 5000 3000 3017 5817 5800 3800 3817 WMO 10 x 10 degree squares - Numbering Principle 180°W 0°E/W 180°E 90°N 90°N 1817 NW (7xxx) NE (1xxx) Equator Equator SW (5xxx) SE (3xxx) 90°S 90°S

  17. 7307 3414 WMO 10 x 10 degree squares in practice(examples) (Maps courtesy R. Curry/WHOI)

  18. (= “4” + “99”) (= “1” + “00”) Basis for Recursive Subdivision(e.g. in NW global quadrant) (Principle as used in Australian “Blue Pages” metadata system, 1996) 10 x 10 deg. square - e.g. 7307 • divided as follows (“Blue Pages” nomenclature): • 7307:4 (5 x 5 deg. square) • 7307:487 (1 x 1 deg. square) • C-squares then extends this principle recursively, e.g. ... • 7307:487:3 (0.5 x 0.5 deg. square) • 7307:487:393 (0.1 x 0.1 deg. square) • etc. (NB, arrangement is mirror image across 0º latitude and 0º longitude: 100 is always closest to the global origin, 499 is furthest away)

  19. Actual Size Examples: 10 x 10, 5 x 5 degree squares

  20. 1-deg. square 7307:487 Actual Size Examples: 5 x 5, 1 x 1 degree squares(1 x 1 degree squares are approx. 110 x 70 km) follows template: 7307:4 7307:487 … bounded by 38º N ( 7307:487 ) and 77º W ( 7307:487 ) … 7307:487:393would be bounded by 38.9º N ( 7307:487:393 ) and 77.3º W ( 7307:487:393 )

  21. 0.1-deg. square 7307:487:393 Actual Size Examples: 0.1 x 0.1 degree squares(approx. 11 x 7 km) 7307:496 (part) 7307:497 (part) 39.1 39.0 7307:486 (part) 7307:487 (part) 38.9 follows template: 38.8 77.4 77.3 77.2 77.1 77.0 76.9 76.8

  22. Efficiency via Data Reduction Available ... • Global coverage requires up to ... • 648 10 x 10 degree squares • 64,800 1 x 1 degree squares • 259,200 0.5 x 0.5 degree squares • To reduce the number of codes required to represent large areas without compromising resolution, a “wildcard” notation is permitted, e.g.: • 3414:* to indicate 3414:1 through 3414:4 (4 codes) • 3414:*** to indicate 3414:100 through 3414:499 (100 codes) • 3414:***:* to indicate 3414:100:1 through 3414:499:4 (400 codes) • (etc.) • Result is similar to a quadtree approach (only subdivide as far as necessary, to match varying levels of detail required)

  23. 18 squares, at 0.5 deg. resolution = 219 characters Real-world c-squares implementation (example 1)

  24. Real-world c-squares implementation (example 2) 603 squares, at 0.1 deg. resolution = 7838 characters / 8 Kb

  25. Encode - Decode methods • Encoders currently available (3 versions): • original at CSIRO Marine Research (Oracle PL/SQL) • another in use at OBIS, USA (Java) • another at FishBase, ICLARM (ColdFusion) … source code for all three available via c-squares website (all these are for encoding point data) • Decoding - not needed for searching (see following slide), or for mapping if the c-squares mapper is invoked (mapper does the decoding) … otherwise, is a very simple algorithm if needed (or can do by inspection!)

  26. C -squares search mechanism (behind-the-scenes) • Look for a text match between “search” dataset extent (expressed as c-square/s) and c-squares string for any dataset, e.g.: … does “3111:499” (or “3111:4”, or “3111”) appear anywhere in the string 3013:497|3111:468|3111:478|3111:479|3111:488|3111:489|3111:499|3112:122|3112:123|3112:131|3112:132| (etc.) • Advantage 1: needs no special, vector-based searching overhead (= simple text search) • Advantage 2: “nested” nomenclature means that searching can be carried out at any level of the hierarchy equal to, or greater than, the encoded resolution • Advantage 3: search precision is now potentially to the level of an individual c-square (much better than bounding rectangle).

  27. C -squares search interface(example from CMR’s “MarLIN” metadata system) • Point-and-click user interface, e.g.:

  28. C-squares Search Result

  29. NB, c-squares string is held already within the record in HTML source code, as web call to the c-squares mapper: View Metadata Record (initial portion) ...

  30. C-squares Search Result (continued) • If no c-squares string held, defaults to standard “bounding rectangles” search, returned as “possible match”, e.g.: (this way, c-squares and non- c-squares enabled records can co-exist in the same metadata repository or in distributed searches)

  31. C-squares as Explicit Spatial Extent Code/s • C-squares can also be quoted explicitly in metadata records, or any other web document referring to a point or region:

  32. … Can Then Utilize Capabilities of a Standard Internet Search Engine, e.g.:

  33. Administrative / topographic region (example using quadtree-like approach) Predicted species distribution Satellite swath footprint C-squares applicable to a Variety of Data Types, e.g.:

  34. Pause to Take Stock ... • Light, portable, metadata-friendly system for describing a wide variety of dataset footprint types • Could be expressed as an XML element (e.g. <csquares> … </csquares>) • Codes can be easily derived from lats/longs in decimal degrees (and vice versa) • Can be used for visualization of dataset spatial extents via web link to the c-squares mapper (or similar) • Amenable to text searching via current text / web search technology - no additional hardware or software overhead needed • Improves reliability of search resultsets, fewer or no “false hits” (results suitable for quantitative analysis) • Could provide an interoperable nomenclature for previously “binned” data (e.g. into 0.1 x 0.1 degree cells, etc.)

  35. metadata systems c-squares reference/s: ... gazetteer C-squares Potential Uses ...

  36. c-squares reference/s: ... alternative to existing grid-based locator systems? C-squares Potential Uses - continued spatially enabled web pages ?? - (like “dot.geo” concept, but requiring no administrative / hardware overhead)

  37. Strengths / Weaknesses ... Strengths ... • “C-squares” is a concise and flexible method of encoding simple to moderately complex forms • Encoding/decoding is easy and follows previously documented methods; also directly related to lats and longs in decimal degrees • Spatial searching is a standard text string matching operation - already supported by most database search applications (and web search engines) • “C-squares mapper” utility available via simple web call • Can be used as adjunct to bounding coordinates searches • No proprietary software or hardware required to implement the system • Potentially globally applicable and interoperable; equally suitable to marine and terrestrial data.

  38. Strengths / Weaknesses ... Weaknesses … • WMO square nomenclature (and subdivisions) are only one of several available (competing?) “taxonomies of space” - further effort may be needed to promote it as a common/interoperable solution • C-squares is not an equal-area system - not amenable to rapid computation of areas or distances • Coding is inefficient near the poles (needs larger number of codes for same size areas) • Strings can become quite long for large, complex regions (e.g. “Pacific Ocean”) - need to be able to incorporate data reduction using “wildcard” method • Encoding algorithms not yet developed for line/ polygon vector data, only for points • Method can be ambiguous at boundaries of natural features or administrative areas (since these will not always coincide neatly with c-square boundaries).

  39. Resources Currently Available • C-squares websitewww.marine.csiro.au/csquares/ -includes: • C-squares draft specification and general background • Sample code for lat/long to c-squares conversion • On-line lat/long to c-squares converter • How to link to the c-squares mapper • Sample presentations, and links to c-squares enabled metadata records • Abstracts, presentations from 2 conferences (May, November 2002) • Paper describing c-squares submitted for publication in “Oceanography”, late 2002 (anticipated publication date March 2003)

  40. Some Questions to Consider ... • Does the system have value in the context of the present audience’s needs? • Who would be potential users? • What mechanisms could / should be utilized to promote it? • Who might have an interest in further concept / system development, if needed? • Is there a place for c-squares in formal metadata standards?

More Related