400 likes | 566 Views
C-squares - a new approach to representing, querying, displaying and exchanging dataset spatial extents at the metadata level. Tony Rees Divisional Data Centre CSIRO Marine Research, Australia (Tony.Rees@csiro.au). Talk Outline. Introduce myself, my agency, our approach to data and metadata
E N D
C-squares - a new approach to representing, querying, displaying and exchanging dataset spatial extents at the metadata level Tony Rees Divisional Data Centre CSIRO Marine Research, Australia (Tony.Rees@csiro.au)
Talk Outline • Introduce myself, my agency, our approach to data and metadata • Review characteristics of metadata, and current handling of spatial extents in metadata records • Describe limitations of “bounding rectangles” representation for non-rectangular / patchy data • The “C-squares” approach • Current c-squares resources / future possibilities
Acknowledgements ... • CMR staff and colleagues in Australia, Europe and USA for helpful discussions • WMO and Australian “Blue Pages” for nomenclature for the squares and their subdivisions • Miroslaw Ryba (CMR) for programming used in the c-squares mapper and search interface • David Hastings / NOAA “GLOBE” Task Team and CSIRO Atmospheric Research for images used as base maps • Doug Nebert / FGDC for hosting my US visit and interest in the system
Author/Agency Background • From CSIRO Marine Research in Australia (located in Hobart, Tasmania, + 2 other locations; c. 300 staff)
(dataset descriptions) metadata exposure via distributed searches Single, searchable metadatabase metadata export Multiple, heterogeneous stored data forms CMR’s Data and Metadata Storage- similar to many other agencies ...
Metadata functions • Dataset discovery - by providing a filtered subset of all possible records (according to user-specified criteria) • Dataset description - permits a degree of resource appraisal (“will this data be what I need?”) • Dataset surrogate - may enable some questions to be answered, and/or statistics compiled, without need to access the actual data … Should also provide access route to the data if required (online link or contact point) … “C-squares” assists each of the first three points above.
data rectangle hit hit hit search rectangle “Bounding Rectangles” Representation- and “overlapping rectangles” search method • Current metadata systems hold a “bounding rectangle” (bounding box) for each dataset (N, S, E, W bounding coordinates) • Spatial searching is carried out by an “overlapping rectangles” test: … cases (1) and (2) include the tacit assumption that the “data rectangle” is actually filled with data: all overlaps with the data rectangle are inferred to be overlaps with the actual data.
The “California” Problem • The State of California is a classic (previously cited) case where the bounding rectangle is a poor fit to the real spatial extent ... … “search” regions in Nevada, a little of Arizona, plus offshore Pacific Ocean will all intersect this “data” rectangle (=“false hits”)
search rectangle convex edges partial sampling convex + concave edges oblique alignment - linear exclusion areas oblique alignment - rectangular “false hit” multiple sampled regions disjunct sampled regions False Hits from “Overlapping Rectangles” Searches Potential problems can be deconstructed into 3 contributing ones ... (a) Filled polygons, but a poor fit to their bounding rectangle (b) Multiple discrete polygons (c) Incompletely filled polygons
Consequences of False Hits ... • Can get nonsensical results (sea ice at the Equator, marine species in the desert) • Time / effort wasted accessing inappropriate datasets • Cannot use resultsets quantitatively, e.g. … • how many records / species occur in this defined region • compare content of one defined region with another • sum the results of consecutive searches • etc.
bounding rectangle actual sampling points (= true dataset extent) Author’s Agency Data (typical):
gridded representation “C-squares” approach: • gives flexibility to represent a variety of dataset shapes, also patchiness (gaps in data coverage)
Highlighted Squares: … can be expressed as a set of codes (labels) in an ASCII string, e.g.: code1 | code2 | code5 | code7 | code13 | code14 | code15 | code21 | (etc.) • List of codes is potentially more succinct (concise) than original data … • codes potentially terse in themselves • multiple points in single square only coded once • empty cells not coded • Now has capability for increased precision of querying (on individual square, not bounding rectangle)
What Notation to Use?(= choosing a taxonomy of space) Available coding systems (global grids): • Lat/long-based systems • 10 x 10 degree squares (WMO squares, Marsden Squares) • 6 x 4 degree squares (International Map of the World) • 2 x 1 degree squares (Maidenhead locators) • Equal-area systems • UTM grids • other National or local grids (e.g. US, UK national systems; local mapsheet refs) • commercial products (e.g. Go2, MapPlanet) • Dutton’s “Quaternary Triangular Mesh” (=basis for MS Encarta) ...Other numeric systems (e.g. postcodes, numbered features or zones) - unsuitable because of local usage only, and/or lack of scalability
Basis for C-squares Codes ... • WMO (World Meteorological Organization) 10 x 10 degree squares chosen as starting point for codes • Subsequent subdivisions are base 10 (with intermediate “base 2” divisions embedded), for compatibility with decimal degrees • Name = “C-squares” (Concise Spatial Query and Representation System”) • any square (at any resolution) encoded according to this method can also be termed a “c-square”.
1800 7817 7800 1000 1017 7017 7000 5017 5000 3000 3017 5817 5800 3800 3817 WMO 10 x 10 degree squares - Numbering Principle 180°W 0°E/W 180°E 90°N 90°N 1817 NW (7xxx) NE (1xxx) Equator Equator SW (5xxx) SE (3xxx) 90°S 90°S
7307 3414 WMO 10 x 10 degree squares in practice(examples) (Maps courtesy R. Curry/WHOI)
(= “4” + “99”) (= “1” + “00”) Basis for Recursive Subdivision(e.g. in NW global quadrant) (Principle as used in Australian “Blue Pages” metadata system, 1996) 10 x 10 deg. square - e.g. 7307 • divided as follows (“Blue Pages” nomenclature): • 7307:4 (5 x 5 deg. square) • 7307:487 (1 x 1 deg. square) • C-squares then extends this principle recursively, e.g. ... • 7307:487:3 (0.5 x 0.5 deg. square) • 7307:487:393 (0.1 x 0.1 deg. square) • etc. (NB, arrangement is mirror image across 0º latitude and 0º longitude: 100 is always closest to the global origin, 499 is furthest away)
1-deg. square 7307:487 Actual Size Examples: 5 x 5, 1 x 1 degree squares(1 x 1 degree squares are approx. 110 x 70 km) follows template: 7307:4 7307:487 … bounded by 38º N ( 7307:487 ) and 77º W ( 7307:487 ) … 7307:487:393would be bounded by 38.9º N ( 7307:487:393 ) and 77.3º W ( 7307:487:393 )
0.1-deg. square 7307:487:393 Actual Size Examples: 0.1 x 0.1 degree squares(approx. 11 x 7 km) 7307:496 (part) 7307:497 (part) 39.1 39.0 7307:486 (part) 7307:487 (part) 38.9 follows template: 38.8 77.4 77.3 77.2 77.1 77.0 76.9 76.8
Efficiency via Data Reduction Available ... • Global coverage requires up to ... • 648 10 x 10 degree squares • 64,800 1 x 1 degree squares • 259,200 0.5 x 0.5 degree squares • To reduce the number of codes required to represent large areas without compromising resolution, a “wildcard” notation is permitted, e.g.: • 3414:* to indicate 3414:1 through 3414:4 (4 codes) • 3414:*** to indicate 3414:100 through 3414:499 (100 codes) • 3414:***:* to indicate 3414:100:1 through 3414:499:4 (400 codes) • (etc.) • Result is similar to a quadtree approach (only subdivide as far as necessary, to match varying levels of detail required)
18 squares, at 0.5 deg. resolution = 219 characters Real-world c-squares implementation (example 1)
Real-world c-squares implementation (example 2) 603 squares, at 0.1 deg. resolution = 7838 characters / 8 Kb
Encode - Decode methods • Encoders currently available (3 versions): • original at CSIRO Marine Research (Oracle PL/SQL) • another in use at OBIS, USA (Java) • another at FishBase, ICLARM (ColdFusion) … source code for all three available via c-squares website (all these are for encoding point data) • Decoding - not needed for searching (see following slide), or for mapping if the c-squares mapper is invoked (mapper does the decoding) … otherwise, is a very simple algorithm if needed (or can do by inspection!)
C -squares search mechanism (behind-the-scenes) • Look for a text match between “search” dataset extent (expressed as c-square/s) and c-squares string for any dataset, e.g.: … does “3111:499” (or “3111:4”, or “3111”) appear anywhere in the string 3013:497|3111:468|3111:478|3111:479|3111:488|3111:489|3111:499|3112:122|3112:123|3112:131|3112:132| (etc.) • Advantage 1: needs no special, vector-based searching overhead (= simple text search) • Advantage 2: “nested” nomenclature means that searching can be carried out at any level of the hierarchy equal to, or greater than, the encoded resolution • Advantage 3: search precision is now potentially to the level of an individual c-square (much better than bounding rectangle).
C -squares search interface(example from CMR’s “MarLIN” metadata system) • Point-and-click user interface, e.g.:
NB, c-squares string is held already within the record in HTML source code, as web call to the c-squares mapper: View Metadata Record (initial portion) ...
C-squares Search Result (continued) • If no c-squares string held, defaults to standard “bounding rectangles” search, returned as “possible match”, e.g.: (this way, c-squares and non- c-squares enabled records can co-exist in the same metadata repository or in distributed searches)
C-squares as Explicit Spatial Extent Code/s • C-squares can also be quoted explicitly in metadata records, or any other web document referring to a point or region:
… Can Then Utilize Capabilities of a Standard Internet Search Engine, e.g.:
Administrative / topographic region (example using quadtree-like approach) Predicted species distribution Satellite swath footprint C-squares applicable to a Variety of Data Types, e.g.:
Pause to Take Stock ... • Light, portable, metadata-friendly system for describing a wide variety of dataset footprint types • Could be expressed as an XML element (e.g. <csquares> … </csquares>) • Codes can be easily derived from lats/longs in decimal degrees (and vice versa) • Can be used for visualization of dataset spatial extents via web link to the c-squares mapper (or similar) • Amenable to text searching via current text / web search technology - no additional hardware or software overhead needed • Improves reliability of search resultsets, fewer or no “false hits” (results suitable for quantitative analysis) • Could provide an interoperable nomenclature for previously “binned” data (e.g. into 0.1 x 0.1 degree cells, etc.)
metadata systems c-squares reference/s: ... gazetteer C-squares Potential Uses ...
c-squares reference/s: ... alternative to existing grid-based locator systems? C-squares Potential Uses - continued spatially enabled web pages ?? - (like “dot.geo” concept, but requiring no administrative / hardware overhead)
Strengths / Weaknesses ... Strengths ... • “C-squares” is a concise and flexible method of encoding simple to moderately complex forms • Encoding/decoding is easy and follows previously documented methods; also directly related to lats and longs in decimal degrees • Spatial searching is a standard text string matching operation - already supported by most database search applications (and web search engines) • “C-squares mapper” utility available via simple web call • Can be used as adjunct to bounding coordinates searches • No proprietary software or hardware required to implement the system • Potentially globally applicable and interoperable; equally suitable to marine and terrestrial data.
Strengths / Weaknesses ... Weaknesses … • WMO square nomenclature (and subdivisions) are only one of several available (competing?) “taxonomies of space” - further effort may be needed to promote it as a common/interoperable solution • C-squares is not an equal-area system - not amenable to rapid computation of areas or distances • Coding is inefficient near the poles (needs larger number of codes for same size areas) • Strings can become quite long for large, complex regions (e.g. “Pacific Ocean”) - need to be able to incorporate data reduction using “wildcard” method • Encoding algorithms not yet developed for line/ polygon vector data, only for points • Method can be ambiguous at boundaries of natural features or administrative areas (since these will not always coincide neatly with c-square boundaries).
Resources Currently Available • C-squares websitewww.marine.csiro.au/csquares/ -includes: • C-squares draft specification and general background • Sample code for lat/long to c-squares conversion • On-line lat/long to c-squares converter • How to link to the c-squares mapper • Sample presentations, and links to c-squares enabled metadata records • Abstracts, presentations from 2 conferences (May, November 2002) • Paper describing c-squares submitted for publication in “Oceanography”, late 2002 (anticipated publication date March 2003)
Some Questions to Consider ... • Does the system have value in the context of the present audience’s needs? • Who would be potential users? • What mechanisms could / should be utilized to promote it? • Who might have an interest in further concept / system development, if needed? • Is there a place for c-squares in formal metadata standards?