270 likes | 432 Views
Tony Rees for OBIS TWG, March 2003. “How to deal with place name data for OBIS”. Two (different) aspects to consider ... Place name conversion => lats/longs or polygons (data custodian end) Place name conversion to search area (portal end).
E N D
Tony Rees for OBIS TWG, March 2003 “How to deal with place name data for OBIS” Two (different) aspects to consider ... • Place name conversion => lats/longs or polygons (data custodian end) • Place name conversion to search area (portal end)
Need to consider both precision and accuracy(not the same!) • Precision • “a measure of the ability to distinguish between nearly equal values” • “the number of decimal places to which a number is computed” • Accuracy • “How close to the real value a measurement is”. Note, a number of dictionaries give accuracy as a synonym for precision, which is incorrect...
Relevance for OBIS … • What precision is associated with the quoted locality name? • locality may be large or small • did specimen come from at/within the actual locality, or the region around it? (i.e., “nearest named place”) • how precisely are displaced distances and directions quoted (e.g. “5 miles NE of point xyz”) • Precision of available lat/long values for the quoted locality • to nearest degree? minute? second? If decimal degrees, how many decimal places quoted? Is precision in fact over-stated? • Accuracy of available lat/long values • if quoted in different sources, do these agree? If checked on a (“reliable”) map, are they in fact correct? • Need to distinguish between “real” and “apparent” precision (see next slide)
Which is most precise? • 147.31666666º E (Alexandria Digital Library Gazetteer) - source A • 147.3167º E (Falling Rain Global Gazetteer) - source B • 147.317º E (Australian Gazetteer) - source C • 147º18’59” E (Alexandria Digital Library Gazetteer) - source A • 147º19’00” E (Falling Rain Global Gazetteer) - source B • 147º19’ E (Australian Gazetteer) - source C NB, 1’ of latitude/longitude is approx. 0.02 degrees (actually 0.0167) or around 1.7 km (1 mile approx.) . Questions to consider ... • What is the true (vs. apparent) precision of the above measurements? • Which of them is most accurate? (or how would you tell?)
Tools for Data Capture • Place names => lats/longs … common museum exercise (“geocoding” or “georeferencing”) • Requires use of gazetteers or maps • Some gazetteers/maps available on web e.g.: • MS Encarta, Expedia.com - http://www.expedia.com/pub/agent.dll?qscr=mmfn • NGDC/WDC MGG, Boulder Marine Coastline Extractor -http://oas.ngdc.noaa.gov/mgg/plsql/extractor.mapit • Alexandria Digital Library Gazetteer (4.4 million names) - http://fat-albert.alexandria.ucsb.edu:8827/gazetteer/ • Falling Rain Genomics Global Gazetteer (2.8 million names) -http://www.calle.com/world/ • National, local gazetteers e.g. Australia -http://www.agso.gov.au/map/names/ • May possibly be available as “web services” in future (machine addressable)
Example - search for “Tinderbox, Australia”(small populated place near my home - <500 inhabitants)
MS Encarta, Expedia (max. zoom in) - includes some quite small places, however not all (inconsistent);- best for showing other named places in surrounding area… cf. New Brunswick at same scale:
NGDC/WDC MGG, Boulder-Marine Coastline Extractor(near max. zoom in) … no placenames, but could use overlay lat/long grid for georeferencing where coastal features are unambiguously recognizable
“Falling Rain” Global Gazetteer result (max. zoom in) - gives lat, long and locator map/s- coastline is quite high resolution- max. zoom level is a bit restricting
Alexandria Dig. Libr. Gazetteer search result (max. zoom in)
“Australian Gazetteer” search result - a very well-populated gazetteer, couldn’t find anything missing (on a brief look)
“Australian Gazetteer” search result - cont’d - show on map (not zoomable)
Printed 1:50 000 map (detail) ü ü ü ü ü ü ü ü ü ü ( ü= indexed in Austr. Gazetteer) ü ü ü
Where actually is “Tinderbox”? • Alexandria Digital Library Gazetteer: 147.31666666º E (147º18’59”), 43.049999º S (43º2’59”) • Falling Rain Global Gazetteer: 147.3167º E (147º19’00”), 43.0500º S (43º2’60”) • Australian Gazetteer (Official): 147.317º E (147º19’), 43.050º S (43º03’) • NGDC/WDC Coastline Extractor (eye estimate): 147.33º E, 43.05º S • 1981 1:50 000 map (eye estimate): 147º19’30”E, 43º03’30”S • maybe need to go down there with a hand-held GPS!! - all compatible (surprise!) - actually (1) - (2) are derived from (3) anyway!! - map allows additional accuracy to maybe +/- 1/5 min (12 seconds, ~0.003 degrees or 300 meters) - however would need to be careful about datum actually used at this fine scale (pre-1980s datum slightly different from current) - at fine scale, difficulty in determining exact centre of “Tinderbox” - “Official” Australian coordinates stated as only accurate to nearest 1 minute (approx. 1.8 km) - so really we are “snapping to a grid” (potentially misleading precision when converted to decimals) --- see next slide ...
147º19’ 147º20’ 147º21’ Printed 1:50 000 map (enlarged detail - grid squares are 1 x 1 km) 9 9 9 43º03’ 9 9 9 43º04’
revisit questions posed earlier ... • 147.31666666º E (Alexandria Digital Library Gazetteer) - source A • 147.3167º E (Falling Rain Global Gazetteer) - source B • 147.317º E (Australian Gazetteer) - source C • 147º18’59” E (Alexandria Digital Library Gazetteer) - source A • 147º19’00” E (Falling Rain Global Gazetteer) - source B • 147º19’ E (Australian Gazetteer) - source C • What is the true (vs. apparent) precision of the above measurements? • true precision given by source C (which the others have simply copied to their own systems) to be +/- 30” (0.008º)- thus true quoted value should be 147.32, and seconds are meaningless • Sources A and B are wildly overstating the true precision, in both decimal places and seconds • Even source C is overstating the true precision when expressed in decimal degrees! • Which of them is most accurate? • all equally inaccurate - by around 0.8 km (0.5 miles) - true position determined from map is 147º19’30” E, = 147.325 (+/- 0.003 approx.)
Potential traps when georeferencing from gazetteers/maps • May be multiple places with the same name (including some maybe not in Gazetteer) • May be variant/misspelled/historic names for the same place • A feature extent can be much larger than designated lat/long reference would imply (e.g. River, Bay, Island, Channel, Strait, Sea…) - where within (or adjacent to) a larger “polygon” is the real locality? (Centre - as typically quoted - may well be an incorrect assignation) • Precision of coordinates from any source needs to be known and not unintentionally misrepresented when converted to decimals • Map, Gazetteer source used should be recorded, in case it contains errors which can be detected/corrected retrospectively if needed (continued…)
Potential traps when georeferencing from gazetteers/maps (continued) • Map can be misleading if older datum used or if otherwise erroneous - however often still best source to see “real” feature extents, minor feature names, and detailed coastal topography • “Distance from”, “Direction from …” may be approximations only and introduce their own errors/uncertainty; also, may be unclear where actually measured from … (e.g., centre or edge of named locality?) • Numeric values obtained should be believable - e.g. marine locations not on land, species not too far from expected range (otherwise more checks needed - maybe ID is wrong, locality mis-reported, etc.) • Precision needs to be reported in a consistent manner - e.g. see next slide.
Precision for OBIS ... My suggestion would be to use a scale e.g. 1-n (1=best), e.g.: • 1 = estimated precision better than 100m / 0.001 degrees • 2 = estimated precision better than 500m / 0.005 degrees • 3 = estimated precision better than 1km / 0.01 degrees • 4 = estimated precision better than 5km / 0.05 degrees • 5 = estimated precision better than 10 km / 0.1 degrees • 6 = estimated precision better than 50 km / 0.5 degrees • 7 = estimated precision better than 100 km / 1 degree • 8 = estimated precision better than 500 km / 5 degrees • 9 = estimated precision better than 1000 km / 10 degrees • 0 = estimated precision > 1000 km (unmappable) … could then select points of relevant precision when mapping at different scales (e.g. precisions 1-6 acceptable for map at 0.5 deg. resolution, precisions 1-8 acceptable for map at 5 deg. resolution) Note: Darwin Core already has a field “Coordinate Precision” - to hold a value in meters (although such values may be too precise!!!)
1 km (precision “3”) and 5 km (precision “4”) radii from 147º19’E, 43º03’S(official quoted position for “Tinderbox”)- note, stated locality may simply be “nearest named place”, not actual location 9 NB: could only improve on this if (1) locality accurately known (e.g. by named small coastal feature or “X” on map), and (2) coordinates quoted to higher precision.
Is it worth representing localities by polygons? • Could be done, but may be too difficult for OBIS to query at this time • Could represent improved precision and accuracy, cf. “point and radius” treatment - but a tool would be needed for data input, unless standard lookup table available (such a tool has been described - see Proctor/Blum/Chaplin, 2001 *), also polygon boundaries would need to be stored in a standard notation (e.g. ISO metadata format) • Value might be questionable, considering the likely precision associated with quoted [marine] localities - how useful is a polygon for “Gulf of xyz” if the exact locality is not known? (Would be different if the data actually were polygon, rather than point data). * “A Software Tool for Retrospectively Georeferencing Specimen Localities using ArcView” - Elizabeth J. Proctor, Stanley D. Blum and George Chaplin (available on the web at http://www.calacademy.org/research/informatics/georef/Main_Pages/2_Background.html)
Some implications for OBIS data storage ... • 1. Store original (“verbatim”) locality information as well as designated lat/long -- may contain important information lost during “translation” (and may also be more precise). OBIS may wish to display it for additional user information if available ?? • ?= Darwin Core “Locality” (or locality + other fields?) -- see below • 2. Designated lat/long and assigned precision would be fundamental parameters for OBIS to query. • 3. Need to consider value of storing/accessing polygons if available (or not bother?) • NB comparison with Darwin Core v2: • Darwin Core has the following “optional” fields available (in addition to lat, long)… • “Continent Ocean” • “Country” (from ISO list) • “State Province” • “County” • “Locality” (place name + optional displacement from…) • “Coordinate Precision” (= radius of circle in meters) • How much of this is relevant to OBIS usage (marine, cf. terrestrial specimens)?
2: Searching the portal by placename • Most gazetteers hold only centre point (lat, long) for any place • Some entries in Alexandria D.L. Gazetteer have bounding box • My “MarLIN” system (+ others in Australia) has bounding box for ~120 pre-defined areas including named ocean/seas, etc., e.g. ...
Available alternatives to region representation by bounding box: • Every data point pre-assigned to an item on a controlled list of named regions (in hierarchy) - then could do text/index search • (similar to e.g. ASFA indexing terms) • More realistic geographic “footprint” stored for each defined region - either: • Centre point and radius (probably not ideal) • True polygon (requires GIS back end for the searching) • C-squares representation (or similar) - multiple small rectangles per region • At present, my own system uses bounding boxes to represent regions, most users seem happy with the approximations required • Suggest OBIS implements rectangle representation as first step, investigate possible improvements/refinements later
Requirement then is to construct a list of geographic search terms to be made available - and to define/store the relevant rectangle for each. • Q1: How big should the list be? (hundred/s, thousand/s …) • Q2: What types of locality should be listed - e.g. • ocean/sea/gulf/bay name • river/estuary name • island/cape/point names (coastal features) • seafloor feature names (seamounts, canyons, reefs) • political/administrative entities e.g. country, state names • coastal city names … (list soon gets long!) • Comment: could eventually envisage more complex queries with full GIS capability, e.g. “within x miles of named region y” (although probably not at this time) • OBIS would need to expend some resources to build such a list, unless available from a public/commercial source (e.g. as vector data from which rectangles could easily be calculated).