220 likes | 285 Views
James Reid Project Manager EDINA. The geoXwalk project. funded under JISC IE Development Programme builds on Phase I scoping study aims to develop a demonstrator gazetteer service suitable for extension to full service. time-frame: start 1 June 2002 for 1 year
E N D
James Reid Project Manager EDINA
The geoXwalk project • funded under JISC IE Development Programme • builds on Phase I scoping study • aims to develop a demonstrator gazetteer service suitable for extension to full service. • time-frame: start 1 June 2002 for 1 year • project partners: EDINA and UK Data Archive • aim: to develop a ‘proof of concept’ demonstrator
Provision layer Fusion layer geoXwalk Presentation layer JISC Information Environment -geoXwalk as ‘shared service’ Content providers Shared services Authentication Authorisation Broker/Aggregator Collect’n Desc Portal Portal Portal Service Desc Resolver Inst’n Profile End-user
Geo-referencing: that’s what’s special about the spatial • subject content most often referenced by topic … … but much (80%?) can be referenced to specific geographic places • broad disciplinary base for more powerful geographic searching • across the social, life & physical sciences as well as the humanities • also from libraries, archives and museums • now from digital libraries, service providers & data providers • geo-referencing thus a way of viewing information content: • subject, people, place and time • geographic co-ordinates are persistent regardless of name, political boundary or other changes
Why this is difficult... • How to search ‘geographically’ given that : e.g. a postcode, a placename and an administrative area are all valid geographies and yet every information system cannot know about all the possible variations of what constitutes a ‘geography’! • Problem compounded by inconsistency of use even in the ‘standards’ e.g. placenames evolve, have alternative names • Long history in UK of boundary changes and changes in the geographies used to record things e.g. electoral ward boundary changes …
There is underlying complexity, such as Multiple Geographies …
The vision • Make variations in definitions of ‘geography’ transparent • Provide a means to ‘crosswalk’ geographies i.e. translate one geography into another - hence the name • ‘Geographic agnosticism’ How? • A digital gazetteer that stores the different geographies and can implicitly resolve the relationships between them • Provision as a service to service other services
Gazetteer -A list of geographic features together with their associated spatial location Digital Gazetteer -An electronic list of geographic features together with their associated spatial location (An authority database of places (and features?)) Digital Gazetteer Service -A network-addressable middle-ware server supporting geographic referencing and searching. A shared ‘terminology’ service.
Why not just use hierarchical thesauri? (part of the ‘Document Tradition’) United Kingdom………………………… (nation) England …………………………..(country) Devon………………………….. (county) Barton……………………………….. Comment: • one type of simple relationship between entries is exploited • entries ordered from very general to very specific (BT, NT) • can efficiently determine what a given area contains • normally structured to handle alternative names (SY) • rigid structure, one view only, typically geo-political entities can belong in many hierarchies and new relationships evolve • names may not be unique • cannot deal with spatial proximity / contiguity • no way to relate to other geographies, e.g. postcodes • lack of simple hierarchies in UK (and other ‘old’) geographies …
Uses of geoXwalk Digital Gazetteer Service 1. As ‘shared service’, enabling other information services to support full range of spatial searching (query constraints) • no need to hold all data (at service) to resolve spatial query • uses co-ordinates and (implicit) spatial relationships to ‘cross-walk’ between geographies • machine-to-machine (m2m) interaction to ‘shared service’ 2. As reference facility for researchers, libraries & museums • including means to resolve variant names etc. 3. As online facility to assist metadata creators and means to semi-automatically geo-reference existing resources
Information server Information server geoXwalk Use Cases Geo-parsing & indexing Searching (1 - use cases) The geoXwalk Server e.g. • Where is Aberdour? • On what river is Dundee situated? • By what alternative names has York been known? • List me all places ending with ‘kirk’ Searching (2) Reference use
Query for a placename <?xml version="1.0" encoding="UTF-8"?> <gazetteer-service xmlns="http://www.alexandria.ucsb.edu/gazetteer" version="1.1"> <query-request> <gazetteer-query> <name-query operator="equals” text="Fife"/> </gazetteer-query> <report-format>standard</report-format> </query-request> </gazetteer-service> Query by feature type and bounding box <?xml version="1.0" encoding="UTF-8"?> <gazetteer-service xmlns="http://www.alexandria.ucsb.edu/gazetteer" xmlns:gml="http://www.opengis.net/gml" version="1.1"> <query-request> <gazetteer-query> <and> <class-query thesaurus="Edina FT Thesaurus” term="towns"/> <footprint-query operator="within"> <gml:Box> <gml:coordinates> -0.02988,51.45753, 1.30798,52.07042 </gml:coordinates> </gml:Box> </footprint-query> </and> </gazetteer-query> <report-format>standard</report-format> </query-request> </gazetteer-service> XML query fragments
Developments to Date • Creation & population of GB gazetteer database with: • Enhanced OS 1:50,000 Placename Gazetteer • Digital boundary data (UKBORDERS) • Additional Place Name Variants (partial for Scotland and Wales) • Derived multi-source data e.g. named woodlands and lakes based on hybrid 1:50K gazetteer and OS products • Development of spatial extensions to database to support enhanced geographic search capabilities • Development of middleware to support m2m and interactive searching • Support for and testing of alternative query protocols -ADL / Z39.50(?) • Development of a geoparser to support semi-automatic indexing
Ongoing Work and Issues • Merging geo-data from different scales & from different sources • how to accommodate historical data • positional accuracy & expression of confidence? • how to minimise effort in de-duplication of place(s)? • places have multiple names, types, and footprints • need to be able to identify duplicate entries for the same place • Presenting geo-names on different occasions? • many variant ‘proper’ names, what is preferred? • what is the ‘name authority body’? - none in the Scotland or the UK • preferred name varies with location and use and culture • there are language and character code set issues • ‘standard’ codes for postal addresses and other geographies • IPR issues in metadata; and hence terms & conditions of use • Service performance issues and appropriate protocols
Contact details • James.Reid@ed.ac.uk EDINA, Data Library, University of Edinburgh telephone +44 (0)131 650 3302 • For information on geoXwalk project: www.geoXwalk.ac.uk
co-ordinates allow (near) co-located places to be co-identified. Using spatial proximity in an active gazetteer, the search can be widened: PlaceCounty/UA Liverpool Liverpool Bebbington Wirral Birkenhead Wirral Bootle Sefton New Brighton Wirral Seacombe Wirral Seaforth Wirral Waterloo Sefton … that means more & better hits …. !!! 15 Task: Find resource about 'Liverpool docks’ Search using a ‘traditional’ gazetteer might yield: 5
Place name - River Tweed Feature Type: River Relation: ‘near’ Distance: 1/2 km Target type: towns (Images indexed on place names) Image finder server Supporting service searching: “Photographs of towns along the River Tweed” Places... Peebles Innerleithen Melrose Kelso Coldstream Berwick upon Tweed
Coordinate footprints - Dundee (334995, 729203, 350609, 734710) Supporting cross searching: geoXwalk in the Common Information Environment Places: Barnhill Broughty Ferry Craigie Douglas And Angus Fintry Lochee Monifieth West Ferry <
340900,392300 - 347217, 397660 Knowsley geoXwalkServer Portal service BX003 Content Provider C Content Provider A Content Provider B Supporting cross searching different services ‘Find resources for this postcode’(NB postcode often used to geo-reference survey data files) Post code: L34 0HS? Coordinate footprints Place names Parish names <
As online facility to assist metadata creation • Most of the extant resources in the JISC IE have some form of spatial reference e.g. placename, county name, postcode • A ‘geoparser’ has been developed which will assist in the semi-automatic indexing of these resources by using the gazetteer as reference. • The results of the geoparsing can be used to update the documents metadata, making it directly geographically searchable.