330 likes | 482 Views
Geographical Web Search Engines and Geographical Information Retrieval (GIR). Christopher Jones Cardiff University. Where is Geo-information?. Personal knowledge (in our heads) of landscape, of where things, people and services are located, where things happened… Documents (various media)
E N D
Geographical Web Search Enginesand Geographical Information Retrieval (GIR) Christopher Jones Cardiff University Edinburgh Euro GeoInf 2007
Where is Geo-information? Personal knowledge (in our heads) • of landscape, of where things, people and services are located, where things happened… Documents (various media) • Lists of where facilities, resources, structures are located • Textual descriptions of geographic phenomena • Images and videos of geographic space Maps Edinburgh Euro GeoInf 2007
GIS and the Web World Wide Web is : • Global networked • Supports everyone on Internet • Accessed publicly • Vast range of topics • Unstructured free text / images • Finds documents • Easy to use A GIS typically : • Isolated • Supports individual organisation • Accessed privately • Small range of topics • Structured data / geo-coded locations • Finds answers • Complicated to use Edinburgh Euro GeoInf 2007
WWW as a source of geo-information • Geographic context embedded in natural language descriptions • Web queries depend on exact match of text terms • No intelligent interpretation of spatial relationships (“near”, “west” etc) • Place names ambiguous and confused with names of organisations, people, buildings and streets • No geo-relevance ranking
Current motivation of GIR : Find geo-specific resources on the Web (mostly documents and images) near north find web resources about Somethingrelated_toSomewhere related_to = in, near, within Xkm, north_of ..etc. • Resolve ambiguity of names (many places have same name) • Interpret the query spatial relationships query footprint • Find documents geographically associated with region of query footprint • Relevance rank geographically by place and subject
GIR, GIS and The Web GIS Geo-knowledge GIR World Knowledge The Web Edinburgh Euro GeoInf 2007
Geographical Search Engines • Google etc have “local” versions. -Based on business (yellow pages) directories. Edinburgh Euro GeoInf 2007
Geographical Search Engines SPIRIT research prototype general geo-web search Structured user interface: Dropdown menu of spatial relationships Edinburgh Euro GeoInf 2007
Geographical search engines SPIRIT Results listed as URLs Plus symbols on map Edinburgh Euro GeoInf 2007 User Interface screen shots from Ross Purves et al University of Zurich
Anatomy of a Geographical Search Engine Query disambiguation Place Ontology Query footprint Broker Search Request + Query footprint User Interface Unranked Results Ranked Results Relevance Ranking Search Engine Ranked Results Web Resources Textual Indexes Spatial Textual Spatial Textual Document Footprints Text Indexing Geo- tagging Spatial
Geo-Tagging = Geo-parsing + Geo-coding Geo-parsing Recognising genuine geographic references (place names, addresses, post codes, phone codes ) ignoring non-geographic uses. Geo-coding • Attaching a unique quantitative locations (footprint) to geographic references
Geo-Parsing : true & false references Some types of false geographic reference • Personal names Smedes York • Business name Dorchester Hotel, York Properties.. • Street names Oxford Street, London Road… • Common words that are also places urban, institute, land, battle, derby, over, well, …… Edinburgh Euro GeoInf 2007
Geo-Parsing : distinguishing between false and true geo-references Look for patterns and context Personal names (Jack London, Mr York): <First_name> <Location>; <Title> <Location> Business names (Paris Hotel) : <Business_type> <Location> (or vice versa) Street names (Oxford Street) : <Location> <Road_type> Detect spatial propositions in, near, south of, outside etc “he lived inOver” Genuine occurrences can be used to train machine learning Edinburgh Euro GeoInf 2007
Geo-coding (grounding) the genuine geo-references Many different places with the same name (referent ambiguity) Newport, Cambridge, Springfield……… Use context to decide (references to parent or nearby places ) Or – choose most important one (by population or place type hierarchy) Edinburgh Euro GeoInf 2007
Anatomy of a Geographical Search Engine Query disambiguation Place Ontology Query footprint Broker Search Request + Query footprint User Interface Unranked Results Ranked Results Relevance Ranking Search Engine Ranked Results Web Resources Textual Indexes Spatial Textual Spatial Textual Document Footprints Geo- tagging Text Indexing Spatial Edinburgh Euro GeoInf 2007
Indexing Web Resources Standard text index is inverted file Query: RestaurantsinCardiff Find documents that contain all terms Works literally for “in” but won’t find contained places. Doesn’t work in general for “near”, “Xkms from”, “north_of” etc Edinburgh Euro GeoInf 2007
Why Spatial Indexing? Query “Hotels outside and within 30Kms of Glasgow” Need to find documents referring to hotels that are in places other than Glasgow Query : “Castles in Wales” Need to find documents that refer to names of places in Wales (perhaps without mentioning “Wales”) • In both cases to use conventional text indexing requires a query to contain the names of all places in Wales and all places outside Glasgow within 30km Edinburgh Euro GeoInf 2007
Spatial indexing of resources • Use dominant geographic references of documents to create document footprints (point, polygon, bounding rectangle..) • Use footprints to index documents • Convert query to a query footprint • Match query footprint to doc. footprints Spatial Query Result Edinburgh Euro GeoInf 2007
Anatomy of a Geographical Search Engine Query disambiguation Place Ontology Query footprint Broker Search Request + Query footprint User Interface Unranked Results Ranked Results Relevance Ranking Search Engine Ranked Results Web Resources Textual Indexes Spatial Textual Spatial Textual Document Footprints Geo- tagging Text Indexing Spatial Edinburgh Euro GeoInf 2007
Geographical Relevance Ranking Q D Example: airports near Leicester the further away, the lower the spatial score • Determine “distance” between query footprint and document footprint • Depends on query spatial operator (in, outside, X Kms from, north_of etc) Spatial score Edinburgh Euro GeoInf 2007 Figure from Marc van Kreveld, University of Utrecht
Combining textual and spatial scores • Textual scores: BM25 • Spatial scores: by spatial footprint analysis query / ideal footprint 1 normalized BM25 score footprints of documents Figure from Marc van Kreveld University of Utrecht 0 1 spatial score
Anatomy of a Geographical Search Engine Query disambiguation Place Ontology Query footprint Broker Search Request + Query footprint User Interface Unranked Results Ranked Results Relevance Ranking Search Engine Ranked Results Web Resources Textual Indexes Spatial Textual Spatial Textual Document Footprints Geo- tagging Text Indexing Spatial Edinburgh Euro GeoInf 2007
Place Ontology Encodes knowledge of terminology and structure of geographic space • alternative names, languages • place types (political, topographic, social.. ) • footprint (point, MBR, polygon) • spatial relationships and attributes : containment, adjacency, overlap • imprecise (vernacular) places (“Midlands”, “south of France”, “Scottish borders”, “Pennines”, “Highlands”…..) Derive from gazetteers, thesauri, maps & the web Edinburgh Euro GeoInf 2007
Roles of Place Ontology Web collection Metadata Extraction document footprints User Interface Geo-Tagging document footprints Query Disambiguation Spatial Index Relevance Ranking Query Expansion Relevance Ranking Search Component (query footprint) ontology Edinburgh Euro GeoInf 2007
Mining text on the web for vernacular place name knowledge • Objective: estimate spatial extent of vague place • Documents that refer to vague places may also refer to more precise places inside them. • Places that occur frequently in association with a target named place may have higher chance of being inside • Analyse frequency of occurrence of co-located places Edinburgh Euro GeoInf 2007
Places mentioned in documents retrieved by queries on the “Cotswolds” Figure from Ross Purves et al University of Zurich Edinburgh Euro GeoInf 2007
GIR and GIS • GIR currently dominated by web search • Unstructured results in multiple documents • Sometimes single focused result wanted • Hotels within 1 kilometre of the British Museum in London • Where are pre-sixteenth century dwellings in USA? • Which areas of East Anglia would be flooded if sea level rose by 1 metre? Edinburgh Euro GeoInf 2007
Bringing GIR and GIS together GIS Geo-knowledge Geo-knowledge GIR World Knowledge GIS The Web GIR World Knowledge The Web Edinburgh Euro GeoInf 2007
GeoInformation Services Encode Geo-information in Web Services (Geo-services) • Parse natural language queries • Interpret geo-terminology of queries • Identify the relevant geo-services to match geo and non-geo concepts • Compose appropriate chain of services Edinburgh Euro GeoInf 2007
EU - TRIPOD Project • Improve accessibility of images on web • Focus on geographical context • Enhance captions / metadata for archival images • Automatically generate captions for images from location / orientation – aware cameras • Web harvesting to enrich metadata • Interpret (vague) spatial natural language • Toponym ontology of places and landmarks (including vernacular places) • Use 3D landscape models to determine what is in camera view • Prototype image search engine Edinburgh Euro GeoInf 2007 http://tripod.shef.ac.uk/index.html
Future of GIR? • Improve “conventional GIR” components: • Geo-tagging, spatio-textual indexing and geo-relevance ranking • Place ontologies with world-wide coverage • Understanding of spatial natural language • Integrate time & space (temporal language) • Open GeoInformation Web services • Adapt GIR to personal needs & location Edinburgh Euro GeoInf 2007
More Information • See www.geo-spirit.org for information on SPIRIT project and downloads of articles and project deliverables. [N.B. Prototype search engine (with link from SPIRIT web site) is no longer functional] TRIPOD : www.ProjectTripod.org Edinburgh Euro GeoInf 2007