1 / 32

Geographical Web Search Engines and Geographical Information Retrieval (GIR)

Geographical Web Search Engines and Geographical Information Retrieval (GIR). Christopher Jones Cardiff University. Where is Geo-information?. Personal knowledge (in our heads) of landscape, of where things, people and services are located, where things happened… Documents (various media)

keaton-knox
Download Presentation

Geographical Web Search Engines and Geographical Information Retrieval (GIR)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Geographical Web Search Enginesand Geographical Information Retrieval (GIR) Christopher Jones Cardiff University Edinburgh Euro GeoInf 2007

  2. Where is Geo-information? Personal knowledge (in our heads) • of landscape, of where things, people and services are located, where things happened… Documents (various media) • Lists of where facilities, resources, structures are located • Textual descriptions of geographic phenomena • Images and videos of geographic space Maps Edinburgh Euro GeoInf 2007

  3. GIS and the Web World Wide Web is : • Global networked • Supports everyone on Internet • Accessed publicly • Vast range of topics • Unstructured free text / images • Finds documents • Easy to use A GIS typically : • Isolated • Supports individual organisation • Accessed privately • Small range of topics • Structured data / geo-coded locations • Finds answers • Complicated to use Edinburgh Euro GeoInf 2007

  4. WWW as a source of geo-information • Geographic context embedded in natural language descriptions • Web queries depend on exact match of text terms • No intelligent interpretation of spatial relationships (“near”, “west” etc) • Place names ambiguous and confused with names of organisations, people, buildings and streets • No geo-relevance ranking

  5. Current motivation of GIR : Find geo-specific resources on the Web (mostly documents and images) near north find web resources about Somethingrelated_toSomewhere related_to = in, near, within Xkm, north_of ..etc. • Resolve ambiguity of names (many places have same name) • Interpret the query spatial relationships query footprint • Find documents geographically associated with region of query footprint • Relevance rank geographically by place and subject

  6. GIR, GIS and The Web GIS Geo-knowledge GIR World Knowledge The Web Edinburgh Euro GeoInf 2007

  7. Geographical Search Engines • Google etc have “local” versions. -Based on business (yellow pages) directories. Edinburgh Euro GeoInf 2007

  8. Geographical Search Engines SPIRIT research prototype general geo-web search Structured user interface: Dropdown menu of spatial relationships Edinburgh Euro GeoInf 2007

  9. Geographical search engines SPIRIT Results listed as URLs Plus symbols on map Edinburgh Euro GeoInf 2007 User Interface screen shots from Ross Purves et al University of Zurich

  10. Anatomy of a Geographical Search Engine Query disambiguation Place Ontology Query footprint Broker Search Request + Query footprint User Interface Unranked Results Ranked Results Relevance Ranking Search Engine Ranked Results Web Resources Textual Indexes Spatial Textual Spatial Textual Document Footprints Text Indexing Geo- tagging Spatial

  11. Geo-Tagging = Geo-parsing + Geo-coding Geo-parsing Recognising genuine geographic references (place names, addresses, post codes, phone codes ) ignoring non-geographic uses. Geo-coding • Attaching a unique quantitative locations (footprint) to geographic references

  12. Geo-Parsing : true & false references Some types of false geographic reference • Personal names Smedes York • Business name Dorchester Hotel, York Properties.. • Street names Oxford Street, London Road… • Common words that are also places urban, institute, land, battle, derby, over, well, …… Edinburgh Euro GeoInf 2007

  13. Geo-Parsing : distinguishing between false and true geo-references Look for patterns and context Personal names (Jack London, Mr York): <First_name> <Location>; <Title> <Location> Business names (Paris Hotel) : <Business_type> <Location> (or vice versa) Street names (Oxford Street) : <Location> <Road_type> Detect spatial propositions in, near, south of, outside etc “he lived inOver” Genuine occurrences can be used to train machine learning Edinburgh Euro GeoInf 2007

  14. Geo-coding (grounding) the genuine geo-references Many different places with the same name (referent ambiguity) Newport, Cambridge, Springfield……… Use context to decide (references to parent or nearby places ) Or – choose most important one (by population or place type hierarchy) Edinburgh Euro GeoInf 2007

  15. Anatomy of a Geographical Search Engine Query disambiguation Place Ontology Query footprint Broker Search Request + Query footprint User Interface Unranked Results Ranked Results Relevance Ranking Search Engine Ranked Results Web Resources Textual Indexes Spatial Textual Spatial Textual Document Footprints Geo- tagging Text Indexing Spatial Edinburgh Euro GeoInf 2007

  16. Indexing Web Resources Standard text index is inverted file Query: RestaurantsinCardiff Find documents that contain all terms Works literally for “in” but won’t find contained places. Doesn’t work in general for “near”, “Xkms from”, “north_of” etc Edinburgh Euro GeoInf 2007

  17. Why Spatial Indexing? Query “Hotels outside and within 30Kms of Glasgow” Need to find documents referring to hotels that are in places other than Glasgow Query : “Castles in Wales” Need to find documents that refer to names of places in Wales (perhaps without mentioning “Wales”) • In both cases to use conventional text indexing requires a query to contain the names of all places in Wales and all places outside Glasgow within 30km Edinburgh Euro GeoInf 2007

  18. Spatial indexing of resources • Use dominant geographic references of documents to create document footprints (point, polygon, bounding rectangle..) • Use footprints to index documents • Convert query to a query footprint • Match query footprint to doc. footprints Spatial Query Result Edinburgh Euro GeoInf 2007

  19. Anatomy of a Geographical Search Engine Query disambiguation Place Ontology Query footprint Broker Search Request + Query footprint User Interface Unranked Results Ranked Results Relevance Ranking Search Engine Ranked Results Web Resources Textual Indexes Spatial Textual Spatial Textual Document Footprints Geo- tagging Text Indexing Spatial Edinburgh Euro GeoInf 2007

  20. Geographical Relevance Ranking Q D Example: airports near Leicester the further away, the lower the spatial score • Determine “distance” between query footprint and document footprint • Depends on query spatial operator (in, outside, X Kms from, north_of etc)  Spatial score Edinburgh Euro GeoInf 2007 Figure from Marc van Kreveld, University of Utrecht

  21. Combining textual and spatial scores • Textual scores: BM25 • Spatial scores: by spatial footprint analysis query / ideal footprint 1 normalized BM25 score footprints of documents Figure from Marc van Kreveld University of Utrecht 0 1 spatial score

  22. Anatomy of a Geographical Search Engine Query disambiguation Place Ontology Query footprint Broker Search Request + Query footprint User Interface Unranked Results Ranked Results Relevance Ranking Search Engine Ranked Results Web Resources Textual Indexes Spatial Textual Spatial Textual Document Footprints Geo- tagging Text Indexing Spatial Edinburgh Euro GeoInf 2007

  23. Place Ontology Encodes knowledge of terminology and structure of geographic space • alternative names, languages • place types (political, topographic, social.. ) • footprint (point, MBR, polygon) • spatial relationships and attributes : containment, adjacency, overlap • imprecise (vernacular) places (“Midlands”, “south of France”, “Scottish borders”, “Pennines”, “Highlands”…..) Derive from gazetteers, thesauri, maps & the web Edinburgh Euro GeoInf 2007

  24. Roles of Place Ontology Web collection Metadata Extraction document footprints User Interface Geo-Tagging document footprints Query Disambiguation Spatial Index Relevance Ranking Query Expansion Relevance Ranking Search Component (query footprint) ontology Edinburgh Euro GeoInf 2007

  25. Mining text on the web for vernacular place name knowledge • Objective: estimate spatial extent of vague place • Documents that refer to vague places may also refer to more precise places inside them. • Places that occur frequently in association with a target named place may have higher chance of being inside • Analyse frequency of occurrence of co-located places Edinburgh Euro GeoInf 2007

  26. Places mentioned in documents retrieved by queries on the “Cotswolds” Figure from Ross Purves et al University of Zurich Edinburgh Euro GeoInf 2007

  27. GIR and GIS • GIR currently dominated by web search • Unstructured results in multiple documents • Sometimes single focused result wanted • Hotels within 1 kilometre of the British Museum in London • Where are pre-sixteenth century dwellings in USA? • Which areas of East Anglia would be flooded if sea level rose by 1 metre? Edinburgh Euro GeoInf 2007

  28. Bringing GIR and GIS together GIS Geo-knowledge Geo-knowledge GIR World Knowledge GIS The Web GIR World Knowledge The Web Edinburgh Euro GeoInf 2007

  29. GeoInformation Services Encode Geo-information in Web Services (Geo-services) • Parse natural language queries • Interpret geo-terminology of queries • Identify the relevant geo-services to match geo and non-geo concepts • Compose appropriate chain of services Edinburgh Euro GeoInf 2007

  30. EU - TRIPOD Project • Improve accessibility of images on web • Focus on geographical context • Enhance captions / metadata for archival images • Automatically generate captions for images from location / orientation – aware cameras • Web harvesting to enrich metadata • Interpret (vague) spatial natural language • Toponym ontology of places and landmarks (including vernacular places) • Use 3D landscape models to determine what is in camera view • Prototype image search engine Edinburgh Euro GeoInf 2007 http://tripod.shef.ac.uk/index.html

  31. Future of GIR? • Improve “conventional GIR” components: • Geo-tagging, spatio-textual indexing and geo-relevance ranking • Place ontologies with world-wide coverage • Understanding of spatial natural language • Integrate time & space (temporal language) • Open GeoInformation Web services • Adapt GIR to personal needs & location Edinburgh Euro GeoInf 2007

  32. More Information • See www.geo-spirit.org for information on SPIRIT project and downloads of articles and project deliverables. [N.B. Prototype search engine (with link from SPIRIT web site) is no longer functional] TRIPOD : www.ProjectTripod.org Edinburgh Euro GeoInf 2007

More Related