LAMI Spring 2014

LAMISpring 2014 Search Engine and Services Presented by Edgar Cornejo 03.03.14

Outline Mobile information search for location-based information Web-a-Where: Geotagging Web Content The design and implementation of SPIRIT: a spatially-aware search engine for information retrieval on the Internet

Mobile information search for location-based information Department of Industrial Engineering Tsinghua University Beijing, China April 2010 ChengyiLiu · Pei-Luen Patrick Rau · FeiGao

Mobile search for location-based information The study investigated the effects of location and information type in mobile searching for location-based information by carrying out two experiments in an airport Mobile information search for location-based information

Mobile search scenario Many environmental disturbances High time pressure Restricted users’ operations Device limitations (screen size, input method) Mobile information search for location-based information

Mobile searching context • Search Engine Since most of the information is location-based [1,2], the results can be improved by analyzing information queries and location Information queries + location More suitable results Mobile information search for location-based information

Features of mobile interaction [3] Users may be involved in tasks that demand a high level of visual attention User's hands are often used to manipulatephysical objects Mobile information search for location-based information

Features of mobile interaction [3] Users may be highly mobile during the task and have high-speed interaction Mobile information search for location-based information

Search queries *According to a large scale study of European mobile search behavior developed in 2008 [4] Mobile information search for location-based information

Factors proposed that may influence the mobile information search

Experiment 1 - Hypotheses • Hypothesis 1 • For information searches in mobileversus non-mobile: • The average of clicks in mobile is less • The first search is more important • Free recall is worse Mobile information search for location-based information

Experiment 1 - Hypotheses • Hypothesis 2 • For information searching about location-based with respect to non-location-based information • The number of clicks is less • The first search result is more important • Free recall is better Mobile information search for location-based information

Experiment 1 - Tasks

Experiment 1 - Results Mobile information search for location-based information

Experiment 2 - Hypotheses • Hypothesis 3 • For mobile information searching under high pressurewith respect to low pressure info requirement: • Average number of clicks are less • The first search result is more important • Free recall is worse Mobile information search for location-based information

Experiment 2 - Hypotheses • Hypothesis 4 • For mobile information searching of informational or navigational with respect to transactionalqueries • Number of clicks is greater • The first search result is less important • Free recall is worse Mobile information search for location-based information

Experiment 2 - Tasks

Experiment 2 - Result Mobile information search for location-based information

Experiment 2 - Results Mobile information search for location-based information

Summary • Information type (location-based vs. non-location-based) was found to be effective in user performance during the information search process • Information requirement pressure and location-based information type (navigational, informational and transactional) affect the mobile search process • The first two search results were found to be very important to good search efficiency and good user satisfaction Mobile information search for location-based information

Web-a-Where: Geotagging Web Content EinatAmitay · NadavHar’El Ron · SivanAyaSoffer IBM Haifa Research Lab Haifa 31905, Israel July 2004

Web-a-Where: Geotagging Web Content • Is a system for associating geography with Web pages • Locates mentions of places and determines the place each name refers to • Assigns to each page a geographic focus a locality that the page discusses as a whole • Implemented within the framework of the IBM WebFountaindata mining system Web-a-Where: Geotagging Web Content

Web-a-Where: Geotagging Web Content • Pages may have two types of geography associated with it: a source and a target. • Source geography has to do with the origin of the page, the physical location, address of its author, etc. • Target geography is determined by the contents of the page and relates to the topic the page is discussing. Web-a-Where: Geotagging Web Content

Ambiguities • Geo/non-geoambiguity is the case of a place name having another, non geographic meaning e.g. Mobile (Alabama) or Reading (England) • Geo/geo ambiguity arises when two or more distinct places have the same name Web-a-Where: Geotagging Web Content

System Components • Geotagger(Main component) • Finds and disambiguates geographic names • Assigns a taxonomy node to each phrase in the text to refer to a place e.g., Paris/France/Europe • The gazetteer • Database that keeps the list of geographic names, their canonical taxonomies and other information Web-a-Where: Geotagging Web Content

Tagging individual place names The processing of a page is done in three phases: Spotting Focus determination Disambiguation Web-a-Where: Geotagging Web Content

1. Spotting place name candidates • Finding all the possible geographic names in each page • Short abbreviations are not spotted e.g. IN (for Indiana) or AT ( for Austria) but used to help disambiguate other spots e.g. Gary, IN Web-a-Where: Geotagging Web Content

2. Disambiguating spots (Algorithm) • The geotaggerassigns a unique meaning to spots that can be uniquely qualified. Confidence 95% • Combinations that are not unique are left unassigned • In a page with multiple spots with the same name where only one is qualified, this value is assigned to the others. Confidence 80% • Disambiguation contexts are also used to unassigned spots with confidence less than 70% Web-a-Where: Geotagging Web Content

2. Disambiguating spot (Data sources) • The Geographic Names Information System (GNIS) for U.S. locations • world-gazetteer.com for non-U.S. locations • United Nations Statistic Division (UNSD) for countries and continents • ISO 3166-1 for country and other abbreviations Web-a-Where: Geotagging Web Content

3. Focus determination • The basic idea is that if several cities from the same region are mentioned, probably this region is the focus • Sometimes cannot be said that a page has only one focus • The confidence score should be taken into account when finding the focus, giving higher weight to information coming from locations with higher confidence Web-a-Where: Geotagging Web Content

Example A certain page contained four mentions of Orlando/Florida(assigned confidence 0.5), three Texas (0.75), eight Fort Worth/Texas (0.75), three Dallas/Texas (0.75), one Garland/Texas (0.75), and one Iraq (0.5) A human was asked to judge what is the geographical focus of this page and responded with “It’s about Texas and perhaps also Orlando” Indeed, that page comes from the “Orlando Weekly” site, in a forum titled “Just a look at The Texas Local Music Scene...” Web-a-Where: Geotagging Web Content

Evaluating geotagging precision Geotags assigned automatically versus defined manually Web-a-Where: Geotagging Web Content

Evaluating focus Comparison of Web-a-Where-determined focus to human-determined one (ODP) for ~1 million pages Web-a-Where: Geotagging Web Content

Summary • The system is able to correctly tag individual name place occurrences 80% of the time and define correct focusof a page 92% of the time • Accuracy can be further improved • The main source of errors is geo/non-geo ambiguity Web-a-Where: Geotagging Web Content

The design and implementation of SPIRIT Ross Purves, Paul Clough, Christopher Jones, AviArampatzis, BenedicteBucheri, David Finch, Gaihua Fu, Hideo Joho, AwaseHhirni Syed, SubodhVaidand Bisheng Yang Department of Geography, University of Zurich, Switzerland Department of Information Studies, University of Sheffield, UK School of Computer Science, Cardiff University, UK Institute of Information and Computing Sciences, Utrecht University, Netherlands Laboratoire COGIT - Institut GeographiqueNational, France August 2007

The design and implementation of SPIRIT This paper describes the design and implementation of a complete solution to geographic information retrieval The design and implementation of SPIRIT

Requirements • Exhaustive retrieval of relevant documents in a specified area • Place names should be automatically identified, and interactively disambiguated • Ability to query for geographical areas whose boundaries are imprecise The design and implementation of SPIRIT

Requirements • Spatial concepts relating different geographic entities should be represented (outside, in) • It should be possible for users to specify the area of interest on a map • Ability to view query results on a map linked to relevant web documents • Document ranking should combine both spatial and thematic aspects of document relevance The design and implementation of SPIRIT

Architecture Overview • Search Engine Geographical ontology Query disambiguation Query expansion Geo-coding Metadata Doc-to-footprint mapping Rank results User interface Broker Relevance ranking Geo-parsing Search request Spatial index Web data collectiondocuments Indexes Textual Spatial Access indexes Textual index Pre-processing Run-time The design and implementation of SPIRIT

Functionality of the components Pre-processing the document collection Assigning spatial footprints to web documents: Identifygeographicalreferences (geoparsing) Assign them to spatial coordinates (geocoding) Spatialfootprint The design and implementation of SPIRIT

Functionality of the components Building document indexes • Grid-based spatial indexing • For each cell of the grid, a list of document ID’s was constructed, using the document footprints which resulted from the geo-tagging process The design and implementation of SPIRIT

Functionality of the components Retrieving the results: “T” (Text) Scheme • Simplest approach • Retrieve all the documents that match the concept terms of the query and then filter to return only those which intersect the geographical scope of the place in the query (footprint) The design and implementation of SPIRIT

Functionality of the components Retrieving the results: “ST” (Space-Text) Scheme • More integrated approach • Regarded as a space-primary method • At search time the cells that intersect the query footprint are determined and then only the corresponding text indexes are searched The design and implementation of SPIRIT

Functionality of the components Retrieving the results: “TS” (Text-Space) Scheme • Better query response time • Regarded as a text-primary method • At search time, for each term, the associated documents are grouped according to the spatial index which they relate to The design and implementation of SPIRIT

Query interfaces The design and implementation of SPIRIT

Results display The design and implementation of SPIRIT

Evaluation Performance analysis A relevant document to the query had to be both thematically and spatially relevant. In this sense, the key result of the work is that spatially aware search outperformed text-only search. The design and implementation of SPIRIT

Evaluation Usability analysis The design and implementation of SPIRIT

Conclusions • The paper describes a unified approach, as well as the architecture, for introducing spatial-awareness into search-engine technology • A prototype system demonstrated the effectiveness of the strategy The design and implementation of SPIRIT

LAMI Spring 2014

LAMI Spring 2014

Presentation Transcript

Spring 2014

Spring 2014

Spring 2014

Spring 2014

Spring 2014 Workshop

Spring 2014

Spring 2014

Spring 2014

Spring 2014

Spring 2014

2014 Spring Craft

Spring 2014

Spring 2014

Spring 2014

Spring 2014

Spring 2014

Spring 2014

Spring 2014

Spring 2014

QPF verification of LAMI

Spring 2014

Spring 2014