240 likes | 336 Views
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries. Ricardo Campos 1, 2, 4 Alípio Jorge 3, 4 Gaël Dias 2. 1 Tomar Polytechnic Institute, Tomar, Portugal. 2 Centre of Human Language Tecnnology and Bioinformatics,
E N D
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries Ricardo Campos 1, 2, 4 Alípio Jorge 3, 4Gaël Dias 2 1Tomar Polytechnic Institute, Tomar, Portugal 2 Centre of Human Language Tecnnology and Bioinformatics, University of Beira Interior, Covilhã, Portugal 3 Faculty of Sciences, University of Oporto, OPorto, Portugal 4 LIAAD-INESC Porto L.A , OPorto, Portugal QRU 2011 – 2nd International Query Representation and Understanding Workshop in association with SIGIR 2011, Beijing - China, July 28, 2011 h u l t i g . d i . u b i . p t ] [ w w w . i p t . p t ] [ w w w . l i a a d . u p . p t ]
Objectives INTRODUCTION Web Query Logs MOTIVATIONS Web Snippets Conclusions Difficulties Different Approaches in the Extraction of T-I Official Website Query: Lady Gaga. Query: Lady Gaga Official web site This is a particular hard taskthat can become even more difficult if the user is not clear in his purpose. Ricardo Campos, Alípio Jorge, Gaël Dias
MOTIVATIONS Objectives INTRODUCTION Web Query Logs Difficulties Web Snippets Conclusions Different Approaches in the Extraction of T-I Informative and Rumor texts Query: Lady Gaga. Query: Lady Gaga Informative texts: Rihanna passes Gaga as Facebook's most popular lady… Rumor texts: Lady Gaga, queen of extravagant fashion, is planning to intern for ... the milliner confirmed the rumors that the 'Born This Way' singer and he were ... Ricardo Campos, Alípio Jorge, Gaël Dias
MOTIVATIONS Objectives INTRODUCTION Web Query Logs Difficulties Web Snippets Conclusions Different Approaches in the Extraction of T-I Biography and Discography Query: Lady Gaga. Query: Lady Gaga Biography: Discography Release Ricardo Campos, Alípio Jorge, Gaël Dias
MOTIVATIONS Objectives INTRODUCTION Web Query Logs Difficulties Web Snippets Conclusions Different Approaches in the Extraction of T-I Tour Dates Query: Lady Gaga. Query: Lady Gaga Tour Dates: Understanding the temporal nature of a query, namely of implicit ones, is one of the most interesting challenges (Berberich et al (2010)) in (T-IR) that would enable to apply specific strategies to improve web search results retrieval. Ricardo Campos, Alípio Jorge, Gaël Dias
Motivations Objectives INTRODUCTION Web Query Logs Web Snippets Conclusions DIFFICULTIES Different Approaches in the Extraction of T-I Deal with Implicit Temporal Queries is Difficult However, this may prove to be a particularly difficult task and a hard challenge: 1. Different semantic concepts can be related to a query: 2. Difficult to define the boundaries between what is temporal and what is not and so is the definition of temporal ambiguity; 3. Even if temporal intents can be inferred by human annotators, the question is how to transpose this to an automatic process. Ricardo Campos, Alípio Jorge, Gaël Dias
Motivations OBJECTIVES INTRODUCTION Web Query Logs Web Snippets Conclusions Difficulties Different Approaches in the Extraction of T-I Understand the Temporal Nature of Implicit Temporal Queries In our work we aim to understand whether temporal information can be used to automatically disambiguate query terms, namely implicit temporal queries. Ricardo Campos, Alípio Jorge, Gaël Dias
Motivations Objectives INTRODUCTION Web Query Logs DIFFERENT APPROACHES IN THE EXTRACTION OF T-I Difficutlies Web Snippets Conclusions Metadata-Based Approach Usually the extraction of temporal information is based on a metadata-based approach upon time-tagged controlled collections such as news articles, using the timestamp of the document. This information can be particularly useful to date relative temporal expressions found in a document (e.g., today) with a concrete date (e.g., document creation time): Jun 16, 2009 – The city of São Paulo shall have to make use of the Credicard Hall as the venue for the 2011 Miss Universe. Today was also announced that Miss Morumbi show is going to be on July 27, 2009. From Miss Universe.Com However, it can be a tricky process if used to date implicit temporal queries as the time of the document can differ significantly from the actual content of the document; Ricardo Campos, Alípio Jorge, Gaël Dias
Motivations Objectives INTRODUCTION Web Query Logs DIFFERENT APPROACHES IN THE EXTRACTION OF T-I Difficutlies Web Snippets Conclusions Content Approach. Query-Dependency Query-Logs. One possible solution is to seek for related temporal references over complementary web resources: Content-Related Resources, based on a web content approach Simply requires the set of web search results. Query-Log Resources, based on similar year-qualified queries Imply that some versions of the query have already been issued. Ricardo Campos, Alípio Jorge, Gaël Dias
Introduction Web Query Logs Web Snippets Conclusions Content-Related Resources Query-Log Resources Conclusions
TEMPORAL INFORMATION Introduction Web Query Logs Difficulties WEB SNIPPETS Conclusions Temporal Classification Temporal Value Temporal Evidence within Web Pages One of the most interesting approaches to date implicit temporal queries is to rely on the exploration of temporal evidence within web pages: Ricardo Campos, Alípio Jorge, Gaël Dias
Temporal Information Introduction Web Query Logs DIFFICULTIES WEB SNIPPETS Conclusions Temporal Classification Temporal Value Correlation between the Dates and Query Concepts The use of web documents to date queries not entailing any temporal information can be however a tricky process. The main problem is related to the difficulties underlying the association of the year date found in the document and the query: Ricardo Campos, Alípio Jorge, Gaël Dias
Temporal Information Introduction Web Query Logs Difficulties WEB SNIPPETS Conclusions Temporal Classification TEMPORAL VALUE Measures In this work we aim to determine the temporal value of web snippets: Oil Spill; BP Oil Spill; 450 Waka Waka; TTitle(.) # Snippets Retrieved with Dates TSnippets = TSnippets(.) # Snippets Retrieved TUrl(.) Ricardo Campos, Alípio Jorge, Gaël Dias
Temporal Information Introduction Web Query Logs Difficulties WEB SNIPPETS Conclusions TEMPORAL CLASSIFICATION Temporal Value Temporal Ambiguity Value Each query was classified on the basis of a temporal ambiguity value: If (TA(q) < 10%) then Query is ATemporal Else Query is Temporal Ricardo Campos, Alípio Jorge, Gaël Dias
Temporal Information Introduction Web Query Logs Difficulties WEB SNIPPETS Conclusions TEMPORAL CLASSIFICATION Temporal Value Evaluation In order to evaluate our simple classification model, we conducted a user study; Human annotators were asked to consider each of the 176 queries, to look at web search results and to classify them as ATemporal or Temporal; Overall, results pointed at 35% of implicit temporal queries from human annotators, while only 25% were given by our methodology; Ricardo Campos, Alípio Jorge, Gaël Dias
TEMPORAL INFORMATION Introduction WEB QUERY LOGS Difficulties Web Snippets Conclusions Temporal Value Completion Search-Engine Features Another approach to date implicit temporal queries is to use web query logs based on similar year-qualified queries: Bp oil spill Bp oil spill live feed Bp oil spill 2010 Bp oil spill map Bp oil spill claims Ricardo Campos, Alípio Jorge, Gaël Dias
Temporal Information Introduction WEB QUERY LOGS DIFFICULTIES Web Snippets Conclusions Temporal Value Web Query Logs Drawbacks Extremely hard to access outside the big industrial labs; Highly dependent on the user own intents: Queries that have never been typed, thus not existing in the web search log e.g. Blaise Pascal 1623 (his year birth date) Not adapted to concept disambiguation; Query: Euro Euro 2008; Euro 2012; Ricardo Campos, Alípio Jorge, Gaël Dias
Temporal Information Introduction WEB QUERY LOGS Difficulties Web Snippets Conclusions TEMPORAL VALUE Measures Explicit temporal queries only represent 1.21% of the overall set [5]; Furthermore, we must also take into account that the simple fact that a query is year-qualified does not necessarily mean that it has a temporal intent; Similarly to TTitle(.), TSnippets(.) and TUrl(.) TLogYahoo(.) TLogGoogle(.) #Suggested Queries Retrieved with Dates TLogGoogle = # Suggested Queries Retrieved Ricardo Campos, Alípio Jorge, Gaël Dias
Temporal Information Introduction WEB QUERY LOGS Difficulties Web Snippets Conclusions TEMPORAL VALUE Results Pearson correlation coefficient between each of the dimensions: TSnippets(.) TLogGoogle(.) TTitle(.) TLogYahoo(.) TUrl(.) Results show that: This means that as dates appear in the titles and snippets, they also tend to appear, albeit in a more reduced form, in the auto-complete query suggestion of Google. Ricardo Campos, Alípio Jorge, Gaël Dias
Temporal Information Introduction WEB QUERY LOGS Difficulties Web Snippets Conclusions TEMPORAL VALUE Results An additional analysis led us to conclude that the temporal information is more frequent in web snippets than in any of the query logs of Google and Yahoo!; Overall, while most of the queries have a TSnippet(.) value around 20%, TLogYahoo(.) and TLogGoogle(.) are mostly near to 0%. Ricardo Campos, Alípio Jorge, Gaël Dias
Temporal Information Introduction WEB QUERY LOGS Difficulties Web Snippets Conclusions TEMPORAL VALUE Results Finally, we studied how strongly a given query is associated to a set of different dates, both in web snippets and in web query logs. For this, we have built a confidence interval for the difference of means, for paired samples, between the number of times that the dates appear in the web snippets and in web query logs: TLogGoogle(.) [5.12; 6.43] TLogYahoo(.) [5.10; 6.38] Results show that the number of different dates that appear in web snippets is significantly higher than in either one of the two web query logs. Ricardo Campos, Alípio Jorge, Gaël Dias
Introduction Web Query Logs Web Snippets CONCLUSIONS Temporal Value of Web Snippets and Web Query Logs In this paper, we showed that web snippets are a very rich source of temporal information, especially years. Dates often appear correlated in snippets and titles. Some of the items have even more than one date; Results show that future dates are very common in web snippets, but seldom used in Queries; Dates mostly appear together with the categories of automotive, sports, politics, both in web snippets and web query logs; Contrary to web snippets, web query logs have a very small temporal value (at about 1.2%), which is statistically smaller when compared to the former; Ricardo Campos, Alípio Jorge, Gaël Dias
Introduction Web Query Logs Web Snippets CONCLUSIONS Query Understanding based on Web Snippets Our experiments, also showed that web snippets can be used for query understanding; We introduced a simple model for the temporal classification of queries based on the temporal value of web snippets that showed that 25% of the queries have a temporal nature. These values contrast with the 35% resulted from our user study; So, the use of complementary information, such as the number of instances or the number of different dates, should be considered in future approaches; Ricardo Campos, Alípio Jorge, Gaël Dias
Introduction Web Query Logs Web Snippets Conclusions Thanks for your attention! Both experimental datasets are available for download at www.ccc.ipt.pt/~ricardo/software VipAccess is online at http://hultig.di.ubi.pt/vipaccess HULTIG is online at http://hultig.di.ubi.pt LIAAD is online at http://liaad.up.pt Polytechnic Institute of Tomar is online at http://www.ipt.pt Gaël Dias is online at http://www.di.ubi.pt/~ddg Alípio Jorge is online at http://liaad.up.pt/~amjorge Ricardo Campos, Alípio Jorge, Gaël Dias