1 / 13

Predicting the News of Tomorrow Using Patterns in Web Search Queries

Predicting the News of Tomorrow Using Patterns in Web Search Queries. Kira Radinsky , Sagie Davidovich , Shaul Markovitch Computer Science Department Technion – Israel Institute of technology. Goal. Oil Peaks and Stock Market Crashes

morley
Download Presentation

Predicting the News of Tomorrow Using Patterns in Web Search Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predicting the News of Tomorrow Using Patterns in Web Search Queries KiraRadinsky, SagieDavidovich, ShaulMarkovitch Computer Science Department Technion – Israel Institute of technology

  2. Goal Oil Peaks and Stock Market Crashes NEW YORK – Crude-oil futures shot up as commodities markets benefited from a surge in investor confidence. Light, sweet crude for January delivery settled $4.57, or 9.2%, higher at $54.50 a barrel on the New York Mercantile Exchange. January Brent crude on the ICE futures exchange settled $4.74, or 9.6%, higher at $53.93 a barrel. Humans can predict events Can it be done automatically? "We find that changes in oil prices strongly predict future stock market returns in many countries in the world... The impact of this predictability on stock returns tends to be large.“ (“Striking Oil: Another Puzzle?”Gerben Driesprong, Benjamin Maat and Ben Jacobsen)

  3. Solution Outline • Identify events that occur today • More than 0.5 billion daily searches on the web (2008) • Many queries are related to current events • Analyze what events tend to follow today’s events in the past • History repeats itself • Query log archives

  4. Knowledge Sources • Google Hot Trends • Technorati • Online news (Newzingo) July 08 Aug 08 Sep 08

  5. Identifying Events Hurricane Katrina Hurricane Ivan Hurricane Gustav Hurricane Wilma Hurricane Dean Peak Detection Algorithm Each maximum point my has at most two neighboring minimum points. We consider a maximum point as a peak if: 1. Local maximum my> Δ1 (high-pass filter). 2. The difference between the point my and the lowest of its neighboring minimum points is above Δ2. July 08 Aug 08 Sep 08

  6. Prediction Goal: For each candidateterm evaluate the likelihood of it to appear in the future, given today’s terms. Indication Weight • : How many of the peaks of w2 (future candidate) appeared k days after w1(today’s term) • Saliency of w1: Significance of the peak in the search volume. Indication weight on the candidate Today’s salient terms Futurecandidateterms Likelihood to appear in k days 0.12 0.36 0.10 china 0.7 0.40 Gas Storm Weather 0.85 hurricane 0.9 Flood Evacuation 0.30 Economics 0.05 South Asia pope Taliban 0.01 War texans 0.08

  7. Hurricane Oil, Gas May Soar as Storm Shuts U.S. Gulf Production Crude-oil and natural-gas prices may soar after Hurricane Katrina moved into production regions of the Gulf of Mexico, forcing companies including Exxon Mobil Corp. and Chevron Corp. to close operations Gas Gas Prices Rise as Industry Assesses Storm Damage HOUSTON — Gasoline prices rose Saturday by an average of five cents a gallon across the country as the oil industry anticipated disruptions at several refineries along the Texas coast because of Hurricane Ike. Hurricane

  8. Empirical Methodology • Testing on aggregation of 4500 online news sources • What is “to appear in the news” • Appear significantly more times than its average in the past year • Precision at 100

  9. Empirical Evaluation • Baseline method - What happens today happens tomorrow • Each point is how many of the 100 appeared • A total of 30 days of experiments

  10. Empirical Evaluation • Baseline method - What happens today happens tomorrow • Each point is an average of results from 30 days of tests

  11. Empirical Evaluation Baseline - Related Baseline - Related • Baseline-related – 100 terms which are related to today’s terms are selected randomly • Each point is how many of the 100 appeared • A total of 30 days of experiments

  12. Empirical Evaluation • Cross-Correlation - Not using indication weights • Each point is how many of the 100 appeared • A total of 30 days of experiments

  13. Conclusions • A new method for prediction of global future events using their patterns in the past. • A novel application of aggregated collection of search queries, represented as a time series of a search term. • Testing methodology for evaluating such news prediction algorithms.

More Related