130 likes | 228 Views
Predicting the News of Tomorrow Using Patterns in Web Search Queries. Kira Radinsky , Sagie Davidovich , Shaul Markovitch Computer Science Department Technion – Israel Institute of technology. Goal. Oil Peaks and Stock Market Crashes
E N D
Predicting the News of Tomorrow Using Patterns in Web Search Queries KiraRadinsky, SagieDavidovich, ShaulMarkovitch Computer Science Department Technion – Israel Institute of technology
Goal Oil Peaks and Stock Market Crashes NEW YORK – Crude-oil futures shot up as commodities markets benefited from a surge in investor confidence. Light, sweet crude for January delivery settled $4.57, or 9.2%, higher at $54.50 a barrel on the New York Mercantile Exchange. January Brent crude on the ICE futures exchange settled $4.74, or 9.6%, higher at $53.93 a barrel. Humans can predict events Can it be done automatically? "We find that changes in oil prices strongly predict future stock market returns in many countries in the world... The impact of this predictability on stock returns tends to be large.“ (“Striking Oil: Another Puzzle?”Gerben Driesprong, Benjamin Maat and Ben Jacobsen)
Solution Outline • Identify events that occur today • More than 0.5 billion daily searches on the web (2008) • Many queries are related to current events • Analyze what events tend to follow today’s events in the past • History repeats itself • Query log archives
Knowledge Sources • Google Hot Trends • Technorati • Online news (Newzingo) July 08 Aug 08 Sep 08
Identifying Events Hurricane Katrina Hurricane Ivan Hurricane Gustav Hurricane Wilma Hurricane Dean Peak Detection Algorithm Each maximum point my has at most two neighboring minimum points. We consider a maximum point as a peak if: 1. Local maximum my> Δ1 (high-pass filter). 2. The difference between the point my and the lowest of its neighboring minimum points is above Δ2. July 08 Aug 08 Sep 08
Prediction Goal: For each candidateterm evaluate the likelihood of it to appear in the future, given today’s terms. Indication Weight • : How many of the peaks of w2 (future candidate) appeared k days after w1(today’s term) • Saliency of w1: Significance of the peak in the search volume. Indication weight on the candidate Today’s salient terms Futurecandidateterms Likelihood to appear in k days 0.12 0.36 0.10 china 0.7 0.40 Gas Storm Weather 0.85 hurricane 0.9 Flood Evacuation 0.30 Economics 0.05 South Asia pope Taliban 0.01 War texans 0.08
Hurricane Oil, Gas May Soar as Storm Shuts U.S. Gulf Production Crude-oil and natural-gas prices may soar after Hurricane Katrina moved into production regions of the Gulf of Mexico, forcing companies including Exxon Mobil Corp. and Chevron Corp. to close operations Gas Gas Prices Rise as Industry Assesses Storm Damage HOUSTON — Gasoline prices rose Saturday by an average of five cents a gallon across the country as the oil industry anticipated disruptions at several refineries along the Texas coast because of Hurricane Ike. Hurricane
Empirical Methodology • Testing on aggregation of 4500 online news sources • What is “to appear in the news” • Appear significantly more times than its average in the past year • Precision at 100
Empirical Evaluation • Baseline method - What happens today happens tomorrow • Each point is how many of the 100 appeared • A total of 30 days of experiments
Empirical Evaluation • Baseline method - What happens today happens tomorrow • Each point is an average of results from 30 days of tests
Empirical Evaluation Baseline - Related Baseline - Related • Baseline-related – 100 terms which are related to today’s terms are selected randomly • Each point is how many of the 100 appeared • A total of 30 days of experiments
Empirical Evaluation • Cross-Correlation - Not using indication weights • Each point is how many of the 100 appeared • A total of 30 days of experiments
Conclusions • A new method for prediction of global future events using their patterns in the past. • A novel application of aggregated collection of search queries, represented as a time series of a search term. • Testing methodology for evaluating such news prediction algorithms.