120 likes | 255 Views
Wikipedia Page View Analyzer . by Timothy Werner. Wikipedia. The 6 th biggest website on the internet ( Alexa ) Articles for everything Small villages and towns in the middle of nowhere Famous and not-so-famous people “History of The Simpsons”. Wikipedia.
E N D
Wikipedia Page View Analyzer by Timothy Werner
Wikipedia • The 6th biggest website on the internet (Alexa) • Articles for everything • Small villages and towns in the middle of nowhere • Famous and not-so-famous people • “History of The Simpsons”
Wikipedia • People read Wikipedia to gain knowledge about things they’re curious about • But why are they curious about these things?
Wikipedia Page View Analyzer • Track page views per day • Figure out why things were popular
Data • Provided by DomasMituzas • Database Administrator for Wikipedia Foundation • Page views for all pages, over all Wiki projects, over all languages, computed hourly • ~350MB for one hour of data • 350MB/hour * 24 hours/day * 365 days/year = 3TB/year • That’s a lot of data
Compressing Data • Limit to a short period for initial presentation (14 days) • Limit to pages from the English Wikipedia • Still the most popular Wiki, accounting for nearly half of the page views across all Wikimedia projects • Limit to articles that meet certain minimum page views • Over a half million articles only receive one or two page views per hour • Combine hourly files together to remove repetition of article names
Design • Based off the NameVoyager • Similar concept – a variety of individual data points over time
First Try • Google Chart Tools • Javascript API provided by Google • Good documentation • Provided a “Code Playground” to learn by changing existing data • Can draw data from Google Spreadsheets
First Try • But… • Google reads in the data in a way I didn’t expect • Requires columns to be the different articles, and rows to be the different days • Google Spreadsheets only supports 256 columns of data • Also can’t filter based on article name like that
Second Try • Flare • Flash-based • Allows freedom in how data is entered • More flexible on filtering and related queries
Wikipedia Page View Analyzer • Back to the original goal • “But why are they curious about these things?” • Charts and graphs cannot be analyzed without context
Wikipedia Page View Analyzer • Google News • Service by Google • Archives news articles and blogs from across the world • Connect to visualization to help explain “why”