1 / 12

Wikipedia Page View Analyzer

Wikipedia Page View Analyzer . by Timothy Werner. Wikipedia. The 6 th biggest website on the internet ( Alexa ) Articles for everything Small villages and towns in the middle of nowhere Famous and not-so-famous people “History of The Simpsons”. Wikipedia.

frye
Download Presentation

Wikipedia Page View Analyzer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Wikipedia Page View Analyzer by Timothy Werner

  2. Wikipedia • The 6th biggest website on the internet (Alexa) • Articles for everything • Small villages and towns in the middle of nowhere • Famous and not-so-famous people • “History of The Simpsons”

  3. Wikipedia • People read Wikipedia to gain knowledge about things they’re curious about • But why are they curious about these things?

  4. Wikipedia Page View Analyzer • Track page views per day • Figure out why things were popular

  5. Data • Provided by DomasMituzas • Database Administrator for Wikipedia Foundation • Page views for all pages, over all Wiki projects, over all languages, computed hourly • ~350MB for one hour of data • 350MB/hour * 24 hours/day * 365 days/year = 3TB/year • That’s a lot of data

  6. Compressing Data • Limit to a short period for initial presentation (14 days) • Limit to pages from the English Wikipedia • Still the most popular Wiki, accounting for nearly half of the page views across all Wikimedia projects • Limit to articles that meet certain minimum page views • Over a half million articles only receive one or two page views per hour • Combine hourly files together to remove repetition of article names

  7. Design • Based off the NameVoyager • Similar concept – a variety of individual data points over time

  8. First Try • Google Chart Tools • Javascript API provided by Google • Good documentation • Provided a “Code Playground” to learn by changing existing data • Can draw data from Google Spreadsheets

  9. First Try • But… • Google reads in the data in a way I didn’t expect • Requires columns to be the different articles, and rows to be the different days • Google Spreadsheets only supports 256 columns of data • Also can’t filter based on article name like that

  10. Second Try • Flare • Flash-based • Allows freedom in how data is entered • More flexible on filtering and related queries

  11. Wikipedia Page View Analyzer • Back to the original goal • “But why are they curious about these things?” • Charts and graphs cannot be analyzed without context

  12. Wikipedia Page View Analyzer • Google News • Service by Google • Archives news articles and blogs from across the world • Connect to visualization to help explain “why”

More Related