1 / 49

Intro To Data Journalism Marc Ellison || @ marceellison

Intro To Data Journalism Marc Ellison || @ marceellison. Who is this guy? What is data journalism and why does it matter? Who’s doing it? OK, but what have you done ? Data journalism in the Canadian newsroom Get mappin ’ Get scrapin ’ Resources Questions. Overview.

rhys
Download Presentation

Intro To Data Journalism Marc Ellison || @ marceellison

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intro To Data Journalism Marc Ellison || @marceellison

  2. Who is this guy? • What is data journalism and why does it matter? • Who’s doing it? • OK, but what have you done? • Data journalism in the Canadian newsroom • Get mappin’ • Get scrapin’ • Resources • Questions Overview

  3. Freelance data- and photojournalist • Produced features and multimedia for variety of publications • Worked in Canada, Rwanda, South Sudan and Uganda. • BA in History • MSc in Computer Science + 10 years as web developer • Pre-midlife crisis Who is this guy?

  4. “Data journalism is obtaining, reporting on, curating and publishing data in the public interest.” [Jonathan Stray] “Data journalism is [...] the convergence of a number of fields [...] - from investigative research and statistics to design and programming.” [Paul Bradshaw] So, what is data journalism?

  5. “Data driven journalism is a workflow that consists of the following elements: digging deep into data by scraping, cleansing and structuring it, filtering by mining for specific information, visualizing it and making a story.” [Mirko Lorenz] So, what is data journalism?

  6. Rapid advancement of technology == greater digitalization of data • People’s lives are data • Help/prove a complex story • Reveal “abstract threats” to society • Combined with traditional reporting techniques we can tell stories in more compelling + innovative ways Why do it?

  7. Age of open data (debatable in Canada) • Add a string to your bow: being a good writer is no longer enough i.e. job ads for “multimedia journalists” • Fill a niche: handful of recognized data journalists in Canada • As more and more paywalls go up, outlets are looking for inventive ways to drive traffic to their sites and increase subscriptions Why do it?

  8. Journalists need to be data-savvy. It used to be that you would get stories by chatting to people in bars…But now it's also going to be about poring over data and equipping yourself with the tools to analyze it and picking out what's interesting.” [Tim Berners-Lee] Why do it?

  9. “Data Journalism Is The New Punk”

  10. “Arguably punk was most important in its influence, encouraging kids in the suburbs to take up instruments, with little or no musical training. It represented a DIY ethosand a shake-up of the old established order. It was a change. “Crucial to it was the idea: anyone can do it.” [Simon Rogers] http://www.guardian.co.uk/news/datablog/2012/may/24/data-journalism-punk# No excuses: free tools and tutorials

  11. Who’s doing it? The Guardian: Iraqi War Logs http://www.guardian.co.uk/world/datablog/interactive/2010/oct/23/wikileaks-iraq-deaths-map

  12. Who’s doing it? The Guardian: UK Riots

  13. Who’s doing it? NY Times: Basketball Statistics http://www.nytimes.com/interactive/2012/06/11/sports/basketball/nba-shot-analysis.html?ref=multimedia http://flowingdata.com/2012/10/04/more-on-making-heat-maps-in-r/

  14. Who’s doing it? NY Times: Election 2012 http://www.nytimes.com/interactive/2012/11/02/us/politics/paths-to-the-white-house.html

  15. Who’s doing it? LA Times: Murder Map http://projects.latimes.com/homicide/map/

  16. Who’s doing it? Telegraph (UK): MP expenses http://parliament.telegraph.co.uk/mpsexpenses/home

  17. Who’s doing it? OpenFile: CensusFile

  18. OK, but what have you done? Mapping B.C.’s bicycle collisions Vancouver Sun http://www.vancouversun.com/news/map_bicyclecollisions.html

  19. OK, but what have you done? Search Public Sector Salaries Vancouver Sun

  20. OK, but what have you done? Mapping Grow-Op Busts Vancouver Sun

  21. OK, but what have you done? Stanley Cup Riot Charge Database Vancouver Sun http://www.vancouversun.com/riot/chargesearch.html

  22. OK, but what have you done? Failed Restaurant Inspections Vancouver Sun

  23. Canada and optimism in Data Journalism Handbook optimism • Look at outlets in book – UK, Germany, USA, France – none are Canadian • “They don’t know what they’re looking for” • “Work off the side of your desk” • Hacks ‘n’ Hackers as an outlet? Data journalism in the Canadian newsroom

  24. The process:chicken and the egg? Diagram: Mirko Lorenz

  25. Find: Searching for data on the web • Clean: Process to filter and transform data, preparation for visualization • Visualize: Displaying the pattern, either as a static or animated visual • Publish: Integrating the visuals, attaching data to stories • Distribute: Enabling access on a variety of devices, such as the web, tablets and mobile • Measure: Tracking usage of data stories over time and across the spectrum of uses. The process [Paul Bradshaw]

  26. Get Mappin’ Case Study: Bicycle-Car collisions In B.C. http://geocommons.com/maps/141588

  27. What is the story you want to tell? • Does it need telling – think of your pitch? Cycling is a hot-button issue • Brainstorm what data may be available and where you can find it i.e. plan ahead if FOI/ATI is needed? In this case: ICBC • How will you visualize the data i.e. map, graph, database? What data?

  28. Is the data online freely available? • Speak to people in the know i.e. city or govt officials • Crowdsourcing: BuzzData, GeoCommons etc. • Webscraping • Past ATI/FOI requests – someone may have already requested the data • ATI/FOI – if so, plan – this takes time and you’ll likely deal with privacy issues • ATI/FOI – make your request specific, and be in constant dialogue with dept • Think about the data format you want: .CSV, .XLS,. KML, .KMZ, .SHP… Get your data

  29. The data you get will rarely be good enough to use as is… • Missing data, multiple files, irrelevant columns… • Tools: Google Refine, Data Wrangler, Excel or Google Spreadsheets • ICBC data: missing city names, longitude and latitudes… • ICBC data: use Excel and data sort to remove bad data Clean your data

  30. Create a GeoCommonsacccount • Click ‘create a map’ • Upload your clean data to GeoCommons (they support SHP files, CSV, KML, RSS • Whileituploadsgiveit a name, description, citation • Waitwhileitprocessesyour data • Choose a theme– i.e. Incidentbyyear Map your data

  31. Open Data – enhance map – traffic lights, bike paths, neighbourhood boundaries etc • http://data.vancouver.ca/datacatalogue/index.htm • Simplydownload ZIP file, unzip and upload 4 files, and thenadd as new layers on yourmap • …or use crowdsourced data in GeoCommons Add layers of data

  32. Select your main data layer – select ‘analyze’ option • Select aggregation • Select neighbourhoods as boundary • Attribute = year • Calculation = count • 3D-Street View feature – particularly relevant for cyclists • Animate your data – collisions over time! • People can view your data at bottom or click on map points Other cool tricks/features

  33. Collision map tells multiple stories… • …Vancouver’s most dangerous intersection • …Bike paths != safety • …Need more bike paths? • …Have collisions reduced over time as result of bike lanes? Analyze your data

  34. HTML • Iframe • Map complements your written story • Facebook/Twitter • Comments Embed and share your data

  35. Get Scrapin’

  36. Definition: an automated way of getting data from a website • Saves us time (if 1000s of pages) and the data comes to us • Data isn’t always available to download in a handy PDF or spreadsheet • Allows us to map or tweet out findings! • We can even send data automatically to Dropbox! • Data Journalism Handbook and Visualize This include basic introductions Get Scrapin’

  37. Free/open source – you can see other peoples’ code and adapt it! • Learning curve – you have to teach yourself Python • …or pay people to do it for you! • No need for painstaking setup • Built-in libraries to do mapping, tweeting, encoding of URLs using bit.ly, graphics and storing of data to database • Schedule your scrapers • Great tutorials and online resources Don’t be scared:Meet Scraperwiki

  38. What do you want to do i.e. store data, tweet etc? • Analyze page hierarchy i.e. http://www.inspections.vcha.ca/Main • Search page HTML for easily identifiable, repetitive HTML tags?Check out Firefox plugin Outwit Hub • Start coding! Basic Steps:RestoCop case study

  39. Restaurant Inspections

  40. Web page analysis

  41. “Nose-bloodying” learning curve: HTML and Python • Web page changes can ‘break’ your scrapers! • Dealing with cookies • Limitations: badly-formatted HTML, CAPTCHA, and session-based sites Webscraping“challenges”

  42. Excel/Google spreadsheets • Google Fusion Tables • Google Charts • Tableau Public • Many Eyes • Open Heat Map • Tile Mill • D3.js • R • Document Cloud • Overview • Meograph • Zeega Other cool tools

  43. http://www.ire.org/resource-center/listservs/subscribe-nicar-l/http://www.ire.org/resource-center/listservs/subscribe-nicar-l/ • http://www.guardian.co.uk/data • http://datadesk.latimes.com/ • http://datadrivenjournalism.net • http://flowingdata.com/category/tutorials • http://www.vancouversun.com/news/data-central/index.html • http://www.datajournalismblog.com • http://www.data.gc.ca • http://vancouver.ca/your-government/open-data-catalogue.aspx • http://www.mediabistro.com/10000words/ • http://www.journalism.co.uk • http://onlinejournalismblog.com • http://www.mashable.com Resources [web]

  44. Data journalism handbook • Visualize this – Nathan Yau • 1970-2012: http://interactivetimeline.com/1413/datajournalism-1970-2012// • http://www.journalism.co.uk/skills/how-to-get-to-grips-with-data-journalism/s7/a542402/ • http://www.guardian.co.uk/news/datablog/2010/oct/01/data-journalism-how-to-guide • http://owni.eu/2011/09/07/datajournalism-faith-in-numbers • http://blogs.journalism.co.uk/2011/04/16/ijf11-lessons-in-data-journalism-from-the-new-york-times/ Resources [books+articles]

  45. https://twitter.com/smfrogers • http://www.davidmccandless.com/ • http://www.andydickinson.net • http://blogs.vancouversun.com/author/chadskeltonvansun/ • http://afewtastefulsnaps.wordpress.com/ • https://twitter.com/marshallk/datajournalists/members Resources [people]

  46. Questions Over to you…

More Related