490 likes | 654 Views
Intro To Data Journalism Marc Ellison || @ marceellison. Who is this guy? What is data journalism and why does it matter? Who’s doing it? OK, but what have you done ? Data journalism in the Canadian newsroom Get mappin ’ Get scrapin ’ Resources Questions. Overview.
E N D
Intro To Data Journalism Marc Ellison || @marceellison
Who is this guy? • What is data journalism and why does it matter? • Who’s doing it? • OK, but what have you done? • Data journalism in the Canadian newsroom • Get mappin’ • Get scrapin’ • Resources • Questions Overview
Freelance data- and photojournalist • Produced features and multimedia for variety of publications • Worked in Canada, Rwanda, South Sudan and Uganda. • BA in History • MSc in Computer Science + 10 years as web developer • Pre-midlife crisis Who is this guy?
“Data journalism is obtaining, reporting on, curating and publishing data in the public interest.” [Jonathan Stray] “Data journalism is [...] the convergence of a number of fields [...] - from investigative research and statistics to design and programming.” [Paul Bradshaw] So, what is data journalism?
“Data driven journalism is a workflow that consists of the following elements: digging deep into data by scraping, cleansing and structuring it, filtering by mining for specific information, visualizing it and making a story.” [Mirko Lorenz] So, what is data journalism?
Rapid advancement of technology == greater digitalization of data • People’s lives are data • Help/prove a complex story • Reveal “abstract threats” to society • Combined with traditional reporting techniques we can tell stories in more compelling + innovative ways Why do it?
Age of open data (debatable in Canada) • Add a string to your bow: being a good writer is no longer enough i.e. job ads for “multimedia journalists” • Fill a niche: handful of recognized data journalists in Canada • As more and more paywalls go up, outlets are looking for inventive ways to drive traffic to their sites and increase subscriptions Why do it?
Journalists need to be data-savvy. It used to be that you would get stories by chatting to people in bars…But now it's also going to be about poring over data and equipping yourself with the tools to analyze it and picking out what's interesting.” [Tim Berners-Lee] Why do it?
“Arguably punk was most important in its influence, encouraging kids in the suburbs to take up instruments, with little or no musical training. It represented a DIY ethosand a shake-up of the old established order. It was a change. “Crucial to it was the idea: anyone can do it.” [Simon Rogers] http://www.guardian.co.uk/news/datablog/2012/may/24/data-journalism-punk# No excuses: free tools and tutorials
Who’s doing it? The Guardian: Iraqi War Logs http://www.guardian.co.uk/world/datablog/interactive/2010/oct/23/wikileaks-iraq-deaths-map
Who’s doing it? The Guardian: UK Riots
Who’s doing it? NY Times: Basketball Statistics http://www.nytimes.com/interactive/2012/06/11/sports/basketball/nba-shot-analysis.html?ref=multimedia http://flowingdata.com/2012/10/04/more-on-making-heat-maps-in-r/
Who’s doing it? NY Times: Election 2012 http://www.nytimes.com/interactive/2012/11/02/us/politics/paths-to-the-white-house.html
Who’s doing it? LA Times: Murder Map http://projects.latimes.com/homicide/map/
Who’s doing it? Telegraph (UK): MP expenses http://parliament.telegraph.co.uk/mpsexpenses/home
Who’s doing it? OpenFile: CensusFile
OK, but what have you done? Mapping B.C.’s bicycle collisions Vancouver Sun http://www.vancouversun.com/news/map_bicyclecollisions.html
OK, but what have you done? Search Public Sector Salaries Vancouver Sun
OK, but what have you done? Mapping Grow-Op Busts Vancouver Sun
OK, but what have you done? Stanley Cup Riot Charge Database Vancouver Sun http://www.vancouversun.com/riot/chargesearch.html
OK, but what have you done? Failed Restaurant Inspections Vancouver Sun
Canada and optimism in Data Journalism Handbook optimism • Look at outlets in book – UK, Germany, USA, France – none are Canadian • “They don’t know what they’re looking for” • “Work off the side of your desk” • Hacks ‘n’ Hackers as an outlet? Data journalism in the Canadian newsroom
The process:chicken and the egg? Diagram: Mirko Lorenz
Find: Searching for data on the web • Clean: Process to filter and transform data, preparation for visualization • Visualize: Displaying the pattern, either as a static or animated visual • Publish: Integrating the visuals, attaching data to stories • Distribute: Enabling access on a variety of devices, such as the web, tablets and mobile • Measure: Tracking usage of data stories over time and across the spectrum of uses. The process [Paul Bradshaw]
Get Mappin’ Case Study: Bicycle-Car collisions In B.C. http://geocommons.com/maps/141588
What is the story you want to tell? • Does it need telling – think of your pitch? Cycling is a hot-button issue • Brainstorm what data may be available and where you can find it i.e. plan ahead if FOI/ATI is needed? In this case: ICBC • How will you visualize the data i.e. map, graph, database? What data?
Is the data online freely available? • Speak to people in the know i.e. city or govt officials • Crowdsourcing: BuzzData, GeoCommons etc. • Webscraping • Past ATI/FOI requests – someone may have already requested the data • ATI/FOI – if so, plan – this takes time and you’ll likely deal with privacy issues • ATI/FOI – make your request specific, and be in constant dialogue with dept • Think about the data format you want: .CSV, .XLS,. KML, .KMZ, .SHP… Get your data
The data you get will rarely be good enough to use as is… • Missing data, multiple files, irrelevant columns… • Tools: Google Refine, Data Wrangler, Excel or Google Spreadsheets • ICBC data: missing city names, longitude and latitudes… • ICBC data: use Excel and data sort to remove bad data Clean your data
Create a GeoCommonsacccount • Click ‘create a map’ • Upload your clean data to GeoCommons (they support SHP files, CSV, KML, RSS • Whileituploadsgiveit a name, description, citation • Waitwhileitprocessesyour data • Choose a theme– i.e. Incidentbyyear Map your data
Open Data – enhance map – traffic lights, bike paths, neighbourhood boundaries etc • http://data.vancouver.ca/datacatalogue/index.htm • Simplydownload ZIP file, unzip and upload 4 files, and thenadd as new layers on yourmap • …or use crowdsourced data in GeoCommons Add layers of data
Select your main data layer – select ‘analyze’ option • Select aggregation • Select neighbourhoods as boundary • Attribute = year • Calculation = count • 3D-Street View feature – particularly relevant for cyclists • Animate your data – collisions over time! • People can view your data at bottom or click on map points Other cool tricks/features
Collision map tells multiple stories… • …Vancouver’s most dangerous intersection • …Bike paths != safety • …Need more bike paths? • …Have collisions reduced over time as result of bike lanes? Analyze your data
HTML • Iframe • Map complements your written story • Facebook/Twitter • Comments Embed and share your data
Definition: an automated way of getting data from a website • Saves us time (if 1000s of pages) and the data comes to us • Data isn’t always available to download in a handy PDF or spreadsheet • Allows us to map or tweet out findings! • We can even send data automatically to Dropbox! • Data Journalism Handbook and Visualize This include basic introductions Get Scrapin’
Free/open source – you can see other peoples’ code and adapt it! • Learning curve – you have to teach yourself Python • …or pay people to do it for you! • No need for painstaking setup • Built-in libraries to do mapping, tweeting, encoding of URLs using bit.ly, graphics and storing of data to database • Schedule your scrapers • Great tutorials and online resources Don’t be scared:Meet Scraperwiki
What do you want to do i.e. store data, tweet etc? • Analyze page hierarchy i.e. http://www.inspections.vcha.ca/Main • Search page HTML for easily identifiable, repetitive HTML tags?Check out Firefox plugin Outwit Hub • Start coding! Basic Steps:RestoCop case study
“Nose-bloodying” learning curve: HTML and Python • Web page changes can ‘break’ your scrapers! • Dealing with cookies • Limitations: badly-formatted HTML, CAPTCHA, and session-based sites Webscraping“challenges”
Excel/Google spreadsheets • Google Fusion Tables • Google Charts • Tableau Public • Many Eyes • Open Heat Map • Tile Mill • D3.js • R • Document Cloud • Overview • Meograph • Zeega Other cool tools
http://www.ire.org/resource-center/listservs/subscribe-nicar-l/http://www.ire.org/resource-center/listservs/subscribe-nicar-l/ • http://www.guardian.co.uk/data • http://datadesk.latimes.com/ • http://datadrivenjournalism.net • http://flowingdata.com/category/tutorials • http://www.vancouversun.com/news/data-central/index.html • http://www.datajournalismblog.com • http://www.data.gc.ca • http://vancouver.ca/your-government/open-data-catalogue.aspx • http://www.mediabistro.com/10000words/ • http://www.journalism.co.uk • http://onlinejournalismblog.com • http://www.mashable.com Resources [web]
Data journalism handbook • Visualize this – Nathan Yau • 1970-2012: http://interactivetimeline.com/1413/datajournalism-1970-2012// • http://www.journalism.co.uk/skills/how-to-get-to-grips-with-data-journalism/s7/a542402/ • http://www.guardian.co.uk/news/datablog/2010/oct/01/data-journalism-how-to-guide • http://owni.eu/2011/09/07/datajournalism-faith-in-numbers • http://blogs.journalism.co.uk/2011/04/16/ijf11-lessons-in-data-journalism-from-the-new-york-times/ Resources [books+articles]
https://twitter.com/smfrogers • http://www.davidmccandless.com/ • http://www.andydickinson.net • http://blogs.vancouversun.com/author/chadskeltonvansun/ • http://afewtastefulsnaps.wordpress.com/ • https://twitter.com/marshallk/datajournalists/members Resources [people]
Questions Over to you…