230 likes | 361 Views
Small team workflow in government analytics. Peter Ellis Manager Sector Performance 18 March 2014. Today’s talk. Who are we and why is our experience important? What are “data-intensive economic reports”? The challenge The solution Reflections on analytics in government.
E N D
Small team workflow in government analytics Peter Ellis Manager Sector Performance 18 March 2014
Today’s talk • Who are we and why is our experience important? • What are “data-intensive economic reports”? • The challenge • The solution • Reflections on analytics in government
The Sector Performance team • 9-10 staff • $5 million budget – mostly for outsourced data collection • One of 3, 4 or 9 analytical teams in MBIE • Depending on definitions • But diverse approaches from different teams • Variety of roles • Manage collection of tourism and science and innovation data • Analyse and publicly disseminate tourism data • Analyse data on all sectors for policy teams and Ministers • Support policy teams in other areas • Mid through 5 year Tourism Data Improvement Programme • Since MBIE’s creation, now applying the tools, skills and techniques to a wider range of data
Whatever the terminology, tools and content, your organisation’s “analytics” team/s need to be in this space http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Capability building for an analytical team • Five key areas needed • Workflow, document management and teamwork • Analytical techniques • Tools • Data reshaping and management • Data storage • Many programmes don’t take all five into account… • IT-led BI programmes may focus on only #3 and #5 • Universities typically only teach #2
Data-intensive economic reports http://www.mbie.govt.nz/what-we-do/business-growth-agenda
The challenge – update the draft overview Sectors Report • Current version had evolved over 24 months – over 200 plots and 50 tables of data • Not all the data sources fully defined • Some of the Excel workbooks lost • Some data was custom-cut by Statistics New Zealand • Home-grown (and inconsistent) concordances to “sector” • Some data hard keyed in, and not clear what was original, what was analysis, and what was grooming/reshaping • Tight timeframe • High profile, and quality guarantee essential
This is just one worksheet of around 30 – only 20 of which we could find…
Principles for a solution • Separate the data from the grooming and analysis • Reproducibility • Systemised constant teamwork and peer review, requiring: • Repository-based version control • Centralised and disciplined folder and file structure • Modular code with custom functions, palettes and themes • Frequent integration and continuous testing • Cut the dependencies on externals • Extreme code-based plot polishing • And for our next project (Small Business Report): • Frequent iteration with the client (policy team and Minister) • Separate exploratory analysis from polishing
The toolkit ( future warehouse)
The folder structure • raw_data • concordances • NZ.Stat • Infoshare • custom • grooming_code • data • analysis_code • output • Part I • Part II – dashboards • R • .git • Held together with key files in the project’s root directory: • integrate.r(in future to replace with makefile) • sector_report.rproj • .Rprofile
John’s PC Shared fileserver Jane’s PC
Particular things that make this approach hum • Git • Rstudio projects are a great way of organising • But Notepad++ users can still participate if they use R shortcuts in the root folder of the repo • Clean, pared back, modular scripts essential for readability • Create your own palette, ggplot2 themes, font variables and functions for image dimensions and resolution • Resource for oversight, coordination, ensuring the build works • Manager needs to be technical enough to dive into the repo • You wouldn’t have a policy manager who couldn’t use Word • Clear spec – or ability to have agile iterative approach with client
Joel’s 12 point test for software developer teams • Do you use version control for your code?* • Can you make a build in one step? • Do you make frequent builds (at least daily)? • Do you use an issues tracking system?* • Do you fix bugs before writing new code? • Do you have an up-to-date schedule? • Do you have a spec? • Do programmers have quiet working conditions? • Do you use the best tools available?* • Do you have testers? (not sure this one’s relevant) • Do new candidates write code during their selection ? • Do you do hallway usability testing? Surprisingly relevant for analytics teams too Tweaked (*) from http://www.joelonsoftware.com/articles/fog0000000043.html
Five things needed for successful capability building • External demand • Sustained management commitment • Resourcing for trialling, experiments and intensive customised training • Supportive IT team and environment • Preparedness for the process to take years rather than months
Some particular issues in government • Demand from Ministers and senior management essential • Courage required to raise the expectations • Need to push some boundaries • Work with, not against, your ICT team • Common goals • Recognise where ICT projects are needed and when to use “BAU” • Balance of waterfall v. agile and beyond • But - be prepared to use personal machines as a trial environment for new tools and techniques • Only way to know what you want to invest in – high costs in packaging up new software for locked down networks • A significant sized team essential to build momentum • Recent developments only possible for us with the creation of MBIE