140 likes | 269 Views
IT Metrics/Dashboards at Duke: Curation , Automation, Aggregation?. CSG January 11, 2012. Background. Initiated metrics effort in 2008 (1 FTE dedicated) Aligned with finance initially, now integrated with service management team
E N D
IT Metrics/Dashboards at Duke:Curation, Automation, Aggregation? CSG January 11, 2012
Background • Initiated metrics effort in 2008 (1 FTE dedicated) • Aligned with finance initially, now integrated with service management team • Focused on availability, capacity, service usage/demand, internal resources • Consult with units on what metrics they should capture • Help with collection and analysis of metrics • Ensure consistent, universal reporting of data
Initial scope and progress • Monthly reporting process for managers and community • Internal: Data collection, report development, management review, publication of detailed report • External: availability, usage, performance summaries • Limited range of services historically • Email, network/voice, paging, HR, IT security, telepresence
The challenges of our “curation” process • Labor-intensive • Monthly periodicity limits immediacy • Fine for long-term trending, but less immediate • Reliance on local units to self-report • Leveraging Duke’s post-incident review (PIR) process, but remaining data still sui generisCuration: work by content specialists immersed in a specialized discipline and imbued with analysis
Duke’s metrics curation challenges • No one wants to read “yesterday’s news” • Hard to avoid with curation • The stories are too long • New monthly exec-summary focused on trends has helped (like the WSJ’s “What’s News” box) to a point • Focus on putting out a daily paper takes away from the “longreads” • Time required to produce manual reports takes away time expected to be spent on consulting with units to help them be primary data-gatherers
Automation: leveraging monitoring data • System/network monitoring produces thousands of data points a day – how can we use them? • Left: Daily low-level alert analysis (SPC methodology) • Not shown: Loss of redundancy reports
Duke’s metrics automation challenges • Automation a great start, but plenty of “curation” still occurs! • Monitoring’s “service dashboard” useful, but not ready to be directly published • Monitoring events don’t/can’t catch everything • Human adjustment still needed to raw data • Low-level alert reports are automated… and reviewed by daily operational staff • Ultimately, there’s infinite ways to leverage data, but what do you care about?
Duke’s aggregation questions • What data mix should appear? • Curated data from monthly metrics, live report on high-priority user tickets, performance graphs, availability measures against targets on one screen • Broad enough to be watched; specific enough to be useful • What platform should we use? • SharePoint? Improve collection, visualization • ServiceNow? Leverage dashboards, APIs, data structure; easy access to support tickets • Javascript/JSON components for easily customized dashboards? – node.js and d3.js