310 likes | 446 Views
Generating. Big Value from. Big Data. Chicago Big Data Executive Summit June 12, 2013. Using Data to Derive Value. Lessons Learned: Data size is relative to an organizations ability to make use of it Assumptions and bias can get in the way The best insights are actionable.
E N D
Generating Big Value from Big Data Chicago Big Data Executive Summit June 12, 2013
Using Data to Derive Value • Lessons Learned: • Data size is relative to an organizations ability to make use of it • Assumptions and bias can get in the way • The best insights are actionable
Speaker Introduction R. Brendan Aldrich Executive Director, Data Warehousing City Colleges of Chicago • 18 years in Information Technology • 13 years running data warehouse, business intelligence and analytics teams for global high volume data companies such as The Walt Disney Company, Travelers Insurance and Demand Media • Currently building a data democracy at the City Colleges of Chicago • TDWI and AERA membership
Colleges: • Richard J. Daley College • Kennedy-King College • Malcolm X College • Olive-Harvey College • Harry S Truman College • Harold Washington College • Wilbur Wright College • Satellites: • Lakeview Learning Center • Dawson Technical Institute • West Side Learning Center • South Chicago Learning Center • Arturo Velasquez Institute • Humboldt Park Vocational Education Center The City Colleges of Chicago is the largest community college district in the state of Illinois and one of the largest in the country with more than 5,800 administrators, staff and faculty educating over 120,000 students annually at facilities located within the city of Chicago. • Culinary • The French Pastry School • Washburn Culinary Institute • Parot Cage Restaurant • Sikia Banquet Room • Broadcast • WYCC TV (Channel 20) • WKKC FM 89.9 …as well as five child development centers, the Center for Distance Learning and the Workforce Institute
The Origin of Big Data John Mashey, chief scientist at Silicon Graphics until 2000, gave hundreds of talks to small groups in the mid-to-late 1990’s using the term “Big Data” to describe how the boundaries of computing keep advancing.1
Gartner Group 2001: Doug Laney first uses “Volume, Velocity & Variety” to describe Big Data2 2012: Gartner updates the definition to: “Big data are high volume, high velocity and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process automation”
Datafication is Driving Big Data Datafication: Creating new data that didn’t previously exist in digital form The more you know about your customer, the better you can differentiate yourself from your competitors.
Disney’s Magic Bands3 • Customer Value: • Disney’s MagicBands will allow park guests to access the park, sign up for ride waitlists (FastPass), interact with characters, purchase items, lost parents, etc. • Company Value: • What type of guest are you and how do you route through the park (rides, concessions, shows, purchases, etc.) • Route optimization, scheduling, ride balancing • Know your customer • Worldwide • 121.4 million guests (2011) • Florida • 17.1 million guests (2011)
Getting to Big Value(or… Don’t Miss the Trees for the Forest) • Gathering vs. Understanding • Assumptions • Bias
Barrier #1: Gathering vs. Understanding “Big Data is not defined by it’s data management challenges, but by the organization’s capabilities in analyzing the data, deriving intelligence from it, and leveraging it to make forward looking decisions.”4 - IssacSacolick, VP Technology at McGraw-Hill Construction
The “Understanding” Market Takes Off http://www.bigdatalandscape.com/
Value Derived from Human Interaction “Data and data sets are not objective; they are creations of human design. We give numbers their voice, draw inferences from them, and define their meaning through our interpretations.”5 - Kate Crawford, Principal Researcher @ Microsoft Research
What Does Your Data Weigh? Data classification on the value being derived from the data • Light Data • Easily quantifiable measures and facts • Mid-Weight Data • Interesting data; trends; patterns • Heavy Data • Rich, meaningful, verified, and actionable data
Barrier #2: Assumptions People inherently make assumptions… which can lead you to find what you expect as opposed to the marketable anomalies
DVD rental and video streaming company with over • 33 million subscribers (29 million streaming) in 40 countries • Big Data Stats: • More than 50 Cassandra clusters with over 750 nodes • More than 50,000 reads & 100,000 writes per second. • Claims 75% of its subscribers are influenced by what it suggests they will like.6
House of Cards • Netflix’s data indicated that the same subscribers who loved the original BBC production of “House of Cards” also loved movies starring Kevin Spacey or directed by David Fincher.7 • Netflix has committed $100 million to create two 13-episode seasons.
Were they Right? • From a data standpoint, it’s hard to know since Netflix doesn’t release viewership numbers. • But how else could we evaluate? • Facebook likes: 206k • Twitter: 34,706 Followers • Mainstream Culture • Magazine Covers? • Talk shows? • What do you hear? • What could we conclude?
Barrier #3: BIAS “Hidden biases in both the collection and analysis stages present considerable risks, and are as important to the big-data equation as the numbers themselves.”5 - Kate Crawford, Principal Researcher @ Microsoft Research
Classification of Bias8 • Cognitive • Misunderstanding of the probabilities. • Selection • Most available, convenient and/or cost-effective as opposed to most relevant. • Sampling • Most relevant to a subset that may not hold true in the wider population. • Modeling • Biased assumptions drive selection of wrong variables • Funding • Assumptions, interpretations, data and applications skewed to favor funding party • Representation • Larger data sets do not ensure that the data is representative.
Accounting for Bias9 • Know your Enemy • Be aware of biases that may affect your analysis. Document them as part of your results • Make use of Subject Matter Experts • Validate your results with domain experts and use them to test your findings and algorithms • Continuous Exploration • Don’t settle for satisfactory! Investigate the anomalies and explore the data outside of your focus
Generating Big Value • Big Data is quantitative • Deriving meaningful insights requires people • Managing assumptions and bias increases value • Insights identified can be acted upon • Insights acted upon must be continually reviewed Anything Else?
Rise of the Data Democracy “Humans are not an important part of utilizing new data, they are single most important part of the process.”10 - Bryce Maddock, CEO of TaskUs.com
Building a Data Democracy:Enable Everyone with Access • The right data must be available in allareas of the organization. • Access to and use of data will create positive and lasting change. • AllCity Colleges of Chicago employees will be able to use this platform to obtain data and/or run reports. Only part of this challenge is licensing cost! Organizational acceptance, tool selection, bandwidth, data comprehension and accessible training are critical!
Building a Data Democracy:One-Size Does Not Fit All … and Interactive Analytics for all users A unified data warehouse and web-based interface for accessing and interacting with data Reports …User-Created Dashboards
Building a Data Democracy:Increase Data Comprehension & Skills Integrated Data Dictionary and Online Training By integrating necessary reference and training information directly into the analytics website, we enable our employees to know with certainty what their data means and how to use it effectively.
Takeaways • Generating Big Value from Big Data: • Datafication is driving differentiation in the marketplace • Collect the data that drives your business • The value in Big Data is derived from human insight • How much does your data weigh? • Be aware of Assumptions and Bias in your approach • Evaluate what does and doesn’t benefit your analysis • Enable everyone with the right data to succeed • Data democracy
References • Articles • Infographics 10 James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh and Angela Hung Byers, McKinsey Global Institute, “Big data: The next frontier for innovation, competition and productivity”, 5/11, http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation 1 Steve Lohr, The New York Times, “The Origins of ‘Big Data’: An Etymological Detective Story”, http://bits.blogs.nytimes.com/2013/02/01/the-origins-of-big-data-an-etymological-detective-story/ 2 Doug Laney, Blog, “DejaVVVu: Others Claiming Garner’s Construct for Big Data”, 1/14/12, http://blogs.gartner.com/doug-laney/deja-vvvue-others-claiming-gartners-volume-velocity-variety-construct-for-big-data/ 11 Bryce Maddock, Blog, “People and Big Data: Separately Good, Together Great”, 9/26/12, http://www.huffingtonpost.com/bryce-maddock/big-data_b_1908358.html 3 Jules Polonetsky, Linkedin Post, “Magic Lessons for Retailers”, 5/31/13, http://www.linkedin.com/today/post/article/20130531031125-258347-magic-lessons-for-retailers 4IssacSacolick, Blog, “What is Big Data The Real Challenges Beyond Volume, Velocity and Variety”, 12/11/12, http://blogs.starcio.com/2012/12/what-is-big-data-real-challenges-beyond.html Mushroom Networks, Infographic, “Landscape of Big Data”, 2013, http://www.mushroomnetworks.com/infographics/landscape-of-big-data 5 Kate Crawford,Blog, Harvard Business Review, “The Hidden Biases in Big Data”, 4/1/13, http://blogs.hbr.org/cs/2013/04/the_hidden_biases_in_big_data.html Graeme Noseworthy, Infographic, “The Flood of Big Data”, 4/24/12, http://analyzingmedia.com/2012/infographic-big-flood-of-big-data-in-digital-marketing/ 6 Andrew Leonard, Salon, “How Netflix is turning viewers into puppets”, 2/1/13, http://www.salon.com/2013/02/01/how_netflix_is_turning_viewers_into_puppets/ IBM Big Data Hub, Infographic, “Tuning Into Big Data As The Buzz Gets Louder”, 9/26/12, http://www.ibmbigdatahub.com/infographic/tuning-big-data-buzz-gets-louder 7 Mary McNamara, Los Angeles Times, “Netflix’s ‘House of Cards’ looks, but doesn’t sound, like a hit””, 4/27/13, http://articles.latimes.com/2013/apr/27/entertainment/la-et-st-house-of-cards-netflix-20130427 8 James Kobielus,IBM Big Data Hub, “Data Scientist: Bias, Backlash and Brutal Self-Criticism”, 5/16/13, http://www.ibmbigdatahub.com/blog/data-scientist-bias-backlash-and-brutal-self-criticism 9Haowen Chan and Robin Morris, GigaOm, “Careful: Your big data analytics may be polluted by data scientist bias”, 5/4/13, http://gigaom.com/2013/05/04/careful-your-big-data-analytics-may-be-polluted-by-data-scientist-bias/