500 likes | 659 Views
Expanding Big Data Science: Forward & Backward. C. Randall (Randy) Howard, Ph.D., PMP Big Data Scientist, Thought Leader, Systems Innovation Analyst, Solutions Architect Sr. Data Scientist, Novetta Solutions Adjunct Professor, Mason ’ s Volgenau School of Engineering choward@gmu.edu
E N D
Expanding Big Data Science:Forward & Backward C. Randall (Randy) Howard, Ph.D., PMP Big Data Scientist, Thought Leader, Systems Innovation Analyst, Solutions Architect Sr. Data Scientist, Novetta Solutions Adjunct Professor, Mason’s Volgenau School of Engineering choward@gmu.edu http://www.crhphdconsulting.net/ May 20, 2014 April 4, 2013 Technology Trends, Big Data and Data-Driven Decisions
C. Randall (Randy) Howard, Ph.D., PMP • Senior Data Scientist, Novetta Solutions • Adjunct Professor, Volgenau School of Engineering, GMU • Big Data Overview • Systems Analysis & Design Determining Needs in Big Data • Big Data, Small Details & Time (Metadata) • 2013 Teaching Excellence Award Nominee • Co-Organizer of Big Data Lecture Series, EIT Award Nominee • Member, Data Science Working Groups & Sub-teams • International Author & Speaker • 30 years IT & systems engineering, architecture, trouble-shooting, change & innovation • Ph.D., Information Technology, GMU • BS, MS: Information Systems, VCU
Agenda Context: What is Big Data All About? Forward: Considering Multiple Perspectives Backward: Refactor/Repurpose Legacy Approaches
Context of Material • How was the big data collected? • Empirical Observations & Applications • Critical Thinking • Where is it stored? • Case Studies • Feverishly Codifying • Move from Rescuing to Preventing • What are the results? • Clarifying and Connecting Disparate, Contentious Pieces • Still Working…
My Positions on Big Data • Big Data Science • Big Data: Problem & Opportunity Space • Data Science: Potential Solution Discipline • Big Data Science: “Applying Data Science to Big Data” • Technology “Reboot” CAN Usher in New Generation of Capabilities • Big Data Today • New “Big Data” Tomorrow • Must Clarify Business Value • Have To Think Horizontally & Corporately • But, I am a professor… • Heresy Now? Genius Tomorrow?
IT Disasters & Dilemmas: Possible w/ Big Data?[IT-Failures] Disasters Dilemmas Economic Winter (Do more w/ Less) What is it? Exactly? NSA Trailblazer* $1.2B: over-budget, ineffective, 7-yr boondoggle FBI’s Trilogy Virtual Case File* $170M:Scrapped UK Inland Revenue* $3.5B:Software Errors Obama Care? Ford’s Purchasing System* $400M:Abandoned
My Big Concern!! Peak of Inflated Expectations: Early publicity produces a number of success stories—often accompanied by scores of failures. Some companies take action; many do not. Plateau of Productivity: Mainstream adoption starts to take off. Criteria for assessing provider viability are more clearly defined. The technology’s broad market applicability and relevance are clearly paying off. Slope of Enlightenment: More instances of how the technology can benefit the enterprise start to crystallize and become more widely understood. Second- and third-generation products appear from technology providers. More enterprises fund pilots; conservative companies remain cautious. Curve of Complacency: Early successes satisfy stakeholders that the problem or opportunity is handled, and it is time to move on to the next issue. Meanwhile the Plateau of Productivity that is achieved is much lower.[crh] Dr. C. Randall Howard, PMP (Not a position of Gartner or Dr. Aiken-yet) Trough of Disillusionment: Interest wanes as experiments and implementations fail to deliver. Producers of the technology shake out or fail. Investments continue only if the surviving providers improve their products to the satisfaction of early adopters. Technology Trigger: A potential technology breakthrough kicks things off. Early proof-of-concept stories and media interest trigger significant publicity. Often no usable products exist and commercial viability is unproven. [Aiken] [Gartner]
Big Data & Data Science “1-Page Summary” • Big Data “V”s[IBM]: • Volume (How much in total) • Variety (How many sources) • Velocity (How fast does it come in) • Veracity, Variability, Complexity, etc.[various] • “Hard” Data Science[various] • Math, Science, Analytics • Data-Driven Organizations • Creating data products • Looking to the future • “Soft” Data Science? (Hold on) NOTIONAL DEPICITION Creation & Collection Capabilities Capability gaps due to surges in data collections Data V’s [Conway] • Increases in Sensors • Social Media • Mobile Data Processing & Analytical Capabilities Time
Soft Data Science [crh] Changing Term to Tacit Data Science, but that’s another talk Shrink the Capability Gap Creation & Collection Capabilities NOTIONAL DEPICITION • Hardening the “Soft” • Automate “Hard-to-Automate” • Predict Predictable • To-be Performed by Many w/ “Soft” & “Hard” Data Science Data V’s • Backlogs increase exponentially • Signals become noise • “Action” windows lost / missed • We become bottlenecks to partners “Soft Head Start” w/ “Hard” Data Science Alone Processing & Analytical Capabilities • Notoriety to date • Performed by a few • Bottlenecked by a few? Time
Big Data Science Value Parameters • Increased Actionable Intelligence • Trends Noticed / Confirmed • Leverage Unstructured • Faster Knowledge / Awareness / Ability to Search Data • Flexibility / Extensibility of Data Utilization • New, More Adaptable HW/SW Acquisition Models • More TBD
Other Big Data Considerations • Capabilities Their Own Separate ROI’s • Process Data w/in Acceptable Tolerances: • Time • Errors • Accuracy • Reliability • Etc. • Accountability: Find Critical Intelligence & Make Time Windows • Thus, Big Data Is “Having more data than you can process and manage within acceptable tolerances (e.g. time, quality, cost)”[crh]
BDLS: A Broader Look Big Data Science • Each channel is difficult • Each complements the other • Complexities are compounded exponentially in cross-sections
Multiple Perspectives in Publications • Multi-disciplinary[Gartner-ERDS] teams[Patil] a “broad sample of the population” & involves “teams that frequently partner w/ diverse roles in an organization… to gather, organize, & make use of their data”[EMC-DS] • “Wetware[Gleichauf]” (vs. HW & SW): “People, their skillsets, corporate policies, & organizational structures that define our analytic communities” • Soft Skills[Gartner-ERDS]: • Communication • Collaboration • Leadership • Creativity • Discipline • Passion • Data Scientist can be invaluable…unique combination of technical & business skills…makes them difficult to to find or cultivate. [Gartner-ERDS]
Data Science Teams • Data Science Teams[Patil] • Small-team members should sit close to each other • Mix of skill-sets, some experts, some not • Train people to fish • Functional areas must stay in regular contact and communication. • Impediments • Measuring Performance: Rewarding & Disciplining Teams vs. Individuals • Sharing Intellectual Property w/ Integrated Product Teams (esp. cross-vendor) • “Expert Teams”???? • “Expert Teams” • May find Big Data Science trivial • Typically • have more control over their environment • Don’t need to have the masses engaged But … • Most organizations need to have the knowledge & skills spread out to “Non-experts”
Life-Cycle Service Orchestration Acquisition (FAR) ? Legal Review Life Cycle OODA Loop
Wicked Problems Tip-off Words[Nixon] Networked Integrated Joint Shared Multi-organizational Interoperable Coalition Cross-organizational Community Combined Virtual Big Data is a Wicked Problem!
Wicked Problems[Nixon] • Requires Multiple Stakeholders’ Perspectives • Key Driver: Social Complexity from Integrated Networks • Traditional linear solution styles are not well suited • Needs focus on: • Social Aspects • Gaining Shared Understanding • Try Things • Let Solution Emerge From Cycle of Adaptation • Thus[crh], • Multiple Perspectives Involves Collaboration • Collaboration Technologies MUST BE INNOVATED
Sample Collaboration Innovation[InnovationGames] [InnovationGames] http://innovationgames.com/
Learning Organization [Senge] • Peter Senge (http://www.infed.org/thinkers/senge.htm) • Studied how adaptive capabilities developed • The Fifth Discipline(1990)‘Learning Organization' (LO) • Basic Learning Organization Disciplines: • Systems Thinking • Personal Mastery • Mental Models • Building Shared Vision • Team Learning
Changing Culture • Examples: • Hard-drives • Management Visibility of Data Processing • Target’s former CEO? • Leadership needs to foster a culture of: • Increased curiosity about data • Rewarding experimentation • Counting “Assists” • Need ‘democratization’, or open-access, of data”[Patil] • Or Horizontal Orientation / Governance of Data[crh] • Not trivial - Sharing data exposes risks of: • Misinterpretation • Loss of “credit” associated with results from the data
Education • Establish a new baseline of knowledge to advance • Mason’s Big Data Lecture Series Purpose: • Separate Hype from Reality • Have marquee experts expose what in Big Data: • Is really working and making a difference? • Shows promise? • Has failed? Needs another try? • Are the impediments? • Convey daunting challenge Is feasible, but still a challenge
Learning Revolution [Robinson] • Big Data Science is a REVOLUTION that starts (& continues) w/ LEARNING • Requires new skills • New leadership models • http://www.ted.com/talks/sir_ken_robinson_bring_on_the_revolution.html
What is Legacy? • What “brought us here” • Business Basics (e.g., Planning, ROI) • Structured Systems Analysis (e.g. Waterfall methodology, CMMI) • Yes, • Very Cumbersome • Have Failed too But… • Developed by Very Smart People • For Very Similar Issues • Been “Tested” So….. • Re-invent the Wheel? • To leverage: • Consider Context: Intent & Issues • Re-calibrate / Re-factor For Today • Come Back to “Common Sense”, What Works • Examples: • Meeting Management • Scaled Agile
Enterprise Architecture • “Process of translating business vision and strategy into effective enterprise change by creating, communicating and improving the key requirements, principles and models that describe the enterprise's future state and enable its evolution.[Gartner-EA] • Short: Simple Structure & Alignment of Technical & Business Capabilities So…. • Take “Business Back to IT”[crh] • Maintain Line-of-Sight to Value[crh] • Focus on the Mission and Mission Capabilities!
Capability Dependencies Hierarchy Example: Tool x requires staff time for training & learning
Strategic Planning Survey[Bain] • 14-year Compilation of: • 11 Surveys • 8,504 respondents 2006: 88% 3.93
Strategy to Tactics Line-of-Sight[crh] • Establish Enterprise-wide Decision Criteria • Convey & Carry Commander’s Intent to Execution Levels
Engineering “Risky Art” Landscape • Most impactful, hardest to tame, most ignored • Least concrete, hardest to sell / prove • Needs the most “innovation attention”
A Big Data Systems Analysis & Engineering “Success” Story Lots of ways to do this. Lots of requirements. Lots of ways to get requirements across lots of different stakeholders Users Big Data Lecture Series Fall 2012 Session 4: Solving the Risk Equation Big Data Systems Analysis & Engineering “So-What” 41
Big Data Science Postulates[crh] • If Big Data Science is not a technology problem, then let’s focus on the PROBLEM: the non-technology side, or the human-side. • We must perfect the blending of disciplines to educate & train on Big Data Science (vs. perfecting specific disciplines) • Doing what you are doing will not get you out of the fix you are in since it got you in the fix in the first place – innovate and improve! • Our Big Data Science, Analytics & Intelligence is an ENVIRONMENT and a SYSTEM, not an APP
Big Data / Data Science Postulates (cont’d) Final Quiz: Where do we start? LEARNING!
One last time… How did we do?
References • [1000v] URL: http://www.1000ventures.com/design_elements/selfmade/quaity_cost-4components_6x4.png • [Aiken] Dr. Peter Aiken, Data Blueprint, 2012-2013 • [arcweb] http://www.arcweb.com/events/arc-orlando-forum/pages/analytics-for-industry.aspx • [asq] URL: http://asq.org/learn-about-quality/cost-of-quality/overview/read-more.html • [Bain] http://www.bain.com/management_tools/management_tools_and_trends_2007.pdf • [Barbara’] Dr. Daniel Barbara’, George Mason University, 2012 Big Data Lecture Series • [Batni] Carlo Batini, Cinzia Cappiello, Chiara Francalanci, and Andrea Maurino. 2009. Methodologies for data quality assessment and improvement. ACM Comput. Surv. 41, 3, Article 16 (July 2009), 52 pages. DOI=10.1145/1541880.1541883 http://doi.acm.org/10.1145/1541880.1541883 • [Conway] http://www.drewconway.com/zia/?p=2378 • [coq] URL: http://costofquality.org/wp-content/uploads/2011/02/Cost-of-Quality.jpg • [crh] Dr. C. Randall Howard, PMP, crhPhDConsulting.net • [Crosby] http://www.philipcrosby.com/25years/crosby.html • [ct-bdtech] http://cloudtimes.org/2013/06/13/big-data-techniques-for-analyzing-large-data-sets-infographic/ • [dddm] http://www.clrn.org/elar/dddm.cfm • [DTIC] http://www.dtic.mil/doctrine/new_pubs/ • [econBD] http://www.economistinsights.com/analysis/evolving-role-data-decision-making, August 12th 2013 • [EMC-DS] http://www.emc.com/collateral/about/news/emc-data-science-study-wp.pdf • [Forbes] http://www.forbes.com/sites/christopherfrank/2012/03/25/improving-decision-making-in-the-world-of-big-data/ • [FSAM/BAH] http://www.fsam.gov/about-federal-segment-architecture-methodology.php • [Gartner-EA] http://www.gartner.com/technology/it-glossary/enterprise-architecture.jsp • [Gartner-ERDS] "Emerging Role of the Data Scientist and the Art of Data Science", Gartner, 20 March 2012, ID:G00227058, Douglas Laney, Lisa Kart • [Gartner-HC] http://www.gartner.com/newsroom/id/1763814
References • [gayatri-patele-bay] http://www.slideshare.net/AsterData/gayatri-patele-bay • [Gleichauf] See Bob Gleichauf’s article: http://www.iqt.org/technology-portfolio/on-our-radar/Big_Data_Advanced_Analytics.pdf • [IBM-usingBD] ftp://ftp.software.ibm.com/software/tw/Using_Big_Data_for_Smarter_Decision-Making_v.pdf • [IBM] http://www.ibm.com/developerworks/data/library/dmmag/DMMag_2011_Issue2/BigData/index.html?cmp=dw&cpb=dwinf&ct=dwnew&cr=dwnen&ccy=zz&csr=051211 • [IBM-Analytics] http://www-935.ibm.com/services/multimedia/Analytics_The_real_world_use_of_big_data_in_Financial_services_Mai_2013.pdf • [Infocus] http://infocus.emc.com/robert_abate/the-business-case-for-big-data-part-1/ • Infostory] http://infostory.com/2012/03/28/data-information-knowledge-web/ • [InnovationGames] http://innovationgames.com/ • [IT-Failures] • [http://it-project-failures.blogspot.com • http://it.slashdot.org/submission • http://www.sfgate.com] • [Lwanga] The Job of the Information/Data Quality Professional (2010) Lwanga, Walenta, Talburt (IAIDQ Publication) • [Madnick] Stuart E. Madnick, Richard Y. Wang, Yang W. Lee, and Hongwei Zhu. 2009. Overview and Framework for Data and Information Quality Research. J. Data and Information Quality 1, 1, Article 2 (June 2009), 22 pages. DOI=10.1145/1515693.1516680 http://doi.acm.org/10.1145/1515693.1516680 • [Mason-BDLS] George Mason University Volgenau School of Engineering Big Data Lecture Series, 2011-2012 • [MIT] http://lean.mit.edu/downloads/2010-theses/view-category.html • [Nixon] steven.d.nixon@gmail.com - 08/29/2011, Mason Big Data Lecture Series 2011 • [Nonaka, Hirotaka, Knowledge-Creating Company] Nonaka, Ikujiro, and Hirotaka Takeuchi. The knowledge-creating company: How Japanese companies create the dynamics of innovation. Oxford University Press, USA, 1995. • [O’Reily] https://docs.google.com/present/view?hl=en_US&id=0AXaXKp9bt6OXZGd4YzlnYmRfNThjMmo4dm5yaA from What is data science? O'Reilly Radar • [p36] http://information-retrieval.info/taipale/papers/p36-popp.pdf • [Patil] Patil, D.J., Building Data Science Teams, 2011 • [RG] http://www.riskglossary.com/link/risk_metric_and_risk_measure.htm • [Robinson] http://www.ted.com/talks/sir_ken_robinson_bring_on_the_revolution.html • [Sagan] Dr. Philip Sagan, Infiniti, 2012 Big Data Lecture Series • [Senge] http://www.infed.org/thinkers/senge.htm • [Talburt] Dr. John Talburt, 2012 Big Data Lecture Series • [Tandem] http://www.tandemlabs.com/documents/CPSA2008.pdf
J. C. R. Lickleider's Man-Computer Symbiosis[Aiken] Best approaches combines manual and automated reconciliation!