440 likes | 776 Views
Managing the Data Lifecycle of Big Data Environments. Brian Vile Program Director, InfoSphere Product Marketing. Agenda. Trends Information Governance for Big Data Recent II&G for Big Data announcements. The Era of Big Data Demands Confidence . Volume. Variety. Velocity. Veracity.
E N D
Managing the Data Lifecycle of Big Data Environments Brian Vile Program Director, InfoSphere Product Marketing
Agenda • Trends • Information Governance for Big Data • Recent II&G for Big Data announcements
The Era of Big Data Demands Confidence Volume Variety Velocity Veracity Data at Scale Terabytes topetabytes of data Data in Many Forms Structured, unstructured, text, multimedia Data in Motion Analysis of streaming data to enable decisions within fractions of a second. Data Uncertainty Managing the reliability and predictability of inherently imprecise data types.
IIG maturity is a key characteristic of big data initiatives
IIG required for big data to go-live How important was having information integration and governance… Production Pilot Sandbox Base: Variable Director or VP level professionals with decision making authority for Big Data technologies Source: “IBM Data Governance”, a commissioned study conducted by Forrester Consulting on behalf of IBM, July, 2013
Security comes first…access and quality top of mind What best describes how you govern 'big data' today? (top 5) Base: 512 Director or VP level professionals with decision making authority for Big Data technologies Source: “IBM Data Governance”, a commissioned study conducted by Forrester Consulting on behalf of IBM, August, 2013
Agenda • Trends • Information Governance for Big Data • Recent II&G for Big Data announcements
InfoSphere IIG Covers both Analytical and Operational Use Cases Enhanced 360 Viewof the Customer Application Development & Testing Big DataExploration Security/IntelligenceExtension Application Efficiency OperationsAnalysis Security & Compliance Data Warehouse Augmentation Application Consolidation & Retirement
The 5 Key Analytical Use Cases Enhanced 360o View of the Customer Extend existing customer views (MDM, CRM, etc) by incorporating additional internal and external information sources Big Data Exploration Find, visualize, understand all big data to improve decision making Security/Intelligence Extension Lower risk, detect fraud and monitor cyber security in real-time Operations Analysis Analyze a variety of machine data for improved business results Data Warehouse Augmentation Integrate big data and data warehouse capabilities to increase operational efficiency
Data Warehouse Augmentation Integrate big data and data warehouse capabilities to increase operational efficiency • Challenges • Leveraging structured, unstructured, and streaming data sources for deep analysis • Low latency requirements • Query access to data • Optimizing warehouse for big data volumes IIG capabilities • High performance and high quality data loads • Archive to ensure performance, compliance and lower costs • Standardized approach to discovering your data assets • Metadata management • Database activity monitoring
IIG Is Essential - Ingest, Understand, & Govern Data Advanced Analytics/New Insights New/Enhanced Applications All Data Sources Big Data Platform Capabilities • Information Ingest • Real-time Analytics • Warehouse & Data Marts • Analytic Appliances Streaming Data Watson Cognitive Learn Dynamically? Prescriptive Best Outcomes? Predictive What Could Happen? Descriptive What Has Happened? Exploration and Discovery What Do You Have? Text Data Alerts Applications Data Automated Process Time Series Case Management Geo Spatial Analytic Applications Video & Image Cloud Services Relational ISV Solutions Social Network
Open Architecture/Multiple Product Entry Points Real-time Analytics Data Exploration Enterprise Warehouse Data Marts Information Ingestion and Integration Data Exploration Archive Information Governance, Security and Business Continuity IBM Big Data & Analytics Reference Architecture
Enhanced 360o View of the Customer Extend existing customer views (MDM, CRM, etc.) by incorporating additional internal and external information sources • Challenges • Need a deeper understanding of customer sentiment from internal and external sources • Desire to increase customer loyalty and satisfaction • Challenged getting the right information to the right people for cross-sell & up-sell IIG capabilities • Leverage pre-built domains & extend custom data domains • Use business services library • Analyze, validate and monitor data quality; cleanse and enrich data • Search probabilistically • Integrate data of any complexity from diverse sources
Improving the customer experience by better understanding behaviors drives almost half of all active big data efforts Big data objectives Customer-centric outcomes New business model Employee collaboration Operational optimization Risk / financial management Top functional objectives identified by organizations with active big data pilots or implementations. Responses have been weighted and aggregated. Total respondents n = 1061 Source: 2011 IBM Global Chief Marketing Officer Study and 2012 IBM Global Chief Executive Officer Study
The 5 Key Operational Use Cases Enhanced 360o View of the Customer Extend existing customer views (MDM, CRM, etc) by incorporating additional internal and external information sources Efficient Application Development & Testing Create and maintain right-sized dev, test & training environments Improve Application Efficiency Manage data growth, improve performance, and lower the cost for mission-critical applications Security and Compliance Protect data, improve data integrity, mitigate breach risks and lower compliance costs. Application Consolidation and Retirement Archive old application data and streamline new application deployment
Security and Compliance Protect data, improve data integrity, mitigate breach risks and lower compliance costs. • Challenges • Inability to identify sensitive data • Lack of common definition of sensitive data elements • Increasing number of regulations • Shrinking time to comply • LOB variances for privacy rules • Difficult to monitor privileged user access IIG capabilities • Discover and understand sensitive data in all systems • Database and file system level activity monitoring • Mask and redact sensitive data • Compliance reporting
Mask data in databases and applications Patient No 123456 SSN 333-22-4444 Name Erica Schafer Address 12 Murray Court City Austin State TX Zip 78704 Patient No 112233 SSN 123-45-6789 Name Amanda Winters Address 40 Bayberry Drive City Elgin State IL Zip 60123 Mask • Names • Geography • Credit Card Numbers • Telephone numbers • Email addresses • Social Security numbers • Account numbers Sensitive Data • Certificate/license numbers • Vehicle identifiers numbers • Web URL's • IP Addresses • Business Data • Corporate intelligence
Example: Semantic Masking Symptom Code 157:
Example: Semantic Masking Rules Age and income must be analyzed in a range Ethnicity and Symptom codes must be non-identifiable Name, Address and Phone need to be masked
Protect sensitive data in databases, data warehouses, Big Data Environments and file shares DATA What map-reduce jobs are they running? Who is running specific big data requests? Big Data Environments Hadoop Activity Monitoring InfoSphere BigInsights NEW • HDFS • MapReduce • Hive • HBASE • CouchDB • Cassandra • MongoDB • GreenPlum • HortonWorks What data are they accessing?
Application Consolidation and Retirement Archive old application data and streamline new application deployment with test data management, integration, and data quality. • Challenges • Big data leads to more systems and a greater need to consolidate • Manual data integration, quality, and archiving is slow and costly • Difficult to ensure legal compliance for data retention • 10-40% of projects for profiling, mapping and retiring data manually IIG capabilities • Discover and understand data in all systems • Retain and dispose according to retention policies • Efficient test data management • Cleanse and consolidate data • Rapidly load data to new system
Forrester’s four fates for applications • Monitor & maintain • Keep the lights on • Modernize it • UI, DBMS, enhancements, migrations • Replace it • BPO, SaaS, Package, rewrite, or hybrid • Retire it • Remove from production environment (retire) • Decommission (leave in inquiry mode) Classic data management use cases Consolidation and migration of legacy apps use cases ERP consolidation retirement and / or migrate legacy apps ERPconsolidation, migration, data archival use cases Source: Forrester Research November 2012
Improve Application Efficiency Manage data growth, improve performance, and lower the cost for mission-critical applications • Challenges • Big data growth saddles applications with too much data • Slower response times • Increased storage and hardware costs • Longer downtime periods for batch updates • Longer downtime for application upgrades IIG capabilities • Discover and understand data that may be archived • Define lifecycle policies • Archive business object based upon retention policies • Search and retrieve archived data • Supply archived data to warehouses or Hadoop for analysis
The pro’s and con’s of a “Keep Everything” strategy VS Data Lake Data Swamp
The pro’s and con’s of a “Keep Everything” strategy Source: IBM 2012 CGOC Summit Survey
Database Archiving Data Archives Production Archive Historical Retrieve Current Can selectively restore archived data records Reference Data Historical Data Universal Access to Application Data InfoSphere Data Explorer InfoSphere BigInsights Application ODBC / JDBC Report Writer XML Data Archiving is an intelligent process for moving inactive or infrequently accessed data that still has value, while providing the ability to search and retrieve the data
Agenda • Trends • Information Governance for Big Data • Recent II&G for Big Data announcements
IIG Evolves for the Era of Big Data Automated Integration How do I get access to new big data sources? 1 Business users need rapid data provisioning among the zones Visual Context How do I digest all of this new information? 2 Categorize, index, and findbig data to optimize its usage Agile Governance 3 How do manage all of this new data? Ensure appropriate actions based on the value of the data
Innovations in Information Integration and Governance InfoSphere Data Click Self-service access to a growing variety of big data in traditional, NoSQL and Hadoop sources Automated Integration 80% 170x 2 Click Faster ActivityMonitoring Faster MetadataIngestion Data Integration Information Governance Dashboard Immediate, visual context for critical decisions and actions Understand big data to leverage it better Visual Context InfoSphere Privacy & Security Find and protect sensitive big data Single point of security for traditional, NoSQL & big data Agile Governance
Information Governance DashboardVisualize and Control Governance • Innovation • Measurements for policies and KPIs • Rapid creation of tailored dashboards • Value • Immediate insight into governance policy status • Interception of issues when they start, right at the source • Usage • Raises data confidence with visual governance status 1000s Of data points and policies visualized Visual Context
Confidence Is Essential for Actionable Insight Automated Integration Visual Context Agile Governance • Make decisions with greater certainty • Analyze rapidly while providing necessary controls • Increase the value of data