220 likes | 538 Views
Data Integration for Big Data. Pierre Skowronski Prague le 23.04.2013. IT is struggling with the cost of Big Data. Growing data volume is quickly consuming capacity. Need to onboard, store, & process new types of data. High expense and lack of big data skills.
E N D
Data Integration for Big Data Pierre Skowronski Prague le 23.04.2013
IT is struggling with the cost of Big Data • Growing data volume is quickly consuming capacity • Need to onboard, store, & process new types of data • High expense and lack of big data skills
Prove the Value with Big Data Deliver Value Along the Way Cost: Lower Big Data Project Costs (helps self-fund big data projects) Risk: Minimize Risk of New Technologies (design once, deploy anywhere) Delivery: Innovate Faster With Big Data (onboard, discover, operationalize)
PowerCenter Big Data EditionLower Costs Optimizeprocessing with low cost commodity hardware Traditional Grid Transactions,OLTP, OLAP EDW Documents and Emails ODS Social Media, Web Logs Increase productivity up to 5X Machine Device, Scientific MDM
Hadoopcomplements Existing Infrastructureon low cost commodity hardware 7
5 x better productivity for similar performance In the worst, only 20% slower the hand-coding Mostly, equal or faster Inormatica 1 week vs hand-coding 5-6 weeks 8
PowerCenter Big Data EditionMinimize Risk Quickly staff projects with trained data integration experts Design once and deploy anywhere Deploy On-Premise or in the Cloud Traditional Grid Pushdown to RDBMS or DW Appliance
Graphical Processing LogicTest on Native, Deploy on Hadoop Select incomplete partial records Separate incomplete and complete partial records Partial records only Aggregate all completed and partial-completed records Sort records by Calling number Separate partial records from completed records Completed records only 10
Run it simple on Hadoop Choose execution environment Press Run View hive query 11
Technology Achieving Operational Efficiency With Informatica Expertise & best practices Best practices & reusability Minimaize Risk with Informatica Partners and Certified Developer Community Global Systems Integrators Informatica Developers 9,000+ trained developers • 45,000+ developers in Informatica TechNet • 3x more developers than any other vendor* People * Source: U.S. resume search on dice.com, December 2008
Lower Costs of Big Data ProjectsSaved $20M + $2-3M On-going by Archiving & Optimization The Challenge Data warehouse exploding with over 200TB of data. User activity generating up to 5 million queries a day impacting query performance The Solution The Result Business Reports • Saved 100TBs of space over past 2 ½ years • Reduced rearchitecture project from 6 months to 2 weeks • Improved performance by 25% • Return on investment in less than 6 months ERP EDW CRM Custom Archived Data Phase 2 Interaction Data Large Global Financial Institution
Large Global Financial InstitutionLower Costs of Big Data Projects The Challenge. Increasing demand for faster data driven decision making and analytics as data volumes and processing loads rapidly increase The Solution The Result • Cost-effectively scale performance • Lower hardware costs • Increased agility by standardizing on one data integration platform Near Real-Time Datamarts RDBMS Datamarts RDBMS Traditional Grid Phase 2 RDBMS Data Warehouse Phase 2 Web Logs
Large Government AgencyFlexible Architecture to Support Rapidly Changing Business Needs Traditional Grid The Challenge Data volumes growing at 3-5 times over the next 2-3 years The Solution The Result • Manage data integration and load of 10+ billion records from multiple disparate data sources • Flexible data integration architecture to support changing business requirements in a heterogeneous data management environment Business Reports DW EDW RDBMS DW Data Virtualization Phase 2 Mainframe Phase 2 Unstructured Data
Why PowerCenter Big Data Edition • Repeatability • Predictable, repeatable deployments and methodology • Reuse of existing assets • Apply existing integration logic to load data to/from Hadoop • Reuse existing data quality rules to validate Hadoop data • Reuse of existing skills • Enable ETL developers to leverage the power of Hadoop • Governance • Enforce and validate data security, data quality and regulatory policies • Manageability 17