Modernizing Business with BIG DATA

Modernizing Business with BIG DATA Aashish Chandra Divisional VP, Sears Holdings Global Head, Legacy Modernization, MetaScale

Big Data fueling Enterprise Agility Harvard Business Review refers Sears Holdings Hadoop use case - Big Data's Management Revolution! Sears eschews IBM/Oracle for open source and self build Sears’ Big Data Swap Lesson: Functionality over price? How banks can benefit from real-time Big Data analytics?

Legacy Rides The Elephant Hadoop has changed the enterprise big data game. Are you languishing in the past or adopting outdated trends?

Journey to the world with NO Mainframes.. Cost Savings Open Source Platform Simpler & Easier Code Business Agility Business & IT Transformation Modernized Systems IT Efficiencies High TCO Optimize • I. Mainframe Optimization • 5% ~ 10% MIPS Reduction • Quick Wins with Low hanging fruits Mainframe Migration Inert Business Practices Convert • II. Mainframe ONLINE • Tool based Conversion • Convert COBOL & JCL to Java Resource Crunch PiG / Hadoop Rewrites • III. Mainframe BATCH • ETL Modernization • Move Batch Processing to Hadoop

Why Hadoop and Why Now? THE ADVANTAGES: Cost reduction Alleviate performance bottlenecks ETL too expensive and complex Mainframe and Data Warehouse processing  Hadoop THE CHALLENGE: Traditional enterprises lack of awareness THE SOLUTION: Leverage the growing support system for Hadoop Make Hadoop the data hub in the Enterprise Use Hadoop for processing batch and analytic jobs

The Classic Enterprise Challenge

The Sears Holdings Approach • Key to our Approach: • allowing users to continue to use familiar consumption interfaces • providing inherent HA • enabling businesses to unlock previously unusable data 1 2 3 4 5 6

The Architecture • Enterprise solutions using Hadoop must be an eco-system • Large companies have a complex environment: • Transactional system • Services • EDW and Data marts • Reporting tools and needs • We needed to build an entire solution

The Sears Holdings Architecture

PiG/Hadoop Ecosystem MetaScale

The Learning • We can dramatically reduce batch processing times for mainframe and EDW • We can retain and analyze data at a much more granular level, with longer history • Hadoop must be part of an overall solution and eco-system • We developed tools and skills – The learning curve is not to be underestimated • We developed experience in moving workload from expensive, proprietary mainframe and EDW platforms to Hadoop with spectacular results Over two years of Hadoop experience using Hadoop for Enterprise legacy workload. HADOOP • We can reliably meet our production deliverable time-windows by using Hadoop • We can largely eliminate the use of traditional ETL tools • New Tools allow improved user experience on very large data sets IMPLEMENTATION UNIQUE VALUE

Some Examples Use-Cases at Sears Holdings

The Challenge – Use-Case #1 Sales: 8.9B Line Items Price Sync: Daily Elasticity: 12.6B Parameters Offers: 1.4B SKUs Items: 11.3M SKUs Stores: 3200 Sites Timing: Weekly Inventory: 1.8B rows • Intensive computational and large storage requirements • Needed to calculate item price elasticity based on 8 billion rows of sales data • Could only be run quarterly and on subset of data – Needed more often • Business need - React to market conditions and new product launches

The Result – Use-Case #1 Sales: 8.9B Line Items Price Sync: Daily Business Problem: Elasticity: 12.6B Parameters Offers: 1.4B SKUs • Intensive computational and large storage requirements • Needed to calculate store-item price elasticity based on 8 billion rows of sales data • Could only be run quarterly and on subset of data • Business missing the opportunity to react to changing market conditions and new product launches Items: 11.3M SKUs Stores: 3200 Sites Timing: Weekly Inventory: 1.8B rows Hadoop Price elasticity calculated weekly 100% of data set and granularity Meets all SLAs New business capability enabled

The Challenge – Use-Case #2 Mainframe Scalability: Unable to Scale 100 fold Data Sources: 30+ Mainframe: 100 MIPS on 1% of data Input Records: Billions Hadoop • Mainframe batch business process would not scale • Needed to process 100 times more detail to handle business critical functionality • Business need required processing billions of records from 30 input data sources • Complex business logic and financial calculations • SLA for this cyclic process was 2 hours per run

The Result – Use-Case #2 Mainframe Scalability: Unable to Scale 100 fold Data Sources: 30+ Business Problem: Mainframe: 100 MIPS on 1% of data Input Records: Billions • Mainframe batch business process would not scale • Needed to process 100 times more detail to handle rollout of high value business critical functionality • Time sensitive business need required processing billions of records from 30 input data sources • Complex business logic and financial calculations • SLA for this cyclic process was 2 hours per run Hadoop Teradata & Mainframe Data on Hadoop JAVA UDFs for financial calculations Scalable Solution in 8 Weeks Implemented PIG for Processing Processing Met Tighter SLA $600K Annual Savings 6000 Lines Reduced to 400 Lines of PIG

The Challenge – Use-Case #3 Data Storage: Mainframe DB2 Tables Price Data: 500M Records Processing Window: 3.5 Hours Mainframe Jobs: 64 Hadoop Mainframe unable to meet SLAs on growing data volume

The Result – Use-Case #3 Business Problem: Data Storage: Mainframe DB2 Tables Mainframe unable to meet SLAs on growing data volume Price Data: 500M Records Processing Window: 3.5 Hours Mainframe Jobs: 64 Hadoop Source Data in Hadoop $100K in Annual Savings Maintenance Improvement – <50 Lines PIG code Job Runs Over 100% faster – Now in 1.5 hours

The Challenge – Use-Case #4 Teradata via Business Objects Transformation: On Teradata User Experience: Unacceptable Batch Processing Output: .CSV Files History Retained: No New Report Development: Slow Hadoop • Needed to enhance user experience and ability to perform analytics at granular data • Restricted availability of data due to space constraint • Needed to retain granular data • Needed Excel format interaction on data sources of 100 millions of records with agility

The Result – Use-Case #4 Teradata via Business Objects Transformation: On Teradata Business Problem: User Experience: Unacceptable • Needed to enhance user experience and ability to perform analytics at granular data • Restricted availability of data due to space constraint • Needed to retain granular data • Needed Excel format interaction on data sources of 100 millions of records with agility Batch Processing Output: .CSV Files History Retained: No New Report Development: Slow Hadoop Sourcing Data Directly to Hadoop Transformation Moved to Hadoop User Experience Expectations Met Redundant Storage Eliminated Business’s Single Source of Truth Datameer for Additional Analytics Over 50 Data Sources Retained in Hadoop PIG Scripts to Ease Code Maintenance Granular History Retained

Summary of Benefits

Summary • Hadoop can revolutionize Enterprise workload and make business agile • Can reduce strain on legacy platforms • Can reduce cost • Can bring new business opportunities • Must be an eco-system • Must be part of an data overall strategy • Not to be underestimated

The Horizon – What do we need next? • Automation tools and techniques that ease the Enterprise integration of Hadoop • Educate traditional Enterprise IT organizations about the possibilities and reasons to deploy Hadoop • Continue development of a reusable framework for legacy workload migration

Legacy Modernization Made Easy! www.metascale.com Follow us on Twitter @LegacyModernizationMadeEasy Join us on LinkedIn: www.linkedin.com/company/metascale-llc For more information, visit: Contact: Kate Kostan National Solutions Kate.Kostan@MetaScale.com

Modernizing Business with BIG DATA

Modernizing Business with BIG DATA

Presentation Transcript

Harnessing Big Data with Hadoop

Engineering BIG DATA with HADOOP

Transforming Big Data with D4M

Modernizing Member and Committee Data

Big Data – BIG Business Meetup Group 8/20/2013

Fun With Big Business

Big Data

Data Mining with Big data

Optimization with Big Data

Computations with Big Image Data

Data Mining with Big Data

Big Data Impacts on Telecom Business

Big Data Analytics for Insurance Business

Small Business and Big Data with Thinklayer

Transform your business' big data with Sonata's Unified Data Analytics Platform for Big Data

Big Data Training | Big Data Courses | Big Data Online Courses

Big Data Big Data

Big Data Management with Hadoop

Modernizing Applications with Docker

How Big Data Developer Managing The Small & Big Business?

Big Data

Big Data Business Transformation: Unleashing Data Science Potential