180 likes | 366 Views
Why Spark on Hadoop Matters. MC Srivas , CTO and Founder , MapR Technologies. Apache Spark Summit - July 1, 2014. MapR Overview. Top Ranked. 500+ Customers. Cloud Leaders. Exponential Growth. 3X. 80%. 90%. < 1%. bookings Q1 ‘13 – Q1 ‘14. of accounts expand 3X. software licenses.
E N D
Why Spark on Hadoop Matters MC Srivas, CTO and Founder, MapR Technologies Apache Spark Summit - July 1, 2014
MapROverview Top Ranked 500+ Customers Cloud Leaders Exponential Growth 3X 80% 90% < 1% bookings Q1 ‘13 – Q1 ‘14 of accounts expand 3X software licenses lifetime churn in incremental revenuegenerated by 1 customer > $1B
Rapidly Evolving Landscape APACHE HADOOP AND OSS ECOSYSTEM SQL Batch NoSQL & Search Streaming Data Integrtn. & Access Security Workflow &Data Gov. Provision ML, Graph Tez* Spark Drill* Management Savannah* Cascading GraphX Shark Accumulo* Hue Storm* Juju Pig MLLib Impala Solr HttpFS Spark Streaming MR v1 & v2 Mahout Hive HBase Flume Knox* Falcon* Whirr ZooKeeper YARN Sqoop Sentry* Oozie EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS MapR Data Platform * 2014 TIMELINE
The Complete Spark Stack on Hadoop APACHE HADOOP AND OSS ECOSYSTEM SQL Batch NoSQL & Search Streaming Data Integrtn. & Access Security Workflow & Data Gov. Provision ML, Graph Tez* Spark Drill* Management Savannah* Cascading GraphX Shark Accumulo* Hue Storm* Juju Pig MLLib Impala Solr HttpFS Spark Streaming MR v1 & v2 Mahout Hive HBase Flume Knox* Falcon* Whirr ZooKeeper YARN Sqoop Sentry* Oozie EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS MapR Data Platform * 2014 TIMELINE
Spark Advantages: EASE OF DEVELOPMENT • Easier APIs • Python, Scala, Java IN-MEMORY PERFORMANCE • RDDs • DAGs Unify Processing COMBINE WORKFLOWS • Shark, ML, Streaming, GraphX
Hadoop Advantages: UNLIMITEDSCALE • Multiple data sources • Multiple applications • Multiple users ENTERPRISE PLATFORM • Reliability • Multi-tenancy • Security WIDE RANGE OF APPLICATIONS • Files • Databases • Semi-structured
The Combination of Spark on Hadoop UNLIMITED SCALE EASE OF DEVELOPMENT Operational Applications Augmented by In-Memory Performance IN-MEMORY PERFORMANCE ENTERPRISE PLATFORM COMBINE WORKFLOWS WIDE RANGE OF APPLICATIONS
Industry Leading Ad-Targeting Platform • High performance analytics over MapR M7 NoSQL • Load from M7 table into RDD to augment scoring in real-time • Results fed back to M7 for other applications
Leading Pharma Company: NextGen Genomics Existing process takes several weeks to align chemical compounds with genes ADAM on Spark allows realignment in a few hours Geneticists can minimizeengineering dependency
Cisco: Security Intelligence Operations Sensor data lands in M7 Spark Streaming on M7 for first check on known threats Data next processed on GraphX and Mahout Results queried using SQL via Shark and Impala
Patient information in M7 combined with clinical records to compute re-admittance probability • Process uses Spark with transactional data in M7 • Insurance options decided in real-time on online portals Insurance Giant: Addressing Health Care Regulations
MapR is Unbiased Open Source (a la Linux) • Open source distribution is about providing choice • Linux includes MySQL, PostgreSQLand SQLite • Linux includes Apache httpd, nginxandLighttpd
Thank you Engage with us! maprtech @mapr mapr-technologies MapR maprtech srivas@mapr.com