240 likes | 475 Views
Revolution Analytics. Overview of Revolution R Enterprise. Joseph B. Rickert , Marketing Manager. For the Dallas R User’s Group. Agenda. Revolution Analytics Today Revolution R Enterprise Revolution Analytics in the Enterprise Big Data with RevoScaleR
E N D
Revolution Analytics • Overview of Revolution R Enterprise • Joseph B. Rickert, Marketing Manager • For the Dallas R User’s Group
Agenda • Revolution Analytics Today • Revolution R Enterprise • Revolution Analytics in the Enterprise • Big Data with RevoScaleR • Deploying R Throughout the Enterprise with RevoDeployR
Corporate Overview & Quick Facts • “Revolution Analytics is the leading commercial provider of software and support for theopen-source R statistical computing language.”
Open Source Analytics for the Enterprise • Most advanced statistical analysis software available The professor who invented analytic software for the experts now wants to take it to the masses • Half the cost of commercial alternatives Power • 2M+ Users • 2,500+ Applications Finance Statistics Productivity Life Sciences Predictive Analytics Manufacturing Enterprise Readiness Retail Data Mining Telecom Social Media Visualization Government
Revolution R Enterprise Productivity
Revolution R Enterprise has Open-Source R Engine at the core 2,500 community packages and growing exponentially Community Packages Technical Support Multi-ThreadedMath Libraries Web ServicesAPI Big DataAnalysis ParallelTools DeveloperIDE BuildAssurance R Engine Language Libraries
A network of partners for integrated, large-scale data analysis • Deployment / Consumption • Advanced Analytics • Data Infrastructure
Revolution R Enterprise Performance
Performance: Intel MKL Math Libraries 1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php 2. http://r.research.att.com/benchmarks/ Open Source R Revolution R Enterprise
Revolution R Enterprise Big Data Analysis
A common analytic platform across big data architectures Hadoop File Based In-database
Two Big Data problems: capacity and speed • Capacity: problems handling the size of data sets or models • Data too big to fit into memory • Even if it can fit, there are limits on what can be done • Even simple data management can be extremely challenging • Speed: even without a capacity limit, computation may be too slow to be useful
RevoScaleR: Big Data Analysis for Revolution R Enterprise Addresses performance by distributing computations between cores and computers Addresses capacity through a collection of functions for chunking through massive data files External Memory Programming Framework DistributedStatisticalAlgorithms A novel high-speed file format designed specifically to support statistical analyses Familiar, high-prodictivity programming paradigm for R users R Language Interface XDF File Format
The basis for a solution for capacity, speed, distributed and streaming data – PEMA’s • Parallel external memory algorithms (PEMA’s) allow solution of both capacity and speed problems, and can deal with distributed and streaming data • External memory algorithms are those that allow computations to be split into pieces so that not all data has to be in memory at one time • It is possible to “automatically” parallelize and distribute such algorithms
RevoScaleR on a Multicore Server Shared Memory Data Data Data Disk Core 0 (Thread 0) Core 1 (Thread 1) Core 2 (Thread 2) Core n (Thread n) Multicore Processor (4, 8, 16+ cores) RevoScaleR • A RevoScaleR algorithm is provided a data source as input • The algorithm loops over data, reading a block at a time. Blocks of data are read by a separate worker thread (Thread 0). • Other worker threads (Threads 1..n) process the data block from the previous iteration of the data loop and update intermediate results objects in memory • When all of the data is processed a master results object is created from the intermediate results objects
RevoScaleR for Distributed Computing Clusters Compute Node (RevoScaleR) Data Partition • Portions of the data source are made available to each compute node • RevoScaleR on the master node assigns a task to each compute node • Each compute node independently processes its data, and returns it’s intermediate results back to the master node • master node aggregates all of the intermediate results from each compute node and produces the final result Compute Node (RevoScaleR) Data Partition Master Node (RevoScaleR) Compute Node (RevoScaleR) Data Partition Compute Node (RevoScaleR) Data Partition
Platform-agnostic Big Data Analytics • Set “compute context” to define hardware (one line of code) • Native job-scheduler handles distribution, monitoring, failover etc. • Same code runs on other supported architectures • Just change compute context • Supported architectures: • Windows: Microsoft HPC Server • Linux: Platform Computing LSF (coming 2012) 42 seconds instead of 6 minutes
R and Hadoop • Hadoopoffers a scalable infrastructure for processing massive amounts of data • Storage – HDFS, HBASE • Distributed Computing - MapReduce • R is a statistical programming language for developing advanced analytic applications • Currently, writing analytics for Hadoop requires a combination of Java, pig, Python, … • The Rhadoop project makes it possible to write PEMAs for Hadoop using the R language alone.
Massively parallel/distributed analytics:RevoConnectR for Hadoop • rhdfs - R and HDFS • rhbase - R and HBASE • rmr- R and MapReduce Write Map-Reduce analytics using only R code with these R packages: HDFS HBASE R Thrift Map or Reduce rhbase Task Node rhdfs More information at: bit.ly/r-hadoop Revolution R Client Job Tracker rmr
Revolution R Enterprise Enterprise Deployment
Revolution R Web Services: RevoDeployR Data Sources & Creation of Analytics Consumption of Analytics & Results Data Analysis Revolution “RevoDeployR” R / Statistical Modeling Expert DeploymentExpert Business Intelligence Interactive Web Apps Cloud / SaaS
Thank you. The leading commercial provider of software and support for the popular open source R statistics language. www.revolutionanalytics.com 650.646.9545 Twitter: @RevolutionR