370 likes | 705 Views
Powering up Analytics with Big data -the SAS Way!. -Priya Sarathy, ph.d Analytic Sales Consultant, SAS. Salute to the world run by Statisticians. Play. Agenda. High Performance Analytics (HPA) Meeting Challenges The What? Understanding the Analytic paradigm Shift
E N D
Powering up Analytics with Big data -the SAS Way! -Priya Sarathy, ph.d Analytic Sales Consultant, SAS
Agenda High Performance Analytics (HPA) • Meeting Challenges • The What? • Understanding the Analytic paradigm Shift • High Performance Analytics – the SAS way • What is the business value add
High Performance Analytics What is HPA delivering What is HPA about? • Evolving business needs Why does business need it? • Leveraging information to compete in the market • Raise revenue/ profits • Reduce costs and inefficiencies
Big Data Analytics • Big Analytics High performance Analytics grew from the need for big data Analytics! Proactive Analytic Capabilities BI Big Data BI Reactive Big Large Data Size
High performance analytics Hpa is impacting business performance in many areas
High performance analytics What Conversations are you involved in?
What do others think? Data measurement is the modern equivalent of the microscope* At the World Economic Forum last month in Davos, Switzerland, Big Data was a marquee topic. A report by the forum, “Big Data, Big Impact,” declared data a new class of economic asset, like currency or gold. 28 year Asst. professor at Stanford combined math with political science in his undergraduate and graduate studies, seeing “an opportunity because the discipline is becoming increasingly data-intensive.” His research involves the computer-automated analysis of blog postings, Congressional speeches and press releases, and news articles, looking for insights into how political ideas spread. It’s not just more streams of data, but entirely new ones- countless digital sensors worldwide in industrial equipment, automobiles, electrical meters and shipping crates- measure and communicate location, movement, vibration, temperature, humidity, even chemical changes in the air.. * Quote from Professor Brynjolfsson The Age of Big Data, By STEVE LOHR, NYT
The new Normal – what is HPA doing to Analytics? Analyze 100% of data More/New variables More model iterations Manage complex models More models (per domain area) More questions/ideas/scenarios to evaluate Multiple deployment options: batch, real-time Continuously monitor model effectiveness and retrain The Things you can Think!
High Performance Analytics HPA combines the three pillars to deliver results Data: Leveraging technology to collect, access and manage data Analytics: Adapting to new technology, In-memory, Grid, In-database Platform: Positioning analytics within industry leaders technology solutions
High Performance Analytics advanced analytics and fast computing capabilities are brought together with SAS HPA In a recent National Postinterview with Jim Goodnight, the SAS CEO explains it like this: There's a lot of business processes that will be changing because of the speed at which we can do analytics; using a thousand processes in parallel to do these computations can make it possible to do huge problems that we would never have been able to do before because it would take too long on a single processor. A big part of how HPA gets its speed: it breaks larger problems down into smaller pieces.
High Performance analytics From Sampling to Populations analysis 50 Attributes to 500+ Attributes Reduce run times 18 Hrs - 30 minutes Build more complex models 3 month Lagged modeling to Real time updates Structured data to combining unstructured data Shortening model lifecycle More frequent updates, model iterations real time scoring impacting business bottom-line HPA Helps remove limitations You will have more time to think!
Model lifecycle How much time do you spend on your models? Where would you like to spend more time? Data Analysis 45% Monitoring & Results Reporting 15% Model Build 30% Validation & Implementation 10%
Responsibilities of an Statistical Analyst Model building paradigm Shift Extract, Transform, Load data Data massaging/ mining Aggregating, normalizing data Identifying Analytic approach Building Samples Building Models Creating Scoring Code Validation Reports/ model documentation Implementation for Production Results monitoring Update, refresh, or rebuild model IT – shifting responsibilities to EDW/ DW Data Quality Data integration ODS Production implementation Analyst – building models Access to more and better data Need for documentation and transparency Greater number of business solutions Changing market and data dynamics impacting frequency of build and update
Model lifecycle Changing roles and responsibilities New technology, new tools New business processes New competitive demands Data Analysis 25% Monitoring & Results Reporting 5% Model Build 60% Validation & Implementation 10%
The Farfalle Model The basic structure of analytic function Analytics as a commodity Source: IDC, 2012 • 70% of the effort in analytics is typically on the information management side of the model. • Analytical teams in the middle are small but crucial for translating the data assets into actionable insights. • The organization change side highlights the attributes of behavior changes needed by business users.
Working with a Tsunami of data VOLUME VARIETY VELOCITY DATA SIZE VALUE TODAY THE FUTURE
SAS® HIGH-PERFORMANCE ANALYTICS Embracing new technology, building new strengths Visual Analytics
Physical Layout Scalable analytic Capability COMPUTING FRAME CLIENT FRAME DATA FRAME MID-TIER Node 1 SAS Analytic & Scoring Accelerators RDMBS Node 2 Shared / Clustered File SAS Metadata Servers [Controller Node n cores] SAS Analytic & Scoring Accelerators Node n HADOOP
High Performance Analytics Changing the way Analytics is done bottoms up • HPLOGISTIC • HPREG • HPLMIXED • HPFOREST • HPNEURAL • HPREDUCE • HPNLIN • SUMMARY/MEANS • FREQ • RANK • DS2 • SORT
SAS® High-Performance Analytics Server Areas of Model Development that benefit • Predictive Analytics & Data Mining • Binary target & continuous no. predictions • Linear & Non-Linear modeling • Complex relationships • Tree-based Classification • Text Mining • Parsing large-scale text collections • Extract entities • Auto. stemming & synonym detection • Topic discovery • Optimization* • Local search optimization • Large-scale linear & mixed integer problems • Econometrics Time Series • Probability of an event(s) • Severity of random event(s) *Currently only available for Teradata and EMC Greenplum
Financial services Customer acquisition use case Current Process High-Performance Process MODEL DEVELOPMENT MODEL DEPLOYMENT DATA EXPLORATION One algorithm (Neural Network) Multiple algorithms (e.g. Forest, Logistic Reg., etc.) 1 model per day 1 model per 30 minutes 5 hours to process model 3 minutes to process model 84 SECONDS Model lift of 1.6% Model lift of 2.5%
Think left and think right and think low and think high. Oh, the thinks you can think up if only you try!Oh the things you can find, if you don't stay behind! Dr. Seuss(On Beyond Zebra!, 1955)
SAS® High-Performance Analytics Server Key differentiators Only in-memory offering in the market delivering high-end analytics, including text mining and optimization Addresses the entire model development and deployment lifecycle 36 years of proven technology...faster. Opens up vast array of possibilities to get value from big data
Top five ways high-performance analytics will transform marketing Faster, more sophisticated, effective segmentation • segmentation tests can be run against the entire populations in order to determine the best campaign interaction methods Real-time, relevant next-best customer actions or offers • This results in a more relevant offer or customer interaction surfacing at the “point of need” in real-time Instant deployment and management of marketing models that give you a sustainable advantage • companies to quickly and efficiently update their numerous models without submitting a slow overnight batch update process. 1:1 real-time experiences to bolster brand connections • The outcome is more precise, real-time interactions with consumers at the “point of need.” Optimized marketing for broader business impact • Now businesses can not only determine the customer and financial impacts of their campaigns faster but also adapt instantaneously to market, competitive and customer changes.
Healthcare payer BUSINESS ISSUE Electronic medical records (EMRs) driving a data explosion Utilize all of the unstructured text (records, case notes, emails, transcripts, etc.) How to improve quality and cost of care? “Create Healthier Lives” SOLUTION SAS® High-Performance Analytics Server including HP Text Mining Greenplum Data Computing Appliance RESULTS Reduce model processing time from four hours to 10 seconds. Reduce misclassification rates from 30% to 10% Historical models improved with more than 10% lift I can now tell that a prescription will harm a patient before you write it… I can tell that a customer is dissatisfied before you lose him or her... I can now determine that a claim is fraudulent before you pay it… “ SAS is helping make our member services the best in the industry, In less than one hour, we can load a huge table (169 million row dataset), find the best variables, compare different models and pick the best model. I would not attempt to model a dataset this large without SAS HPA Server.” Mark Pitts Director of Data Science, Solutions and Strategy united healthcare group
SAS High-Performance Leveraging Database Appliance for HPA Request is sent to the root node inside the appliance Root Node (Teradata Managed Server) Worker Node 1 Worker Node 2 Worker Node N
SAS High-Performance Analytical Computation and data request sent to the worker nodes Root Node Worker Node 1 Worker Node 2 Worker Node N
SAS High-Performance Data request sent to the database. data slice moved into memory Root Node
SAS High-Performance Analytic Processing with internode communication Root Node
SAS High-Performance Worker node returned to the root node. Job is complete. Root Node