340 likes | 530 Views
Prescriptive Analytics Part I. Nick Gonzalez, 2/10/14. “It is change, continuing change, inevitable change, that is the dominant factor in society today. No sensible decision can be made any longer without taking into account not only the world as it is, but the world as it will be.”.
E N D
Prescriptive AnalyticsPart I • Nick Gonzalez, 2/10/14
“It is change, continuing change, inevitable change, that is the dominant factor in society today. No sensible decision can be made any longer without taking into account not only the world as it is, but the world as it will be.” -Isaac Asimov
Topics Covered • Reference automated prescriptive analytics system • Automated algorithm selection • Distributed algorithm development
Covered in future presentations • Ontology creation and extraction • Representing solutions using ontologies • Business optimization • everything else…
Data is outpacing us • Humans can not keep up • Computers can but…
Prescriptive Analytics • Scalable • Automated understanding • Automated predictive analytics • Actionable • Closed loop
game simulations learning process predictive models game server deploy metrics rules Example. Video Games write modify user space analytics space copy to production generate start understanding build / update models
Problems • Scale • Speed • Adaptability
“I do not fear computers. I fear the lack of them.” - Isaac Asimov
Goals • Remove the human element from analysis phases • Generate accurate, actionable, predictive models • Combine predictive models and simulation to solve problems
Guiding Principle Big data with simple algorithms will out perform sampled data with complex algorithms.
How is this possible? • Focus on a single problem. • Limit scope • Goal must be • Measurable • Actionable
Data Data Engineering & Understanding Actionable Deployment Prep Modeling Simulation Process
1. Automated Understanding Find the data representation that is most ideal for the problem you are trying to solve.
Raw Data Clean Data Stats meta Automated Understanding Initial Transform
A.1 A.2 … Representation A Representation B Representation C Stats meta Clean Data … Automated Understanding
2. Automated Algorithm Selection Find the algorithm that performs best against the problem you are trying to solve, while meeting all criteria.
Automated Algorithm Selection • Choose algorithms best suited for this type of problem. • Consider the data, types, sparsity, size, and desired outcome • Try multiple algorithms • Calculate the Root Mean Squared Error or some other appropriate measure. • Consider problem domain. • Use cross validation. • Do not just compare the average RMSE • Choose the algorithm(s) that perform the best
Distributed Processing • Learning to Scale
Approaching the Problem • Two ways to approach a problem • Bottom up • Top down
Bottom Up Approach Programmer Design Patterns, Algorithms C++, Java C, Pascal Assembly Language Hardware
Top Down Problem Solver Problem Representation Distributed System Abstractions Functional Languages Hardware
Building Distributed Algorithms • Identify the simplest concepts that describe data processing • Collections • Collection processing Problem Solver Problem Representation Distributed System Abstractions Functional Languages Hardware
Collection Collection Processing Data Data Data Algorithm Data Algorithm Single “Box” Evolution of thought No “Box”
Single PC … map Hadoop MPI k-means density random forest gradient boost …. mapcat reduce filter sort group Coming together
Distributed Processing Interface • Simple concept • Focus on building algorithms • Many ways to implement this concept • Works with both shared memory systems and distributed memory systems
Implementation • Functional language - Clojure • Reusable functions as callbacks • Hadoop drivers written on top of Cascalog • Data location and type are abstracted as “collection”
“Part of the inhumanity of the computer is that once it is completely programmed and working smoothly, it is completely honest.” - Isaac Asimov