150 likes | 358 Views
Scaling the Data Scientist. Dr. Ira Cohen, Chief Data Scientist, HP Software. HP-Software and Data Science. HP-Software products collect huge amounts of IT data. Requirements. Changes. Defects. Security events. System Monitoring. Logs.
E N D
Scaling the Data Scientist Dr. Ira Cohen, Chief Data Scientist, HP Software
HP-Software and Data Science HP-Software products collect huge amounts of IT data Requirements Changes Defects Security events System Monitoring Logs • “Big Data & Predictive Analytics: The Future of IT Management” MikeGualtieri, Forrester Events Configuration Incidents Network data App Monitoring Test data Customers want us to transform the data to actionable information
Our solution Developer Data analytics specialist
How? Data infrastructure New Dev tool Training Mentoring Community
Training: Practical Machine Learning 4 day training Commitment to complete first project
Early detection of anomalous behavior in IT systems Yonatan Ben Simhon & Yaneeve Shekel Practical Machine LearningOhad Assulin, Efrat Egozi Levi, Ira Cohen Automatic Vulnerability Categorization Barak Raz & Ben Feher Sales Pipeline Early Warning Gabriel, Alvarado Classifying Security Events Yoni Roit & Omer Weissman Predictive Analytics in Release Management Sigalit Sade URL to Action Classification Boaz Shor & Eyal Kenigsberg Cloud Delivery Optimization (CDO) Ran, Levi Automatic Event Prioritization Anat Levinger & Roy Wallerstein
Pushing My Buttons Gil Zieder, Ofer Eliassaf, Boris Kozorovitzky
The process @ work As a Pusher or DevOps of a project you would like to know if the given change set is safe to push into the production branch. Rank based attribute selection 87% Accuracy with K-NN • 9 open source projects, 8806 individual commits • Get labels of “good” or “bad” commit by running tests after each commit • “good” – tests pass, “bad” – tests fail • Classification algorithms • K-NN, SVM, Decision Tree, Random Forest, … • 80attributes per commit • source control, previous commits, and code complexity based attributes: • e.g., average change frequency, previous commit state, cyclomatic complexity
Can we do better? • Yes. From months to days! • How? • Create a simple tool for analytic specialists • Automate the data scientist as much as possible
Scaling the data scientist Data Scientist • Provides expert advice • Develops new types of machine learning solutions Analytic specialists Develops using standard machine learning Uses simplified tool