1 / 15

Scaling the Data Scientist

Scaling the Data Scientist. Dr. Ira Cohen, Chief Data Scientist, HP Software. HP-Software and Data Science. HP-Software products collect huge amounts of IT data. Requirements. Changes. Defects. Security events. System Monitoring. Logs.

ceana
Download Presentation

Scaling the Data Scientist

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scaling the Data Scientist Dr. Ira Cohen, Chief Data Scientist, HP Software

  2. HP-Software and Data Science HP-Software products collect huge amounts of IT data Requirements Changes Defects Security events System Monitoring Logs • “Big Data & Predictive Analytics: The Future of IT Management” MikeGualtieri, Forrester  Events Configuration Incidents Network data App Monitoring Test data Customers want us to transform the data to actionable information

  3. Need

  4. A tale of two worlds

  5. Our solution Developer Data analytics specialist

  6. How? Data infrastructure New Dev tool Training Mentoring Community

  7. Training: Practical Machine Learning 4 day training Commitment to complete first project

  8. Early detection of anomalous behavior in IT systems Yonatan Ben Simhon & Yaneeve Shekel Practical Machine LearningOhad Assulin, Efrat Egozi Levi, Ira Cohen Automatic Vulnerability Categorization Barak Raz & Ben Feher Sales Pipeline Early Warning Gabriel, Alvarado Classifying Security Events Yoni Roit & Omer Weissman Predictive Analytics in Release Management Sigalit Sade URL to Action Classification Boaz Shor & Eyal Kenigsberg Cloud Delivery Optimization (CDO) Ran, Levi Automatic Event Prioritization Anat Levinger & Roy Wallerstein

  9. Pushing My Buttons Gil Zieder, Ofer Eliassaf, Boris Kozorovitzky

  10. The process @ work As a Pusher or DevOps of a project you would like to know if the given change set is safe to push into the production branch. Rank based attribute selection 87% Accuracy with K-NN • 9 open source projects, 8806 individual commits • Get labels of “good” or “bad” commit by running tests after each commit • “good” – tests pass, “bad” – tests fail • Classification algorithms • K-NN, SVM, Decision Tree, Random Forest, … • 80attributes per commit • source control, previous commits, and code complexity based attributes: • e.g., average change frequency, previous commit state, cyclomatic complexity

  11. Analytic specialist program: Results

  12. Can we do better? • Yes. From months to days! • How? • Create a simple tool for analytic specialists • Automate the data scientist as much as possible

  13. Project Titan

  14. Titan: Demo

  15. Scaling the data scientist Data Scientist • Provides expert advice • Develops new types of machine learning solutions Analytic specialists Develops using standard machine learning Uses simplified tool

More Related