1 / 25

Data Science in the Wild David Asboth & Shaun McGirr

Learn about Cox Automotive UK's data science initiatives and the different types of data scientists. Discover the essential skills required to excel in the field.

warrenp
Download Presentation

Data Science in the Wild David Asboth & Shaun McGirr

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Science in the Wild David Asboth & Shaun McGirr

  2. About Cox Automotive UK Our mission: to transform the way the world buys, sells, and owns vehicles With data

  3. About Cox Automotive UK

  4. About Cox Automotive UK • Valuation • Stock Monitoring & Alerts • “Data as a Service”

  5. Data Science at Cox Automotive

  6. Data Solutions Structure • Data Engineers • Business Intelligence • Product Development • Valuations team • Data Science

  7. What is a Data Scientist?

  8. Different Types of Data Science Type B Danger Zone Type A Icons made by Smashicons from https://www.flaticon.com, licensed by Creative Commons BY 3.0

  9. Data Science Job Descriptions Type A • Data is a given • Focus is on new knowledge • Questions are clear • Measurable success (target) • Interpretability may or may not matter • Similar to a Kaggle competition Type B • Data is messy • Focus on helping decision making • Questions are ambiguous • Success is undefined • Interpretability is often important

  10. Data Science Skills Type A • PhD in a numerical science • Knowledge of wide range of cutting edge machine learning • Deep mathematical understanding • Is a researcher at heart Type B • Experience with “real” data • Knows a few algorithms well • Understanding of business projects • Focused on pragmatic outcomes

  11. Could you do your job just as well if your dataset was unlabelled?

  12. https://xkcd.com/1838

  13. Example 1: “How many blue cars will we sell tomorrow?”

  14. Example: “How many blue cars will we sell tomorrow?” Type A Data Scientist: • Extract data from “previous sales” dataset • SELECT * FROM Sales WHERE colour=“blue” etc. • Use machine learning to predict future • Regression problem? • Time series modelling? • Done

  15. Example: “How many blue cars will we sell tomorrow?” Type B Data Scientist: • Extract data from “previous sales” dataset • Oh…. • Let’s start by answering another question: How many blue cars did we sell yesterday? That depends…

  16. Example: “How many blue cars did we sell yesterday?” Depends what you mean by: • Car • Blue • Out of 6,974 unique colours, 1,339 contain “blue” including: • Digital Blue • Blue Ambition • Blue/Green/Silver • Danish Blue • Yesterday • Sell

  17. Example: “How many blue cars will we sell tomorrow?” Type B Data Scientist: • Extract data fromCreate “previous sales” dataset • Try to use machine learning to predict future • Iterate dataset as required • Done?

  18. Example 2: Detecting Bots

  19. Example 2: Bot Detection • You need an accurate count for the number of unique visitors • An estimated 80% of traffic is bots (spiders etc.) • Build a classifier to detect bots

  20. Example 2: Bot Detection Type A: • Take data (which is a given) • Analyse the two classes • Build a classifier • Done • Company will use algorithm to count bots more accurately

  21. Example 2: Bot Detection Type B: • Understand current manual process & business reasons for detecting bots • Take what data we can find • Classes labelled manually based on business assumptions • Relevant features are unknown and/or need to be calculated • Analyse the two classes • Build a classifier • Present to the stakeholders & work together on integration • Help them change how they do things using our findings • Done?

  22. Speedometer: Revisited Type B Type A Icons made by Smashicons from https://www.flaticon.com, licensed by Creative Commons BY 3.0

  23. Summary of skills to be a good Type B Data Scientist • Experience with dirty data • Statistics • Care about the wider context (the business and the data generating process) • Presentation and people skills

  24. Final Thoughts If you: • Care about getting things done even in a messy world • Are excited by helping people make better decisions • Fancy yourself as an amateur philosopher Then there is a world of data science out there for you!

  25. Questions? david.asboth@coxauto.co.uk shaun.mcgirr@coxauto.co.uk

More Related