1 / 17

“Data Mining and Machine Learning Essentials | CSEP 546”

Dive into the fundamentals of data mining and machine learning with CSEP 546. Explore real datasets and implement algorithms for tasks like clickstream mining and recommender systems. Get hands-on with assignments and source materials by top industry experts like T. Mitchell and P. Domingos.

abalzer
Download Presentation

“Data Mining and Machine Learning Essentials | CSEP 546”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSEP 546Data MiningMachine Learning Instructor: Pedro Domingos

  2. Logistics • Instructor: Pedro Domingos • Email: pedrod@cs • Office: CSE 648 • Office hours: Mondays 5:30-6:20 • TAs: Kenton Lee, Alon Milchgrub • Email: kentonl@cs, alonmil@cs • Office: CSE TBD • Office hours: Mondays 5:30-6:20 • Web: www.cs.washington.edu/csep546

  3. Evaluation • Four assignments (25% each) • Handed out on weeks 2, 4, 6 and 8 • Due two weeks later • Mix of: • Implementing machine learning algorithms • Applying them to real datasets (e.g.: clickstream mining, recommender systems, spam filtering) • Exercises

  4. Source Materials • T. Mitchell, Machine Learning,McGraw-Hill (Required) • R. Duda, P. Hart & D. Stork, Pattern Classification (2nd ed.), Wiley (Required) • P. Domingos, The Master Algorithm,Basic Books (Recommended) • Papers

  5. A Few Quotes • “A breakthrough in machine learning would be worthten Microsofts” (Bill Gates, Founder, Microsoft) • “Machine learning is the next Internet” (Tony Tether, Director, DARPA) • Machine learning is the hot new thing” (John Hennessy, President, Stanford) • “Machine learning is Google’s top priority”(Eric Schmidt, Chairman, Alphabet) • “Machine learning is Microsoft Research’s largest investment area” (Peter Lee, Head, Microsoft Research) • “Machine learning is the single most important technology trend” (Steve Jurvetson, Partner, Draper Fisher Jurvetson) • “‘Data scientist’ is the hottest job title in Silicon Valley”(Tim O’Reilly, Founder, O’Reilly Media)

  6. So What Is Machine Learning? • Automating automation • Getting computers to program themselves • Writing software is the bottleneck • Let the data do the work instead!

  7. Traditional Programming Machine Learning Computer Data Output Program Computer Data Program Output

  8. Magic? No, more like gardening • Seeds = Algorithms • Nutrients = Data • Gardener = You • Plants = Programs

  9. Sample Applications • Web search • Computational biology • Finance • E-commerce • Space exploration • Robotics • Information extraction • Social networks • Debugging • [Your favorite area]

  10. ML in a Nutshell • Tens of thousands of machine learning algorithms • Hundreds new every year • Every machine learning algorithm has three components: • Representation • Evaluation • Optimization

  11. Representation • Decision trees • Sets of rules / Logic programs • Instances • Graphical models (Bayes/Markov nets) • Neural networks • Support vector machines • Model ensembles • Etc.

  12. Evaluation • Accuracy • Precision and recall • Squared error • Likelihood • Posterior probability • Cost / Utility • Margin • Entropy • K-L divergence • Etc.

  13. Optimization • Combinatorial optimization • E.g.: Greedy search • Convex optimization • E.g.: Gradient descent • Constrained optimization • E.g.: Linear programming

  14. Types of Learning • Supervised (inductive) learning • Training data includes desired outputs • Unsupervised learning • Training data does not include desired outputs • Semi-supervised learning • Training data includes a few desired outputs • Reinforcement learning • Rewards from sequence of actions

  15. Inductive Learning • Given examples of a function (X, F(X)) • Predict function F(X) for new examples X • Discrete F(X): Classification • Continuous F(X): Regression • F(X) = Probability(X): Probability estimation

  16. What We’ll Cover • Supervised learning • Decision tree induction • Rule induction • Instance-based learning • Bayesian learning • Neural networks • Support vector machines • Model ensembles • Learning theory • Unsupervised learning • Clustering • Dimensionality reduction

  17. ML in Practice • Understanding domain, prior knowledge, and goals • Data integration, selection, cleaning,pre-processing, etc. • Learning models • Interpreting results • Consolidating and deploying discovered knowledge • Loop

More Related