1 / 41

CS 657/790 Machine Learning and Data Mining Course Introduction

CS 657/790 Machine Learning and Data Mining Course Introduction. Student Survey. Please hand in sheet of paper with: Your name and email address Your classification (eg, 2 nd year computer science PhD student) Your experience with MATLAB (none, some or much)

gaurav
Download Presentation

CS 657/790 Machine Learning and Data Mining Course Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 657/790 Machine Learning andData MiningCourse Introduction

  2. Student Survey • Please hand in sheet of paper with: • Your name and email address • Your classification (eg, 2nd year computer science PhD student) • Your experience with MATLAB (none, some or much) • Your undergraduate degree (when, what, where) • Your AI experience (courses at UWM or elsewhere) • Your programming experience

  3. Course Information • Course Instructor: Joe Bockhorst • email: joebock@uwm.edu • office: 1155 EMS • Course webpage: http://www.uwm.edu/~joebock/790.html • office hours: ??? • Possible times: • before class on Monday (3:30-5:30) • Monday morning • Wednesday morning • after class Monday (7:00-9:00)

  4. Textbook & Reading Assignment • Machine Learning (Tom Mitchell) • Bookstore in union, $140 new • Amazon.com hard cover: $125 new , $80 used • Amazon.com soft cover: < $30 • Read (posted on class web page) • Preface • Chapter 1 • Sections 6.1, 6.2, 6.9, 6.10 • Sections 8.1, 8.2

  5. Powerpoint Vs Whiteboard • Powerpoint encourages words over pictures (not good) • But powerpoint can be saved, tweaked, easily shared, … • Notes posted on course website following lecture • Your thoughts?

  6. Full Disclosure • Slides are a combination of • Jude Shavlik’s notes from UW-Madison machine learning course (Prof. I had) • Textbook Slides (Google “machine learning textbook”) • My notes

  7. Class Email List • Is there one?

  8. Course Outline • 1st half covers supervised learning • Algorithms: support vector machines, neural networks, probabilistic models … • Methodology • 2nd half covers graphical probability models • Powerful statistical models very useful for learning in complex and/or noisy settings

  9. Course "Style" • Primarily algorithmic & experimental • Some theory, both mathematical & conceptual (much on statistics) • "Hands on" experience, interactive lectures/discussions • Broad survey of many ML subfields • "symbolic" (rules, decision trees) • "connectionist" (neural nets) • Support Vector Machines • statistical ("Bayes rule") • genetic algorithms (if time)

  10. Two Major Goals • to understand what a learning system should do • to understand how (and how well) existing systems work

  11. Background Assumed • Programming • Data structures and algorithms • CS 535 • Math • Calculus (partial derivatives) • Simple probability & statistics

  12. Programming Assignments in MATLAB • Why MATLAB? • Fast prototyping • Integrated plotting • Widely used in academia (industry too?) • Will save you time in the long run • Why not MATLAB? • Proprietary software • Harder to work from home • Optional Assignment: familiarize yourself with MATLAB, use MATLAB help system

  13. Student Computer Labs • E256, E280, E285, E384, E270 • All have MATLAB installed under Windows XP

  14. Requirements • Bi-weekly programming plus perhaps some “paper & pencil” homework • "hands on" experience valuable • HW0 – build a dataset • HW1 & HW2 supervised learning algorithms • HW3 & HW4 graphical probability models • Midterm exam (after about 8-10 weeks) • Final exam • Find project of your choosing • during last 4-5 weeks of class

  15. Grading HW's 25% Project 20% Midterm 20% Final 30% Quality Discussion 5%

  16. Late HW's Policy • HW's due @ 4pm • you have 5 late days to use over the semester • (Fri 4pm → Mon 4pm is 1 late "day") • SAVE UP late days! • extensions only for extreme cases • Penalty points after late days exhausted • 10% per day • Can't be more than one week late

  17. Machine Learning Vs Data Mining • Machine Learning: computer algorithms that improve automatically through experience [Mitchell]. • Data Mining: Extracting knowledge from large amounts of data. [Han & Kamber] (synonym: knowledge discovery in databases (KDD))

  18. What’s the difference? Topics in ML and DM texts (Mitchell Vs Han & Kamber) Supervised learning, decision trees, neural nets, Bayesian networks, k-nearest neighbor, genetic algorithms, unsupervised learning (clustering in DM jargon),… reinforcement learning, learning theory, evaluating learning systems, using domain knowledge, inductive logic programming, … Data Warehouse, OLAP, query languages, association rules, presentation, … ML DM We’ll try to cover topics in red

  19. The learning problem • Learning = improving with experience • Example: learn to play checkers • Improve over task T, • with respect to performance measure P, • based on experience E • T: Play Checkers • P: % of games won • E: games played against self

  20. Famous Example: Discovering Genes • T: find genes in DNA sequences • ACGTGCATGTGTGAACGTGTGGGTCTGATGATGT… • P: % of genes found • E: experimentally verified genes * Prediction of Complete Gene Structures in Human Genomic DNA, Burge & Carlin J. Molecular Biology, 1997, 268 78-94

  21. Famous Example 2: Autonomous Vehicles Driving • T: drive vehicle • P: reach destination • E: machine observation of human driver

  22. ML key to winning DARPA Grand Challenge Stanford team won 2005 driverless vehicle race across Mojave Desert “The robot's software system relied predominately on state-of-the-art AI technologies, such as machine learning and probabilistic reasoning.” [Winning the DARPA Grand Challenge, Thrun et al., Journal of Field Robotics, 2006]

  23. Why study machine learning (data mining)? • Data is plentiful • Retail, video, images, speech, text, DNA, bio-medical measurements, … • Computational power is available • Budding Industry • ML has great applications • ML still relatively immature

  24. Next Time: HW0 – Create Your Own Dataset • Think about this • will need to create it by week after next • Google to find: • UCI archive (or UCI KDD archive) • UCI ML archive (UCI machine learning repository)

  25. HW0 – Your “Personal Concept” • Step 1: Choose a Boolean (true/false) concept • Subjective Judgement • Books I like/dislike • Movies I like/dislike • Web pages I like/dislike • “Time will tell” concepts • Stocks to buy • Medical outcomes • Sensory interpretation • Face recognition (See text) • Handwritten digit recognition • Sound recognition

  26. HW0 – Your “Personal Concept” • Step 2: Choosing a feature Space • We will use fixed-length feature vectors • Choose N features • Each feature has Vipossible values • Each example is represented by a vector of N feature values (i.e., is a point in the feature space) e.g.: <red, 50, round> colorweight shape • Feature Types • Boolean • Nominal • Ordered • Hierarchical • Step 3: Collect examples (“I/O” pairs) Defines a space In HW0 we will use a subset (see next slide)

  27. closed polygon continuous square triangle circle ellipse Standard Feature Typesfor representing training examples – source of “domain knowledge” • Nominal • No relationship among possible values e.g., color є {red, blue, green} (vs. color = 1000 Hertz) • Linear (or Ordered) • Possible values of the feature are totally ordered e.g., size є{small, medium, large}←discrete weight є [0…500] ←continuous • Hierarchical • Possible values are partiallyordered in an ISA hierarchy e.g. for shape->

  28. Product Pct Foods Tea 99 Product Classes 2302 Product Subclasses Dried Cat Food Canned Cat Food Friskies Liver, 250g ~30k Products Example Hierarchy (KDD* Journal, Vol 5, No. 1-2, 2001, page 17) • Structure of one feature! • “the need to be able to incorporate hierarchical (knowledge about data types) is shown in every paper.” • - From eds. Intro to special issue (on applications) of KDD journal, Vol 15, 2001 * Officially, “Data Mining and Knowledge Discovery”, Kluwer Publishers

  29. Our Feature Types(for homeworks) • Discrete • tokens (char strings, w/o quote marks and spaces) • Continuous • numbers (int’s or float’s) • If only a few possible values (e.g., 0 & 1) use discrete • i.e., merge nominal and discrete-ordered (or convert discrete-ordered into 1,2,…) • We will ignore hierarchy info and only use the leaf values (it is rare any way)

  30. Today’sTopics • Creating a dataset of • HW0 out on-line • Due next Monday fixed length feature vectors

  31. Digitized camera image Learned Function Steering Angle age = 13 sex = M wgt = 18 Learned Function ill vs healthy Some Famous Examples • Car Steering (Pomerleau) • Medical Diagnosis (Quinlan) • DNA Categorization • TV-pilot rating • Chemical-plant control • Back gammon playing • WWW page scoring • Credit application scoring Medical record

  32. HW0: Creating your dataset • Choose a dataset • based on interest/familiarity • meets basic requirements • >1000 examples • category (function) learned should be binary valued • ~500 examples labeled class A, other 500 labeled class B → Internet Movie Database (IMD)

  33. HW0: Creating your dataset • IMD has a lot of data that are not discrete or continuous or binary-valued for target function (category) Name Country List of movies Name Year of birth Gender Oscar nominations List of movies Studio Actor Name Year of birth List of movies Director/ Producer Made Directed Acted in Produced Movie Title, Genre, Year, Opening Wkend BO receipts, List of actors/actresses, Release season

  34. HW0: Creating your dataset • Choose a boolean or binary-valued target function (category) • Opening weekend box office receipts > $2 million • Movie is drama? (action, sci-fi,…) • Movies I like/dislike (e.g. Tivo)

  35. HW0: Creating your dataset • How to transfer available attributes: Other example attributes (select predictive features) • Movie • Average age of actors • Number of producers • Percent female actors • Studio • Number of movies made • Average movie gross • Percent movies released in US

  36. HW0: Creating your dataset • Director/Producer • Years of experience • Most prevalent genre • Number of award winning movies • Average movie gross • Actor • Gender • Has previous Oscar award or nominations • Most prevalent genre

  37. HW0: Creating your dataset David Jensen’s group at UMass used Naïve Bayes (NB) to predict the following based on attributes they selected and a novel way of sampling from the data: • Opening weekend box office receipts > $2 million • 25 attributes • Accuracy = 83.3% • Default accuracy = 56% • Movie is drama? • 12 attributes • Accuracy = 71.9% • Default accuracy = 51% • http://kdl.cs.umass.edu/proximity/about.html

  38. What Do You Think Machine Learning Means?

  39. What is Learning? Learning denotes changes in the system that … enable the system to do the same task … more effectively the next time. - Herbert Simon Learning is making useful changes in our minds. - Marvin Minsky

  40. Not in Mitchell’s textbook (will spend 0-2 lectures on this – but also in CS776) Major Paradigms of Machine Learning • Inducing Functions from I/O Pairs • Decision trees (e.g., Quinlan’s C4.5 [1993]) • Connectionism / neural networks (e.g., backprop) • Nearest-neighbor methods • Genetic algorithms • SVM’s • Learning without a Teacher • Conceptual clustering • Self-organizing systems • Discovery systems

  41. Will be covered briefly Major Paradigms of Machine Learning • Improving a Multi-Step Problem Solver • Explanation-based learning • Reinforcement learning • Using Preexisting Domain Knowledge Inductively • Analogical learning • Case-based reasoning • Inductive/explanatory hybrids

More Related