60 likes | 240 Views
CS240B Spring 2014: Work Plan. Monday, April 1—First Week Monday 14 1 st homework WD 24: Homework 2. WD 1st: Homework 3 May 5 (Monday 6 th week): MIDTERM May 6: Project assigned. May 14 Students ’ presentations begin May 19: Project due. June 4. Last day of class.
E N D
CS240B Spring 2014: Work Plan Monday, April 1—First Week Monday 14 1st homework WD 24: Homework 2. WD 1st: Homework 3 May 5 (Monday 6th week): MIDTERM May 6: Project assigned. May 14 Students’ presentations begin May 19: Project due. June 4. Last day of class. June 13: Deadline for turning in the final—take home project and report. * max cumulative delay on projects and report must not exceed 3 days. 10% penalty for each extra day. Day
CS 240B—Spring 2014 Grade Basis • 3 Homeworks: 8% each • Project: 16% • Midterm: 28% • Presentation: 18% • Final Project and Report: 30%.
CS240B: Presentations & Final Project • The Presentation: • Teams of two students reviewing two/three papers on a topic of their choosing—will share the grade. • Should also propose two or three simple questions that deserve answers. • The Final Project (typically individual) could be: • Survey paper a topic of your choosing (presentation topic OK, but a +Δ of 2 or 3 additional papers is required) • Experimental DSMS project (e.g., advanced application on prototype DSMS), with report. • Research paper developing an original idea (e.g., a new/improved data stream mining algorithm).
Supporting DM Tasks & DM Processesin a DSMS or a CEP System Motivation: Gaining experience with current DSMS and their limitations which make it hard to support KDD applications on data streams. Case Study: Naïve Bayesian Classifiers—arguably the simplest mining algorithm, which is doable in SQL/DBMS. Thus the question is: can we support it using a DSMS and its SQL-like query languages? A slightly more general question is whether the NBC can be supported various CEP systems, which claim to be powerful (e.g., support rules). Couldthey be extended to support generic versions of NBC, and perhaps other data stream mining methods?
CS240B Project: Due on Monday, May 19. Download a DSMS or a CEP system of your choice and (after explaining why you have selected this and not the others) explore how you can implement the following tasks: Testing of a Naïve Bayesian Classifier: you can assume that the NBC has already been trained and you can read it from the input, or a DB, a file, or memory. Assume now that you also have a stream of pre-classified samples. Use this to determine the accuracy of your current classifier, at periodic intervals. Output the accuracy, and if this falls below a certain threshold execute the next step. Periodically retrain a new NBC from the stream of pre-classified tuples; then use the newly built classifier to predict the class of unclassified tuples (Step 1). See if you can generalize your software, and e.g., design/develop generic NBCs, ensemble methods, other classifiers, etc. It is understood that the limitations of DSMS and CEP systems will probably prevent you from completing all these tasks (listed in order of increasing difficulty). So, you should make sure that you (1) download a good system, (2) write clear report explaining your efforts, and the reasons that prevented you from going further. (For test sets, see the CS240A project --- http://www.cs.ucla.edu/classes/winter14/cs240A/DMproject.html)
Feedback Questionnaire • CS240B—Spring 2014. Today’s date and Your Name: • Presentation Title: Your Feedback: • On the scale 5(High) to 1(Low) evaluate presentations in the following categories. Please comment. Don't just circle a number! • 1. Clarity in topic description: 5 4 3 2 1 • 2. Organization of the talk: 5 4 3 2 1 • 3. Quality of presentation: 5 4 3 2 1 • 4. How well did you get the main ideas of the talk? 5 4 3 2 1 • 5. Did the speaker(s) make you interested in the topic? 5 4 3 2 1 • 6. Did the speaker raise/pose interesting questions? • 6. List the Strong points: • 7. List the Weak points: • 8. Your suggestions for improvements. You, and your contribution to the discussion will be evaluated too: (i) Remind me of the questions you asked during the discussion (ii) I will later record any answer/comment that you have provided via email.