20 likes | 132 Views
Supporting DM Tasks & DM Processes in a DSMS or a CEP System. Motivation: Gaining experience with current DSMS and their limitations which make it hard to support KDD applications on data streams.
E N D
Supporting DM Tasks & DM Processesin a DSMS or a CEP System • Motivation: Gaining experience with current DSMS and their limitations which make it hard to support KDD applications on data streams. • Case Study: Naïve Bayesian Classifiers—arguably the simplest mining algorithm, which is doable in SQL/DBMS. Thus the question is: can we support it using a DSMS and its SQL-like query languages? • A slightly more general question is whether the NBC can be supported various CEP systems, which claim to be powerful (e.g., support rules). Couldthey be extended to support generic versions of NBC, and perhaps other data stream mining methods?
CS240B Project: Due on Monday, May 19. Download a DSMS or a CEP system of your choice and (after explaining why you have selected this and not the others) explore how you can implement the following tasks: • Testing of a Naïve Bayesian Classifier: you can assume that the NBC has already been trained and you can read it from the input, or a DB, a file, or memory. • Assume now that you also have a stream of pre-classified samples. Use this to determine the accuracy of your current classifier, at periodic intervals. Output the accuracy, and if this falls below a certain threshold execute the next step. • Periodically retrain a new NBC from the stream of pre-classified tuples; then use the newly built classifier to predict the class of unclassified tuples (Step 1). • See if you can generalize your software, and e.g., design/develop generic NBCs, ensemble methods, other classifiers, etc. It is understood that the limitations of DSMS and CEP systems will probably prevent you from completing all these tasks (listed in order of increasing difficulty). So, you should make sure that you (1) download a good system, (2) write clear report explaining your efforts, and the reasons that prevented you from going further. (For test sets, see the CS240A project --- http://www.cs.ucla.edu/classes/winter14/cs240A/DMproject.html)