1 / 2

Supporting DM Tasks & DM Processes in a DSMS or a CEP System

Supporting DM Tasks & DM Processes in a DSMS or a CEP System. Motivation: Gaining experience with current DSMS and their limitations which make it hard to support KDD applications on data streams.

evan
Download Presentation

Supporting DM Tasks & DM Processes in a DSMS or a CEP System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Supporting DM Tasks & DM Processesin a DSMS or a CEP System • Motivation: Gaining experience with current DSMS and their limitations which make it hard to support KDD applications on data streams. • Case Study: Naïve Bayesian Classifiers—arguably the simplest mining algorithm, which is doable in SQL/DBMS. Thus the question is: can we support it using a DSMS and its SQL-like query languages? • A slightly more general question is whether the NBC can be supported various CEP systems, which claim to be powerful (e.g., support rules). Couldthey be extended to support generic versions of NBC, and perhaps other data stream mining methods?

  2. CS240B Project: Due on Monday, May 19. Download a DSMS or a CEP system of your choice and (after explaining why you have selected this and not the others) explore how you can implement the following tasks: • Testing of a Naïve Bayesian Classifier: you can assume that the NBC has already been trained and you can read it from the input, or a DB, a file, or memory. •   Assume now that you also have a stream of pre-classified samples. Use this to determine the accuracy of your current classifier, at periodic intervals. Output the accuracy, and if this falls below a certain threshold execute the next step. • Periodically retrain a new NBC from the stream of pre-classified tuples; then use the newly built classifier to predict the class of unclassified tuples (Step 1). • See if you can generalize your software, and e.g., design/develop generic NBCs, ensemble methods, other classifiers, etc. It is understood that the limitations of DSMS and CEP systems will probably prevent you from completing all these tasks (listed in order of increasing difficulty). So, you should make sure that you (1) download a good system, (2) write clear report explaining your efforts, and the reasons that prevented you from going further. (For test sets, see the CS240A project --- http://www.cs.ucla.edu/classes/winter14/cs240A/DMproject.html)

More Related