270 likes | 393 Views
Enabling Fast Prediction for Ensemble Models on Data Streams. Peng Zhang, Jun Li, Peng Wang Byron Gao, Xingquan Zhu, Li Guo Information Security Research Center, Institute of Computing Technology, Chinese Academy of Sciences, China Texas State University-San Marocs, USA
E N D
Enabling Fast Prediction for Ensemble Models on Data Streams Peng Zhang, Jun Li, Peng Wang Byron Gao, Xingquan Zhu, Li Guo Information Security Research Center, Institute of Computing Technology, Chinese Academy of Sciences, China Texas State University-San Marocs, USA University of Technology, Syndey, Australia
Outline • Motivation • Solutions • Experiments • Related work • Conclusion
Outline • Motivation • Solutions • Experiments • Related work • Conclusion
Data Stream Classification • Information Security • Applications: Intrusion detection, spam filtering, web page stream monitoring • Example 1. Web page stream monitoring • Two requirements • Classify each incoming web page as accurate as possible. • Classify each incoming web page as fast as possible.
Ensemble Models • Technique • Use a divide-and-conquer manner to build models from continuous data streams • Merits • Scale well, adapt quickly to new concepts, low variance errors, and ease of parallelization. • Family • Ensemble weighted classifiers, incremental classifiers, both classifiers and clusters
Limitation of Ensembles • Low Prediction efficiency. • Prediction usually takes linear time complexity. • Previous works combine a small number of base classifiers in ensemble. • Example 2: Prediction time of ensemble models 830 Processors for 50 classifier ensembles !
Outline • Motivation • Solutions • Experiments • Related work • Conclusion
Solutions • Convert ensemble into spatial database • Design a new Ensemble-Tree indexing structure • Prediction becomes search over Ensemble-Tree • Build and update ensemble becomes insertion and deletion. Our solutions
Spatial Indexing for Ensemble • How to convert ensemble into spatial database ? • Example 3: Convert ensemble into spatial database
Spatial Indexing for Ensemble • Why convert ensemble to spatial database ? • Trade space for time. • Example 4. linear vs. sub-linear prediction (b) Spatial database (a) Linear prediction (c) Sub-linear prediction
Ensemble-Tree (E-Tree) Comparisons (classifier_id, weight, pointer) (I, classifier_id, sibling) Parameters M, m • E-Tree is designed for binary classification. • All decision rules are from one class, which is called target class.
Operations on E-Tree • Search • Goal: Predict the label of each incoming stream record x, • Step 1: depth-first traverse to find entries in leaf covering x (i.e., u entries) • Step 2: calculate x’s class label • Example 5. Search • Consider a new record x = (4, 4). • E-Tree achieve 20% improvement in predicting x’s class label.
Operations on E-Tree • Insertion • Goal: Integrates new classifiers into ensemble • Step 1: Insert the classifier into the table structure • Step 2: For each rule R in the classifier • search the leaf node that contains R • If the leaf node is full, split the node; otherwise, add in directly. • Repeat until all nodes meet the [m, M] constraint. • Example 6. Insertion • We insert the five classifiers one by one. Suppose parameters M=3, and m=1.
Operations on E-Tree • Deletion • Goal: Discards outdated classifiers from ensemble • Step 1: Delete an outdated classifier from the table. • Step 2: Delete all entries associated with the classifier • Step 3: After deletion, if a node has less than m entries, then delete the leaf node, and re-insert into the tree. • Step 4: Repeat until all nodes meet the [m, M] constraint.
Ensemble Learning with E-Trees • The architecture • online prediction: search operation • Sideway training and updating: Insertion and deletion operations
Theoretical Analysis of E-Trees • For each incoming record x, the ideal situation is that all target decision rules that cover x are located in one single leaf. (1) What's the probability of x's target decision rules being located in one single leaf node? (2) In the ideal case, what’s the worst prediction time? • The worst case is that the target decision rules are uniformly distributed across m leaf nodes. (3) In the worst case, what’s the worst prediction time?
Theoretical Analysis of E-Trees Sublinear time
Outline • Motivation • Solutions • Experiments • Related work • Conclusion
Experiments • Data sets • Synthetic data • Real world data • Comparison methods • Global E-Tree, Local E-Tree • Global-Ensemble, Local-Ensemble Decision space 1 0 1
Experiments • Prediction time
Experiments • Memory cost
Experiments • Four methods comparisons • LE-tree performs better than L-Ensemble with respect to prediction time. • GE-tree performs better than G-Ensemble with respect to prediction time. • LE-tree achieves the same prediction accuracy as L-Ensemble. • GE-tree achieves the same prediction accuracy as G-Ensemble. E-Trees can reduce Ensemble model’s prediction time, meanwhile, E-Tree can achieve the same prediction accuracy as ensemble.
Outline • Motivation • Solutions • Experiments • Related work • Conclusion
Related Work • Stream classification • Incremental learning, ensemble learning • Stream indexing • Boolean expression indexing in Publish/subscribe systems • Spatial indexing • R-Tree, R*-Tree, R+-Tree, X-Tree, … • Ensemble Pruning • Concept drifting
Outline • Motivation • Solutions • Experiments • Related work • Conclusion
Conclusion • Contributions • Identify and address the prediction efficiency problem for ensemble models on data streams • Convert ensemble model to spatial database, and propose a new type of spatial indexing Ensemble-Tree to achieve sub-linear prediction time. • Future work • For time-critical data mining tasks. • Index SVM and other classification models.
Source codehttp://streammining.org Thank you ! Questions?