1 / 13

Enhanced Situation Space Mining for Data Streams

Data Stream Mining Assignment 2. Enhanced Situation Space Mining for Data Streams. Rishabh Upadhyay Fr. Conceicao Rodrigues College of Engineering Mumbai, India uhrishabh@gmail.com. Sivan Toledo Tel-Aviv University Tel-Aviv, Israel stoledo@tau.ac.il. Yisroel Mirsky

abdulk
Download Presentation

Enhanced Situation Space Mining for Data Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Stream Mining Assignment 2 Enhanced Situation Space Mining for Data Streams Rishabh Upadhyay Fr. Conceicao Rodrigues College of Engineering Mumbai, India uhrishabh@gmail.com Sivan Toledo Tel-Aviv University Tel-Aviv, Israel stoledo@tau.ac.il Yisroel Mirsky Ben-Gurion University Beer-Sheva, Israel yisroel@post.bgu.ac.il Tal Halpern Tel-Aviv University Tel-Aviv, Israel talhalpern10@gmail.com Yuval Elovici Ben-Gurion University Beer-Sheva, Israel elovici@bgu.ac.il Presented by Pooja Joshi Waikato University Hamiltion, New Zealand pj60@students.Waikato.ac.nz

  2. Abstract • pcStream • Algorithm to extract the knowledge of present situation from data stream. • It is a machine learning algorithm for finding context or concepts in a numerical stream in an unsupervised manner. • Drawback of pcStream • Complexity due to Principle Component Analysis (PCA) • Situation overlap • pcStream2 – variant of pcStream • Incremental PCA (IPCA) to reduce complexity and memory requirement • Just-In-Time PCA`- algorithm to implement IPCA

  3. Introduction • Context Space Theory (CST) • CST is applied to get the actor’s situation from the given data stream. • CST is used in many context-aware applications • What is context? • Point in n-dimensional space(context space) • Drawbacks of CST • Define situation space manually • Situation space minning is difficult, • data streams are unbounded • subject to concept drift Figure 1 - An illustration of a context domain for activity recognition consisting of two situation spaces: walking and running (c1 and c2).

  4. PCStream Pseudo code for pcStream • Step 1 – When an instance(X) arrives, compute statistical similarity(Mahalanobis distance) between X and each known context • Step 2 – If X is within the distribution, assign X to the context • Else If X doesn't fit to any context, assign X to buffer B for time tmin • If any observation after X is not placed in B for tmin, • B is labeled as noise and is emptied • Else, X is assigned to current distribution • If B is full, it can be said that a new situation space is found and the content of B is emptied Algorithm 1 - pcStream algorithm

  5. Drawbacks of pcStream algorithm • Detecting new overlapping situation spaces • Similarity score –Mahalanobis distance • Algorithm complexity • Uses Principle Component Analysis(PCA) • Algorithm complexity O(n3) • Issues overcome by pcStream2 • Windowing – overcome overlapping situation space. • Incremental PCA(JIT-PCA) instead of PCA Figure 3 - An illustration of the issues with detecting overlapping situation spaces from a data stream generated from smartphone accelerometer. Here the ground truth is activity recognition.

  6. PCStream2 • 2 changes • Persistence • Before assign X to its closest context, first check d(X)<threshold. • Windowing • Consider latest instances in tmin observations. • 3 stages • Push • When instance(X) arrives, it is pushed in buffer B. • Process • When X pops out as, |B|> tmin,process X • Detect • When B is full, we check for any evolving new situation. Algorithm 2 – pcStream2 algorithm

  7. PCStream & IPCA • Implementing IPCA over PCA has following advantages • Reduced Complexity • Reduced Memory Consumption • Damped Window

  8. JIT-PCA • JIT-PCA is a heuristic randomized incremental PCA. • Implements ideas from previous literature to build effective and fast algorithm • QR-based update formula • Least updating cost when delta=0 • Randomize sketching algorithms • Compute total mass and average • Compute the probability to decide to use orthogonal part in update model • Relaxation mode – wait till the update is stabilized. Algorithm 3 – pcStream2 algorithm

  9. Experimental Results Parameters used for valuation Table 2: The parameters used in the evaluations over each dataset. Dataset used for evaluation Table 1: Summaries of the three datasets used for the evaluations.

  10. Experimental Results - Ctnd 2 Parameter selection robustness 1 Adjusted Rand Index - ARI Figure 4: The resulting ARI for every parameter selection for both pcStream and pcStream2 over the SherLock dataset. Figure 3: The best ARI achieved by pcStream and pcStream2 (left), pcStream with PCA and JIT-PCA (right), for eac dataset.

  11. Experimental Results - Ctnd 2 Runtime evaluation (PCA v/s JIT-PCA) 1 Accuracy and runtime (PCA v/s IPCA) Figure 5: A comparison of accuracy (top row) and runtime (bottom row) when using PCA and IPCA with different pcStream parameters over the SCA dataset. Figure 6: The runtimes of pcStream with PCA and JIT-PCA for each dataset.

  12. Experimental Results - Ctnd 1 Feature evaluation (PCA v/s JIT-PCA) on KDD dataset Figure 7: The affect the number of dimensions have on pc- Stream’s runtime on the KDD dataset with PCA and JIT-PCA respectively. The bars represent the standard deviation.

  13. Questions?

More Related