1 / 22

Classification and Novel Class Detection of Feature Based Stream Data.

Date : 21 st of May, 2014. Shri Ramdeo Baba College of Engineering and Management. Classification and Novel Class Detection of Feature Based Stream Data. A Technical Seminar on. Presentation By : Rimjhim Singh. Under the Guidance of: Dr. M.B. Chandak . Contents:.

lonato
Download Presentation

Classification and Novel Class Detection of Feature Based Stream Data.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Date : 21st of May, 2014. Shri Ramdeo Baba College of Engineering and Management Classification and Novel Class Detection of Feature Based Stream Data. A Technical Seminar on PresentationBy: Rimjhim Singh Under the Guidance of: Dr. M.B. Chandak.

  2. Contents: • Stream Data Classification. • Novel class Detection. • Data Generation. • Training Classifiers. • Steps Involved. • Applications. • Conclusion. • Future Scope.

  3. Stream Data Classification • Stream Data : Sequence of data or packets. • Managing online transactions requires classification of data. • Minimize space and time required. • Dynamic nature of data.

  4. Example: • Intrusion Detection : - On a network, data arriving may also contain attacks, viruses , worms etc. Hence we need to classify them and the cause of their arrival. Here, stream data classification can be used.

  5. Characteristics of Stream Data. • Infinite Length: - Fast and continuous. - Impractical to store. - Incremental learning. • Concept Drift: - Underlying concept of stream changes. - Updations in classifier. - Classifiers must adapt to changes.

  6. Concept Evolution: - New classes evolve in data. - Example: During intrusion detection in network, a new type to attack evolves. • Feature evolution: - New features evolve. - Example: Text streams on Twitter. • Labelling of Data: - Difficult Process. - Data arrives at huge speed.

  7. Novel Class Detection: • Novel class: -Let M be the current ensemble of classification models. A class c is an existing class if at least one of the models Mi in M has been trained with class c. Otherwise, c is a novel class. • Single model or an ensemble of models can be used.

  8. Data Generation: • Chunks of data are created. • Recent chunks are classified. • Labelling is done. • Data is ready for training.

  9. Training a Classifier: • K clusters are built. • Cluster summaries are saved. • Also Known as Pseudopoints. • Summary contains data: - centroid of cluster. - radius of cluster. - frequency of data points.

  10. Properties of Ensemble ‘M’ • Classfication of test instance Xj by Mi: -pseudopoint ‘h’ЄMi , its centroid is closest to Xj, predicted class will be the one with highest frequency in ‘h’. - point is classified by the voting of all models. • Decision Boundary of ‘Mi’: - equal to Union of feature spaces encompassed by pseudo points. Decision Boundary of ‘M’: - equal to union of Mi , where Mi belong to M.

  11. Feature Selection: • Lossy Fixed : - Same feature set is used. • Lossy Local: - Each model or training chunk has its own featue set. • Lossless Homogenizing: - Both model and the incoming instance expand their feature set. - union of the feature sets is performed . - best technique.

  12. Steps Involved in Classification and Novel Class Detection: • Outlier Detection using Adaptive Threshold. • Novel Class Detection. • Simultaneous Novel Class Detection.

  13. Outlier Detection Using Adaptive Threshold: • Check whether the instance is Outlier. - F_outlier or Outlier. • Adaptive Threshold is used. • Lesser False Alarm Rate: -Marginal False-Novel Instance. -Marginal False-Existing Instance.

  14. Novel Class Detection: • F_outliers occur due to 3 reasons: -Noise, concept drift or concept evolution. • Get F_outliers occurring due to concept evolution. • Here we need to calculate: - Distance between Outlier and existing class pseudopoint. - Cohesion between different outliers in buffer.

  15. Simultaneous Multi Class Detection: • Possibility of occurrence of multiple novel classes simultaneously. • Principle: -Cohesion between instances of same class should be high. -Distance between instances of different classes shoud be more. • Graphs are used. • Two Phases: 1. Separation phase. 2. Merging Phase.

  16. Applications: • Network security. • Social Media. • Credit Card Frauds etc.

  17. Problem Definition: • To classify and detect Novel Classes in feature based stream data using some tool in more efficient way.

  18. Conclusion: • Majority of the algorithms used for “Classification and Detection of novel Classes” suffer from either feature-evolution or False alarm rate. • The methodology adapts properly to normal concept-drifts, but for handling abrupt drifts it takes time. • Multiple novel classes are generated and separated efficiently.

  19. Future Scope: • Work can be done on making the cluster size dynamic and adaptive. • Work can be done on handling abrupt drift efficiently. • If existing class is divided into two, then work can be done on judging whether they have same feature space, or whether they are novel or not.

  20. References: • M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Classification and Novel Class Detection in Feature Based Sream Data,” IEEE Trans. Knowledge andData Eng, vol. 25, no. 7, July 2013. • M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints,” IEEE Trans. Knowledge andData Eng,vol. 23, no. 6, pp. 859-874, June 2011. • M.M. Masud, Q. Chen, L. Khan, C. Aggarwal, J. Gao, J. Han, and B.M. Thuraisingham, “Addressing Concept-Evolution in Concept-Drifting Data Streams,” Proc. IEEE Int’l Conf. Data Mining (ICDM), pp. 929-934, 2010.

  21. References: • A Review of Classification and Novel Class DetectionTechnique of Data Streams by Manish rai, RekhaPandit2 • M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Integrating Novel Class Detection with Classification for Concept-Drifting Data Streams,”IEEE Trans. Knowledge andData Eng, vol. 25, no. 7, July 2009. • M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Classication and Novel Class Detection in Data Streams with Active Mining,”.

  22. THANK YOU

More Related