220 likes | 379 Views
Date : 21 st of May, 2014. Shri Ramdeo Baba College of Engineering and Management. Classification and Novel Class Detection of Feature Based Stream Data. A Technical Seminar on. Presentation By : Rimjhim Singh. Under the Guidance of: Dr. M.B. Chandak . Contents:.
E N D
Date : 21st of May, 2014. Shri Ramdeo Baba College of Engineering and Management Classification and Novel Class Detection of Feature Based Stream Data. A Technical Seminar on PresentationBy: Rimjhim Singh Under the Guidance of: Dr. M.B. Chandak.
Contents: • Stream Data Classification. • Novel class Detection. • Data Generation. • Training Classifiers. • Steps Involved. • Applications. • Conclusion. • Future Scope.
Stream Data Classification • Stream Data : Sequence of data or packets. • Managing online transactions requires classification of data. • Minimize space and time required. • Dynamic nature of data.
Example: • Intrusion Detection : - On a network, data arriving may also contain attacks, viruses , worms etc. Hence we need to classify them and the cause of their arrival. Here, stream data classification can be used.
Characteristics of Stream Data. • Infinite Length: - Fast and continuous. - Impractical to store. - Incremental learning. • Concept Drift: - Underlying concept of stream changes. - Updations in classifier. - Classifiers must adapt to changes.
Concept Evolution: - New classes evolve in data. - Example: During intrusion detection in network, a new type to attack evolves. • Feature evolution: - New features evolve. - Example: Text streams on Twitter. • Labelling of Data: - Difficult Process. - Data arrives at huge speed.
Novel Class Detection: • Novel class: -Let M be the current ensemble of classification models. A class c is an existing class if at least one of the models Mi in M has been trained with class c. Otherwise, c is a novel class. • Single model or an ensemble of models can be used.
Data Generation: • Chunks of data are created. • Recent chunks are classified. • Labelling is done. • Data is ready for training.
Training a Classifier: • K clusters are built. • Cluster summaries are saved. • Also Known as Pseudopoints. • Summary contains data: - centroid of cluster. - radius of cluster. - frequency of data points.
Properties of Ensemble ‘M’ • Classfication of test instance Xj by Mi: -pseudopoint ‘h’ЄMi , its centroid is closest to Xj, predicted class will be the one with highest frequency in ‘h’. - point is classified by the voting of all models. • Decision Boundary of ‘Mi’: - equal to Union of feature spaces encompassed by pseudo points. Decision Boundary of ‘M’: - equal to union of Mi , where Mi belong to M.
Feature Selection: • Lossy Fixed : - Same feature set is used. • Lossy Local: - Each model or training chunk has its own featue set. • Lossless Homogenizing: - Both model and the incoming instance expand their feature set. - union of the feature sets is performed . - best technique.
Steps Involved in Classification and Novel Class Detection: • Outlier Detection using Adaptive Threshold. • Novel Class Detection. • Simultaneous Novel Class Detection.
Outlier Detection Using Adaptive Threshold: • Check whether the instance is Outlier. - F_outlier or Outlier. • Adaptive Threshold is used. • Lesser False Alarm Rate: -Marginal False-Novel Instance. -Marginal False-Existing Instance.
Novel Class Detection: • F_outliers occur due to 3 reasons: -Noise, concept drift or concept evolution. • Get F_outliers occurring due to concept evolution. • Here we need to calculate: - Distance between Outlier and existing class pseudopoint. - Cohesion between different outliers in buffer.
Simultaneous Multi Class Detection: • Possibility of occurrence of multiple novel classes simultaneously. • Principle: -Cohesion between instances of same class should be high. -Distance between instances of different classes shoud be more. • Graphs are used. • Two Phases: 1. Separation phase. 2. Merging Phase.
Applications: • Network security. • Social Media. • Credit Card Frauds etc.
Problem Definition: • To classify and detect Novel Classes in feature based stream data using some tool in more efficient way.
Conclusion: • Majority of the algorithms used for “Classification and Detection of novel Classes” suffer from either feature-evolution or False alarm rate. • The methodology adapts properly to normal concept-drifts, but for handling abrupt drifts it takes time. • Multiple novel classes are generated and separated efficiently.
Future Scope: • Work can be done on making the cluster size dynamic and adaptive. • Work can be done on handling abrupt drift efficiently. • If existing class is divided into two, then work can be done on judging whether they have same feature space, or whether they are novel or not.
References: • M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Classification and Novel Class Detection in Feature Based Sream Data,” IEEE Trans. Knowledge andData Eng, vol. 25, no. 7, July 2013. • M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints,” IEEE Trans. Knowledge andData Eng,vol. 23, no. 6, pp. 859-874, June 2011. • M.M. Masud, Q. Chen, L. Khan, C. Aggarwal, J. Gao, J. Han, and B.M. Thuraisingham, “Addressing Concept-Evolution in Concept-Drifting Data Streams,” Proc. IEEE Int’l Conf. Data Mining (ICDM), pp. 929-934, 2010.
References: • A Review of Classification and Novel Class DetectionTechnique of Data Streams by Manish rai, RekhaPandit2 • M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Integrating Novel Class Detection with Classification for Concept-Drifting Data Streams,”IEEE Trans. Knowledge andData Eng, vol. 25, no. 7, July 2009. • M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Classication and Novel Class Detection in Data Streams with Active Mining,”.