260 likes | 958 Views
Learning under concept drift: an overview. Zhimin He iTechs – ISCAS 2013-03-21. Agenda. What’s Concept Drift Causes of a Concept Drift Types of Concept Drift Detecting and Handling Concept Drift Implications for Software Engineering Research. Definitions. Prediction
E N D
Learning under concept drift: an overview Zhimin He iTechs – ISCAS 2013-03-21
Agenda • What’s Concept Drift • Causes of a Concept Drift • Types of Concept Drift • Detecting and Handling Concept Drift • Implications for Software Engineering Research
Definitions • Prediction • is a vector in p-dimensional feature space observed at time tand ytis the corresponding label. • We call Xtan instanceand a pair (Xt; yt) a labeled instance. We refer to instances (X1; : : : ;Xt) as historical data and instance Xt+1as target (or testing) instance. • The task is to predict a label yt+1 for the target instance Xt+1.
Definitions(cont.) • Concept Drift • Every instance Xtis generated by a source St. • If all the data is sampled from the same source, i.e. S1 = S2 = : : : = St+1 = S we say that the concept is stable. • If for any two time points i and j Si != Sj, we say that there is a concept drift.
Causes of Concept Drift • Let is an instance in p-dimensional feature space. , where c1, c2,….ck is the set of class labels. • The optimal classier to classify is determined by a prior probabilities for the classes P(ci) and the class-conditional probability density functions p(X | ci), i = 1,….k. • Concept /data source: • a set of a prior probabilities of the classes and class-conditional pdf's:
Causes of Concept Drift (cont.) • Concept drift may occur in three ways: • Class priors P(c) might change over time. • The distributions of one or several classes p(X|ci) might change. (virtual drift) • The posterior distributions of the class memberships p(ci|X) might change.(real drift)
Types of Concept Drift • Types: • Sudden drift • Gradual drift • Incremental drift • reoccurring contexts
Detecting and Handling Concept Drift • Detecting • Monitoring the raw data • Monitoring parameters of learners • Monitoring prediction errors of learners • Handling • Ensemble learning • Instance selection • Instance weights • Training windows • Training windows are naturally suitable for sudden concept drift, while ensembles are more flexible in terms of change type.
Detecting and Handling Concept Drift (cont.) • Overall solution for learning under concept drift
Implications for SE Research • Concept drift is a fundamental issue for SE predictions • Cost estimation, defect prediction… • Especially in the cross-company/cross-project context • Be harmful to performance of prediction models • Detecting and handling concept drift is a challenging task! • Quality problems of SE data, e.g., insufficient data • Data generation context is highly unstable. • Has become a increasingly popular research topic in SE field! • E.g., BurakTurhan [JESE 2012], JayalathEkanayake [MSR 2009, JESE 2011]
References • IndreZliobaite, “Learning under Concept Drift : an Overview,” Tech-report, 2009 • A. Dries and R. Ulrich, “Adaptive Concept Drift Detection,” Journal of Statictical Analysis and Data Mining, 2009 • L. Minku, A. White, and X. Yao. “The impact of diversity on on-line ensemble learning in the presence of concept drift.” IEEE Transactions on Knowledge and Data Engineering, 2009 • M. Kelly, D. Hand, and N. Adams. “The impact of changing populations on classier performance.” KDD,1999