Dynamic Integration of Virtual Predictors

Dynamic Integration of Virtual Predictors Vagan Terziyan University of Jyvaskyla, Finland e-mail: vagan@it.jyu.fi http://www.cs.jyu.fi/ai/vagan/index.html

Discovering Knowledge from Data - one of the basic abilities of an intelligent agent Data Knowledge

Basic Reference Terziyan V., Dynamic Integration of Virtual Predictors, In: L.I. Kuncheva, F. Steimann, C. Haefke, M. Aladjem, V. Novak (Eds), Proceedings of the International ICSC Congress on Computational Intelligence: Methods and Applications- CIMA'2001, Bangor, Wales, UK, June 19 - 22, 2001, ICSC Academic Press, Canada/The Netherlands, pp. 463-469.

Acknowledgements Academy of Finland Project (1999): Dynamic Integration of Classification Algorithms Information Technology Research Institute (University of Jyvaskyla): Customer-oriented research and development in Information Technology http://www.titu.jyu.fi/eindex.html Multimeetmobile (MMM) Project (2000-2001): Location-Based Service System and Transaction Management in Mobile Electronic Commerce http://www.cs.jyu.fi/~mmm

Contents • The problem • Virtual Predictor • Classification Team • Team Direction • Dynamic Selection of Classification Team • Implementation for Mobile e-Commerce • Conclusion

The Problem:Knowledge Discovery • Knowledge discovery in databases (KDD) is a combination of data warehousing, decision support, and data mining and it is an innovative new approach to information management. • KDD is an emerging area that considers the process of finding previously unknown and potentially interesting patterns and relations in large databases*. • __________________________________________________________________________________________________________________________________* Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R., Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1996.

Classification Problem m classes, n training observations, p object features Given: n training pairs (xi, yi) with xiÎRp and yiÎ{1,…,m} denoting class membership Goal: given: new x0 select classifier for x0 predict class y0

The Research Problem During the past several years, in a variety of application domains, researchers in machine learning, computational learning theory, pattern recognition and statistics have tried to combine efforts to learn how to create and combine an ensemble of classifiers. The primary goal of combining several classifiers is to obtain a more accurate prediction than can be obtained from any single classifier alone.

Approaches to Integrate Multiple Classifiers Integrating Multiple Classifiers Combination Selection Decontextualization Local (“Virtual” Classifier) Local(Dynamic) Global (Static) Global (Voting-Type)

Inductive learning with integration of predictors Learning Environment Sample Instances Predictors/Classifiers P1 P2 ... Pn yt

Virtual Classifier Virtual Classifier is a group of seven cooperative agents: TC - Team Collector TM - Training Manager TP - Team Predictor TI - Team Integrator FS - Feature Selector DE - Distance Evaluator CL - Classification Processor

Classification Team: Feature Selector FS - Feature Selector

Feature Selection • Feature selection methods try to pick a subset of features that are relevant to the target concept; • Each of these methods has its strengths and weaknesses based on data types and domain characteristics; • The choice of a feature selection method depends on various data set characteristics: (i) data types, (ii) data size, and (iii) noise.

Classification of feature selection methods [Dash and Liu, 1997]

Feature Selector:to find the minimally sized feature subset that is sufficient for correct classification of the instance Sample Instances Sample Instances Feature Selector

Classification Team: Distance Evaluator DE - Distance Evaluator

Use of distance evaluation • Distance between instances is useful to recognize nearest neighborhood of any classified instance; • Distance between classes is useful to define the misclassification error; • Distance between classifiers is useful to evaluate weights of every classifier for their further integration.

Well known distance functions [ Wilson & Martinez 1997]

Distance between Two Instances with Heterogeneous Attributes (e.g. Profiles) where: d (“red”, “yellow”) = 1 d (15°, 25°) = 10°/((+50°)-(-50°)) = 0.1

Distance Evaluator:to measure distance between instances based on their numerical or nominal attribute values Distance Evaluator

Classification Team: Classification Processor CL - Classification Processor

Classification Processor: to predict class for a new instance based on its selected features and its location relatively to sample instances Feature Selector Sample Instances Classification Processor Distance Evaluator

Team Instructors:Team Collector TC - Team Collector completes Classification Teams for training

Team Collector - completes classification teams for future training Distance Evaluation functions Classification rules Feature Selection methods Team Collector FSi DEj CLk

Team Instructors:Training Manager TM - Training Manager trains all completed teams on sample instances

Training Manager - trains all completed teams on sample instances Training Manager CLk1 FSi1 DEj1 Sample Instances Sample Metadata CLk2 FSi2 DEj2 CLkn FSin DEjn Classification Teams

Team Instructors:Team Predictor TP - Team Predictor predicts weights for every classification team in certain location

Team Predictor - predicts weights for every classification team in certain location Predicted weights of classification teams Location Team Predictor: e.g. WNN algorithm Sample Metadata

Team Predictor - predicts weights for every classification team in certain location Sample metadata NN2 <w21, w22,…, w2n> <w31, w32,…, w3n> NN3 d2 NN1 <w11, w12,…, w1n> d3 d1 Pi dmax <wi1, wi2,…, win> NN4 <w41, w42,…, w4n> wij= F(w1j, d1, w2j, d2, w3j, d3, dmax)

Weighting Neighbors Example

Team Prediction: Locality assumption Each team has certain subdomains in the space of instance attributes, where it is more reliable than the others; This assumption is supported by the experiences, that classifiers usually work well not only in certain points of the domain space, but in certain subareas of the domain space [Quinlan, 1993]; If a team does not work well with the instances near a new instance, then it is quite probable that it will not work well with this new instance also.

Team Instructors:Team Integrator TI - Team Integrator produces classification result for a new instance by integrating appropriate outcomes of learned teams

Team integrator - produces classification result for a new instance by integrating appropriate outcomes of learned teams Weights of classification teams in the location of a new instance New instance CLk1 yt1 FSi1 DEj1 CLk2 yt2 FSi2 DEj2 yt Team Integrator CLkn yt1 FSin DEjn Classification teams

Dynamic Selection of the Team: Penalty Kick Example x4 team selected for the kick x2 x1 x3

Simple case:static or dynamic selection of a classification team from two Assume that we have two different classification teams and they have been learned on a same sample set with n instances. Let the first team classifies correctly m1, and the second one m2sample instances respectively. We consider two possible cases to select the best team for further classification: a static selection case and a dynamic selection case.

Static Selection • Static selection means that we try all teams on a sample set and for further classification select one, which achieved the best classification accuracy among others for the whole sample set. Thus we select a team only once and then use it to classify all new domain instances.

Dynamic Selection • Dynamic selection means that the team is being selected for every new instance separately depending on where this instance is located. If it has been predicted that certain team can better classify this new instance than other teams, then this team is used to classify this new instance. In such case we say that the new instance belongs to the “competence area” of that classification team.

Theorem • The average classification accuracy in the case of (dynamic) selection of a classification team for every instance is expected to be not worse than the one in the case of (static) selection for the whole domain. • The accuracy of these two cases can be equal if and only if : where k is amount of same instances correctly classified by both teams

“Competence areas ” of classification teams in dynamic selection n instances m2 instances m1 instances k instances

M-Commerce LBS systemhttp://www.cs.jyu.fi/~mmm In the framework of the Multi Meet Mobile (MMM) project at the University of Jyväskylä, a LBS pilot system, MMM Location-based Service system (MLS), has been developed. MLS is a general LBS system for mobile users, offering map and navigation across multiple geographically distributed services accompanied with access to location-based information through the map on terminal’s screen. MLS is based on Java, XML and uses dynamic selection of services for customers based on their profile and location.

Positioning Service Geographical, spatial data Mobile network Location-Based Service Personal Trusted Device Location-based data: (1) services database (access history); (2) customers database (profiles) Architecture of LBS system

Positioning systems Cellular network based positioning Satellite-based positioning

Opening a Connection to Location Service

Request and Receive Map from the Location Service

Selecting Point of Intereston the Map

Receiving Information Content Related to Point of Interest

Sample from the location-based services’ access history Mobile customer description Ordered service

Adaptive interface for MLS client Only predicted services, for the customer with known profile and location, will be delivered from MLS and displayed at the mobile terminal screen as clickable “points of interest”

Route-based personalization [Katasonov, 2001] Dynamic Perspective Static Perspective

Conclusion • Knowledge discovery with an ensemble of classifiers is known to be more accurate than with any classifier alone [e.g. Dietterich, 1997]. • If a classifier somehow consists of certain feature selection algorithm, distance evaluation function and classification rule, then why not to consider these parts also as ensembles making a classifier itself more flexible? • We expect that classification teams completed from different feature selection, distance evaluation, and classification methods will be more accurate than any ensemble of known classifiers alone, and we focus our research and implementation on this assumption.

Dynamic Integration of Virtual Predictors