100 likes | 219 Views
Active Learning on Spatial Data. Christine Körner Fraunhofer AIS, Uni Bonn. Outline. Active Learning FAW-Project Spatial Data Experiment Outline. Active Learning. Difficult / expensive to obtain labelled data manual preparation of documents for text mining analysis of drugs or molecules
E N D
Active Learning on Spatial Data Christine Körner Fraunhofer AIS, Uni Bonn
Outline • Active Learning • FAW-Project • Spatial Data • Experiment Outline
Active Learning • Difficult / expensive to obtain labelled data • manual preparation of documents for text mining • analysis of drugs or molecules • Active learning strategies actively select which data points to query in order to • minimize the number of training examples for a given classification quality • maximize the quality of results for a given number of data points
Selective Sampling Label? • Which Instance to choose next? Where we • have no data? • perform poorly? • have a low confidence? • expect our model to change? • previously found data that improved quality? ORACLE Instance add to training set
The FAW-Project • FAW: Association to regulate outdoor commercials • Goal: Prediction of traffic frequencies for 82 major German cities • Samples: ~ 400-1500 poster sites measured per city
Data Characteristics, Prediction • street name, • segment ID • speed class • street type • sidewalks • one-way-road • POIs • no. restaurants • no. public buildings • … • spatial coordinates • KNN: • similarity calculated based on scalar attributes and spatial coordinates • applies weights according to (spatial) distance of neighbors
Frequency Nordstraße Riesenweg 2000 1500 1000 500 0 Streets Segments Spatial Data • Spatial Data: • spatial covariance between data points • high autocorrelation and concentrated linkage* on street name bias test accuracy • 1:n relationship between street name and segments • frequencies within one street are alike • here: complete instance space is known (all street segments of a city) *David Jensen, Jennifer Neville: Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners
Active Learning in FAW • Usage: • additional samples at ~50 places per city • KNN needs cross product of street segments with all poster places • Cologne: 50 GB, 5 days • Strategy: • Data density • mean distance of next k neighbors • Model differences • Build Model Tree with predicted frequencies • Disagreement between models?
Experiment Outline Samples • Comparison of accuracy-increase using • Ranking vs Random order of added samples • Alternatives • iterative ranking (reality?, greedy search optimal?) • rank once, remove similar objects (eg: exclude segments of same street, …) • Possible Problems: • KNN not very stable • few samples, Oracle has little choice to provide requested data sets Model Tree KNN Frequencies Training Distance Test Iterations Oracle Ranking for AL
Thank you! • Suggestions • Ideas ? • Questions