Active Learning on Spatial Data

Active Learning on Spatial Data Christine Körner Fraunhofer AIS, Uni Bonn

Outline • Active Learning • FAW-Project • Spatial Data • Experiment Outline

Active Learning • Difficult / expensive to obtain labelled data • manual preparation of documents for text mining • analysis of drugs or molecules • Active learning strategies actively select which data points to query in order to • minimize the number of training examples for a given classification quality • maximize the quality of results for a given number of data points

Selective Sampling Label? • Which Instance to choose next? Where we • have no data? • perform poorly? • have a low confidence? • expect our model to change? • previously found data that improved quality? ORACLE Instance add to training set

The FAW-Project • FAW: Association to regulate outdoor commercials • Goal: Prediction of traffic frequencies for 82 major German cities • Samples: ~ 400-1500 poster sites measured per city

Data Characteristics, Prediction • street name, • segment ID • speed class • street type • sidewalks • one-way-road • POIs • no. restaurants • no. public buildings • … • spatial coordinates • KNN: • similarity calculated based on scalar attributes and spatial coordinates • applies weights according to (spatial) distance of neighbors

Frequency Nordstraße Riesenweg 2000 1500 1000 500 0 Streets Segments Spatial Data • Spatial Data: • spatial covariance between data points • high autocorrelation and concentrated linkage* on street name bias test accuracy • 1:n relationship between street name and segments • frequencies within one street are alike • here: complete instance space is known (all street segments of a city) *David Jensen, Jennifer Neville: Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners

Active Learning in FAW • Usage: • additional samples at ~50 places per city • KNN needs cross product of street segments with all poster places • Cologne: 50 GB, 5 days • Strategy: • Data density • mean distance of next k neighbors • Model differences • Build Model Tree with predicted frequencies • Disagreement between models?

Experiment Outline Samples • Comparison of accuracy-increase using • Ranking vs Random order of added samples • Alternatives • iterative ranking (reality?, greedy search optimal?) • rank once, remove similar objects (eg: exclude segments of same street, …) • Possible Problems: • KNN not very stable • few samples, Oracle has little choice to provide requested data sets Model Tree KNN Frequencies Training Distance Test Iterations Oracle Ranking for AL

Thank you! • Suggestions • Ideas ? • Questions

Active Learning on Spatial Data

Active Learning on Spatial Data

Presentation Transcript

Active Learning

Active Learning

Spatial Data

Spatial Data

Active learning

Active Learning

Active Learning

Active Learning

Active Learning

Active Learning

Active Learning

Active Learning

Active Learning

Active Learning

AN ACTIVE LEARNING ASSIGNMENT ON

Active Learning

Active Learning

Active Learning

Active learning