1 / 10

Active Learning on Spatial Data

Active Learning on Spatial Data. Christine Körner Fraunhofer AIS, Uni Bonn. Outline. Active Learning FAW-Project Spatial Data Experiment Outline. Active Learning. Difficult / expensive to obtain labelled data manual preparation of documents for text mining analysis of drugs or molecules

early
Download Presentation

Active Learning on Spatial Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Active Learning on Spatial Data Christine Körner Fraunhofer AIS, Uni Bonn

  2. Outline • Active Learning • FAW-Project • Spatial Data • Experiment Outline

  3. Active Learning • Difficult / expensive to obtain labelled data • manual preparation of documents for text mining • analysis of drugs or molecules • Active learning strategies actively select which data points to query in order to • minimize the number of training examples for a given classification quality • maximize the quality of results for a given number of data points

  4. Selective Sampling Label? • Which Instance to choose next? Where we • have no data? • perform poorly? • have a low confidence? • expect our model to change? • previously found data that improved quality? ORACLE Instance add to training set

  5. The FAW-Project • FAW: Association to regulate outdoor commercials • Goal: Prediction of traffic frequencies for 82 major German cities • Samples: ~ 400-1500 poster sites measured per city

  6. Data Characteristics, Prediction • street name, • segment ID • speed class • street type • sidewalks • one-way-road • POIs • no. restaurants • no. public buildings • … • spatial coordinates • KNN: • similarity calculated based on scalar attributes and spatial coordinates • applies weights according to (spatial) distance of neighbors

  7. Frequency Nordstraße Riesenweg 2000 1500 1000 500 0 Streets Segments Spatial Data • Spatial Data: • spatial covariance between data points • high autocorrelation and concentrated linkage* on street name bias test accuracy • 1:n relationship between street name and segments • frequencies within one street are alike • here: complete instance space is known (all street segments of a city) *David Jensen, Jennifer Neville: Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners

  8. Active Learning in FAW • Usage: • additional samples at ~50 places per city • KNN needs cross product of street segments with all poster places • Cologne: 50 GB, 5 days • Strategy: • Data density • mean distance of next k neighbors • Model differences • Build Model Tree with predicted frequencies • Disagreement between models?

  9. Experiment Outline Samples • Comparison of accuracy-increase using • Ranking vs Random order of added samples • Alternatives • iterative ranking (reality?, greedy search optimal?) • rank once, remove similar objects (eg: exclude segments of same street, …) • Possible Problems: • KNN not very stable • few samples, Oracle has little choice to provide requested data sets Model Tree KNN Frequencies Training Distance Test Iterations Oracle Ranking for AL

  10. Thank you! • Suggestions • Ideas ? • Questions

More Related