1 / 18

游晟佑 2.6.2013

Smarter Outlier Detection and Deeper Understanding of Large-Scale Taxi Trip Records A Case Study of NYC. 游晟佑 2.6.2013. About A uthor Introduction Backgrounds and Related Works Method and Discussions Experiments and Results Conclusion Critique Appendix. About Author.

clea
Download Presentation

游晟佑 2.6.2013

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Smarter Outlier Detection and Deeper Understanding of Large-Scale Taxi Trip RecordsA Case Study of NYC 游晟佑 2.6.2013

  2. About Author • Introduction • Backgrounds and Related Works • Method and Discussions • Experiments and Results • Conclusion • Critique • Appendix

  3. About Author • JiantingZhang • Assistant professor in Geographical Information System (GIS) • CS@CUNY City College, CS@CUNY Graduate Center • A member of Geospatial Technologies and Environmental Cyberinfrastructure (GeoTECI) Lab

  4. Backgrounds and Related Works Detect outliers: (whole trip detection / point detection) (few) • Easiest way:threshold (for example: trip >=30km, trip <=200meter) (Antrip: point detection) • A More General way: to compute the distribution of measurement (location, distance, duration) • Special aspects: if the pick up / drop off location (point detection) is in some land use types (i.e. lake / river), it is an outlier

  5. Method and Discussions • Shortest Path(A* algorithm; Dijkstra) • Why shortest path? They only want to detect Outliers. Distance is an important factor in its nature. • But they use CH (Contraction Hierarchies) developed by KarlsruheInstitute of Tech. (KIT, in Germany) in MoNav (an open source package) • Why CH ? CH is designed specifically for road networks have achieved significant higher efficiencies than the generic ones.

  6. Method and Discussions(Cont.) • A taxi trip has following attributes: • pickup location, pickup time, drop-off location, drop-off time, recorded distance, a street network with N nodes and M edges

  7. Method and Discussions(Cont.) • Four steps to detect outliers: • If pickup, drop-off locations cannot be snapped into nearest street segments with a reasonable distance (D0), it is considered as Type I outlier • Compute the uniquecombinations of pickup and drop-off nodes of all trips • generate shortest trips (using the MoNav-CH module) • The computed shortest path distances are then compared with the recorded distances. • If the computed distances are greater than a threshold D1and are W times longer than the recorded distances, then the records are marked as type II outliers.

  8. Experiments and Results • Data and Experimental Setting • Data from NAVTEQ, 166million taxi trips in NYC in 2009 • More than 20 attributes in a trip but they only use some of them

  9. Experiments and Results(Cont.) • Distributions of Trip Distances, Time, Speed and Fare

  10. Experiments and Results(Cont.) • Results on taxi trip outlier detection D0 = 200 feet, D1 = 3, W = 2, 166 million trips 1.5% fall into Type I outlier

  11. Experiments and Results(Cont.) • Trip大多發生在midtown and downtown

  12. Experiments and Results(Cont.) • About 18,000 trips are fall into Type II outliers • Recorded vs calculated (They want to increase D1 to get more outliers  more false positives)

  13. Experiments and Results(Cont.) • Results on Betweeness Centrality(gen by Monav-CH) • Just a by-product • 參與中間度指標(betweenness centrality), 一個edge or 一個point, 上面經過的shortest path 總計量 • 用來看出一個path重要性

  14. Experiments and Results(Cont.)

  15. Conclusion • 有效的detect outliers

  16. Critique • It wound be better to classify errors from device / human / etc. Not just delete all “suspects” • 以下是題外話, 與outlierdetection無關, 與trip planning 有關: • Geospatial data is not enough for trip planning • It wound be better to consider real time data into accounts (for example: traffic congestion, 尖峰離峰時間, 可能不同) • 不能只考慮shortest path, What if 4miles vs 5miles = 20 minsvs 15mins under the same OD (origin destination)?

  17. Appendix MoNav is a Desktop / Mobile application that offers state-of-the-art fast and exact routing with OpenStreetMap Data. http://wiki.openstreetmap.org/wiki/MoNav http://code.google.com/p/monav/

  18. Thanks for your listening …

More Related