游晟佑 2.6.2013

Smarter Outlier Detection and Deeper Understanding of Large-Scale Taxi Trip RecordsA Case Study of NYC 游晟佑 2.6.2013

About Author • Introduction • Backgrounds and Related Works • Method and Discussions • Experiments and Results • Conclusion • Critique • Appendix

About Author • JiantingZhang • Assistant professor in Geographical Information System (GIS) • CS@CUNY City College, CS@CUNY Graduate Center • A member of Geospatial Technologies and Environmental Cyberinfrastructure (GeoTECI) Lab

Backgrounds and Related Works Detect outliers: (whole trip detection / point detection) (few) • Easiest way:threshold (for example: trip >=30km, trip <=200meter) (Antrip: point detection) • A More General way: to compute the distribution of measurement (location, distance, duration) • Special aspects: if the pick up / drop off location (point detection) is in some land use types (i.e. lake / river), it is an outlier

Method and Discussions • Shortest Path(A* algorithm; Dijkstra) • Why shortest path? They only want to detect Outliers. Distance is an important factor in its nature. • But they use CH (Contraction Hierarchies) developed by KarlsruheInstitute of Tech. (KIT, in Germany) in MoNav (an open source package) • Why CH ? CH is designed specifically for road networks have achieved significant higher efficiencies than the generic ones.

Method and Discussions(Cont.) • A taxi trip has following attributes: • pickup location, pickup time, drop-off location, drop-off time, recorded distance, a street network with N nodes and M edges

Method and Discussions(Cont.) • Four steps to detect outliers: • If pickup, drop-off locations cannot be snapped into nearest street segments with a reasonable distance (D0), it is considered as Type I outlier • Compute the uniquecombinations of pickup and drop-off nodes of all trips • generate shortest trips (using the MoNav-CH module) • The computed shortest path distances are then compared with the recorded distances. • If the computed distances are greater than a threshold D1and are W times longer than the recorded distances, then the records are marked as type II outliers.

Experiments and Results • Data and Experimental Setting • Data from NAVTEQ, 166million taxi trips in NYC in 2009 • More than 20 attributes in a trip but they only use some of them

Experiments and Results(Cont.) • Distributions of Trip Distances, Time, Speed and Fare

Experiments and Results(Cont.) • Results on taxi trip outlier detection D0 = 200 feet, D1 = 3, W = 2, 166 million trips 1.5% fall into Type I outlier

Experiments and Results(Cont.) • Trip大多發生在midtown and downtown

Experiments and Results(Cont.) • About 18,000 trips are fall into Type II outliers • Recorded vs calculated (They want to increase D1 to get more outliers  more false positives)

Experiments and Results(Cont.) • Results on Betweeness Centrality(gen by Monav-CH) • Just a by-product • 參與中間度指標（betweenness centrality）, 一個edge or 一個point, 上面經過的shortest path 總計量 • 用來看出一個path重要性

Experiments and Results(Cont.)

Conclusion • 有效的detect outliers

Critique • It wound be better to classify errors from device / human / etc. Not just delete all “suspects” • 以下是題外話, 與outlierdetection無關, 與trip planning 有關: • Geospatial data is not enough for trip planning • It wound be better to consider real time data into accounts (for example: traffic congestion, 尖峰離峰時間, 可能不同) • 不能只考慮shortest path, What if 4miles vs 5miles = 20 minsvs 15mins under the same OD (origin destination)?

Appendix MoNav is a Desktop / Mobile application that offers state-of-the-art fast and exact routing with OpenStreetMap Data. http://wiki.openstreetmap.org/wiki/MoNav http://code.google.com/p/monav/

Thanks for your listening …

游晟佑 2.6.2013

游晟佑 2.6.2013

Presentation Transcript

3Q | 2013

Olivier’s Review

What's new in Microsoft Visual C++ 2013

HEART TRANSPLANTATION

Welcome CGA Interclub Captains 2013

Instructional Coaches: 2013-2014 CIA Kick-Off

Sermons From Science -- April 2013 科学布道 -- 2013 年 4 月

ANALYZING OPERATING ACTIVITIES

Sermons From Science -- Jan 2013 科学布道 -- 2013 年 1 月

PRESENTATION OF THE NPA STRATEGIC PLAN FOR 2013/18 AND ANNUAL PERFORMANCE PLAN 2013/14