200 likes | 215 Views
Explore a novel geolocation method based on web data and network latency, achieving precise IP location estimation.
E N D
Towards Street-Level Client-Independent IP Geolocation Yong Wang, UESTC/Northwestern Daniel Burgener, Northwestern Marcel Flores, Northwestern Aleksandar Kuzmanovic, Northwestern Cheng Huang, Microsoft Research http://networks.cs.northwestern.edu
Problem and Motivation • How to accurately locate IP addresses on the Internet? • Host-dependent solutions: • GPS • WiFi (e.g., Google My Location, Skyhook) • Host-independent solutions: • Server cannot always expect clients’ cooperation • Security / access restrictions • Online service access analytics • Location-based online advertising
A Scenario of Street-Level Online Advertising User’s location Local Businesses
Prior Work • Constrained Based Geolocation [ToN 06] Median error distance = 228 km • Measure delays from active vantage points • Topology Based Geolocation [IMC 06] Median error distance = 67 km • CBG + consider network topological information • Octant [NSDI 07] Median error distance = 35.2 km • CBG + consider router’s location, geographical and demographics information
Methodology Highlights • Our methodology is based on two insights • Websites often provide the actual geographical location of associated entities • E.g., universities, businesses, government offices, etc. • Develop methods to determine if web- or e-mail servers reside at the corresponding locations • Relative network delays highly correlate with geographical distances • Absolute network delay measurements are fundamentally limited in their ability to achieve fine-grained geolocation results
Institutional Network Example Web cloud-sourcing mail server to external network web server router IP subnet 550 South Hill Street Suite 890, Los Angeles, CA 90013 550 South Hill Street Suite 890, Los Angeles, CA 90013
The Role of Relative Network Delays Measured delays: < < <
A Case Study • Target IP address: 38.100.25.196 • Target postal address: 1850, K Street NW, Washington DC, DC, 20006
Three-Tier Geolocation System Tier 1 Goal: Find the coarse- grained region for the targeted IP Measured delays Geographical distances Create intersection
Three-Tier Geolocation System Tier 2 Goal: Use passive landmarks to determine finer-grained region for the targeted IP Populate the intersection with landmarks Estimate the delay between landmarks and the target D1 + D2 < D3 +D4 Create a new intersection
Three-Tier Geolocation System Tier 3 Goal: Geolocate the target IP using passive landmarks Select the landmark with the minimum delay to the target, and associate the target’s location with it. Measured distance ∝Geographical distance 10.6 km vs. 0.103 km
Remaining Issues • Verifying landmarks • Sweep-out most of the erroneous landmarks • Errors are still possible! • Resilience to errors • The larger the error – the more resilient our method is • We prove that the likelihood that an erroneous landmark will affect the accuracy is small
Evaluation • Three datasets • Planetlab dataset (Academic) • Collected dataset (Residential) • Online Maps dataset (In the wild) • Factors impact the accuracy • Landmark density • Population density • Access networks
Dataset Characteristics Urban areas Rural areas The three datasets cover both urban areas and rural areas.
Landmark Density Density sequence: Planetlab > Residential > Online Maps The larger the number of landmarks we can discover in the vicinity of a target, the larger the probability we will be able to more accurately geolocate the targeted IP.
The Role of Population Density The error distance is smallest in densely populated areas The error grows as the population density decreases Middle of “nowhere”
The Role of Access Networks 2 km 700 meters Cable access networks (Comcast) have a much larger latency variance than DSL networks (AT&T and Verizon)
Conclusions • A geolocation system able to geolocate IP addresses with more than an order of magnitude better precision than the best previous method • Our methodology consists of two components • Mining landmarks from the Web and using Web or E-mail servers as landmarks • Using relative network distances as opposed to absolute network distances
Thank You http://networks.cs.northwestern.edu