Towards Street-Level Client-Independent IP Geolocation

Towards Street-Level Client-Independent IP Geolocation Yong Wang, UESTC/Northwestern Daniel Burgener, Northwestern Marcel Flores, Northwestern Aleksandar Kuzmanovic, Northwestern Cheng Huang, Microsoft Research http://networks.cs.northwestern.edu

Problem and Motivation • How to accurately locate IP addresses on the Internet? • Host-dependent solutions: • GPS • WiFi (e.g., Google My Location, Skyhook) • Host-independent solutions: • Server cannot always expect clients’ cooperation • Security / access restrictions • Online service access analytics • Location-based online advertising

A Scenario of Street-Level Online Advertising User’s location Local Businesses

Prior Work • Constrained Based Geolocation [ToN 06] Median error distance = 228 km • Measure delays from active vantage points • Topology Based Geolocation [IMC 06] Median error distance = 67 km • CBG + consider network topological information • Octant [NSDI 07] Median error distance = 35.2 km • CBG + consider router’s location, geographical and demographics information

Methodology Highlights • Our methodology is based on two insights • Websites often provide the actual geographical location of associated entities • E.g., universities, businesses, government offices, etc. • Develop methods to determine if web- or e-mail servers reside at the corresponding locations • Relative network delays highly correlate with geographical distances • Absolute network delay measurements are fundamentally limited in their ability to achieve fine-grained geolocation results

Institutional Network Example Web cloud-sourcing mail server to external network web server router IP subnet 550 South Hill Street Suite 890, Los Angeles, CA‎ 90013 550 South Hill Street Suite 890, Los Angeles, CA‎ 90013

The Role of Relative Network Delays Measured delays: < < <

A Case Study • Target IP address: 38.100.25.196 • Target postal address: 1850, K Street NW, Washington DC, DC, 20006

Three-Tier Geolocation System Tier 1 Goal: Find the coarse- grained region for the targeted IP Measured delays Geographical distances Create intersection

Three-Tier Geolocation System Tier 2 Goal: Use passive landmarks to determine finer-grained region for the targeted IP Populate the intersection with landmarks Estimate the delay between landmarks and the target D1 + D2 < D3 +D4 Create a new intersection

Three-Tier Geolocation System Tier 3 Goal: Geolocate the target IP using passive landmarks Select the landmark with the minimum delay to the target, and associate the target’s location with it. Measured distance ∝Geographical distance 10.6 km vs. 0.103 km

Remaining Issues • Verifying landmarks • Sweep-out most of the erroneous landmarks • Errors are still possible! • Resilience to errors • The larger the error – the more resilient our method is • We prove that the likelihood that an erroneous landmark will affect the accuracy is small

Evaluation • Three datasets • Planetlab dataset (Academic) • Collected dataset (Residential) • Online Maps dataset (In the wild) • Factors impact the accuracy • Landmark density • Population density • Access networks

Dataset Characteristics Urban areas Rural areas The three datasets cover both urban areas and rural areas.

Baseline Results

Landmark Density Density sequence: Planetlab > Residential > Online Maps The larger the number of landmarks we can discover in the vicinity of a target, the larger the probability we will be able to more accurately geolocate the targeted IP.

The Role of Population Density The error distance is smallest in densely populated areas The error grows as the population density decreases Middle of “nowhere”

The Role of Access Networks 2 km 700 meters Cable access networks (Comcast) have a much larger latency variance than DSL networks (AT&T and Verizon)

Conclusions • A geolocation system able to geolocate IP addresses with more than an order of magnitude better precision than the best previous method • Our methodology consists of two components • Mining landmarks from the Web and using Web or E-mail servers as landmarks • Using relative network distances as opposed to absolute network distances

Thank You http://networks.cs.northwestern.edu

Towards Street-Level Client-Independent IP Geolocation

Towards Street-Level Client-Independent IP Geolocation

Presentation Transcript

Access-independent Core Networks: Converging towards all-IP

Street-Level Ethics

Geolocation

Geolocation

Dude, where’s that IP? Circumventing measurement-based geolocation

Geolocation

Geolocation

Towards Independent Living

Geolocation Privacy

A journey towards independent living

King Street Level

Dude, where’s that IP? Circumventing measurement-based IP geolocation

Geolocation Marketing

Master Data (client Level)

Towards Human Level AI

Geolocation Privacy

Geolocation by IP address

Client-Matter Level Budgeting

Street-level Apologetics

Level I-Client Services Assistant Client Eligibility Overview

Geolocation Marketing

Towards Street-Level Client-Independent IP Geolocation