90 likes | 221 Views
Determining the Geographic Location of Internet Hosts. Venkata N. Padmanabhan Microsoft Research Lakshminarayanan Subramanian University of California at Berkeley SIGMETRICS 2001. Background. Location-aware services are relevant in the Internet context too targeted advertising
E N D
Determining the Geographic Location of Internet Hosts Venkata N. Padmanabhan Microsoft Research Lakshminarayanan Subramanian University of California at Berkeley SIGMETRICS 2001
Background • Location-aware services are relevant in the Internet context too • targeted advertising • event notification • territorial rights management • Existing approaches: • user input: burdensome, error-prone • whois: manual updates, host may not be at registered location • Goal: estimate location based on client IP address • challenging problem because an IP address does not inherently indicate location
IP2Geo Multi-pronged approach that exploits various “properties” of the Internet • DNS names of router interfaces often indicate location • Network delay tends to correlate with geographic distance • Hosts that are aggregated for the purposes of Internet routing also tend to be clustered geographically • GeoTrack • determine location of closest router with recognizable DNS name • GeoPing • use delay measurements to triangulate location • GeoCluster • extrapolate partial IP-to-location mapping information using cluster information derived from BGP routing data
GeoPing • Delay-based triangulation is conceptually simple • delay distance • distance from 3 or more non-collinear points location • But there are practical difficulties • network path may be circuitous • transmission and queuing delays may corrupt delay estimate • one-way delay is hard to measure • GeoPing • delay is measured from several distributed probes • minimum delay among several samples is picked • Nearest Neighbor in Delay Space (NNDS) algorithm • construct a delay map containing (delay vector,location) tuples • given a delay vector, search through the delay map for closest match • location corresponding to the closest match is our location estimate
Validation of Delay-based Approach Delay tends to increase with geographic distance
Impact of the Number of Probes Highest accuracy when 7-9 probes are used
GeoCluster • Basic idea • divide up the space of IP addresses into clusters using BGP prefixes • use partial IP-to-location mapping data to infer location of each cluster • given target IP address, find matching cluster via longest-prefix match. • location of the matching cluster is our estimate of host location • Issues • partial IP-to-location mapping information may not be entirely accurate • BGP prefixes might not correspond to geographic clusters • Sub-clustering algorithm • use partial IP-to-location mapping information to test whether a BGP prefix is likely to correspond to a geographic cluster • if the test is negative, divide the prefix into two and recursively apply the test to each half • in the end we are only left with geographically clustered prefixes • dispersion offers an indication of the accuracy of a location estimate
Performance of IP2Geo Median error: GeoCluster: 28 km,GeoTrack: 102 km, GeoPing: 382 km
Summary • IP2Geo combines several techniques that leverage different sources of information • GeoTrack: DNS names • GeoPing: network delay • GeoCluster: address aggregates used for routing • Median error varies between 20 and 400 km • Even a 30% success rate is useful especially since we can tell when the estimate is likely to be accurate • Forthcoming paper at SIGCOMM 2001 • For more information visit: http://www.research.microsoft.com/~padmanab/