570 likes | 852 Views
Geolocation by IP address. Locating Internet hosts. Sándor Laki lakis@inf.elte.hu http://lakis.web.elte.hu. Outline. RADAR a wireless solution IP2Geo on the Internet Constraint-Based Geolocation (GeoLim) OCTANT framework Topology-Based Geolocation. RADAR. RADAR, a wireless approach.
E N D
Sándor Laki (C) Geolocation by IP address Geolocation by IP address Locating Internet hosts Sándor Laki lakis@inf.elte.hu http://lakis.web.elte.hu
Sándor Laki (C) Geolocation by IP address Outline • RADAR a wireless solution • IP2Geo on the Internet • Constraint-Based Geolocation • (GeoLim) • OCTANT framework • Topology-Based Geolocation
Sándor Laki (C) Geolocation by IP address RADAR, a wireless approach • Focus on the indoor environment • GPS does not work indoors • Dedicated technologies • Goals • Leverage existing infrastucture • Use wireless LAN • Software solution • Better scalability and lower cost than dedicated technology
Sándor Laki (C) Geolocation by IP address RADAR • Key idea: Signal strength matching • Offline calibration • Construct radio map (<location, Sstr>) • Real-time location and tracking • Extract SStr from beacons • Find table entry that best matches the measured SStr
Sándor Laki (C) Geolocation by IP address RADAR: Determine location • Find nearest neighbor in signal space (NNSS) • 1st solution • Physical position of NNSS gives the user location • 2nd solution • K-NNSS • Average the coordinates of k nearest neighbor gives the wanted position
Sándor Laki (C) Geolocation by IP address Correlation between physical location and signal strength • Base system: • INFOCOM 2000 paper • Enhanced system: • Microsoft Technical Report MSR-TR-2000-12
Sándor Laki (C) Geolocation by IP address IP2Geo Single-point localization
Sándor Laki (C) Geolocation by IP address IP2Geo - Motivation • Much focus on location-aware services in wireless and mobile contexts • Such services are relevant in the Internet context too • targeted advertising • event notification • territorial rights management • network diagnostics • It is a challenging problem • IP address does not inherently contain an indication of location
Sándor Laki (C) Geolocation by IP address IP2Geo • Multi-pronged approach that exploits various “properties” of the Internet • DNS names of router interfaces often indicate location • network delay tends to correlate with geographic distance • hosts that are aggregated for the purposes of Internet routing also tend to be clustered geographically • GeoTrack • determine location of closest router with a recognizable DNS name • GeoPing • use delay measurements to estimate location • GeoCluster • extrapolate partial (and possibly inaccurate) IP-to-location mapping information using BGP prefix clusters
Sándor Laki (C) Geolocation by IP address GeoTrack – main idea • Extract geographical information from DNS names of routers on the path • Localizes the target to the last router whose position is known • Example • ngcore1-serial8-0-0-0.Seattle.cw.net => Seattle • 184.atm6-0.xr2.ewr1.alter.net => New York • dnvr-scrm.abilene.ucaid.edu => Denver
Sándor Laki (C) Geolocation by IP address GeoTrack • GeoTrack operation • do a traceroute to the target IP address • determine location of last recognizable router along the path • Key ideas in GeoTrack • partitioned city code database to minimize chance of false match • ISP-specific parsing rules • delay-based correction • Limitations • routers may not respond to traceroute • DNS name may not contain location information or lookup may fail • target host may be behind a proxy or a firewall
Sándor Laki (C) Geolocation by IP address GeoPing - Delay based localization • Delay-based triangulation is conceptually simple • delay to distance • distance from 3 or more non-colinear points => target location • But there are practical difficulties • network path may be circuitous • transmission & queuing delays may corrupt delay estimate • OWD is hard to measure • OWD ≠ RTT/2 because of routing asymmetry
Sándor Laki (C) Geolocation by IP address GeoPing - details • Measure the network delay to the target host from several geographically distributed probes • typically more than 3 probes are used • round-trip delay measured using ping utility • small-sized packets => transmission delay is negligible • pick minimum among several delay samples • Nearest Neighbor in Delay Space (NNDS) • akin to Nearest Neighbor in Signal Space (NNSS) in RADAR • construct a delay map containing (delay vector,location) tuples • given a vector of delay measurements, search through the delay map for the NNDS • location of the NNDS is our estimate for the location of the target host • More robust that directly trying to map from delay to distance
Sándor Laki (C) Geolocation by IP address GeoPing – Delay tends to increase with geographic distance
Sándor Laki (C) Geolocation by IP address GeoPing – Estimation error
Sándor Laki (C) Geolocation by IP address GeoCluster A passive method
Sándor Laki (C) Geolocation by IP address GeoCluster • A passive technique unlike GeoTrack and GeoPing • Basic idea: • breaks the IP address space into clusters • assign a geographical location to each cluster based on IP-to-location third party databases • given a target IP address, first find the matching cluster using longest-prefix match. • location of matching cluster is our estimate of host location
Sándor Laki (C) Geolocation by IP address GeoCluster • Example: • consider the cluster 128.95.0.0/16 (containing 65536 IP addresses) • suppose we know that the location corresponding to a few IP addresses in this cluster is Seattle • then given a new address, say 128.95.4.5, we deduce that it is likely to be in Seattle too
Sándor Laki (C) Geolocation by IP address GeoCluster – Clustering IP addresses • Exploit the hierarchical nature of Internet routing • inter-domain routing in the Internet uses the Border Gateway Protocol (BGP) • BGP operates on address aggregates • we treat these aggregates as clusters • in all we had about 100,000 clusters of different sizes
Sándor Laki (C) Geolocation by IP address IP-to-location mapping • Data sources • e-mail service, business web-hosting companies, etc. • requires a large, fine-grain and fresh database! • Information • partial information (i.e., only for a small subset of addresses) • possibly inaccurate (e.g., manual input from user)
Sándor Laki (C) Geolocation by IP address Extrapolating IP-to-location mapping • Determine location most likely to correspond to a cluster • majority polling • “average” location • dispersion is an indicator of our confidence in the location estimate • What if there is a large geographic spread in locations? • some clusters correspond to large ISPs and the internal subdivisions are not visible at the BGP level • sub-clustering algorithm: keep sub-dividing clusters until there is sufficient consensus in the individual sub-clusters • some clients connect via proxies or firewalls (e.g., AOL clients) • sub-clustering may help if there are local or regional proxies • otherwise large dispersion => no location estimate made • many tools fail in this regard
Sándor Laki (C) Geolocation by IP address Performance of GeoCluster Median errors: GeoCluster ~30km GeoPing ~300km GeoTrack ~100km
Sándor Laki (C) Geolocation by IP address Other database-oriented applications • NetGeo and IP2LL • based on WHOIS DB • not closely regulated • the address information often indicates the head office of the owner which may be far from the actual target • Quova • Commercial service with thier own database • Gtrace • using DNS LOC entries
Sándor Laki (C) Geolocation by IP address Octant framework A very impressive solution
Sándor Laki (C) Geolocation by IP address Octant overview • Combine very different techniques • Active and passive • Constraint-based • Weighted positive and negative constraints • Constraint => region • Using Bézier-regions • Efficient implementations of clipping and union operations are available
Sándor Laki (C) Geolocation by IP address Octant - Notations • bi : the region in which the target node is located • gj : a constraint • It is a region where the node might be reside associated with weight • Set of nodes • Landmarks: physical locations are at least partially known (Lj) • Every Lj has an „estimated” location bLj
Sándor Laki (C) Geolocation by IP address Octant – Landmarks and constraints • Primary landmark • GPS, street address • Low error • Secondary landmark • Position computed by Octant itself • Positive constraints ( set: ) • Node A is within d miles of Lk • g = (x,y) in bkc(x,y,d), where c(x,y,d) is a disc. • Negative constraints( set: ) • Node A is further than d miles from Lk • g = (x,y) in bkc(x,y,d)
Sándor Laki (C) Geolocation by IP address Estimated location • bi = XiXi \ XiXi
Sándor Laki (C) Geolocation by IP address Mapping latencies to distances • Latency between a target and a landmark • bounds thier maximum distance • Calculate with speed of light • delay*2/3*c • Low precision • Octant’s way • Dynamic calibration • For each L landmark compute two bounds RL(d) and rL(d) • where d is the ping time of node i • rL(d) || loc(L) – loc(i) || RL(d) • When queuing delays are dominant then rL(d) = 0.
Sándor Laki (C) Geolocation by IP address Mapping latencies to distances • Each landmark periodically pings all other landmarks => creating a correlation table • Determines the convex hull around the points => R(d) and r(d) • It is sufficient when the target has a direct and congestion-free path to the landmark • Octant introduce a cut off at latency p • a tunable percentage of landmark lie to the left of p • discard the others • (z is a fictitious datapoint, • placed far away)
Sándor Laki (C) Geolocation by IP address Mapping latencies to distances
Sándor Laki (C) Geolocation by IP address Last hop delays • Mapping is further complicated by queuing and transmission delays associated with the last hop • Cable and DSL connections • Overloaded PlanetLAB nodes • Goal: isolate the delay components which artificially inflate latencies • Detailed maps of the underlying physical network, as in network tomography (not in Octant) • Octant introduce a simple metric called height
Sándor Laki (C) Geolocation by IP address Last hop delays in Octant • Based on pair-wise latency measurements between landmarks • Primary landmarks: a, b, c • Measure thier latencies(RTT): [a,b], [a,c], [b,c] • The positions of primary landmarks are known -> we can estimate the transmission delays: (a,b), (a,c), (b,c) • Lasthop delay(a,b) = [a,b] - (a,b) • Landmark coordinates (alon, alat),…
Sándor Laki (C) Geolocation by IP address Last hop delays in Octant • How much of the delays can be attributed to each landmark? • Denoted by a’, b’ and c’ // height • Similarly, for a target t, we can compute t’, as an estimation… • We can solve for • t’, tlon, tlat
Sándor Laki (C) Geolocation by IP address Last hop delays • tlon and tlat has relatively high error • not used in the later stages • Given the target and landmark heights • Each landmark can shift its • RL up if t’ < heights of the other landmarks • rL down if t’ > heights of the other landmarks
Sándor Laki (C) Geolocation by IP address Indirect routes • The preceding assumption • Route lengths are proportional to great circle distances • not the case in practise, due to policy routing • Example: a subscriber Ithaca, NY -> Cornell Univ. (Ithaca) • Syracuse, NY -> Brockport, IL -> New York City -> Cornell Univ. • ~1 mile physical distance VS. ~800 miles length path
Sándor Laki (C) Geolocation by IP address Indirect routes discovery • Landmark’s heigth can indicate… • Localizing routers on the network path • Secondary landmarks • Localization by latencies • Extract location from router names • Reverse DNS lookup – undns tool • Using ZIP code to determine geographical location
Sándor Laki (C) Geolocation by IP address Handling uncertainty • Filter out errorneous constraint • Latency based constraints • Weight system that decreases exponentially with increasing latency • Weight threshold
Sándor Laki (C) Geolocation by IP address Iterative refinement • Two phase: • First, we use accurate and mostly conservative constraints • Second, less acurate and more aggressive constraints to obtain a better estimation (inside the initial estimated region) • …
Sándor Laki (C) Geolocation by IP address Topology-based Geolocation
Sándor Laki (C) Geolocation by IP address Motivations • Problems with CBG • Use constraints that are less than speed of light • Risk of underestimates • When an underestimate occurs, the final region does not contain the true location • Topology based geolocation • using the speed of light to generate constraints • inspired by Sensor Network Localization
Sándor Laki (C) Geolocation by IP address Summary of techniques • Traceroute from landmarks • Map topology • Estimate hop latency • Improve accuracy • Cluster network interfaces • Increase structuring • Validate location hints • Incorporate location hints • Constraint optimization • Geolocate targets
Sándor Laki (C) Geolocation by IP address Estimate hop latencies • Using traceroute tool to infer link latency • Estimate hop latency from the difference in RTT to adjacent routers • Accurate only if the link is traversed both directions (symmetric routing) • How can we discover this property? • Three different techniques
Sándor Laki (C) Geolocation by IP address Estimate hop latencies • First, observing the reverse TTL values • Most routers initialize the TTL values for thier packets from a small set. • 30,32,64,128,150,255 • If TTL values changes significantly from one node to the next => discard the link estimate • Second, measuring paths in both direction between pairs of landmarks • If both paths traverse a particular link => taking the differences of measurements to the two endpoints • This estimation has high confidence • Third, increasing vantage points from which we probe a certain link • For every link on a path from a landmark we probes to both endpoints from all other landmarks… • If these probes pass over the link => estimate for the link
Sándor Laki (C) Geolocation by IP address Clustering interfaces • Clustering interfaces that belong to the same router (IP aliases)
Sándor Laki (C) Geolocation by IP address Clustering interfaces • Two IP-aliases techniques • Mercator • UDP probes are send to high-numbered ports on a set of interfaces • Routers send back a port-unreachable ICMP message with the source address • If two diff. interfaces replie with the same source address => aliases • Ally • Used on pairs of interfaces • Sends probes to the two if. • Examines the IP-ID • Most routers generate the IP-ID using a single counter that has incremented after each packet has been created
Sándor Laki (C) Geolocation by IP address Validating location hints • DNS names -> locations • Some names are incorrect… • Missnamed, reconfig, reassignment of IP addresses • Topology constraints can be used to verify location hints • RTT measurements -> upper bounds… • Clustering -> aliases • Hop latencies