190 likes | 205 Views
RandPing: A Randomized Algorithm for IP Mapping. Michelle Liu Yuhan Cai. Outline. Introduction Related Work Background Algorithm Overview Experimental Evaluation Conclusions and Future Work. Introduction. Motivations Collection of personalized information Authorities of transactions
E N D
RandPing: A Randomized Algorithm for IP Mapping Michelle Liu Yuhan Cai
Outline • Introduction • Related Work • Background • Algorithm Overview • Experimental Evaluation • Conclusions and Future Work
Introduction • Motivations • Collection of personalized information • Authorities of transactions • Problem statement • IP mapping is the problem that, given an IP address p, find the geographic location of the internet host with IP address p. • Challenges • No authorative database • IP addresses do not contain geographic information
Related Work • DNS based approach • Using DNS records from databases • IP2LL, NetGeo, and GeoTrack • DNS might not be related to locations • Delay based approach • Exploiting relationship between distances and network delays • GeoPing and CBG • Clustering based approach • Splitting IP address space into clusters • Assumption: all hosts within the same cluster are co-located
Background • Best line bound • Above the baseline • Below all data points • Closest to all data points
Background (cont.) • Clustering • Partitioning Around Medoids (PAM) • Quality of a Clustering = average of the distance of an object to the medoid of its cluster • Outlier detection • O is a DB(p, D)-outlier if at least fraction p of T lies greater than distance D from O. • Scriptroute system • A system that allows network measurements conduction from remote vantage points
Algorithm Overview • Overall idea • Clustering probing machines • Random selection of a small set of probing machines • Reduction of search space by pruning • Major steps • Preprocessing stage • Randomized pinging • Location estimation
Preprocessing Stage • Construction of RTT table and Distance table for probing machines • Computation of the best line for each probing machine subject to the constraint:
Preprocessing (cont.) • Clustering of probing machines based on their geographic locations • Transformation of the geographic system to a Cartesian coordinate system • x = 2RcosT0 (G – G0) / 360 • y = 2R (T - T0) / 360
Randomized Pinging • Random selection of m clusters • Random selection of k probing machines within each cluster • Pinging the target machine to get n = m*k RTT measurements
Location Estimation • Computation of estimated distances • Determination of the best group of circles by dynamic programming • Keep track of groups of circles • Incrementally build up each group • Pick the biggest group
Location Estimation (cont.) • Locating the target machine by non-linear programming subject to the constraints:
Location Estimation (cont.) • Repeat the process for r times • Computation of the centroid for the r estimated locations • Prune out distance-based outliers • Compute the centroid of the points left
Experimental Results • Setup • Machines selected from Planetlab in US • One small set of machines to be target machines, the rest to be probing machines • Results • Error distance: distance between the real location of the target machine and the estimated one
Experimental Analysis • Limited number of probing machines • Effect of randomization is not obvious • The best line estimation is too conservative. • Intersection region of the circles is too big.
Conclusions • A randomized approach for IP mapping using clustering and outlier detection • Location estimation based on dynamic programming and non-linear programming
Future Work • Adjusting the algorithm parameters: • number of clusters • number of trials and • number of picked machines • Proving a lower bound for the difference between the accuracy of randomized algorithm and deterministic algorithm