250 likes | 568 Views
Table of Contents. IntroductionIDMapsCoordinate-based RTT predictionGNPSimplex DownhillVivaldiICSPrincipal Component AnalysisOwn research ideas. Introduction. Why is communication latency prediction in the Internet important?TCP throughputQoSP2PProblems of latency measurementsOne way la
E N D
1. Predicting Communication Latency in the Internet
Dragan Milic
Universität Bern
2. Table of Contents Introduction
IDMaps
Coordinate-based RTT prediction
GNP
Simplex Downhill
Vivaldi
ICS
Principal Component Analysis
Own research ideas
3. Introduction Why is communication latency prediction in the Internet important?
TCP throughput
QoS
P2P
Problems of latency measurements
One way latency
time synchronization
asymmetric links => RTT != 2 x one way latency
Scaling problem of full mesh measurements
time needed to measure the latency to all potential communication partners
O(n2)
4. IDMaps [Francis et al. 2001]
Pioneer work about RTT prediction in the Internet
A global service for RTT estimation
Estimation of the RTT using triangulation
Proactive measurements
5. IDMaps: Architecture
Address Prefix (AP): Consecutive address range of IP addresses within which all hosts with assigned addresses are equidistant (with some tolerance) to the rest of the Internet.
Tracer: A host deployed in the access network (AS). The tracer measures the network distance to all other tracers in the Internet.
Virtual Link (VL): A raw distance between two tracers (Tracer-Tracer VL) and between a tracer and an AP (Tracer-AP VL).
6. IDMaps: Architecture (2)
7. IDMaps: Drawbacks
Deployment: Infrastructure support is needed. One tracer must be deployed to each access AS.
Scalability: Each tracer measures and stores RTT to all other tracers in the Internet: the complexity of storage and measurement traffic generation grows quadratically with the number of tracers deployed in the Internet - O(n2).
8. Coordinate based RTT prediction Idea:
Each host in the Internet is assigned one point in an virtual n-dimensional euclidean space.
The euclidean distance function in the virtual n-dimensional space predicts the RTT of the communication.
Problem:
The Internet cannot be projected to an ideal euclidean space.
Solution:
The coordinates of each host must be chosen in such way, that the square distance between measured and predicted RTT is minimized.
Practical implementations:
GNP, PIC, NPS
Vivaldi, Big Bang simulation
ICS
9. General Network Positioning (GNP) [Ng et al. 2002] GNP procedure:
Each host measures the RTT to a fixed set of hosts (landmarks). To uniquely determine the coordinates of a host, at least n+1 landmarks must exist for a n-dimensional space.
Using the Landmark positions and measured RTTs, each host can calculate its own coordinates by minimizing the square distance between predicted and measured RTTs using the simplex downhill method for function minimization.
Determining landmark coordinates:
Each landmark measures the RTT to all other landmarks.
One landmark (the leading landmark) receives the measurement results from all landmarks and calculates the coordinates for each landmark by minimizing the square distance function of the distance between estimated distance using landmark coordinates and the measured distances. The function is minimized using the simplex downhill method.
10. GNP (2)
11. Simplex Downhill [Nelder et al. 1965] A numerical algorithm for n-dimensional function minimization
Simplex: the simplest object that can be constructed using n+1 points in an n-dimensional space (i.e. triangle in 2-D, tetrahedron in 3-D etc.).
Input of the algorithm: a function to be minimized, initial corner points of a simplex (usually randomly chosen) and a condition for stopping the iteration (like the maximal number of iterations or the minimal progress for each iteration).
Possible transformations of the simplex in each iteration: reflecting (and optionally expanding), contracting and contracting in all directions.
12. Simplex Downhill (2) The algorithm:
Find the high and low corner points of the simplex by evaluating the function for each corner point of the simplex.
Try to find a better point to replace the high point moving the high point by reflecting, stretching, reflecting and stretching or contracting the simplex relative to all other points. If one of the transformations generates a better value, the high point is replaced by the new point.
If none of the above transformations leads to a better high point, the simplex is contracted in the direction of the low point (all other points are moved in the direction of the low point).
13. Simplex Downhill (3)
14. GNP: Drawbacks Using the same landmarks for all hosts does not scale well.
High network load on each landmark.
Solution proposed in NPS [Ng 2004].
Three levels of landmarks.
Only the landmarks of the first level are the real landmarks.
The of landmarks in the other levels are hosts, that are used by other hosts as landmarks.
Requires infrastructure (fixed landmarks).
Landmarks must be deployed in the Internet.
Solution proposed in PIC [Costa et al. 2004].
First n+1 hosts which need positioning become landmarks and compute their coordinates as described for GNP landmarks.
following hosts use the first n+1 hosts as landmarks.
15. GNP: Drawbacks (2) Simplex downhill
Does not always find the global minimum.
The result depends on the starting points (initial simplex).
Simplex downhill does not converge as fast as other function minimization methods (i.e. Gauss-Newton nonlinear, Newton nonlinear, etc.), that exploit additional knowledge about the function that has to be minimized
The solution for the coordinates of the landmarks is under-determined: infinite number of solutions.
16. Vivaldi [Dabek et al. 2004] Determining coordinates using physical model simulation (virtual springs)
All hosts start at the same coordinates.
Each host measures the RTT to few other hosts.
After each measurement, the host corrects its coordinates in such way, that the difference between the predicted and the measured distance (the potential energy of the virtual spring) is (partially) reduced by moving the host to reduce the spring force.
Requires no infrastructure.
Distributed algorithm.
Similar (but more complex) model: Big Bang Simulation [Shavitt et al. 2004]
Takes the kinetic energy and friction into account.
More complex model without distributed algorithm.
17. Vivaldi: Example
18. Vivaldi: Drawbacks Unstable
A new host can affect positions in the whole system.
Numerous “moves” are needed until a newly joined host reaches its ideal position.
Oscillation
When the algorithm is applied to measurements collected from the Internet, the whole system seems to be oscillating (most of the hosts are constantly changing their coordinates).
Possible solution for the oscillation of the system: Stable Vivaldi [de Launois 2004]: using a loss factor increasing with time for each spring.
19. Open Questions for GNP and Vivaldi GNP: Which hosts should be chosen as landmarks?
GNP and Vivaldi: How many dimensions should the virtual space have?
20. Internet Coordinate System (ICS)[Lim et al. 2003] Uses the Principal Component Analysis (PCA) to determine landmark and host coordinates.
Lower computational overhead than GNP (only basic matrix transformations and eigenvalue decomposition).
Positions of landmarks in the virtual space can be uniquely determined.
The sufficient number of dimensions needed to represent the whole system as a virtual space can be computed!
21. Principal Component Analysis (PCA) Linear transformation of the sample data using eigenvalues and eigenvectors.
The data is transformed so that the variance of the data is decreasing for every next dimension.
The new representation of the data allows reducing the number of dimensions with minimal loss of information.
22. ICS: Algorithm Landmarks:
Determine the RTT between all landmarks (represented as a n*n matrix for n landmarks).
Perform PCA of the RTTs. The result is the PCA transformation matrix.
Determine the number of dimensions that are sufficient to represent most of the measured data.
Scale the calculated transformation by using an least square estimator to achieve the preservation of distances between the landmarks in the transformed space.
Hosts:
Measure distance to all (or a subset of) landmarks. Represent the measurements as a n*1 matrix.
Retrieve the scaled transformation matrix and calculate the host position by multiplying the distance matrix with the received transformation matrix.
23. Drawbacks of the ICS Part of the information that could be exploited (i.e. by function minimization) is lost.
Seems to perform worse than GNP (not yet verified results).
24. Own Research Ideas Using multilateration (non-linear Newton iterative method) to determine the host and landmark coordinates.
Is already used by GPS.
Converges faster than Simplex Downhill.
Can find a global minimum (uses more information about the function that is minimized than the Simplex Downhill method).
Drawback: Solution for the landmarks is under-determined (there are multiple solutions) which leads to divergence of the non-linear Newton iterative method.
Using PCA to determine the number of dimensions that is needed for the virtual space and as a starting point for the non-linear Newton iterative method.
Analyzing how the choice of the landmarks influences the overall error of the system (simulations with generated topologies).
25. Questions ?