1 / 49

Imprecise Probability and Network Quality of Service

Imprecise Probability and Network Quality of Service. Martin Tunnicliffe. Two Kinds of Probability. Alietory Probability: The probability of chance . Example: “When throwing an unweighted die, the probability of obtaining a 6 is 1:6”. Epistemic Probability: The probability of belief .

britain
Download Presentation

Imprecise Probability and Network Quality of Service

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Imprecise Probability and Network Quality of Service Martin Tunnicliffe

  2. Two Kinds of Probability • Alietory Probability: The probability of chance. • Example: “When throwing an unweighted die, the probability of obtaining a 6 is 1:6”. • Epistemic Probability: The probability of belief. • Example: “The defendant on trial is probably guilty”.

  3. Probability and Betting Odds • A “fair bet” is a gamble which, if repeated a large number of times, returns the same amount of money in winnings as the amount of money staked. • Example, if there is a 1:10 chance of winning a game, then the “odds” for a fair gamble would be 10:1. • Problems arise when we do not know exactly what the chance of winning is. Under such circumstances, how can we know what constitutes a fair gamble? • Behavioural interpretation of probability (Bruno de Finetti, 1906-1985): “Probability” in such cases refers to what people will consider or believe a fair bet to be. • Belief stems from experience, i.e. inductive learning.

  4. Inductive Learning • Induction is the opposite of deduction, which infers the specific from the general. • Example: “All dogs have four legs. Patch is a dog. Therefore Patch has four legs.” • Induction is the opposite: It infers the general from the specific. • Example: “Patch is a dog. Patch has four legs. Therefore all dogs have four legs.”

  5. Inductive Learning The last statement has little empirical support. However, consider a larger body of evidence: The statement “all dogs have four legs” now has significant plausibility or epistemic probability. However, it remains uncertain: Even with a hundred dogs, there is no categorical proof that the hundred-and-first Dalmatian will not have five legs!

  6. Approaches to Inductive Learning. • Frequentist statistics disallows the concept of epistemic probability (We cannot talk about the “probability of a five-legged Dalmatian”). Thus it offers very little framework for inductive learning. • The Objective Bayesian approach allows epistemic probability, which it represents as a single probability distribution. (This is the Bayesian Dogma of Precision). • The Imprecise Probability approach uses two distributions representing “upper probability” and “lower probability”.

  7. Marble Problem Example (shamelessly “ripped off” from P. Walley, J. R. Stat. Soc. B, 58(1), pp.3-57, 1996): Marbles are drawn blindly from a bag of coloured marbles. The event  constitutes the drawing of a red marble. The composition of the bag is unknown. For all we know, it could contain no red marbles. Alternatively every marble in the bag may be red. Nevertheless, we are asked to compute the probability associated with a “fair gamble” on , both a priori (before any marble is drawn) and after n marbles are drawn, j of which are red. (Marbles are replaced before the next draw.)

  8. Binomial Distribution If  is the true (unknown) chance of drawing a red marble. The probability of drawing j reds in n draws is: This is proportional to the “Likelihood” of  given that j red marbles have been drawn Walley actually considers a more complex “multinomial” situation, where three or more outcomes are possible. However, I am only going to consider two possibilities:  = red marble and ~ = any other coloured marble.

  9. Bayes’ Theorem provides a relationship between likelihood and epistemic probability. Since  is a continuous variable, its probability must be described by a “probability density function” or pdf which we can denote f(): Let f ( ) be the “prior pdf” (representing our pre-existing beliefs about )and f ( | n, j) the “posterior pdf” (representing our modified beliefs given that n trials have yielded j red marbles). Bayes’ Theorem tells us that: Bayes’ Theorem

  10. Now from the binomial distribution we know that: We need a formula for f( ). Let us assume that it follows a beta distribution: Here t is the first moment (or expectation) of the distribution, representing our prior belief. Beta Model The “hyper-parameter”s is the “prior strength”, the influence this prior belief has upon the posterior probability.

  11. Beta Model: Prior Distributions

  12. Thus the beta-prior generates a beta-posterior (it is the “conjugate prior” for the binomial distribution). Beta Model: Posterior Distributions

  13. The expectation of the posterior distribution can now be calculated: Example: Supposing we are initially willing to bet 2:1 on a red (t=1/2). However, the next ten draws only produce 2 reds. Assuming s=2 gives: Posterior Expectation Under the behavioural interpretation, this is viewed as the posterior probability P(|j,n) of a red. Thus in the light of the new information, a fair gamble now requires odds of 4:1 on red, and 4:3 against red.

  14. Dirichlet Distribution Walley’s paper uses the generalised Dirichlet distribution. The beta distribution is the special case of the Dirichlet for which the number of possible outcomes is 2. (Sample set has cardinality 2.) This leads to the “Imprecise Dirichlet Model” or IDM. The simpler Beta-function model may be called the “Imprecise Beta Model” (IBM).

  15. Under this assumption: Objective Bayesian Approach We need an initial value for t, to represent our belief that  will occur when we have no data available ( j = n = 0). This is called a “non-informative prior”. Under “Bayes’ Postulate” (in the absence of any information, all possibilities are equally likely) t = 0.5: However, a value for s is still needed.

  16. Non-Informative Priors Bayesians favour setting s to the cardinality of the sample space (in this case 2) to give a “uniform” prior.

  17. Even after 10 failures to draw a red, the model still supports betting 10:1 on a red! Problems with Bayesian Approach Problem: Bayesian formula assigns finite probabilities to events which have never been known to happen, and might (for all we know) be physically impossible.

  18. Two possibilities, one “successful”, t = 1/2 Three possibilities, one “successful”, t = 1/3 Four possibilities, two “successful”, t = 1/2 Problems with Bayesian Approach Strict application of Bayes’ Postulate yields prior (and hence posterior) probabilities which depend on the choice of sample space (which should be arbitrary). The experiment is identical in all three cases: Only its representation is altered. Thus the Representation Invariance Principle (RIP) is violated.

  19. A Quote from Walley “The problem is not that Bayesians have yet to discover the truly noninformative priors, but rather that no precise probability distribution can adequately represent ignorance.” (Statistical Reasoning with Imprecise Probabilities, 1991) What does Walley mean by “precise probability”?

  20. The “Dogma of Precision” • The Bayesian approach rests upon de Finetti’s “Dogma of Precision”. • Walley (1991) “…..for each event of interest, there is some betting rate which you regard as fair, in the sense that you are willing to accept either side of a bet on the event at that rate.” • Example: If there is a 1:4 chance of an event , I am equally prepared to bet 4:1 on  and 4:3 against .

  21. The Imprecise Probability Approach The “Imprecise Probability” approach solves the problem by removing the dogma of precision, and thus the requirement for a noninformative prior. It does this by eliminating the need for a single probability associated with , and replaces it with an upper probability and a lower probability.

  22. When no data is available,  might take any value between 0 and 1. thus the prior lower and upper probabilities are respectively: Upper and Lower Probabilities Walley: Before any marbles are drawn “…I do not have any information at all about the chance of drawing a red marble, so I do not see why I should bet on or against red at any odds’. This is not a very exciting answer, but I believe that it is the correct one.”

  23. Upper and Lower Probabilities Lower Probability: The degree to which we are confident that the next marble will definitely be red. Upper Probability: The degree to which we are worried that the next marble might be red.

  24. Posterior Upper and Lower Probabilities However, the arrival of new information (j observed reds in n trials) allow these two probabilities to be modified. The prior upper and lower probabilities (1 and 0) can be substituted for t in the Bayesian formula for posterior mean probabvility. Thus we obtain the posterior lower and upper probabilities:

  25. Properties of Upper and Lower Probabilities The amount of imprecision is the difference between the upper and lower probabilities, i.e. This does not depend on the number of “successes” (occurrences of ). As n, the imprecision tends to zero and the lower and upper probabilities converge towards j/n, the observed success ratio. As s  , the prior dominates: The imprecision becomes 1, and the lower and upper probabilities return to 0 and 1 respectively. As s  0, the new data dominates the prior and and the lower and upper probabilities again converge to j/n (Haldane’s model).

  26. Proof: Interpretation of Upper and Lower Probabilities How do we interpret these upper and lower probabilities? Which do we take as “the probability of red”? It depends on whether you are betting for or against red. If you are betting for red then you take the lower probability, since this represents the most cautious expectation of the probability of red. However, if you are betting against red, you take the upper probability, since this is associated with the lower probability of not-red.

  27. A “fair bet” would be 1/0.7=1.429:1 against the event . 0.7 A “fair bet” would be 1/0.1=10:1 in favour of the event . 0.1 Interpretation of Upper and Lower Probabilities (For consistency, we continue to assume that s = 2.)

  28. Consider the axiom of possibility theory: i.e. the “necessity” of event X occurring is one minus the “possibility” of X not occurring. Similarly the expressions for upper and lower probability show us that: Analogy with Possibility Theory Thus upper probability is analogous to possibility and lower probability to necessity.

  29. Choosing a Value of s

  30. Choosing a Value of s

  31. Confidence Intervals for  You might be tempted to think that the upper and lower probabilities represent some kind of “confidence interval” for the true value of . This is not the case. Upper and lower probabilities are the mean values of belief functions for , relevant to people with different agendas (betting for and against ).

  32. Example: =0.95 (95% confidence) 2.5% 95% probability within this range 2.5% -() +() Confidence Intervals for  Suppose we want to determine a “credible interval” ( -(),+()) such that we are at least  100 per cent “sure” that  -() <  < +():

  33. Confidence Intervals for 

  34. Calculating the Confidence Interval Integrating the two probability distributions, we find that we can compute the confidence intervals by solving the equations: I indicates the “Incomplete Beta Function”. No analytic solution exists, but numerical iteration using the partition method is quite straightforward.

  35. Frequentist Confidence Limits Binomial distribution for = -() Binomial distribution for = +()

  36. Frequentist Confidence Limits

  37. Frequentist Imprecise Probability Comparison – Frequentist vs. Imprecise Probability When s = 1 (Perks), Imprecise Probability agrees exactly with Frequentism on the upper and lower confidence limits.

  38. Applications in Networking • Network Management and Control often requires decisions to be made based upon limited information. • This could be viewed as gambling on imprecise probabilities. • Monitoring Network Quality-of-Service. • Congestion Window Control in Wired-cum-Wireless Networks.

  39. Host/ End System Network Quality of Service (QoS) Host/ End System Quality of Service (QoS) Different types of applications have different QoS requirements. FTP and HTTP can tolerate delay, but not errors/losses (transmitted and received messages must be exactly identical). Real time services (Voice/Video) can tolerate some data losses, but are sensitive to variations in delay.

  40. QoS Metrics Loss: Percentage of transmitted packets which never reach their intended destination (either due to noise corruption or overflow at a queuing buffer.) Throughput: The throughput is the rate at which data can be usefully carried. Latency: A posh word for “delay”; the time a packet takes to travel between end-points. Jitter: Loosely defined as the amount by which latency varies during a transmission. (Its precise definition is problematic.) Most important in real-time applications.

  41. User Data User Data Network Monitor Data Monitor Data (n - j) packets “successful” n packets total j packets “failed” Quality of Service (QoS) Monitoring Failure Probability 

  42. Simulation Data Heavily Loaded Network (Average utilisation: 97%)

  43. Simulation Data Lightly Loaded Network (Average utilisation: 46%)

  44. Jitter Definition 1 Ref: http://www.slac.stanford.edu/comp/net/wan-mon/dresp-jitter.jpg

  45. “Simple” Jitter: Difference between successive latencies “Smoothed” Jitter (RFC 3550): Each value inherits 15/16 of the previous value Jitter Definition Two

  46. Monitor Stream: 10s Data Stream: 10ms Jitter Profiles

  47. Wired-Cum-Wireless Networks Wired Network: Congestion Only Wireless Network: Congestion Plus Random Noise

  48. Block of lost packets Packet i+2 Packet i Packet i+1 Packet j Packet j-2 Packet j-1 Packet Stream Arrival Times ti tj-2 ti+1 ti+2 tj-1 tj Interarrival Times ti+1 – ti ti+2 – ti+1 tj-1 – tj-2 tj – tj-1 WTCP: Identifying the Cause of Packet Loss using Interarrival Time

  49. WTCP: Identifying the Cause of Packet Loss using Interarrival Time Assume we already know the mean M and standard deviation σof the interarrival time when the network is uncongested. If M - Kσ < Δi,j<M + Kσ (where K is a constant), then the losses are assumed to be random. The sending rate is not altered. Otherwise, we infer that queue-sizes are varying: An indication that congestion is occurring. The sending rate is reduced to alleviate the problem. Much work still to be done on this optimising mechanism to maximise throughput.

More Related