NETW 707 Modeling and Simulation Amr El Mougy Maggie Mashaly

NETW 707 Modeling and Simulation Amr El Mougy Maggie Mashaly

Lecture(5) Statistical Models in Simulation

Introduction • The time taken by a repair person to fix a machine is a function of: • The complexity of the breakdown • The availability of proper replacement parts and tools • The availability of a repair person • These variations occur by chance and cannot be predicted • Thus, some statistical model is required

Introduction • Development of a statistical model: • Sampling the phenomenon of interest • Select a known distribution through an educated guess or using a dedicated software • Estimate the parameters of the selected distribution • Test the fit of the chosen distribution

Characterizing a Probability Distribution • Probability mass function (Pmf): a function that describes the probability that a discrete random variable is equal to some value • Cumulative distribution function (cdf): a function that describes the probability that a random variable X assumes a value less than or equal to x F

Characterizing a Probability Distribution • Expected value: the weighted average of a random variable E ) • Variance: the expected (or average) squared distance (or deviation) from the mean Var(X) – [E (X)]2 • Standard deviation: the square root of the variance

Example: Continuous Uniform Distribution for a ≤ x ≤ b for x < a or x > b for x < a F for a ≤ x < b for x ≥ b

Example: Binomial Distribution F

Heavy-Tailed Distributions • The top 1% of a population owns 60% of the wealth • The top 2% of twitter users send 60% of the tweets • A distribution with a tail heavier than the exponential • Pareto principle: known as the 80-20 rule, i.e. 80% of the effects come from 20% of the causes

Long-Tailed Distributions • Distributions where a large number of occurrences are far from the “center” of the distribution • Internet searches have a very long tail. A small number of words are most commonly searched for while the majority is rarely searched for • Amazon profits from long tail distribution by selling rare books • A subset of heavy-tail distributions Heavy-Tailed Distributions Long-Tailed Distributions

Long-Tailed Distributions Linear = light-tailed

Discrete Probability Distributions Binomial Distribution Uniform (Discrete) Distribution Geometric Distribution Negative Binomial Distribution Hypergeometric Distribution

Continuous Probability Distributions Triangular Distribution Uniform (Continuous) Distribution Normal Distribution Exponential Distribution Cauchy Distribution

Continuous Probability Distributions Weibull Distribution Lognormal Distribution Gamma Distribution Minimum Extreme Value Distribution

Discrete Distributions • If an experiment only has two possible outcomes  Binomial distribution (ex: packet successfully received or not) • If we need to count the number of trials until the first success  geometric distribution • If we need to count the number of trials until the kthsuccess, k = 1, 2, … negative binomial distribution. • Negative binomial distribution can thought of as a sum of independent geometric distributions • Ex: What is the probability that the third inspected product at a manufacturing plant is the second one accepted. • If we need to count the number of event occurrences within a period of time  Poisson distribution (ex: number of calls in an hour)

Continuous Distributions • For events that are highly variable (interarrival times) or instantaneous occurrences (failure of a light bulb)  exponential distribution • Sum of independent exponential distributions  Gamma distribution. Extremely flexible, used to model non-negative variables • The sum of k independent random variables  normal distribution • The product of k independent random variables lognormal distribution • Bounded random variables  beta distribution

Continuous Distributions • Weibull distribution can be thought of as a stretched exponential distribution. That’s why it has a longer tail. • Uniform distribution  complete uncertainty • Triangular distribution  maximum and minimum are known • When no theoretical distribution seems appropriate  Empirical data

Choosing a Probability Distribution:Queuing Systems • If service times are completely random, the exponential distribution can be used • Gamma and Weibulldistributions can be used as well • If large service times (much larger than the mean) occur more frequently than the exponential distribution can account for, the Weibull distribution can be used

Choosing a Probability Distribution:Inventory Systems • There are at least 3 random variables: • Number of units demanded • Time between demands • Lead time (to satisfy demands) • Lead time can often be fitted by a gamma distribution

Choosing a Probability Distribution:Inventory Systems • Poisson and negative binomial distributions can satisfy a variety of demand patterns • If the demand is long-tailed, the negative binomial is more appropriate (Poisson has a shorter tail) • Poisson is often used because it is simpler and extensively tabulated

Limited Data • If the data obtained is limited or incomplete, there are usually three distributions that are used:

How to Choose a Random Variable Is the data discrete or continuous Continuous Discrete Can you estimate outcomes and probabilities Is the data symmetric or asymmetric Symmetric Yes Asymmetric No Is the data clustered around a central value Is the data symmetric or asymmetric Are the outliers positive or negative Estimate probability distribution No Yes Symmetric Asymmetric Are the values clustered around a central value Only positive Mostly positive Are the outliers positive or negative How likely are the outliers Mostly negative No outliers. Limits on data Very low low Expo-nential Lognormal Gamma Weibull Min Extreme Uniform Trian-gular Normal Cauchy No Yes Only +ve Mostly -ve Mostly +ve Geometric Negative Binomial Hyper-Geometric Binomial Uniform Discrete

Estimation of Parameters • If data can be collected from the real system: • Trace driven simulation: data values are used directly in the simulation. Ex: real customer arrivals at a grocery store • Empirical distribution: the collected data are used to determine the theoretical distribution, which is then used in the simulation to generate random variables • Fitted standard distribution: the collected data is fed as input to a statistical inference algorithm. If a fit is found, then this distribution can be used with the fitted parameters

Advantages and Disadvantages

Identifying the Appropriate Distribution • Histograms are usually appropriate (divide data into intervals of equal width and plot the frequency of each interval) Too coarse Appropriate Too ragged

Random Number Generation • What is a random number: • A single number cannot be random. Only an infinite sequence of numbers can be random • Random means the absence of order • Have various applications: • Simulations • Sampling • Generating security keys

Properties of Random Numbers • A sequence of numbers is called random if the numbers possess these two statistical properties • Uniformity: every number is equally likely to occur • Independence: the occurrence of a number is independent of the occurrence of another

Generation of Pseudo-Random Numbers • Pseudo means “not genuine” • A sequence of numbers with a repeat period but with the appearance of randomness (if you don’t know the algorithm) • They are called so because they are generated using methods that remove the potential for true randomness

Random Number Generators on Computers • Fast • Portable to different computers • Have sufficiently long cycle • Replicable • Closely approximate properties of uniformity and independence • The random number SEED is the initial random number used to generate the second random number

NETW 707 Modeling and Simulation Amr El Mougy Maggie Mashaly