1 / 237

Comprehensive Guide to Discrete Event Simulation Techniques

Learn the basics of Discrete Event Simulation, including Event Scheduling, smpl usage, Initialization Routine, and Facility Control concepts. Dive into simulation modeling and imitation techniques for real-world processes.

chrisreyes
Download Presentation

Comprehensive Guide to Discrete Event Simulation Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Simulation Modeling • Imitation of the operation of a real-world process or system over time • Objective: to collect data as if a real system were being observed • Data collected from the simulation are used to estimate the performance/dependability measures of the system

  2. Discrete Event Simulation • modeling of the system as it evolves over time by a representation in which the state variables change only at a countable number of points in time • terminology: • simulation clock: a variable that gives the current value of the simulated time • event: an instantaneous occurrence which may change the state of the system

  3. Simulation Terminology • event list: a list (data structure) consisting of event records with each record containing the time of occurrence of a particular event, e.g., the arrival time, the departure time of a client • timing routine: a subroutine which determines and removes the most imminent event record from the event list and advances the simulation clock to the time when the corresponding event is to occur • event routine: a subroutine which updates the state of the system when a particular type of event occurs • one event routine for each type of event

  4. Event Scheduling • Determine the number of event types in the system, e.g., 1: arrival, 2: request for service, 3: service completion, 4: timer, etc. • Place one or more initial event records in the event list, each containing • event time, event type, customer class, etc. • Determine the most imminent event in the event list (by the timing routine) in a loop until a specified stopping rule is satisfied • update the simulation clock when an event record is removed from the event list

  5. Event Scheduling (cont.) • Pass the control to the event routine corresponding to the event type • Update the state of the system • Gather the statistics if necessary • Report the simulation results when the simulation is completed • For example • the average response time per client • the loss probability of calls • the system throughput • the average number of clients served over a time period

  6. Simulation using smpl • In the smpl view of systems, there are three types of entities: • resources: facilities • smpl provides functions to define, request, release and preempt (queueing) facilities • tokens: active entities of the systems, e.g., tasks, users (indistinguishable or distinguishable) • events: a change of state of any system entity is an event • smpl provides functions for scheduling and for selecting events in the order of event occurrence time

  7. Initialization routine; timing control routine to select the most imminent event from the event list (event clock is updated implicitly) { event type 1: event routine for event type 1; event type 2: event routine for event type 2; . . event type n: event routine for event type n; } statistics reporting routine; Structure of An smpl Program

  8. smpl(m, s) int m=0; /* always 0 */ char *s; smpl provides seeds for 15 streams for generating random numbers. To collect a set of 15 sample values of a particular performance measure, one can invoke smpl() 15 times: loop: repeat 15 times { smpl(0, “hw1”); } One can also use stream(1), stream(2), etc. to specify the stream number to be used in a simulation run Initialization Routine

  9. fd = facility(s, n) char *s; int n; /* # of servers */ => define a queueing server with “n” servers; smpl automatically manages enqueueing/dequeueing activities r = request(fd_id, token_id, pri) intfd_id; inttoken_id; intpri; => request a server of facility “fd_id” be reserved for the token designated by “token_id” with priority “pri” (higher is better) r=0: facility is reserved r=1: facility is busy and the request is blocked in the queue ordered on priority Facility Definition and Control

  10. r = preempt(fd_id, tkn_id, pri) intfd_id, tkn_id, pri; => same as request() except that it will preempt the server if it is busy serving a task with priority < “pri” => the event record corresponding to the preempted token (for the service completion event) is removed from the event list and a queue entry with the residual time is created r=0: facility is reserved r=1: facility is busy and the request is blocked in the queue ordered on priority release(fd_id, tkn_id) intfd_id; inttkn_id; => release the facility and if the queue is not empty, reschedule an event with the event occurrence time at NOW for a blocked task, and reschedule an event with the event occurrence time at NOW+ the residual time for a preempted task. Facility Definition and Control create an event of the same type and put it in the event list

  11. schedule(event_id, te, tkn_id) int event_id; real te; /* time interval relative to the current time */ int tkn_id; => schedule the event with id “event_id” to occur at NOW+te => this essentially inserts an event record with the event occurrence time NOW+te into the event list => part of the information in the event record is event_id, tkn_id and the event occurrence time NOW+te Example: schedule(2, 0.0, token_id) => schedule event type #2 associated with token id “token_id” to occur NOW Scheduling Events

  12. cause(event_id, tkn_id) int *event_id; int *tkn_id; => remove the most imminent event from the event list and automatically advance the simulation clock to the event occurrence time => return the event number (type) and token id to the caller Typically in the smpl program, we use a select statement on the event_id returned, so as to transfer the control to the appropriate event routine. Timing Routine

  13. cancel(event_id) int event_id; => search the event list and remove the first event with the event number = event_id Canceling Events Get Current Simulation Time t = real time() => return the current simulation clock value => real is a predefined type; it is the same as double in C

  14. n= int inq(fd); => returning # of tokens currently in queue (not including the ones in service) r = int status(fd) => r=0: facility is free; r=1: facility is busy u = real U(fd) => mean # of tokens in service n = real Lq(fd) => mean # of tokens in queue excluding the ones in service b = real B(fd) => mean busy period = accumulated busy time/release counts Status Functions

  15. r = real drand48(); /* available on UNIX machines */ => return r in the range of (0,1) r = real expntl(x) double x; => return an exponentially distributed sample value with mean x r = real uniform(a,b) double a,b; => return a real number r in the range of (a,b) k = int random(i,j) int i, j; => return an integer k in the range of (i,j) r = real normal(x,s) => return a normally distributed sample value with mean x and standard deviation s Random Variate Generation (rand.c)

  16. trace(n) int n; => generate trace messages when a facility is defined, requested, or released, or whenever an event is scheduled or caused n=0: trace is off n=1; free-running, i.e., trace messages are generated continuously n=2: screen by screen running (press any key to resume tracing) n=3: message by message running (press any key to resume) Traces and Debugging

  17. #include “smpl.h” main() { real Ta=200, Ts=100, te=200000; int customer=1, event, server; smpl(0, “M/M/1 Queue”); server = facility(“server”,1); schedule(1, 0.0, customer); while (time()< te) { cause(&event, &customer); switch(event) { case 1: /* arrival */ schedule(2,0.0, customer); schedule(1, expntl(Ta), customer); break; case 2: /* request server */ if (request(server, customer,0)==0) schedule(3, expntl(Ts), customer); break; case 3: /* completion */ release(server, customer); break; } } report(); } M/M/1 smpl program

  18. Confidence Interval and Level • Suppose we collect N sample values Y1, Y2, …, YN from N simulation runs • sample mean Y = (Y1 + Y2 + …+ YN )/N • true mean is m • Define 1-a as the probability that the absolute value of the difference between the Y and m is equal to or less than H • that is, prob[ Y-H <= m <= Y+H] = 1- a Confidence interval half-width Confidence interval Confidence level

  19. Confidence Interval and Level (cont.) • When Y1, Y2, …, YN are independent random variables from a normal distribution with the mean m, H is defined by H = ta/2;N-1* s/sqrt(N) where t is the student’s t distribution and s2 is the sample variance given by s2 = Si (Yi - Y)2 /(N-1) (and thus sis the standard deviation).

  20. Batch Mean Analysis by smpl m m samples to generate Y1 samples to generate Y2 • Use a batch size m around 2000 observations to collect a sample value Yito justify the normal distribution assumption (by central limit theorem). • Delete d = 0.1 m initial observations • Collect k = 10 batches and compute the confidence interval half-width H • If the desired accuracy has not been reached, collect another batch and compute H again. Repeat as necessary.

  21. BMA: stat.c and bmeans.c • Based on 95% confidence level (a = 0.05) with 10% confidence accuracy (H/Y = 10%) • The following three routines are provided: • init_bm(d, m): d is number of initial observations to be discarded and m is the number of observations to collect one sample Yi • obs(y): y is the observation value generated out of a simulation run • if the returning value is 1, it means that the required confidence level and accuracy have been reached; otherwise, need to continue calling this function obs(y) • civals(Y, H, k): Y, H and k are passed in by reference. This function returns the final result.

  22. M/M/1 smpl program with BMA case 1: /* arrival */ ts[customer] = time(); schedule(2,0.0, customer); if (++tk_id >= TOKENS) tk_id=0; schedule(1,expntl(Ta),tk_id); break; case 2: /* request server */ if (request(server, customer,0)==0) schedule(3,expntl(Ts),customer); break; case 3: /* release server */ release(server, customer); if (obs(time()-ts[customer])== 1) cont = FALSE; break; } } /* end while */ civals(&mean, &hw, &nb); printf(”Y= %f; H= %f after %d batches\n”, mean, hw, nb); } #include “smpl.h” #define TOKENS 1000 #define TRUE 1 #define FALSE 0 main() { real Ta=200.0,Ts=100.0,mean,hw; inttk_id=0,customer=0,event,server,nb; real ts[TOKENS]; /* start time stamp */ intcont=TRUE; smpl(0,"M/M/1 Queue with BMA"); init_bm(200,2000); /* d=200; m=2000 */ server=facility("server",1); schedule(1,0.0,tk_id); while (cont) { cause(&event,&customer); switch(event) {

  23. Bmeans.c printf("batch %2d mean = %.3f",k,smy); smy=0.0; n=0; /* reset batch variables */ if (k>=10) then { /* compute grand mean & half width */ Y=smY/k; var=(smY2-k*Y*Y)/(k-1); h=T(0.025,k-1)*sqrt(var/k); printf(", rel. HW = %.3f",h/Y); if (h/Y<=0.1) then r=1; } printf("\n"); } return(r); } civals(mean,hw,nb) real *mean,*hw; int *nb; { /* return batch means analysis results */ *mean=Y; *hw=h; *nb=k; } #include "smpl.h" #include "stat.c" static intd,k,m,n; static real smy,smY,smY2,Y, h; init_bm(m0,mb) int m0,mb; { /* set deletion amount & batch size */ d=m0; m=mb; smy=smY=smY2=0.0; k=n=0; } obs(y) real y; { int r=0; real var; if (d) then {d--; return(r);} smy+=y; n++; if (n==m) then { /* batch complete: update sums & counts */ smy/=n; smY+=smy; smY2+=smy*smy; k++;

  24. Chap 2: Reliability and Availability Models Reliability R(t) = prob{S is fully functioning in [0,t]} Suppose from [0,t] time period, we measure out of N components, of which N0 (t): # of components operating correctly at time t Nf (t): # of components which have failed at time t Physical meaning: instantaneous rate at which components are failing

  25. Failing rate of a single component How many unfailing components are there at time t ? N Nf(t+dt) - Nf(t) dt t time t+dt (2.3) text p.28 Z(t) is also called the hazard rate.

  26. For electronic components, Z(t)’s relationship with respect to time is a bathtub curve. Burn-in period Z(t) High failure rate due to faulty design, manufacturing or assembly. Useful time phase Wear-out phase (due to aging) Infant mortality phase Weak components are removed during the “burn-in” period. Failure rate can be assumed to be a constant during the useful time period, say  time

  27. Exponential Failure Law e.g.  = 0.01 hr-1, what is R(t) at t = 100 hrs? Ans: e-0.01*100  For hardware components, exponential failure law is frequently assumed.  For software components, the reliability may grow as the software’s design faults are removed during the testing/debugging phase.

  28. In general, we can assume Z(t) =  ( t )-1 Weibull dist.  R(t) = e- (t ) Z(t) Use this to model software failure rate >1 <1 e.g.  = -1 R(t) = =1 -1 t et This yields  t  , R(t)  1 t  0, R(t)  0  Reliability improves as a function of t

  29. Formal Definition of R(t): Let x be a .. representing the life of a system and let F be the cumulative distribution function (CDF) of x. Then,  For a component obeying the exponential failure law

  30. Mean Time to Failure (MTTF): The expected time that a system will operate before the first failure occurs i first failing time 1 t1 Identical systems N tN Discrete case Continuous case failure time Q: what is the reliability of a system obeying the exponential failure law at t = MTTF? Ans:

  31. MTTR (Mean Time to Repair) If also assume a failed system obeys “Exponential Repair Law”, then Relationship between MTBF (Mean time between failure) , MTTR & MTTF: time MTTF MTTR MTTF MTTR Time of 2nd failure Time of first failure MTBF = MTTR + MTTF  If MTTF >> MTTR Then MTBF  MTTF

  32. Availability Instantaneous (or point) Availability A(t) = prob {the system is functioning at time t} Regardless of the # of times it may have failed & been repaired during [0, t] A(t) A time t 0 Steady-State Availability = MTTF MTTF+MTTR  For a system without repair, A(t) = R(t)

  33. Assume exponential “failure” & “repair” law  See also page 67, text chapter 4 O F  Time domain: Po’(t) = - Po(t) + PF(t) PF’(t) = Po(t) - PF(t) with initial state Po(0) = 1 & PF(0) = 0 Laplace domain: SPo(S) - 1 = - Po(S) + PF(S) SPF(S) - 0 = Po(S) - PF(S) Po(0) PF(0)

  34. LT F(t) L(F(t)) = f(S) Inverse LT 1/S 1/S2 n!/Sn+1 1/(S-a) 1 t tn eat   Po(t) = + e-(+)t + + Physical meaning: Po(t) = prob {the system is functioning at time t }   PF(t) = - e-(+)t + +   + e-(+)t  A(t) = + +  Po(S) Similarly PF(S) Inverse LT to return to time domain

  35. e.g.  = 0.01 &  = 0.1 1.0 0.9 0.90909... A(t) t Q1: unavailability? Ans: PF(t) Q2: R(t)? Still e-t Q3: Steady-state availability?  Ans: A( t) = +

  36. Modeling: Series-Parallel Reliability Block Diagrams  A series-parallel block diagram represents the logical structure of a system with regard to how the reliabilities of its components affect the system reliability.  Components are combined into blocks in  series  parallel or  k-of-n configurations

  37. 1 1 2 n A. Serial system: each element of the system is required to function correctly for the system to function correctly. 1 2 3 n B. Parallel system: only one of several elements must be operational for the system to be operational. Assumptions: independent random variables perfect coverage so up to n-1 failures can be tolerated. 2

  38. Computer 1 Interface 1 Display 1 Bus 1 Computer 2 Interface 2 Display 2 Bus 2 Parallel 1 2 3 4 Where Rj,i is for jth component of ith unit C. Combination of series & parallel systems e.g. Numerical ex: R=0.9 then Rsystem = [1-(1-0.9)2]4 = 0.96 v.s. Rnon-redundant = (0.9)4 = 0.6561

  39. e.g. R1 R1 1 R2 R2 R2 2 R3 3 Rparallel = 1 - ( 1 - Rseries, 1 )( 1 - Rseries, 2 )( 1 - Rseries, 3 ) = 1 - ( 1 - R12 )( 1 - R23 )( 1 - R3 ) Q: Prove the following theorem: Replication at the component level is more effective than replication at the system level in improving system reliability using the same # of components. Ans: Show that is better than Assume R=1/2 for each component

  40. D. k-out-of-n e.g. TMR ( Triple Module Redundancy ) is a 2-out-of-3 system. RTMR(t) = R1(t) * R2(t) * R3(t) + R1(t) R2(t) ( 1 - R3(t) ) + R1(t) R3(t) ( 1 - R2(t) ) + R2(t) R3(t) ( 1 - R1(t) ) when R1(t) = R2(t) = R3(t) = R(t) R2-out-of-3(t) = 3R2(t) - 2R3(t) all are functioning 1 failed & 2 are functioning In general, 3 e.g.

  41. 3 2  R2 - R + 0.5 = 0  R = 1.0 or 0.5 n n ! å - l - l - = - t i t n i R ( t ) ( e ) ( 1 e ) system - ( n i )! i ! = i k ¥ æ ö 1 1 1 ò \ = = + + ç ÷ MTTF R ( t ) dt ... system l n k è ø 0 Rsystem RTMR(t) Q: Is RTMR(t) > Rsystem with a single component(t) ? 1.0 Let 3R2 - 2R3 = R R(t) single RTMR(t) = R(t) when the reliability of a single module is 1.0 or 0.5 More realistic region In fact, when R(t) < 0.5, R(t) > RTMR(t) R(t) 1.0 0.5 Q: what is the MTTF of a k-out-of-n system when each single module follows the exponential failure law with a failure rate of  ? e.g. 2-out-of-3: MTTF= 2-out-of-5: MTTF=

  42. m1 P1 m2 P2 m3 MTTF = 6 0.00903 3 0.01042 6 0.01667 3 0.01806 2 0.02431 1 0.0257 - - - = = + + 226.09 Q: what is the reliability & MTTF of the following structure? Fig. 2.5, p36 A parallel system too Also a parallel system m = 0.00139/day ( 1-out-of-3 ) p = 0.00764/day ( 1-out-of-2 ) Rsystem(t) = [ 1 - ( 1 - e-0.00764t )2 ][ 1 - ( 1 - e-0.00139t )3 ] = 6e-0.00903t-3e-0.01042t-6e-0.01667t+3e-0.01806t+2e-0.02431t-e-0.0257t  Recall that Repair rate Failure rate of component i Ai(t) = Equations & obtained above can also be used to compute 1 2 3 the system availability  by replacing Ri(t) with Ai(t)

  43. 4 5 6 Assuming For steady state availability i i + i Use Ai( t   ) = into equations above & 4 5 6 Specifically,

  44. C D A B C E D C block name {( param-list )} optional < block line > end can be one of the following: Chap 9: Reliability & Availability Modeling Reliability Block Diagramusing sharpe P.358 Appendix B

  45. 2) parallel name name-1 name-2 {name3 name4 …} Name 1 The parallel system is assigned to the first name. Name 2 3) series name name-1 name-2 {name3 name4 …} 4) kofn name expression-1, expression-2, component-name The series system is assigned to the first name. Name 1 Name 2 k n identical components 1) comp name exponential-polynomial Referring to the cumulative dist function ( cdf ) F(t) = 1-e-t R(t) = e-t exp (lambda) meaning cdf (component-name{,state}{;arg-list}) which has been defined before gen triple 1, triple 2, ... in the form of (aj, kj, bj) or See p. 352

  46. Ex: m1 Sharing memory: a k-out-of-n device CPU1 Fig. 2.5 (p. 36) m2 CPU2 m3 Output CDF for system 1-6e-0.00903t+3e-0.0104t+… (1-out-of-3) k=1.00 mean(system;k,3,…) = 2.26*102 rel (10,k,3…) = 9.9981*10-1 rel (365,k,3…) = 8.33*10-1 k=2.00 in semi symbolic form 5) kofn name k-expression, n-expression, name1 name2 {name3…} representing a k-out-of-n system having possibly different components Components do not have identical failure-time dist. block system ( k, n, pfrate, mfrate ) comp CPU exp ( pfrate ) comp mem exp ( mfrate ) parallel CPUs CPU CPU kofn mems k, n, mem series subsystem CPUs mems end

  47. Comments: any line starting with the symbol “*”  printing output.  printing F(t) in symbolic form cdf (system;1, 3, 0.00139, 0.00764)  define reliability function at time t func rel(t, k, n, pf, mf)\ 1-value (t; system; k, n, pf, mf) \ means continuation to the next line Returning F(t) at time t numerically loop k, 1, 3, 1 exprmean (system; k, 3, 0.00139, 0.00764) exprrel (10, k, 3, 0.00139, 0.00764) exprrel (365, k, 3, 0.00139, 0.00764) end end

  48. =0.3 =0.05 C =0.1 D A B C E =0.01 D C =0.25 block block1a comp A exp(0.05) comp B exp(0.01) comp C exp(0.3) comp D exp(0.25) comp E exp(0.1) parallel threeC C C C parallel twoD D D series sys1 A B threeC twoD E end Fig. 9.1 p. 156  printing: 5 decimal places format 5  cdf cdf (block1a) expr 1-value(10; block1a) end Print 1-F(t)=R(t) at t=10

  49. Availability Modeling See p.354 text on a user-defined distribution syntax: poly name(param-list) dist. gen\ triple\ triple of the form (aj, kj, bj ) polyunavail(mu, lambda)\ gen\ 1, 0, 0\ -mu/(lambda+mu), 0, 0\ -lambda/(lambda+mu), 0, -(lambda+mu) block block1a comp A unavail(muA, lambdaA) comp B unavail(muB, lambdaB) . . . end bindmuA 1 bindlambdaA0.0001 . To define 1-Ai(t)  print steady state  availability A() expr pinf(block1a)  print instantaneous  availability at t=100 expr 1-value(100; block1a) end

  50. e.g. failure or M1 M2 M3 P1 P2  AND AND M1 M2 M3 P1 P2 Fault Trees P.39, chap.2  A pictorial representation of events that can cause the occurrence of an undesirable event.  An event at a high level is reduced to a combination of lower level events by means of logic gates AND: when all fail, the failure event occurs. OR: when one fails, the failure event occurs. K out of n: when at least k out of n components fail, the failure event occurs.

More Related