240 likes | 428 Views
Analyzing the Results of a Simulation and Estimating Errors. Jason Cooper. Types of Error. Big and obvious errors Systematic error Statistical (random) error. Big, Obvious Errors. Arise from gross error, often in the particle configuration.
E N D
Analyzing the Results of a Simulation and Estimating Errors Jason Cooper
Types of Error • Big and obvious errors • Systematic error • Statistical (random) error
Big, Obvious Errors • Arise from gross error, often in the particle configuration. • Examine intermediate conformations (MD or MC) for obvious problems, regardless of the focus of the study. • Conformations typically stored every 5-25 steps.
Systematic ErrorCharacterization • Results in a constant bias or skew from the expected result. Expected distribution Biased distribution Skewed distribution
Systematic ErrorCharacterization • Calculated values for simple thermodynamic properties should be normally distributed:
Systematic ErrorCharacterization 1. Sort data into bins of approximately equal number. Expected number is given by: 2. Calculate chi-squared statistic: (2 > 1 indicates a poor match)
Systematic ErrorSources • Four main sources of systematic error: • The model (limitations of the basis set, functional, etc.) • The algorithms used (drift in Euler integration of a DE) • Numerical precision (round-off and quantization error) • Implementation (programming error)
Systematic ErrorThe Fix • Systematic errors are most easily isolated when several algorithms are applied: • to several different chemical systems, • on several different computers, • using several different compilers, • etc…
Statistical ErrorCharacterization • Characteristic normal distribution of values about the set average: • M is the number of independent data values
Statistical ErrorRelaxation Time and Statistical Inefficiency • Successive data values are well correlated, and not independent. • To find the effective M, we need to know the statistical inefficiency of the system.
Statistical ErrorRelaxation Time and Statistical Inefficiency • We begin by dividing our M sequential configurations into b blocks each containing nb values of the property A:
Statistical ErrorRelaxation Time and Statistical Inefficiency • The variance of the block averages is then given by: • Where Ai is the average for the ith block and Atotal is the average calculated only over those values covered in the blocks.
Statistical ErrorRelaxation Time and Statistical Inefficiency • For large nb, Ai become uncorrelated and: • Next, define the statistical inefficiency s: and, finally... so that
Statistical ErrorRelaxation Time and Statistical Inefficiency • We solve for s: • Where s can be visualized in two ways: • The factor by which the variance exceeds a naïve estimate (statistical inefficiency); or • The number of steps per block required to give uncorrelated block averages (relaxation time).
Statistical ErrorRelaxation Time and Statistical Inefficiency • In practice, s is calculated from a plot similar to the following:
Statistical ErrorRelaxation Time and Statistical Inefficiency • Care must be taken to avoid boundary effects:
Statistical ErrorApplication of Statistical Inefficiency: Sampling • Simulation is divided into blocks of size nb ≥ s • Blocks may be sampled in one of three ways: • Stratified systematic sampling • Stratified random sampling • Coarse graining • Coarse graining most commonly applied for scalar properties. Sampling applied otherwise.
Statistical ErrorSources • Arises from the finite nature of the simulation: • Finite number of atoms or molecules considered • Finite number of sequential values taken • Finite precision retained in intermediate values
Statistical ErrorThe Fix • Three main approaches: • Increase the number of atoms or molecules considered in the simulation; • Increase the duration of the simulation (number of samples taken); or • Reduce the statistical inefficiency of the algorithms used.