70 likes | 200 Views
Sampling and Variability (Chapter 5.1 - 5.4). Chengyuan Peng 92777A pcy@tcm.hut.fi. Purpose of Sampling. What is Data Population Problems with using all of the data The whole data not available Too much data Necessary to sample the data when building models Capture a Sample:
E N D
Sampling and Variability(Chapter 5.1 - 5.4) Chengyuan Peng 92777A pcy@tcm.hut.fi
Purpose of Sampling • What is Data Population • Problems with using all of the data • The whole data not available • Too much data • Necessary to sample the data when building models • Capture a Sample: • To represent only some part of the population
Variability of Variables • Main Feature of a Variable • Takes on a variety of values • Contains Pattern distribution • Numerical variables • Categorical variables • Graphical Display of a Pattern Distribution • Histogram, Curve • Problems • Convergence: True Population Distribution Pattern Unknown • Measuring Variability: Which Distribution Curve is the Right one to use ????
Converging • To Create a Distribution Curve for the Sample • Selecting instance values, one at a time at random • Recalculated when adding a new instance value • Converge • At first: a large change • After a while: settled down -> Converges to the Final shape • Summary • What is measured not the shape of the curve, but the Variability of the sample
Measuring Variability • Require Some Method of Measuring Variability • Without being sensitive to column width or smoothing method • What is Variability • How far the individual instances from the Mean of the sample • Standard Deviation --- One Popular Measure
Why Confidence • An alternative of sampling the whole population • To establish some acceptable degree of confidence, • 95% as a satisfactory level of confidence Variability of Numeric and Alpha Variables • Distinction • Alpha: for nominal / categorical; measured in nonnumeric scales • Numeric: measured in numeric scales • Differentwhen measuring variability
Measuring Variability of Numeric Variables • Covered above • Random sampling without introducing bias • Measuring Variability of Alpha Variables • Instead of standard deviation • Rate of Discovery (ROD): • Measure the rate of change of the relative proportion of values discovered • Sample size increases, the ROD of new alpha values falls