70 likes | 83 Views
Explore the purpose of sampling, the problems with using all data, and the variability of variables. Learn how to measure variability and establish confidence levels.
E N D
Sampling and Variability(Chapter 5.1 - 5.4) Chengyuan Peng 92777A pcy@tcm.hut.fi
Purpose of Sampling • What is Data Population • Problems with using all of the data • The whole data not available • Too much data • Necessary to sample the data when building models • Capture a Sample: • To represent only some part of the population
Variability of Variables • Main Feature of a Variable • Takes on a variety of values • Contains Pattern distribution • Numerical variables • Categorical variables • Graphical Display of a Pattern Distribution • Histogram, Curve • Problems • Convergence: True Population Distribution Pattern Unknown • Measuring Variability: Which Distribution Curve is the Right one to use ????
Converging • To Create a Distribution Curve for the Sample • Selecting instance values, one at a time at random • Recalculated when adding a new instance value • Converge • At first: a large change • After a while: settled down -> Converges to the Final shape • Summary • What is measured not the shape of the curve, but the Variability of the sample
Measuring Variability • Require Some Method of Measuring Variability • Without being sensitive to column width or smoothing method • What is Variability • How far the individual instances from the Mean of the sample • Standard Deviation --- One Popular Measure
Why Confidence • An alternative of sampling the whole population • To establish some acceptable degree of confidence, • 95% as a satisfactory level of confidence Variability of Numeric and Alpha Variables • Distinction • Alpha: for nominal / categorical; measured in nonnumeric scales • Numeric: measured in numeric scales • Differentwhen measuring variability
Measuring Variability of Numeric Variables • Covered above • Random sampling without introducing bias • Measuring Variability of Alpha Variables • Instead of standard deviation • Rate of Discovery (ROD): • Measure the rate of change of the relative proportion of values discovered • Sample size increases, the ROD of new alpha values falls