1 / 7

Sampling and Variability (Chapter 5.1 - 5.4)

Sampling and Variability (Chapter 5.1 - 5.4). Chengyuan Peng 92777A pcy@tcm.hut.fi. Purpose of Sampling. What is Data Population Problems with using all of the data The whole data not available Too much data Necessary to sample the data when building models Capture a Sample:

dewey
Download Presentation

Sampling and Variability (Chapter 5.1 - 5.4)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sampling and Variability(Chapter 5.1 - 5.4) Chengyuan Peng 92777A pcy@tcm.hut.fi

  2. Purpose of Sampling • What is Data Population • Problems with using all of the data • The whole data not available • Too much data • Necessary to sample the data when building models • Capture a Sample: • To represent only some part of the population

  3. Variability of Variables • Main Feature of a Variable • Takes on a variety of values • Contains Pattern distribution • Numerical variables • Categorical variables • Graphical Display of a Pattern Distribution • Histogram, Curve • Problems • Convergence: True Population Distribution Pattern Unknown • Measuring Variability: Which Distribution Curve is the Right one to use ????

  4. Converging • To Create a Distribution Curve for the Sample • Selecting instance values, one at a time at random • Recalculated when adding a new instance value • Converge • At first: a large change • After a while: settled down -> Converges to the Final shape • Summary • What is measured not the shape of the curve, but the Variability of the sample

  5. Measuring Variability • Require Some Method of Measuring Variability • Without being sensitive to column width or smoothing method • What is Variability • How far the individual instances from the Mean of the sample • Standard Deviation --- One Popular Measure

  6. Why Confidence • An alternative of sampling the whole population • To establish some acceptable degree of confidence, • 95% as a satisfactory level of confidence Variability of Numeric and Alpha Variables • Distinction • Alpha: for nominal / categorical; measured in nonnumeric scales • Numeric: measured in numeric scales • Differentwhen measuring variability

  7. Measuring Variability of Numeric Variables • Covered above • Random sampling without introducing bias • Measuring Variability of Alpha Variables • Instead of standard deviation • Rate of Discovery (ROD): • Measure the rate of change of the relative proportion of values discovered • Sample size increases, the ROD of new alpha values falls

More Related