650 likes | 934 Views
CHAPTER 2. Populations and Samples. Graduate school approach to problem solving. OUTLINE. 2.1 Selecting Appropriate Samples Explains why the selection of an appropriate sample has an important bearing on the reliability of inferences about a population 2.2 Why Sample?
E N D
CHAPTER 2 Populations and Samples
OUTLINE 2.1 Selecting Appropriate Samples Explains why the selection of an appropriate sample has an important bearing on the reliability of inferences about a population 2.2 Why Sample? Gives a number of reasons sampling is often preferable to census taking 2.3 How Samples are Selected Explains how samples are selected 2.4 How to Select a Random Sample Illustrates with a specific example the method of selecting a random sample using a computer statistical package 2.5 Effectiveness of a Random Sample Demonstrates the credibility of the random sampling process 2.6 Missing and incomplete Data Explains the problem of missing or incomplete data and offers suggestions on how to minimize this problem
LEARNING OBJECTIVES 1.Distinguish between a.populations and samples b.parameters and statistics c.various methods of sampling 2.Explain why the method of sampling is important 3.State why samples are used 4.Define random sample 5.Explain why it is important to use random sampling 6.Select a random sample using a computer statistical program 7.Suggest methods for dealing with missing data
SELECTING APPROPRIATE SAMPLES A. Population – a set of persons (or objects) having a common observable characteristic B. Sample – a subset of a population C.The WAY a sample is selected is more important than the size of the sample D.An appropriate sample should be representative of the population E.A set of observations may be summarized by a descriptive statistic called a parameter
SELECTING APPROPRIATE SAMPLES F.Random sample 1.Every subject has an equal opportunity for being selected 2.Technique most likely to yield a representative sample 3.Obstacles a.Response rate – how many will respond b. Sampling bias – some segment of the population may be over or under represented c.May be too costly
WHY SAMPLE? A.Random sampling - Each subject in the population has an equal chance of being selected 1.Avoids known and unknown biases on average 2.Helps convince others that the trial was conducted properly 3.Basis for statistical theory that underlies hypothesis tests and confidence intervals B.Convenience samples 1.selected at will or in a particular program 2.seldom representative of the underlying population 3.used when random samples are virtually impossible to select
WHY SAMPLE? C.Systematic sampling 1.used when a sampling frame – a complete, nonoverlapping list of the persons or objects constituting the population is available 2.randomly select a first case then proceed by selecting every case D. Stratified sampling – used when we wish the sample to represent the various strata (subgroups) of the population proportionately or to increase the precision of the estimate E.Cluster sampling 1.select a simple random sample (number of city blocks) 2.More economical than random selection of persons throughout the city
HOW TO SELECT A RANDOM SAMPLE • Random Numbers Table: Appendix E, pg. 335 • Computer statistical package SPSS
EFFECTIVENESS OF A RANDOM SAMPLE • A.Reliability is usually demonstrated by • 1.defining fairly small population • 2.selecting from it all conceivable samples of a particular size • 3.mean average is computed • 4.the variation for the population is observed • 5.a comparison of these sample means (statistics) with the population mean (population) neatly demonstrates the credibility of the sampling scheme
MISSING AND INCOMPLETE DATA A.Bias may be introduced because of possible differences between respondents and nonrespondents B. Limits the ability to accurately draw inferences about the population C.Subjects may drop out of the study D.Ways to deal with missing data 1.Last observation carry-forward – take the last observed value prior to dropout and treat them as final data
Understanding and Reducing Errors • Goals of Data Collection and Analysis • Promoting accuracy and precision • Reducing differential and nondifferential errors • Reducing intraobserver and interobserver variablity • Accuracy and Usefulness • False-positive and false-negative results • Sensitivity and specificity • Predictive values • Likelihood rations, odds ratios, and cutoff ratios • Receiver operating characteristic (ROC) curves • Measuring Agreement • Overall percentage agreement • Kappa test ratio
Promoting Precision and Accuracy • Accuracy: The ability of a measurement to be correct on the average. • Precision: the ability of a measurement to give the same result or a very similar result with repetition of the test. (reproducibility, reliability)
Accurate and precise True Value
Precise only True Value
Accurate only True Value
Neither Accurate nor Precise True Value
Differential and nondifferential error • Bias is a differential error • A nonrandom, systematic, or consistent error in which the values tend to be inaccurate in a particular direction. • Nondifferential are random errors
Bias • Three most problematic forms of bias in medicine: • 1. Selection (Sampling) Bias: The following are biases that distort results because of the selection process • Admission rate (Berkson’s) bias • Distortions in risk ratios occur as a result of different hospital admission rate among cases with the risk factor, cases without the risk factor, and controls with the risk factor –causing greatly different risk-factor probabilities to interfere with the outcome of interest. • Nonresponse bias • i.e. noncompliance of people who have scheduled interviews in their home. • Lead time bias • A time differential between diagnosis and treatment among sample subjects may result in erroneous attribution of higher survival rates to superior treatment rather than early detection.
Bias • Three most problematic forms of bias in medicine: • 1. Selection (Sampling) Bias • Admission rate (Berkson’s) bias • Nonresponse bias • Lead time bias • 2. Information (misclassification) Bias • Recall bias • Differentials in memory capabilities of sample subjects • Interview bias • “blinding of interviewers to diseased and control subjects is often difficult. • Unacceptability bias • Patients reply with “desirable” answers
Bias • Three most problematic forms of bias in medicine: • 1. Selection (Sampling) Bias • Admission rate (Berkson’s) bias • Nonresponse bias • Lead time bias • 2. Information (misclassification) Bias • Recall bias • Interview bias • Unacceptability bias • 3. Confounding • A confounding variable has a relationship with both the dependent and independent variables that masks or potentiates the effect of the variable on the study.
Types of Variation • Discrete variables • Nominal variables • Dichotomous (Binary) variables • Ordinal (Ranked) variables • Continuous (Dimensional) variables • Ratio variables • Risks and Proportions as variables
Types of Variation • Nominal variables
Nominal A Social Security Number O 123 45 6789 312 65 8432 555 44 7777 Blood Type B AB
Types of Variation • Nominal variables • Dichotomous (Binary) variables
Dichotomous (Binary) variables WNL Not WNL Normal Abnormal Accept Reject
Types of Variation • Nominal variables • Dichotomous (Binary) variables • Ordinal (Ranked) variables
Ordinal (Ranked) variables Strongly agree, agree, neutral, disagree, strongly disagree a b c d e 1 2 3 4 5
Types of Variation • Nominal variables • Dichotomous (Binary) variables • Discrete variables • Ordinal (Ranked) variables • Continuous (Dimensional) variables
Continuous (Dimensional) variables Temperature 32° F Height Blood Pressure Weight
Types of Variation • Nominal variables • Dichotomous (Binary) variables • Discrete variables • Ordinal (Ranked) variables • Continuous (Dimensional) variables • Ratio variables
Ratio variables • A continuous scale that has a true zero point
Types of Variation • Nominal variables • Dichotomous (Binary) variables • Discrete variables • Ordinal (Ranked) variables • Continuous (Dimensional) variables • Ratio variables • Risks and Proportions as variables
Risks and Proportions as variables • Variables created by the ratio of discrete counts in the numerator to counts in the denominator.
CHAPTER 3 Organizing and Displaying Data
OUTLINE 3.1 CLASSIFYING AND ORGANIZING DATA Explains and illustrates numerical scales and distinguishes among qualitative data, discrete quantitative data, and continuous qualitative data 3.2 FIGURES, TABLES, AND GRAPHS Gives brief overview of each 3.3 CREATING TABLES Gives instructions on how to organize data in the form of a frequency table 3.4 GRAPHING DATA Discussing and illustrating various methods of graphing with an emphasis on those that apply specifically to frequency distributions
LEARNING OBJECTIVES 1.Distinguish between a.qualitative and quantitative variables b.discrete and continuous variables c.symmetrical, bimodal, and skewed distributions d.positively and negatively skewed distributions 2.Construct and interpret a frequency table that includes class intervals, class frequency, valid percent, and cumulative percent 3.Indicate the appropriate types of graphs for displaying quantitative and qualitative data 4. Distinguish which forms of data presentation are appropriate for different situations
CLASSIFYING AND ORGANIZING DATA • A.General Data Organization/Presentation Methods • 1.Tables • 2.Graphs • 3.Numerical Techniques • B.Common Scales used to Measure Data • 1.Qualitative Data –variables that yield nominal level data • a.Nominal – primarily used for grouping or categorizing data • b.Ordinal – ordered series of relationships • 2.Quantitative Data – numerically measured variables • a.Interval – the number zero is an artificial 0, i.e. temperature • b.Ratio - the number zero is true or absolute, total absence of the characteristic being measured, i.e. $ in your wallet
CLASSIFYING AND ORGANIZING DATA • C.Discrete Quantitative Variables • 1.discontinuous variables • 2.must always be integers – whole numbers • D.Continuous Quantitative Variables • 1. may take fractional values • 2.Examples • a.age • b.height • c.weight
CLASSIFYING AND ORGANIZING DATA • E.Spreadsheet Data Hints • 1.Verify the accuracy of manually input data • 2.For nominal or ordinal data – change the computer default decimal setting to zero decimal places • 3.Subject ID numbers • a.usually use the first column • b.set the decimal number to zero
FIGURES, TABLES, AND GRAPHS As defined by Publication Manual of the American Psychological Association (APA), Fifth Edition
FIGURES, TABLES, AND GRAPHS • A.FIGURES • 1. any type of illustration other than a table • 2.examples • a.charts • b.graphs • c.photographs • d.drawing • B.GRAPH - one particular type of figure • C.TABLE – typically used to display quantitative data • D.Primary Purpose of Graphs & Tables To visually display information in a manner that makes it easy for readers to comprehend
FREQUENCY TABLES • A.Frequency – refers to the number of cases with a particular value • B.Percent • 1.Valid Percent – percentage out of 100, using only those subjects with data • 2.Cumulative Percent – percentage of all previous cases plus the current interval • C. Class Intervals – usually equal in length thereby aiding the comparisons between two intervals • D.Interval Width – the number of units between the upper and lower limits or, class limits • E.Range – difference between the highest and lowest numbers • F.Class Boundaries – true limits, points that demarcate the true upper limit of one class and true lower limit of the next
GRAPHING DATA • A.Must be self-explanatory • 1.descriptive title • 2.Labeled axes • 3. Indication of units observation
GRAPHING DATA • B.Histograms • 1.pictorial representation of the frequency table • 2.Components • a.Abscissa • i. Horizontal axis which depicts the class boundaries (no limits) • b.Perpendicular Ordinate • i.vertical axis which depicts the frequency (or relative frequency) of observations • ii.Should begin at zero • c. Height of the vertical scale should be three-fourths the length of the vertical scale
GRAPHING DATA • C.Frequency Polygons • 1.Construction • a.uses the same axes as the histogram • b.constructed by marking a point (at same height as the histogram’s bar) at the midpoint of the class interval • c.These points are then connected • 2.Superior to histograms for comparing two frequency distributions • 3.Shapes • a.Symmetrical Distribution – Bell-Shaped • b.Bimodal Distribution – two peaks • c.Rectangular Distribution – each class interval is equally represented
GRAPHING DATA • D.Cumulative Frequency Polygons • 1.Also called Ogive • 2.Horizontal scale – same as histograph • 3.Vertical scale indicates cumulative or relative cumulative frequency • 4.Construction • a.place a point at the upper class boundary of each class interval • b.Each point represents the cumulative relative frequency for that class • c.Points should then be connected • 5.Percentiles – may be obtained from the ogive
GRAPHING DATA • E.Stem-and-Leaf Displays • 1.Innovative technique of summarizing data that utilizes characteristics of the frequency distribution of the histogram • 2.Stems – represent the class intervals • 3.Leaves – strings of values within each class interval
GRAPHING DATA • F.Bar Charts • 1.Particularly useful for displaying nominal or ordinal data • 2.Relative frequencies are shown by heights • 3.Scale on the vertical axis should begin at zero • G.Pie Charts • 1.A common device for displaying data arranged in categories • 2.Useful for conveying data that consists of a small number of categories
GRAPHING DATA • H.Box-and-Whisker Plots • 1.Uses median and quartile statistics to graphically examine data • 2.Median – the score that divides a ranked series into two equal halves • 3.Mean – the average of the two middle scores if there are an equal number of scores • 4.Quartiles • a.locate the median in the ordered list of observations • -1st quartile is the median of the observations below this median • -3rd quartile is the median of the observations above the original median