660 likes | 823 Views
Action Research Descriptive Statistics and Surveys. INFO 515 Glenn Booker. Reliability and Validity. A measure is reliable if it consistently gives the same answer A key to scientific measurement is the ability to repeat an experiment reliably
E N D
Action ResearchDescriptive Statistics and Surveys INFO 515 Glenn Booker Lecture #3
Reliability and Validity • A measure is reliable if it consistently gives the same answer • A key to scientific measurement is the ability to repeat an experiment reliably • A measure is valid if it actually measures the concept under investigation • It tests what you think it tests Lecture #3
Review Std Deviation and CV • Standard Deviation can be used to compare two (or more) groups that have the same units of measure and similar means • Coefficient of Variation can compare two (or more) groups, which have different reference points (means) and different standard deviations • See which groups are more closely distributed around their mean Lecture #3
Z Score • The Z Score is the ‘how weird am I’ measure for a given data point* • The standardized or ‘z’ score allows you to do either of the following: • Find where one or more individuals stand in reference to the mean of a single distribution on one unit of measure (one variable) • Where is an individual located relative to a distribution of test scores? • Am I better than average? If so, how much? * This is not an official ISO definition… Lecture #3
Z Score • Find where one or more individuals stand in reference to the mean of two (or more) different distributions that may have different units of measure • Where does an individual stand relative to two tests, each given in a different class (with different distributions)? • Did I do better on the midterm in philosophy than the one in geography? Lecture #3
Z Score • A z score tells you how far above or below the mean any given score is in standard deviation units • Z scores are most useful when the shape of your actual distribution of scores is nearly normal (see slide 9, or Action Research handout p. 11) • What’s the “normal” distribution? Lecture #3
Normal Distribution Example • Consider stopping a car at a traffic light • You don’t stop exactly the same place each time, but generally stop somewhere behind or near the big white line (I hope!) • Describing where you are likely to stop might be described by a “normal distribution” Lecture #3
Normal Distribution • The normal, or Gaussian, distribution is the classic “bell curve” which shows that most measurements are somewhere close to the mean, but a few measurements could range far above or below that mean • It is symmetric, and extends forever above and below the mean Lecture #3
Normal Distribution • The normal distribution is described by two math functions • The function f(x) is the probability density function, often called a PDF; it represents how likely the answer is to fall near the current value of x • The function F(x) is the cumulative probability function; it represents the total chance of getting the current value of x or anything less • A.k.a. a cumulative density function, or CDF Lecture #3
Normal Distribution ‘f(x)’ is the probability density function (the classic bell curve)‘F(x)’ is the cumulative probability function Lecture #3
a b Probability Density Function, f(x) • The chance you will stop (the event will occur) between any two distances ‘a’ and ‘b’ is the area under the curve f(x) between those two values Lecture #3
Probability Density Function, f(x) • Notice that f(x) is symmetric from left to right, and that it is defined for all possible values of x (x = negative infinity to x = positive infinity) • f(x) never reaches zero! • The total area under the curve f(x) is one • You will eventually stop somewhere • Unfortunately, f(x) is a messy function to integrate (find the area under it) Lecture #3
Cumulative Probability Function F(x) • Imagine you start at x equals minus infinity (x = -) • Then add up the area under f(x) from minus infinity to the current value of x • This is the cumulative probability function, F(x) • That’s why F(0) (F at x=0) is exactly 0.5 • Half of all events occur left of x=0, and half occur to the right of x=0 (symmetry) Lecture #3
Cumulative Probability Function F(x) • So to find the chance of getting a result between values ‘a’ and ‘b’ is also given by: Probability = F(b) - F(a) • An analogy might be • The number of babies born between 1940 (a) and 1990 (b) is equal to the total number of babies ever born by 1990 (F(b)), minus the total number of babies ever born by 1940 (F(a)) Lecture #3
Standard (Z) Scores • Back to Z scores, our motivation for discussing the normal distribution • Z Scores are standardized scores whose distribution has the following properties: • Retains the shape of the original scores, but • Has a mean of 0 and • Has a variance and standard deviation of 1 Lecture #3
Calculating Z scores • Compute “z” score by subtracting the mean from the raw score and dividing that result by the standard deviationz = (Xi - m) / s = (Score – Mean)/(Standard Dev) • The z score is not just associated with the normal distribution – it can be used with any kind of distribution Lecture #3
Interpreting Z Scores • The z score describes how many standard deviations a specific score is above or below the mean • A negative z score means that the score is below the mean • A positive z score is above the mean • A z score of zero (z=0) is equal to the mean Lecture #3
Z Score Example • I own 250 books -- I want to know how I compare to other college professors • Suppose that the mean number of books owned by college professors is 150 with a standard deviation of 50 • z = (250 - 150) / 50 = 2 • My z score is 2; meaning I have 2 standard deviations more books than average (‘cuz I’m a pack rat!) Lecture #3
Z Score Tables • Are used to determine the proportion of the area under the curve that lies between the mean and a given standard score (z) • These tables are prepared using integral calculus to save you time • They show only positive ‘z’ values, since the areas for negative ‘z’ are the same as for positive ‘z’ (thanks to symmetry) Lecture #3
Area between 0 and z(Col. B) Area beyond z(Col. C) Notice that we always haveCol. B + Col. C = 0.5000 z value(Col. A) Z Score Tables (Yonker p. 29-30) Lecture #3
Use of Z Score Tables • Z score tables can be used to find the chance of a measurement (or percentage of cases) occurring between any two z values • If the z scores are on opposite sides of the mean (one positive, one negative), add the areas from Column B for each score • If the z scores are on the same side of the mean (both positive, or both negative), subtract the areas from Column B • Subtract the larger area from the smaller area; otherwise you’d get negative area! Lecture #3
Use of Z Score Table Examples • Between z scores of -1.5 and +2.2, the percent of cases is, from Column B:z(-1.5) is the same area as z(+1.5)z(+1.5) = 0.4332 and z(+2.2) = 0.4861Percent = 43.32 + 48.61 = 91.93% • Between z scores of +1.5 and +2.2, the percent of cases is:Percent = 48.61 – 43.32 = 5.29% Lecture #3
0.13% 2.14% 13.59% 34.13% 34.13% 13.59% 2.14% 0.13% Cumulative Z Score Percentages shown are the total percent between the integer Z score values; between 0 and 1 has 34.13%, between 1 and 2 has 13.59%, etc. From p. 11 in Yonker Lecture #3
F(x) Values • For F(x) from minus 6 to plus 6, a distribution with mean =0 and standard deviation of 1.0 gives: Lecture #3
Cumulative Z Score • Key values are: • From z = -1 to +1, total area is 68.26% • From z = -1.96 to +1.96, total area is 95% • From z = -2 to +2, total area is 95.44% • From z = -2.57 to +2.57, total area is 99% • From z = -3 to +3, total area is 99.74% Lecture #3
Transformed z, or T scores • A.k.a. Standardized scores or “T” scores • Z scores are transformed artificially • Multiply a z score by the desired standard deviation s and add the desired mean m (e.g. 10 and 50) T = zs + m becomes T = 10*z + 50 • Examples • A z score of -1.5 would give a T score ofT = 10*(-1.5) + 50 = 35 • A z of +2.2 would give T = 10*(2.2)+50 = 72 Lecture #3
T scores • This is used in many fields of research, especially Psychology and Education (that’s where the “desired” mean and standard deviation values came from) • Benefits: gets rid of negative connotations of negative and zero scores • Only z scores below z = -5.0 would result in a negative T score (typically less than one data point in a million) Lecture #3
Level of Confidence • Since the normal distribution goes to positive and negative infinity, we need a way to limit the range of expected or likely values • Or any normal distribution could have any value some times • Define the Level of Confidence as the acceptable limits of predictable behavior • Typically use 95% for most applications, but 99% for medical research Lecture #3
Level of Confidence • Generally, we can say that the actual value of a parameter estimate is in the range of its mean + twice its standard error, with a 95% level of confidence • Use 1.96 instead of 2 for precise work • Thus the value of a parameter with mean of 6.2 and standard error of 1.9 lies between 2.4 (i.e., 6.2 – 2*1.9) and 10.0 (i.e., 6.2 + 2*1.9) with a 95% level of confidence Lecture #3
The “t” Statistic • The t-statistic is defined ast = (parameter estimate) / (standard error) • If |t| > 2, then the parameter estimate is significantly different from zero at the 95% level of confidencet = 6.2/1.9 = 3.26 • Hence because |3.26| > 2, this estimate is statistically significant • Also means the 95% confidence interval does not include zero Again, use 1.96 instead of 2 for precise work Lecture #3
The “t” Statistic • T = ‘t’???? No! • Notice that the T score is a completely different concept from the ‘t’ statistic • We’ll use the ‘t’ statistic to help judge SPSS output later in the course Lecture #3
Sampling Terms • Population = the entire realm of interest, everyone, all books, all publishers, all patrons, etc. • Sample = a subgroup or subset of the population • Accurate inference requires good samples • Use sample since often hard or impossible to measure the entire population Lecture #3
Sampling Terms • InferentialStatistics • Taking samples in order to infer unknown population parameters • Principle of Random Selection • A procedure by which each member of the population has an equally likely chance of being chosen as any other member • Representative of the population Lecture #3
Types of Samples • Probabilistic sample - sampling in which the probability of each element in the population being selected is known and can be specified • Each element has the same chance • Non-probabilistic sample – each probability not known a priori (in advance) • E.g. convenience samples, or available samples Lecture #3
Random Sampling Techniques • Simple Random • Stratified Random • Proportional • Disproportional • Cluster • Systematic Lecture #3
Simple Random Sample • Often can’t sample the entire user population • Must be a truly random sample, not just convenient • Can use random number table, or computer-generated pseudo-random numbers (Yonker, p. 31) to choose the sample Lecture #3
Stratified Random Sampling • Group customers into categories (strata); get simple random samples from each category (stratum). Can be very efficient method. • Can weigh each stratum equally (proportional s.s.) or unequally (disproportional s.s.) • For unequal weight, make fraction ~ standard deviation of stratum, and ~ 1/ square root (cost of sampling). F ~ s/sqrt(cost)where “sqrt” is “square root”, “~” is ‘proportional to’ Lecture #3
Proportional Stratified Random Sampling Major # in Population% in Population# in Sample Education 50 50% X 20 10 Soc./Beh. Sci. 30 30 6 Business 15 15 3 Sci./Tech 5 5 1 % = 50/100 X 100 Data taken from Carpenter and Vasu, (1978) Lecture #3
Cluster Sampling • Divide population into (geographic) clusters, then do simple random samples within each selected cluster • Try for representative clusters • Not as efficient as simple random sampling, but cheaper • Sometimes used for in person interviews Lecture #3
Cluster Sampling Example • Randomly select n (certain number of) census tracks • From randomly selected census tracks, randomly select n blocks • From randomly selected blocks, randomly select addresses • Interview the family--unit of study Lecture #3
Systematic Sampling • Calculate your sampling interval: Interval = Size of population / (Size of sample) • Select your first element at random from the sampling interval • Move ahead systematically by the sampling interval (e.g. every 10th customer) until you reach your desired sample size Lecture #3
Non-random Sampling Techniques • Quota • Accidental • Judgment Lecture #3
Non-random techniques • Quota sampling • Is economical • Is a non-random version of stratified sampling • Define desired characteristics in advance: gender, race, age, etc. • Example: Interview 20 females and 20 males over the age of 65 Lecture #3
Non-random techniques • Accidental sampling • Mall market studies, Internet surveys • Often requires a choice (by the interviewee) to be sampled • Judgment sampling • Pick people who have some special knowledge • Seek out experts – more of an interview method Lecture #3
What is a Survey Study (Assessment)? • To describe systematically the facts and characteristics of a given population or area of interest, factually and accurately. (Isacc and Michael) • Survey studies are used to: • Describe what is • Establish need • Identify problems • Infer possible solutions Lecture #3
Surveys • A survey often refers to a large data collection effort: • What it involves—personal interviews, telephone interviews, a questionnaire sent through the mail, document survey, literature survey, social area analysis (observation and description of different areas of the city) • “Who” it involves—community, customers, users, employees, literature • Purpose—information gathering and fact finding to Describe what exists (such as public library services) Establish need, Identify problems, Imply possible solutions Lecture #3
Customer Satisfaction Surveys • Could have many opportunities to conduct surveys • Customer call-back after x days • Customer complaints • Direct customer visits • Customer user groups • Conferences Lecture #3
Customer Satisfaction Surveys • Want representative sample of all customers • Three main methods are used • Personal interview • Telephone interview • Questionnaire by mail Lecture #3
Personal Interview • Advantages: 1. Explore complex issues 2. Question clarification 3. Rapport 4. Higher response rate 5. Observation Lecture #3
Personal Interview • Disadvantages: 1. Interviewer bias 2. Question uniformity 3. No anonymity 4. Difficult to analyze 5. Time consuming Lecture #3