830 likes | 862 Views
Biotechnology Laboratory Technician Program. Course: Basic Biotechnology Laboratory Skills for a Regulated Workplace Lisa Seidman, Ph.D. Ph.D. STATISTICS. A BRIEF INTRODUCTION. WHY LEARN ABOUT STATISTICS?. Statistics provides tools that are used in Quality control Research
E N D
Biotechnology Laboratory Technician Program Course: Basic Biotechnology Laboratory Skills for a Regulated Workplace Lisa Seidman, Ph.D. Ph.D.
STATISTICS A BRIEF INTRODUCTION
WHY LEARN ABOUT STATISTICS? • Statistics provides tools that are used in • Quality control • Research • Measurements • Sports
IN THIS COURSE • We will use some of these tools • Ideas • Vocabulary • A few calculations
VARIATION • There is variation in the natural world • People vary • Measurements vary • Plants vary • Weather varies
Variation among organisms is the basis of natural selection and evolution
EXAMPLE • 100 people take a drug and 75 of them get better • 100 people don’t take the drug but 68 get better without it • Did the drug help?
VARIABILITY IS A PROBLEM • There is variation in response to the illness • There is variation in response to the drug • So it’s difficult to figure out if the drug helped
STATISTICS • Provides mathematical tools to help arrive at meaningful conclusions in the presence of variability
Might help researchers decide if a drug is helpful or not • This is a more advanced application of statistics than we will get into
DESCRIPTIVE STATISTICS • Chapter 16 in your textbook • Descriptive statistics is one area within statistics
DESCRIPTIVE STATISTICS • Provides tools to DESCRIBE, organize and interpret variability in our observations of the natural world
DEFINITIONS • Population: • Entire group of events, objects, results, or individuals, all of whom share some unifying characteristic
POPULATIONS • Examples: • All of a person’s red blood cells • All the enzyme molecules in a test tube • All the college students in the U.S.
SAMPLE • Sample: Portion of the whole population that represents the whole population
Example: It is virtually impossible to measure the level of hemoglobin in every cell of a patient • Rather, take a sample of the patient’s blood and measure the hemoglobin level
MORE ABOUT SAMPLES • Representative sample: sample that truly represents the variability in the population -- good sample
TWO VOCABULARY WORDS • A sample is random if all members of the population have an equal chance of being drawn • A sample is independent if the choice of one member does not influence the choice of another • Samples need to be taken randomly and independently in order to be representative
SAMPLING • How we take a sample is critical and often complex • If sample is not taken correctly, it will not be representative
EXAMPLE • How would you sample a field of corn?
VARIABLES • Variables: • Characteristics of a population (or a sample) that can be observed or measured • Called variables because they can vary among individuals
VARIABLES • Examples: • Blood hemoglobin levels • Activity of enzymes • Test scores of students
A population or sample can have many variables that can be studied • Example • Same population of six year old children can be studied for • Height • Shoe size • Reading level • Etc.
DATA • Data: Observations of a variable (singular is datum) • May or may not be numerical • Examples: • Heights of all the children in a sample (numerical) • Lengths of insects (numerical) • Pictures of mouse kidney cells (not numerical)
ALWAYS UNCERTAINTY • Even if you take a sample correctly, there is uncertainty when you use a sample to represent the whole population • Various samples from the same population are unlikely to be identical • So, need to be careful about drawing conclusions about a population, based on a sample – there is always some uncertainty
SAMPLE SIZE • If a sample is drawn correctly, then, the larger the sample, the more likely it is to accurately reflect the entire population • If it is not done correctly, then a bigger sample may not be any better • How does this apply to the corn field?
INFERENTIAL STATISTICS • Another branch of statistics • Won’t talk about it much • Deals with tools to handle the uncertainty of using a sample to represent a population
EXAMPLE PROBLEM • In a quality control setting, 15 vials of product from a batch are tested. What is the sample? What is the population? • In an experiment, the effect of a carcinogenic compound was tested on 2000 lab rats. What is the sample? What is the population?
A clinical study of a new drug was tested on fifty patients. What is the sample? What is the population?
ANSWERS • 15 vials, the sample, were tested for QC. The population is all the vials in the batch. • The sample is the rats that were tested. The population is probably all lab rats. • The sample is the 50 patients tested in the trial. The population is all patients with the same condition.
EXAMPLE PROBLEM • An advertisement says that 2 out of 3 doctors recommend Brand X. • What is the sample? What is the population? • Is the sample representative? • Does this statement ensure that Brand X is better than competitors?
ANSWER • Many abuses of statistics relate to poor sampling. The population of interest is all doctors. No way to know what the sample is. The sample could have included only relatives of employees at Brand X headquarters, or only doctors in a certain area. Therefore the statement does not ensure that the majority of doctors recommend Brand X. It certainly does not ensure that Brand X is best.
DESCRIBING DATA SETS • Draw a sample from a population • Measure values for a particular variable • Result is a data set
DATA SETS • Individuals vary, therefore the data set has variation • Data without organization is like letters that aren’t arranged into words
Numerical data can be arranged in ways that are meaningful – or that are confusing or deceptive
DESCRIPTIVE STATISTICS • Provides tools to organize, summarize, and describe data in meaningful ways • Example: • Exam scores for a class is the data set • What is the variable of interest? • Can summarize with the class “average”, what does this tell you?
A measure that describes a data set, such as the average, is sometimes called a “statistic” • Average gives information about the center of the data
MEDIAN AND MODE • Two other statistics that give information about the center of a set of data • Median is the middle value • Mode is most frequent value
MEASURES OF CENTRAL TENDENCY • Measures that describe the center of a data set are called: Measures of Central Tendency • Mean, median, and the mode
HYPOTHETICAL DATA SET 2 5 6 7 8 3 9 3 10 4 7 4 6 11 9 Simplest way to organize them is to put in order: 2 3 3 4 4 5 6 6 7 7 8 9 9 10 11 By inspection they center around 6 or 7
MEAN • Mean is basically the same as the average • Add all the numbers together and divide by number of values 2 3 3 4 4 5 6 6 7 7 8 9 9 10 11 What is the mean for this data set?
NOMENCLATURE • Mean = 6.3 = read “X bar” • The observations are called X1, X2, etc. • There are 15 observations in this example, so the last one is X15 Mean = Xi n Where n = number of values
EXAMPLE • Data set 2 3 3 4 5 6 7 8 9 What is the mode? What is the median?
MEAN OF A POPULATION VERSUS THE MEAN OF A SAMPLE • Statisticians distinguish between the mean of a sample and the mean of a population • The sample mean is • The population mean is μ • It is rare to know the population mean, so the sample mean is used to represent it
DISPERSION • Data sets A and B both have the same average: A4 5 5 5 6 6 B1 2 4 7 8 9 • But are not the same: • A is more clumped around the center of the central value • B is more dispersed, or spread out
MEASURES OF DISPERSION • Measures of central tendency do not describe how dispersed a data set is • Measures of dispersion do; they describe how much the values in a data set vary from one another
MEASURES OF DISPERSION • Common measures of dispersion are: • Range • Variance • Standard deviation • Coefficient of variation
CALCULATIONS OF DISPERSION • Measures of dispersion, like measures of central tendency, are calculated • Range is the difference between the lowest and highest values in a data set
Example: 2 3 3 4 4 5 6 6 7 7 8 9 9 10 11 • Range: 11-2 = 9 or, 2 to 11 • Range is not particularly informative because it is based only on two values from the data set