821 likes | 2.43k Views
The Five Basic Words. Population All the members of a group about which you want to draw a conclusionSample The part of the population selected for analysisParameter A numerical measure that describes a characteristic of a populationStatistic A numerical measure that describes a character
E N D
1. Fundamentals of Biostatistics Luis Maldonado, MD, MPH
2. The Five Basic Words Population – All the members of a group about which you want to draw a conclusion
Sample – The part of the population selected for analysis
Parameter – A numerical measure that describes a characteristic of a population
Statistic – A numerical measure that describes a characteristic of a sample
Variable – A characteristic of an item or an individual that will be analyzed using statistics
3. The Branches of Biostatistics Descriptive statistics - focuses on collecting, summarizing, and presenting a set of data
Inferential statistics – focuses on analyzing sample data to draw conclusions about the population
4. Descriptive Statistics Measures of Central Tendency
The Mean
The Median
The Mode
5. The Mean Easily calculated by adding all the observed data values and dividing by the total sample size of the group
If an individual observation = x
And the total number of observations = n
Mean = (x1+x2+…+xn) divided by n
6. The Mean Also known as the ‘arithmetic average’
The one main weakness of the mean: individual extreme values (also known as ‘outliers’) can distort its ability to represent the typical value of a variable
7. The Median The middle value in a set of data values for a variable when the data values have been ordered from lowest to highest
When the number of values is even the median is calculated by taking the mean of the values closest to the middle
8. The Median To calculate the median: add 1 to the number of data values and divide that by 2
Example: if there are 7 sample values, divide (7+1) by 2 to get 4: the median is the 4th ranked value when all values are ranked from lowest to highest
If there is an even number of values; lets say 8, divide (8+1) by 2 to get 4.5; the median is the mean of the 4th and 5th ranked values when all values are ranked from lowest to highest
9. The Median Extreme values do NOT affect the median, making the median a good alternative to the mean to measure central tendencies when such values occur
10. The Mode The value (or values) in a set of data values for a variable that appears most frequently
Similar to the median, extreme values do not affect the mode, however, the mode can vary much more from sample to sample than the median or mean
11. Shape of Distributions
12. Skews Skewness is a parameter that describes asymmetry in a distribution
Symmetrical shape or no skew indicates a set of data values in which the mean, median, and mode are equal
Left-skewed (also known as negative skew) indicates a set of data values in which the mean is less than the median
Right-skewed (also known as positive skew) indicates a set of data values in which the mean is greater than the median
13. Measure of Variation Variation is the amount of dispersion, or ‘spread’ in the data
The Range
The Variance
The Standard Deviation
14. The Range The range is the difference between the largest and smallest data values in a set of values for a variable
Range = Largest value – Smallest value
Represents the largest possible difference between any 2 values in a set of data values for a variable
15. The Range Is not a stable estimate; as sample size increases, the range also tends to increase
Is not amenable to statistical testing
Since the range is derived from the most extreme values, a sample may have a large range even when the majority of observations are fairly close in value
16. The Variance and the Standard Deviation Provide a summary of the spread or dispersion of individual observations around the mean
The Standard Deviation is equal to the square root of the Variance
17. Variance First, calculate the difference between each observation (x) and mean (?)
Then, the differences are squared (x-?)2, so that negative and positive deviations will not cancel each other out
The resultant quantities are added together (S[x-?]2)
And the sum is divided by the total number of observations minus 1 (S[x-?]2)/(n-1)
18. Standard Deviation For sets of data values that are approximately normal in distribution, 68% of the individual values will lie within one standard deviation of the mean, 95% will fall within two standard deviations, and 99% are within three standard deviations
± 1 standard deviations = 68%
± 2 standard deviations = 95%
± 3 standard deviations = 99%
19. Standard Deviations