370 likes | 390 Views
14. Descriptive Statistics. Introduction. Descriptive statistics the type of statistical analysis focused on describing, summarizing, or explaining a set of data Inferential statistics the type of statistical analysis focused on making inferences about populations based on sample data.
E N D
14 Descriptive Statistics
Introduction • Descriptive statistics • the type of statistical analysis focused on describing, summarizing, or explaining a set of data • Inferential statistics • the type of statistical analysis focused on making inferences about populations based on sample data
Descriptive Statistics • Start with a data set • Descriptive statistics to understand and summarize the key numerical characteristics of the data set • e.g., means, frequencies, graphs • Key question in descriptive statistics • how can I communicate the important characteristics of my data?
Frequency Distributions • A systematic arrangement of data values in which the unique data values are rank ordered and the frequencies are provided for each of these values
Frequency Distributions • Column 1 • lowest salary = $24,000 • highest salary = $41,000 • Column 2 (frequencies) • most frequently occurring salary = $32,500 • three of the 25 recent graduates had this starting salary • Column 3 (percentages) • 4% of the cases had a salary of $24,000 • 8% of the cases had a salary of $32,000
Graphic Representations of Data Bar graph graph that uses vertical bars to represent the data values of a categorical variable height of bar represents frequency of category bars should not touch
Bar Graph Example • Frequencies • 8 psychology majors • 10 philosophy majors • 7 business majors • Percentages • 32% psychology majors • 40% philosophy majors • 28% business majors
Graphic Representations of Data • Histogram • graph depicting frequencies and distribution of a quantitative variable used with quantitative variables • no space between bars • advantage over a frequency distribution • it more clearly shows the shape of the distribution
Graphic Representations of Data Line graphs a graph relying on the drawing of one or more lines connecting data points also used with quantitative variables particularly useful for interpreting interactions
Graphic Representations of Data • Scatterplots • a graphical depiction of the relationship between two quantitative variables
Measures of Central Tendency • Provide a single value that is typical of the distribution of scores • mode • most frequently occurring number in a data set • least useful measure of central tendency • example • 0, 2, 3, 4, 5, 5, 5, 7, 8, 8, 9, 10 mode = 5 • 0, 2, 3, 4, 5, 5, 5, 7, 8, 8, 8, 9, 10 mode = 5 and 8, bimodal data set • median • the center point in an ordered set of numbers • examples • odd number of data points 1, 2, 3, 4, 5, median = 3 • even number of data points 1, 2, 3, 4, median = 2.5
Measures of Central Tendency Mean arithmetic average most commonly used measure of central tendency formula sample statistical symbol = example
Measures of Variability Numerical value expressing how spread out or how much variation is present in the values of a quantitative variable if all data points the same, then there is no variability 4, 4, 4, 4, 4, 4, 4, 4, 4, 4 which data set more variable? Data for group one: 44, 45, 45, 45, 46, 46, 47, 47, 48, 49 Data for group two: 34, 37, 45, 51, 58, 60, 77, 88, 90, 98
Measures of Variability range highest score minus lowest score rarely used as a measure of variability variance and standard deviation both account for every score in the data set as each gets larger, the numbers in the data set are more spread out
Measures of Variability variance average deviation of the data values from their mean in squared units standard deviation square root of variance roughly the average amount that individual scores deviate from the mean
Measures of Variability • Standard deviation and the normal curve • standard deviation has greatest meaning when the distribution is normally distributed • normal distribution • a distribution that follows the 68, 95, 99.7 percent rule • rule stating percentage of cases falling within 1, 2, and 3 standard deviations from the mean on a normal distribution
Measures of Variability • Z scores • a score that has been transformed into standard deviation units • mean of z distribution is always zero; standard deviation always 1 • indicates how far above or below a raw score is from its mean in standard deviation units • e.g., a z score of +1.00 indicates a raw score that is one standard deviation unit above the mean • in a normal distribution, the proportion of scores occurring between any two points can be determined • scores on different distributions can be compared
Z Score Formula and Examples • Formula • Examples
Examining Relationships Among Variables • Unstandardized difference between means • a comparison of mean differences (DV) between levels of a categorical independent variable • example • college graduate data set • mean starting salary for males = $34,791.67 • mean starting salary for females = $31,269.23 • the unstandardized difference between these two means • $34,791.67 - $31,269.23 = $3,522.44 • Standardized difference between means • effect size indicator • index of magnitude or strength of a relationship or difference between means • Cohen’s d • the difference between two means in standard deviation units • a common measure of effect size • small, medium, and large effect sizes are indicated by values of at least .2, .5, and .8 respectively
Cohen’s d Example • College student data set • gender is the categorical independent or predictor variable • starting salary is the quantitative dependent variable • mean starting salary for males = $34,791.67 • mean starting salary for females = $31,269.23 • standard deviation for females = $4,008.40 • this says that the mean starting salary for men is .88 standard deviations above the mean for females • using Cohen’s criteria for interpretation, one would consider this a “large” difference between the means
Examining Relationships Among Variables • Correlation coefficient • index indicating the strength and direction of linear relationship between two quantitative variables • value ranges from +1.0 to -1.0 • absolute value indicates strength of relationship • sign indicates direction
Examining Relationships Among Variables • Correlation coefficient • positive correlation • correlation in which values of two variables tend to move in the same direction • e.g., the more hours students spend studying for a test, the higher their test grades tend to be • negative correlation • correlation in which values of two variables tend to move in opposite directions • e.g., the more hours students spend partying the night before an exam, the lower their test grades tend to be
Examining Relationships Among Variables • Correlation coefficient • Pearson correlation (r) • used with two quantitative variables • only appropriate if data is related in a linear fashion • partial correlation • a technique that involves examining correlation after controlling for one or more variables • a scatterplot can be used to judge the strength and direction of a correlation
Regression Analysis • Use of one or more quantitative independent variables to explain or predict the values of a single quantitative dependent variable • Two main types • simple regression • involves the use of one independent or predictor variable • multiple regression • involves two or more independent or predictor variables
Regression Analysis Prediction is made using the regression equation this equation defines the regression line that best fits the pattern of observations in your data slope – how steep is the line y-intercept – point where regression line crosses y-axis
Regression Analysis regression coefficient predicted change in the dependent variable (Y) given a one unit change in the independent variable (X) partial regression coefficient the regression coefficient in a multiple regression equation
Contingency Tables Table used to examine relationship between two categorical variables Cells may contain frequencies or percentages