150 likes | 314 Views
STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!. Nature of the Data Two main types: categorical or continuous 1. Categorical: Nominal (unordered, unequal categories) E.g.: Female=1 and male=2
E N D
STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!
Nature of the Data • Two main types: categorical or continuous • 1. Categorical: • Nominal (unordered, unequal categories) • E.g.: Female=1 and male=2 • Ordinal (ordered unequal or ranked categories) • E.g.: 1=SD 2=D 3=N 4=A 5=SA • 2. Continuous: • Interval (ordered, equal intervals, no zero) • E.g.: 5-point Likert scale with equal intervals or IQ score • Ratio (ordered, equal intervals with absolute zero) • E.g.: raw scores, class attendance (in days); age (in years) • Descriptive statistics: Procedures used for summarizing the data in both numerical and graphic form. Includes, frequencies, distributions, percents, cumulative percents, pie charts, bar graphs (histograms) and scatter plots. • (Cross-tabulations: summarizes relationships between two variables like a scatter plot but in a table form.) • Measures of central tendency: • Mean: arithmetic average (interval & ratio data only) • Mode: most frequent; can be bimodal or multimodal (all types) • Median: mid point with equal half above and below; (ordinal, interval and ration) What do you mean by data??
Statistics 101!! • Statistics • Measures of location—mean vs. median and why • Measures of scale—range, interquartile range, standard deviation (and variance) • Measures of position—percentiles, deciles, quartiles, median Note. For categorical variables, we use proportions as the descriptive statistics
Why does lack of normality cause problems? When we calculate the p-value for an inference test, we find the probability that the sample was different due to sampling variability. Basically, we are trying to see if a recorded value occurred by chance and chance alone. When we look for a p-value, we are assuming that all samples of the given sample size are normally distributed around the mean. This is why the test statistic, which is the number of standard deviations away from the population mean the sample mean is, is able to be used. Therefore, without normality, no p-value can be found.
There are non-parametric tests which are similar to the parametric tests. The following table shows how some of the tests match up.
What is different about Non-Parametric Statistics? • Sometimes statisticians use what is called “ordinal” data. This data is obtained by taking the raw data and giving each sample a rank. These ranks are then used to create test statistics. • In parametric statistics, one deals with the median rather than the mean. Since a mean can be easily influenced by outliers or skewness, and we are not assuming normality, a mean no longer makes sense. The median is another judge of location, which makes more sense in a non-parametric test. The median is considered the center of a distribution.
Drawing a histogram..the good the bad and the downright ugly!!. Many modern introductory texts and confuse frequency graphs, relative frequency graphs, and histograms. Bad Good
Critical Values For a given number of degrees of freedom, by the property of the t-distribution, we know how large the t-statistic must be in order to reject the null. We call that number the “critical value” of the t-statistic and is typically determined by the values in a table of the t-statistic. If the value of the t-statistic calculated from the data is greater than this critical value, then we “reject the null hypothesis.” - This is because, for t-statistics greater than this critical value, our probability of falsely rejecting the null hypothesis is very small.
Example Suppose our null hypothesis is that X is less than 0. The sample mean is 3; The sample standard deviation is 2; There are 121 observations. Step 1. We need to establish our “critical value.” We wish to reject the null hypothesis if we are 95% certain that it is false. For 121 observations and a “one-tailed test,” the critical value is 1.66 (which we look up on the table. This corresponds to a significance level of .05 with 120 degrees of freedom). Step 2. The t-statistic = ( 3 – 0 ) / ( 2 / 121 ) 3 / .18 16.7. Step 3. Compare the t-statistic with the critical value. If the t-statistic is greater than the critical value, then you can reject the null hypothesis. In this case, 16.7 is greater than 1.66, so we can reject the null hypothesis that X is less than zero.
Example The table to the right is a sample “cross-tab” Your research hypothesis is that dog ownership and gender are related. How do you test this hypothesis?
Hypothesis Tests about tables Step 1. Define null and research hypotheses. The null hypothesis will usually be that there is no relationship between the rows and the columns. Step 2. Determine your tolerance for falsely rejecting the null hypothesis of no relationship. Step 3. Empirically analyse the data to determine if there is a relationship.
Example To calculate independence: 1) Identify the number of respondents in each internal cell of the table 2) Calculate the number of respondents who would be in each cell if independent (corresponds to the second number under each total) e.g. cell1,1 = .5 * .15 *1000 = 75 cell1,2 = .5 * .85 *1000 = 425 3) Compute the chi-squared test statistic (next slide)
The Chi-Square Test Statistic To calculate independence: 3) Compute the chi-squared test statistic The chi-squared test statistic is simply: 2 = rowscolumns (Observedrow,column - Expectedrow,column)2 Expectedrow,column The chi-squared statistic follows a chi-squared distribution with degrees of freedom = (rows – 1) (columns – 1).
Example If we look at our table of the 2 with 1 degrees of freedom, the critical value for our test statistic is 3.84. 2 = (100 - 75)2 / 75 +(400-425)2 / 425 + (50- 75)2 / 75 + (450-425)2 / 425 =19.6 In this case, we reject the null hypothesis that the two populations are statistically independent because our test-statistic is greater than our critical value.