230 likes | 345 Views
Data Analysis. A Few Necessary Terms. Categorical Variable : Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable : Measurements along a continuum, such as Flow Velocity. What type of variable would “Mottled Sculpin /meter 2” be?
E N D
A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements along a continuum, such as Flow Velocity What type of variable would “Mottled Sculpin /meter2” be? What type of variable is “Substrate Type”? What type of variable is “% of bank that is undercut”?
A Few Necessary Terms Explanatory Variable: Independent variable. On x-axis. The variable you use as a predictor. Response Variable: Dependent variable. On y-axis. The variable that is hypothesized to depend on/be predicted by the explanatory variable.
Statistical Tests: Appropriate Use For our data, the response variable will always be continuous. T-test: A categorical explanatory variable with 2 options. ANOVA: A categorical explanatory variable with >2 options. Regression: A continuous explanatory variable
Statistical Tests Hypothesis Testing: In statistics, we are always testing a Null Hypothesis (Ho) against an alternate hypothesis (Ha). Test Statistic: p-value:The probability of observing our data or more extreme data assuming the null hypothesis is correct Statistical Significance: We reject the null hypothesis if the p-value is below a set value, usually 0.05.
Student’s T-Test Tests the statistical significance of the difference between means from two independent samples
Compares the means of 2 samples of a categorical variable Mottled Sculpin/m2 Cross Plains Salmo Pond
Precautions and Limitations • Meet Assumptions • Observations from data with a normal distribution (histogram) • Samples are independent • Assumed equal variance (boxplot) • No other sample biases • Interpreting the p-value
Analysis of Variance (ANOVA) Tests the statistical significance of the difference between means from two or more independent samples Grand Mean Mottled Sculpin/m2 Riffle Pool Run ANOVA website
Precautions and Limitations • Meet Assumptions • Observations from data with a normal distribution • Samples are independent • Assumed equal variance • No other sample biases • Interpreting the p-value • Pairwise T-tests to follow
Simple Linear Regression • What is it? Least squares line • When is it appropriate to use? • Assumptions? • What does the p-value mean? The R-value? • How to do it in excel
Simple Linear Regression Tests the statistical significance of a relationship between two continuous variables, Explanatory and Response
Precautions and Limitations • Meet Assumptions • Observations from data with a normal distribution • Samples are independent • Assumed equal variance • Relationship is linear • No other sample biases • Interpret the p-value and R-squared value.
Residual Plots Residuals are the distances from observed points to the best-fit line Residuals always sum to zero Regression chooses the best-fit line to minimize the sum of square-residuals. It is called the Least Squares Line.
Residual vs. Fitted Value Plots Observed Values (Points) Model Values (Line)
0 Residual Plots Can Help Test Assumptions 0 “Normal” Scatter Curve (linearity) Fan Shape: Unequal Variance 0
R-Squared and P-value High R-Squared Low p-value (significant relationship)
R-Squared and P-value Low R-Squared Low p-value (significant relationship)
R-Squared and P-value High R-Squared High p-value (NO significant relationship)
R-Squared and P-value Low R-Squared High p-value (No significant relationship)
P-value indicates the strength of the relationship between the two variables You can think of this as a measure of predictability R-Squared indicates how much variance is explained by the explanatory variable. If this is low, other variables likely play a role. If this is high, it DOES NOT INDICATE A SIGNIFICANT RELATIONSHIP!