290 likes | 385 Views
Math 15 Introduction to Scientific Data Analysis. Lecture 5 Association Statistics & Regression Analysis University of California, Merced. Course Lecture Schedule. Quiz Next Week!. Project #1 – Due March 31 st , 2008.
E N D
Math 15Introduction to Scientific Data Analysis Lecture 5 Association Statistics & Regression Analysis University of California, Merced
Course Lecture Schedule • Quiz Next Week!
Project #1 – Due March 31st, 2008 • Projects can be performed individually or in groups of three, with following rules: • Teams turn in one project report and get the same grade. • A team consists of at most 3 people—no copying between teams! • Team project report must include a title page, where a team describe each team member’s contribution. • 10% bonus for projects done individually • Individual projects must not be copied from anyone else • No late project will be accepted! Project #1 will be posted at UCMCROP by Next Monday! UC Merced
Mode Average Review:Measures of dispersion or variability • Variance or Standard Deviation • The one on the left is more dispersed than the one on the right. It has a higher variance or standard deviation. UC Merced
s = 4.5 s (standard Deviation)= 23 mg 35.49 ml 446 Average Which is more precise measurement? • Although the standard deviation is a good measure of the precision of a given set of data, it can be difficult to compare the standard deviation from two different types of measurements directly. • You might need to do such a comparison to determine the largest source of uncertainty in an experimentally determined answer UC Merced
Get the Right Tool for the Job! UC Merced
s = 4.5 s = 23 mg 35.49 ml 446 Average Measures of dispersion or variability RSD = 100x(4.5/35.49) = 12.7 • One way to do this comparison • A relative standard deviation, RSD, is simply the ratio of the standard deviation over the mean RSD = 100x(23/446) = 5.2 UC Merced
Any Questions? UC Merced
Common Practice for Data Analysis • A common task in data analysis is to investigate an association between two variables. • To see if two variables vary together • To see how one variable affect another. Correlation Regression UC Merced
Correlation • A correlation tells us whether the two variables vary together. • i.e. as one goes up the other goes up (or goes down) Correlation Coefficient (Pearson product-moment correlation coefficient or Pearson’s r) UC Merced
Correlation Coefficient • Vary from +1 (perfect correlation) through 0 (no correlation) to -1 (perfect negative correlation) UC Merced
Correlation Coefficient – cont. • Always draw a diagram to check • There are no OUTLIERS. If there are outliers, the following may not apply. • The relation is not curved (r only refers to LINEAR correlation) UC Merced
Excel Function – Correlation Coefficient • = CORREL(array1,array2) or • = PEARSON(array1,array2) Lengths of a leg bone (in cm) in penguin mating pairs Positive Correlation UC Merced
Ice cream sales vs. number of people who drown at sea Correlation Coefficient 0.927 UC Merced
Wait! What kinds of conclusion can we make from the correlation relationship? UC Merced
Examples Not Good Ones! • Ice cream sales correlate with the number of people who drown at sea. • Therefore, ice cream causes people to drown. • Since the 1950s, both the atmospheric CO2 level and crime levels have increased sharply. • Hence, atmospheric CO2 causes crime. UC Merced
Ice cream sales vs. number of people who drown at sea Correlation Coefficient 0.927 UC Merced
Correlation does not imply causation • There can be no conclusion made regarding the existence or the direction of a cause and effect relationship only from the fact that A is correlated with B. • Correlation Coefficient only tells you whether the two variables vary together. • Determining whether there is an actual cause and effect relationship requires further investigation, even when the relationship between A and B is statistically significant, a large effect size is observed, or a large part of the variance is explained. UC Merced
Any Questions? UC Merced
Regression • Regression is used when we have some reasons to believe that changes in one variablecause changes in the other. • Correlation coefficient is not evidence for a causal relationship. • The simplest kind of causal relationship is a straight-line (or linear) relationship. Linear regression UC Merced
Linear regression • Linear regression assumes a linear relationship between two variables: • Dependent factor, y, and independent factor, x. • In a mathematical approach, this relationship can be described by the following linear equation: where a is called the slope and b is called the intercept. • This equation, which allows you to calculate y (dependent) based on x(independent), is based on the least square method. UC Merced
Review - Math • Linear Equation • Slope and Intercept y = 3x + 8 3 8 UC Merced
Y-values X-values Slope & Intercept formula Lengths of a leg bone (in cm) in penguin mating pairs UC Merced
Predicted Y-values X-values X-value y = ax + b • a– slope & b - intercept B C 1 2 3 4 5 6 7 8 9 10 11 12 Don’t forget $ sign! =$C$10*B3+$C$11 UC Merced
Plot a linear regression (or trend) line – Part 1 You can add a linear regression line UC Merced
Don’t forget to check these two parts! Plot a linear regression (or trend) line –Part 2 • Right-click on any data point on the graph • Choose Add Trendline • Click on Options tab, and select Display equation and Display R-squared. • Click “Ok” UC Merced
Plot a linear regression (or trend) line –Part 2 – cont. • R2 Value (R-squared value – RSQ) • “measure of scatter” • The closer this value comes to 1, the more accurate the prediction. UC Merced
Let’s review the process! If there are some reasons to believe some causalities between two variables, then, plot a graph! Lengths of a leg bone (in cm) in penguin mating pairs • To see if two variables vary together Regression • To see how one variable affect another. UC Merced
Any Questions? UC Merced