90 likes | 251 Views
Unit 4: More About Relationships between Two Variables. Tatiana Dobretsova Eujin Jang Phillip Kim Greg Reyelts. The big idea. Using logarithms to transform data that follow a curved pattern to achieve a linear relationship.
E N D
Unit 4: More About Relationships between Two Variables Tatiana Dobretsova Eujin Jang Phillip Kim Greg Reyelts
The big idea • Using logarithms to transform data that follow a curved pattern to achieve a linear relationship. • When both variables are categorical, there is no perfect graph for displaying the data, but bar graphs can be helpful. • This relationship is described by comparing percents. • Strong observed relationships between two variables may exist without a cause-and-effect link between them. • Look for lurking variables that might affect the relationship.
4.1 Transforming to Achieve Linearity • Linear regression (LSRL) • Exponential regression (transformation of log y): • y = ab^x • Common Ratio: yn/yn-1, a term divided by the term that comes before it • If the common ratio > 1, exponential growth is occurring, and if the common ratio< 1, exponential decay is occurring. • Power growth (transformation of log x and log y): • y = ax^b
4.1 Transforming to Achieve Linearity • 1—Graph initial data (x vs y) • 2—Test for exponential or power growth (yn / yn-1) • 3—Perform linear regression on initial data • 4—Look at residual plot of initial data and consider r and r^2 • 5—Transform y to log y for exponential, and transform x to log x as well for power • 6—Graph transformed data (x vs. log y OR log x vs. log y) • 7—Perform linear regression on transformed data • 8—Look at residual plot; should be scattered, r, and r^2 • 9—Perform inverse transformation of linear equation to arrive at exponential/power model • 10—Graph initial data with exponential/power model
4.2 Relationships between Categorical Variables • Since we’re dealing with categorical variables, we use two-way tables to compare the two variables, divided into a row variable and a column variable. • Marginal Distributions: Simply look at the row totals and column totals for the two separate variables. Round off error may occur, but percentages should still add up to about 1.
4.2 Relationships between Categorical Variables • To find conditional distribution of the row variable for one specific value of the column variable, look only at that one column. Find each entry as a percent of the column total • Describing Relationships: With categorical variables, approximate percentages must be calculated from the given counts for comparison. • Problems using conditional distribution are dependent on a specific category. See if conditional distributions differ from marginal distribution. If it does, then there is likely to be a relationship •Simpson’s Paradox is when data from several groups are combined into a single group, causing the direction of an association between variables to actually reverse. Example of a lurking variable
4.3 Establishing Causation • ASSOCIATION DOES NOT MEAN CAUSATION. • Causation: The explanatory variable is directly changes the response variable and other influences on the response are controlled. • Common response: If changes in both the explanatory and response variables are cause by changes in lurking variables. • Confounding: The confounding of two variables (either explanatory or lurking variables) means that we cannot distinguish their effects on their effects on the response variable.
Calculator keystrokes • REMEMBER TO TURN ON STAT DIAGNOSTICS • Mode → ↑→ Stat Diagnostics On • Stat → Calc → 8: LinReg (a+bx) • Stat → Calc → 0: ExpReg • Stat → Calc → A: PwrReg • Find equation (LSRL, exponential, power) → Store (VARS → Y-VARS → Y1) → Graph the scatterplot and equation → 2nd Y= → Make Ylist Residual (2nd List → 7) → Graph → Zoom Stat
Review problems • 4.5 pg. 276 • 4.1 Review Sheet • 4.2 Review Sheet • 4.41 pg. 312 • 4.47 pg. 313