310 likes | 319 Views
Learn how to transform data for linearity and analyze relationships between variables in bivariate data sets. Understand Simpson’s paradox and how to detect outliers in regression analysis.
E N D
Chapter 4: More about Relationships between Two Variables The Practice of Statistics Third Edition
Ch 4 Objectives • Identify settings in which a transformation might be necessary in order to achieve linearity. • Use transformations involving powers and logarithms to linearize curved relationships. • Explain what is meant by a two-way table, and describe its parts • Give an example of Simpson’s paradox. • Explain what gives the best evidence for causation. • Explain the criteria for establishing causation when experimentation is not feasible.
4.1 Recap… Objectives • Explain what is mean by transforming (re-expressing) data. • Discuss the advantage of transforming nonlinear data. • Tell where y=log(x) fits into the hierarchy of power transformations. • Explain the ladder of power transformations. • Explain how linear growth differs from exponential growth. • Identify real-life situations in which a transformation can by used to linearize dara from an exponential growth model. • Use a logarithmic transformation to linearize a data set that can by modeled by an exponential model. • Identify situations in which a transformation is required to linearize a power model. • Use a transformation to linearize a data set that can be modeled by a power model.
Bivariate Data • Influential points (which may be outliers) See 3.3, page 237 • Effects of outliers
Fig 4.1 Scatterplot and least-squares regression line of brain weight against body weight for 96 species of mammals. Are there influential points? How do you know? There is a standard method for multivariate outlier detection, but we need – among other knowledge – Chi-square.
Fig 4.2 Scatterplot of brain weight against body weight for mammals with outliers removed.
Fig 4.3 Scatterplot and least-squares regression line of brain weight against body weight for 96 species of mammals.
Example 4.2: Fishing tournament (transforming data with powers) Scatterplot of Atlantic Ocean rockfish weight versus length.
Scatterplot of Atlantic Ocean rockfish weight versus length3.
Residual plot versus length3. Analysis Slight pattern in smaller values of length3. For smaller fish, the residuals are all small negative values. Thus, the model predicts a weight slightly higher than the actual weight. For heavier fish, fairly randomly scattered points. The model should be relatively accurate for heavier (tournament-sized) fish.
Atlantic Ocean rockfish data with the model WEIGHT = 4.066 + 0.0147(LENGTH3)
Fig 4.8 The hierarchy of power functions. The logarithm function corresponds to p = 0.
Want or need more practice with logarithms? Visittutorial.math.lamar.edu/AllBrowers/AlgebraTrigReview/SimpLogs.asp
4.2 Relationship between Categorical Variables • Examples: sex, race, occupation • Use counts or percents of individuals in categories
4.2 Objectives • Explain what is meant by a two-way table. • Explain what is meant by marginal distributions in a two-way table. • Describe how changing counts to percents is helpful in describing relationships between categorical variables. • Explain what is mean by a conditional distribution. • Define Simpson’s paradox, and give an example of it.
Marginal Distributions Two way table: If the row and column totals are missing, CALCUATE THEM!
Fig 4.20: A bar graph of the distribution of age for college students. This is one of the marginal distributions for Table 4.5. EXCEL DATA
Fig 4.21: Bar graph comparing the percent of female college students in four age groups. There are more women than men in all college age groups, but the percent of women is the highest among older students.
Fig 4.22: Computer output of the two-way table of college students by age and sex, along with each entry as a percent of its row total. The percents in each row five the conditional distribution of sex for one age group, and the percent in the “Total” row give the marginal distributional of sex for all college students.
Simpson’s Paradox As is the case with quantitative variables, the effects of lurking variables can change or even reverse relationships between two categorical variables.
Example 4.15 Do medical helicopters save lives? pp 299-300
Remaining exercises to be completed and handed in include any we didn’t finish in class as well as: 4.31 through 4.35 We will cover 4.3 tomorrow.