1 / 23

Class 1: Sept. 9

Class 1: Sept. 9. About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?. My Current research.

jenkinsm
Download Presentation

Class 1: Sept. 9

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Class 1: Sept. 9 • About instructor: Dylan Small, Assistant Professor, Department of Statistics. • How I got interested in statistics?

  2. My Current research • Statistical methods for comparing treatments/policies when a perfectly controlled randomized experiment cannot be done using the method of “instrumental variables.” Applications to: • Treatment of depression among the elderly in primary care practices • Food policy in developing countries • Statistical methods for panel studies, studies that survey same people repeatedly over time. • Prediction of child morbidity/mortality in Pakistan using previous height and weight measurements.

  3. Course Objectives • To learn how to use two important statistical tools to analyze data: Regression and Analysis of Variance • To get hands on experience analyzing data and computing with data (using JMP) • To gain experience in interpreting the results of a statistical analysis and communicating the results to others

  4. Course requirements • Responsible for both material covered in the lecture and reading associated with the lecture. • Weekly homework, typically handed out on Thursday, due following Thursday at beginning of class. Late homework will be given at most half credit. • Project: Analysis of data set of interest to you using regression. Work in groups of 2-3 people. Final report, class presentation. More details in October. • Midterm: Tuesday, October 21, 3:00 pm-4:20pm • Final: Tuesday, December 21, 8:30am-10:30am

  5. Grading • Grades will be based on • 20% Homework • 30% Project • 20% Midterm • 30% Final

  6. Web site/Textbooks • Web site: http://www-stat.wharton.upenn.edu/~dsmall/stat112-f04 Can be reached by going to http://www-stat.wharton.upenn.edu, clicking on courses and clicking on Stat 112. • Textbooks: • Moore and McCabe, Introduction to the Practice of Statistics, 4th edition (Required). We will be covering Chapter 2, part of Chapter 3 and Chapters 10-13. • JMP version 5 with handbook. Highly recommended. If you do not own it, you need to sign up for a Wharton account and use it in the Wharton labs. • JMP manual for Introduction to the Practice of Statistics. Recommended.

  7. Instructor Accessibility • E-mail address: dsmall@wharton.upenn.edu • My Office hours (office: 464 Huntsman Hall): • Tuesdays and Thursdays after class, 4:30-5:30. • By appointment. I will be happy to meet with you if you send me an e-mail to arrange a time. • I encourage you to come see me at least once during the semester to chat about your background, interests, concerns about the class and future plans. • TA: Lie Wang, office hours TBA • Stat Lab: Monday-Thursday, 9-3; Friday, 11-5

  8. Class 1 • Reading: Introduction to Chapter 2, Chapter 2.1 • Topic: Relationships between variables measured on same unit. • Unit could be an individual, a state, a company, a year, etc. • Data set: Penn Alcohol data set. Penn Alcohol dataset (pennalcohol.JMP under datasets on website). Survey given to 123 Penn undergraduates. • Alcohol use: Number of days per month on which person drinks.

  9. Association • Two variables measured on the same unit are associated if some values of one variable tend to occur more often with some values of the second variable than with other values of that variable. • Two variables are positively associated when above average values of one tend to accompany above average values of the other and below-average values also tend to occur together. • Two variables are negatively associated when above-average values of one accompany below-average values of the other, and vice versa.

  10. Strength of association • Strength of the association: Measure of how strong is the positive or negative association. Statistical associations are overall tendencies, not ironclad rules. • If there is a strong association between two variables, then knowing one helps a lot in predicting the other. But when there is a weak association, information about one variable does not help much in guessing the other.

  11. Association does not have to be linear or unidirectional • Relationship between gas mileage per gallon and speed at which a car is driven:

  12. Response and Explanatory Variable • Response variable (Y) measures outcome of study. • Explanatory variable (X) explains or causes change in the response variable. • Y=gas mileage per gallon, X=speed at which car is driven. • Response and explanatory variables in alcohol study?

  13. Scatterplots • A scatterplot shows the relationship between two quantitative variables measured on the same units. The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis. Each unit in the data appears as the point in the plot fixed by the values of both variables for that unit. • Always plot the explanatory variable, if there is one, on the horizontal axis (the x axis of the scatterplot).

  14. Scatterplots in JMP • Click Analyze, Fit Y by X. Left click the response variable (so that it is highlighted) and then left click the Y, response button (so that it appears in the Y, response box). Similarly left click the explanatory variable and then left click the X, factor button. Click OK.

  15. Examining a scatterplot • Look for the overall pattern of the data and for striking deviations from that pattern. • The overall pattern of a scatterplot can be described by the form, direction and strength of the relationship. • An important kind of deviation is an outlier in terms of the direction of the scatterplot, a point that falls outside the overall pattern of the relationship.

  16. Brain size and body size in 96 mammals (mammalstudy.JMP)

  17. Labeling points in JMP • To label a point in a scatterplot in JMP, put cursor in column that you want to use to name the point (species in the mammal study), then click Cols and then click Label. Then put cursor on the row you want to label, then click Rows and then click Label.

  18. Association is not causation • An association between what we call the response variable and what we call the explanatory variable does not prove that changes in the explanatory variable cause changes in the response variable. • The relationship between two variables can be strongly influenced by other variables that are lurking in the background (lurking variables)

  19. Key Points from Lecture • Association: Definition. • Scatterplots: • How to examine them. • How to make them in JMP • Association is not causation. • Next class: 2.2 (correlation), begin 2.3 (least squares regression)

More Related