1 / 23

STATS 330: Lecture 1

STATS 330: Lecture 1. Introductory Stuff. Today’s agenda:. Introductory Comments: Housekeeping Computer details Plan of the course Statistical Modelling: an overview Our Analysis strategy Goals for the course Role of Graphics Data cleaning. Housekeeping. Contact details….

jacob
Download Presentation

STATS 330: Lecture 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STATS 330: Lecture 1 Introductory Stuff 330 Lecture 1

  2. Today’s agenda: • Introductory Comments: • Housekeeping • Computer details • Plan of the course • Statistical Modelling: an overview • Our Analysis strategy • Goals for the course • Role of Graphics • Data cleaning 330 Lecture 1

  3. Housekeeping Contact details…. Plus much else on course web page www.stat.auckland.ac.nz/~lee/330/ Or via Cecil 330 Lecture 1

  4. 330 Lecture 1

  5. Assignments • There will be five assignments • (20% of total grade) • The due dates are listed in the course summary • Assignment 1 is due on August 2. • No paper will be issued in class – download assignments and data from the Web or Cecil 330 Lecture 1

  6. Tutorials • These will cover computing details • Held in basement tutorial lab, 303S Extension (Rm 303S-B75) • Tutorial 1: 11-12 Wed • Tutorial 2: 10-11 Fri • Tutorial 3: 2-3 Fri • Start second week of semester 330 Lecture 1

  7. Computer details • All analyses will be done using R • I require homework to be typed, use Word, cut and paste R text and graphics into Word • Use home/laptop computer or basement lab (303S) • See web page for info on downloading R330 package • URL is www.stat.auckland.ac.nz/~lee/330 330 Lecture 1

  8. Course Plan • There will be 33 lectures, divided into chapters as follows • Chapter 1: Introduction (1 lecture, this one!) • Chapter 2: Graphics (3 lectures) • Chapter 3: Multiple Regression Model (12 lectures) • Chapter 4: Factors (4 lectures) • Chapter 5: Models for categorical and count responses (12 lectures) • Revision (1 lecture) 330 Lecture 1

  9. Course book • This covers most of the material we will discuss in the lectures • Has 5 chapters corresponding to the division on the previous slide • On-line version available on the course web site • Paper copies available from the Statistics Department, Commerce A 330 Lecture 1

  10. Overview of Statistical Modelling • Statistical models summarize relationships between variables • Regression models focus on one variable (the response) and how its distribution can be modelled by one or more explanatory variables 330 Lecture 1

  11. Example: Response is BPD BPD: Bi-Parietal Diameter For an unborn baby, depends on several factors, including Gestational Age. 330 Lecture 1

  12. Ultrasound image 330 Lecture 1

  13. Points to take into account: • Not all babies of the same age have the same BPD. • In general, the older the baby, the bigger the BPD. 330 Lecture 1

  14. Model relating BPD and GA Line is BPD =a + b GA BPD Each bell curve has sd 10 mm 30 40 Gestational Age Mean BPD for 30 weeks = a + b 30 Mean BPD for 40 weeks = a + b 40 Mean BPD for GA weeks = a + b GA 330 Lecture 1

  15. Regression model features • Model specifies the distribution of the response • Also says how the distribution of the response is affected by the covariates (explanatory variables) • In this case the covariate GA determines the mean response: mean BPD = a + b GA ie a straight-line relationship. 330 Lecture 1

  16. In general…. • Response has a distribution depending on (conditional on) the explanatory variables • Mean of the distribution given by some function of the explanatory variables • Need to • describe the function • describe the variability • Balance accuracy with simplicity. 330 Lecture 1

  17. Our Analysis strategy: • Explore the data using graphics and summary statistics • Construct a useful model (trial and error) • Use the model to gain knowledge about the system under study (How big should a 40 week baby’s BPD be?) • Communicate findings (be able to write a report!!) 330 Lecture 1

  18. Goals for 330 • Get more practice in exploring data • Expand knowledge of regression • Get better at fitting models • Improve your diagnostic skills (how to recognize when a model doesn’t describe the data properly) • Improve your interpretation and communication skills, in a more flexible way. 330 Lecture 1

  19. Role of Graphics • Data Cleaning: Are there gross errors, outliers, special codings (eg 999 for missing), missing data, absolute rubbish in the data? • Exploratory analysis: what sort of model might be appropriate? • Diagnostics: having tentatively selected a model, is it any good? (Residual plots, etc) 330 Lecture 1

  20. Exploratory analysis for BPD data Linear relationship (straight line) appropriate? Outlier 330 Lecture 1

  21. Diagnostic Graphics: residuals versus fitted values Smoothing enhances interpretation. Conclusion? Outlier stands out 330 Lecture 1

  22. Data Cleaning • All real-life data sets are likely to contain errors • These will usually be revealed by suitable plots, such as the one on the previous slide • Before starting any analysis, data should always be carefully checked • To encourage this practice, I will occasionally introduce errors into the assignment data supplied on the web. 330 Lecture 1

  23. Data Cleaning (2) It is your responsibility to check all data supplied on the web, against the assignment sheets . Failure to so will cost marks. YOU HAVE BEEN WARNED!!! 330 Lecture 1

More Related