150 likes | 441 Views
Statistical Data Analysis STAT221A. Dr. Judi McWhirter Room G3.29 – third floor of G block Office hours: by appointment Web page for this course http://www.stats.waikato.ac.nz/Courses
E N D
Statistical Data Analysis STAT221A Dr. Judi McWhirter Room G3.29 – third floor of G block Office hours: by appointment Web page for this course http://www.stats.waikato.ac.nz/Courses We will use this website for distribution of lecture notes, assignments, and related documents or computer files Statistical Data Analysis - Lecture 1 - 04/03/03
Course Structure Data Exploration, Presentation, and Analysis (3 weeks) Analysis of Variance (ANOVA) and Design (3 weeks) Regression (3 weeks) Multivariate data (3 weeks) Statistical Data Analysis - Lecture 1 - 04/03/03
Textbooks • There are no set texts for this course, but you might like to read: • A.J. Lee - Data analysis – an introduction using R • This book will be on desk copy in the library and is currently out of print • Don’t use this as a reference for R – the commands used are extras written especially for that book • Peter Dalgaard – Introductory Statistics using R • This book will be on desk copy in the library. • Moore & McCabe – Introduction to the practice of Statistics 3rd Ed. • This book is on desk copy in the library. If you took 0655.121 last year you should already own a copy. Statistical Data Analysis - Lecture 1 - 04/03/03
Computer Laboratories • Lab is Lab 5 in R Block, room RG.12 • Login names/User names will be on the R block notice board • If your name is not on the board see Harry Johnston, room RG.20 • Get in early! Inability to get to the computer lab is not a valid excuse for late assignments Statistical Data Analysis - Lecture 1 - 04/03/03
Lab times • There is a space reserved (sometimes with demonstrators) for our class on: • Tuesday 11am-1pm • Thursday 2pm-4pm • Friday 2pm-4pm Statistical Data Analysis - Lecture 1 - 04/03/03
Assessment • Internal assessment • Computing assignments 80% • One test during class time (30th April) 20% • Exam/internal assessment • Ratio 1:1 • Late assignments get zero. Medical certificates/Counsellors certificates are the only excuse • You must get over 40% for your coursework to get credit for it. Statistical Data Analysis - Lecture 1 - 04/03/03
Some aims of course • Learn more about how statistics can help solve real problems. • Including how to figure out what the problem really is! • What the statistical result means! • Learn more about communicating the results of statistical manipulations to the ‘client’, • Gain and improve specific skills in statistical analysis eg. regression, analysis of variance, multivariate analysis, Statistical Data Analysis - Lecture 1 - 04/03/03
Some aims of course • But NOT to learn mathematical theorems which may be behind statistical procedures • Note the emphasis is on practical statistical inference. • R will be used to perform all statistical calculations. We will also use Excel for data management. • R – http://lib.stat.cmu.edu/R/CRAN Statistical Data Analysis - Lecture 1 - 04/03/03
The statistical process Population Sample The statistician Population value Sample Estimate Sampling Calculating Inferring Estimating Statistical Data Analysis - Lecture 1 - 04/03/03
Statistical inference about a population • Population -The entire set of things we wish to make statements about. • Sample--The set of things we have data from • Statistical inference--making probabilistic statements about population parameters based on sample statistics. • This requires that the sample was chosen randomly by some probabilistic method Statistical Data Analysis - Lecture 1 - 04/03/03
Exploration and presentation of univariate data • Single variable. (there may be multiple samples of the same variable) • E.g. Reported rapes for 2001 in NZ • We could group these into regions. • E.g. Results of “Sex, Drugs, & Rock n Roll” class survey. • We will look at each of the questions separately. • We will look at responses grouped into males and females. Statistical Data Analysis - Lecture 1 - 04/03/03
Statistical inference about a relationship • We have observations on two or more variables for a number of items. • We believe there is an underlying (linear) relationship between the variables, but we don’t know the parameters. • Assuming a random error structure on the observations. • We make inferences about the parameters of the relationship. • Whether or not the conclusions extend to the population of items depends how the ones we measured were selected. Statistical Data Analysis - Lecture 1 - 04/03/03
Inference on a relationship • Observational study--which items go into the treatment group(s) and the control group are not determined using randomisation. • Randomised Experiment--we use randomisation to determine which items go into treatment groups and which go into control group. • We can infer the relationship is a causal one only when the data comes from a randomised experiment. Statistical Data Analysis - Lecture 1 - 04/03/03
Sex Drugs & Rock n Roll Data • Census (population : students taking first year statistics 121 or 122 at Waikato) data with errors (incorrect responses) • Class was surveyed using randomised response technique. • Each person responded either to sensitive or dummy question, depending on outcome of roll of dice. • Data contains incorrect responses. • Did not get individual information from the data • The incorrect responses are based on probability, so we can make inferences about population parameters from the data. Statistical Data Analysis - Lecture 1 - 04/03/03
Questions to think about • Can we generalise the conclusions from population of students taking first year statistics course 121 or 122 at Waikato to any wider population? • What population would you think it could be generalised to? • On what basis could you justify generalising it? Statistical Data Analysis - Lecture 1 - 04/03/03