90 likes | 179 Views
Introduction to Statistical Analysis. Yale Braunstein School of Information Management & Systems. Approximate (!) Schedule. Today Data, data collection instruments (e.g., surveys) Research design Sample size, sources of error (maybe) Wednesday Sample size, sources of error
E N D
Introduction to Statistical Analysis Yale Braunstein School of Information Management & Systems
Approximate (!) Schedule • Today • Data, data collection instruments (e.g., surveys) • Research design • Sample size, sources of error (maybe) • Wednesday • Sample size, sources of error • Measures of central tendency • Demos of Excel & SPSS • Discussion of statistics assignment • Next Wednesday • More on SPSS with lots of examples • Q & A on the assignment
Introduction • We are focusing on “quantitative analysis” • The general idea is to summarize and analyze data so that it is useful for decision-making • We do this by calculating “measures of central tendency” and by looking for relationships • (We will NOT cover formal tests of hypotheses) • Primary vs. secondary data sources • Data on uses (system) vs. data on users (people)
Data • Data may be continuous or discrete • Just looking at the data often does not enable one to ascertain what is actually happening • Solution: Use appropriate descriptive statistics to summarize and present results Another Data
Analysis--Introduction • The BIG Questions: • What are you trying to discover or show? • How will you present the results? • From survey to report • Flow of information • Sample survey of California ISPs • Brief comparison of Excel & SPSS
Data Collection Instruments • Questionnaires & surveys • Transactions logs • Experimental observation • Bills & invoices • Census forms & reports • Pre-packaged data sets Interviewing & designing surveys requires skill & experience. It is often useful to get professional help.
Issues in Research Design • Case study vs. statistical sample • What is the universe ? (uses, users, etc.) • Current political debate over “average tax cut” vs. “tax cut for the average family” • Is the sample representative ? • Volumes vs. titles in the library • Does correlation imply causality? • Do we need to identify the pathogen? • Controlling for outside factors
Sample Size • How large a sample is needed? • The larger the sample the more accurate the results (unless the response rate becomes very low) • The larger the sample the more the cost/effort • Sample size does NOT depend on the size of the population • Rules of thumb • 100 for 95% confidence, 5% tolerance, 90-10 expected split • 400 for 95% confidence, 5% tolerance, 50-50 expected split • 30 – 50 in each cell on n x m discrete classes • Exact formula (use with care): • Size = 0.25 * (certainty factor/acceptable error)^2 • Where the certainty factor = 1.96 for 95%; 2.576 for 99% [Alternate approach: hire a statistical consultant.]
Sources of Error • The respondent • The investigator • Sampling error • Change in the system itself • Coding & analysis • Other