E N D
2. Care and Feeding of Iguanas Iguana iguana
Natural sunlight
Variety of fruits and vegetables
Water
Bathing is a good habit
3. Care and Feeding of Puppies Canis lupus familiaris
Balanced diet
Exercise
Socialization
Bathing is a good habit
4. Care and Feeding of Survey Statisticians Statisticus exemplus reprćsentativus
Balanced diet
Exercise
Natural sunlight
Socialization
Bathing is a good habit
5. Survey Sampling
6. Balanced Diet Mathematical and statistical nutrients at university
Sampling courses
Other aspects of training and care
7. Essence of Survey Sampling How to generalize from seen to unseen?
Quantify uncertainty about population
18th, 19th Century
Immanuel Kant
Charles Peirce
John Venn
Adolphe Quetelet
What is P(sun will rise tomorrow)?
8. 1920s and 1930s Convenience, judgment samples
Models (usually not explicitly stated)
Faith
Famous example: Literary Digest Survey
Correct winner, every election 1912-1932
Uncanny accuracy: n = 2.3 million
1936: predicted Landon with 55%
1936: Roosevelt won with 61%
9. 1940+: Probability sampling Revolutionary Idea: Inference is based on random variables for sample inclusion
Fisher, Neyman, Mahalanobis, Hansen
Robust, nonparametric approach
11. 1960s +: Predictive approach
13. HHM Volume I (1953) Sampling Principles
Biases, Nonsampling Errors
Sample Designs
SRS, Stratified, One- & Two-Stage Cluster Sampling, Stratified Multistage
Control of Variation in Cluster Size
Estimating Variances
Regression Estimates, Double Sampling, Other
Case Studies
14. HHM Volume II (1953) Fundamental Theory of Probability
Derivations for Chapters of Volume 1
Response Errors in Surveys
15. What diet are students getting? SRS of 80 university programs that offer MS or PhD in Statistics or Biostatistics
Exclude JPSM, Iowa State, UNC, UNL
Sampling frame: www.amstat.org listings
Thank you, Burcu Eke!
16. Basic syllabus Sampling Principles
Biases, Nonsampling Errors
Sample Designs
SRS, Stratified, One- & Two-Stage Cluster Sampling, Stratified Multistage
Control of Variation in Cluster Size
Estimating Variances
Regression Estimates, Double Sampling, Other
Case Studies
SRS
Stratified
Cluster
Multistage
Ratio, regression estimation
17. Beyond Basics Replication variance estimation
Nonresponse models, calibration
Regression, categorical data
Spatial sampling
Adaptive sampling
Model-based inference
18. SRS of 80 Grad Programs
19. Exercise: Analyze Survey Data Download data from fedstats.gov
Codebook, SAS code
Investigate topics of interest to students
Graph data
Multivariate analyses
Regression, logistic regression, categorical
Discuss nonsampling errors
Variance estimation
20. Exercise: Analyze Survey Data Cholesterol, obesity (NHANES)
Predicting number of friends (Add Health)
Energy-saving systems & consumption (Commercial Buildings Energy Consumption Survey)
Math scores, sex, calculator use (TIMSS)
Jackknife macros
21. Exercise: Design Work on all steps of a survey
Survey center helpful, not necessary
Take sample from Internet data
amazon.com
Treat large data set as population
IPUMS, baseball
Compare sampling designs
Generate nonresponse
22. Exercise: Inferential Framework Population N = 100
Take SRS of size 30
X1 = mean of first sample
Put them back
Take a second SRS of size 30
X2 = mean of second sample
Are X1 and X2 independent?
Model-, design-based simulations in R
23. Socialization Students need to work with people outside statistics
Socialize with other statisticians
Exposure to new ideas
Integrate sampling with other classes
24. Bathing Need to cleanse old, crusted concepts
What are main goals?
Would I teach this material if starting over?
Do students really need to work out small samples by hand?
Want data-centric training
Problem solvers
25. Sunlight Instead of preparing statisticians for survey problems of 1950, look at
What a survey statistician actually does
What a survey statistician might need to do in the future
26. Current Research Topics Weighting and weight smoothing / trimming
Computer-intensive variance estimation
Visualization
Multi-mode, multi-frame
Small area, disease mapping
Nonparametric, robust models for surveys
Time series / spatial methods
Record linkage, administrative data
Confidentiality
Nonresponse, calibration, imputation
27. Technology and Sampling 1940s: Errors in surveys
Depression, war: Need for data
Sampling: lower cost, fewer errors
Computing
1960s: Telephone, errors, computing
Measurement error
Model-based inference
1980s: Computing ? Replication variance estimation methods, data analysis
28. 2000s: Internet Inexpensive data collection
But
Coverage problems
Nonresponse
Measurement error
Opportunity for ingenuity in sample design HHM, V1, p. 456
29. 1920s and 1930s Convenience, judgment samples
Models (usually not explicitly stated)
Faith
Literary Digest Survey
Claimed accuracy
Predicted correct winner, 1912-1932
30. 2000s ACS, other govt surveys: high quality data
Volunteer (or paid) online panel polls
Convenience, judgment samples
Models (usually not explicitly stated)
Faith
Claim accuracy because predicted correct winner in last few elections
But give margin of error
31. From pollster.com blogs September 10, 2009
Justification of convenience samples for estimating population values
Use model-based inference
See Sharon Lohrs Sampling: Design and Analysis
But what is the model, and how do you know it fits non-volunteers?
32. 2000s Coverage
Nonresponse
Measurement error
Massive amounts of data available
Networked data
Multiple sources, linking
Data fusion
33. Danger Ready availability of data
Wilkinson (2008): structural equation software
Correlational studies ?
Designed experiments ?
Designed surveys are important
Careful data collection
Inference to population
34. New uses for survey data Detecting anomalies
False discovery rates
Forecasting
Better survey design
Combining information from surveys
From data sampling to data integration
????
35. New uses for survey methods Relationships in massive data sets
SRS sometimes used, but rarely other designs
Dynamic data collection
Data dispersed on servers
Microarray data
Effectiveness of medical treatments
Value added by teachers
36. Better connections Tukey (1962) The Future of Data Analysis
It is, incidentally, both surprising and unfortunate that those concerned with statistical theory and statistical mathematics have had so little contact with the recent developments of sophisticated procedures of empirical sampling.
37. Better connections Efron (2007) The Future of Statistics
Statistics is in a period of rapid expansion and change. During such times, it pays to concentrate on basics and not tie oneself too closely to any one technology or analysis fad.
38. Training for the Future Balanced diet: mathematical and statistical background that will give flexibility
Variety of backgrounds
Parallels with 1930s
Economic
Need for more survey theory, expertise
Who foresaw probability sampling in 1920?
39. Statistics Curriculum
40. Statistics Curriculum
41. Training for the Future Still need
mathematical theory for statistics
methodology
probability and model-based sampling
But these need to be updated
Solve problems using statistical thinking
Integrate theory and practice
Emphasize data collection
42. Socialization Better integration of survey sampling with other courses
Asymptotics, probability
Computing
See stat.berkeley.edu/users/statcur
Some students should learn about:
Machine learning
Graph and social network theory
Spatial statistics, bioinformatics,
43. Statistics Curriculum
44. Species Survival Groves Senate Confirmation Hearing, May 15
Sen. Akaka: The federal government is facing major human capital challenges
45% of current Census employees will be eligible to retire next year
.
Bob Groves: I am terribly worried about this problem
the number of programs in the country training people that have the requisite skills for the Census Bureau is way below the need.
45. SRMS Distribution 1241 SRMS members in US1241 SRMS members in US
46. SRMS Members per Million People 1241 SRMS members in US1241 SRMS members in US
47. Morris Hansen Born Thermopolis, WY, 1910
Univ. Wyoming (Deming, Bryant)
Bachelors degree, accounting, 1934
Why did he become a survey statistician?
48. Morris Hansen
49. Teacher: Forest R. Hall Asst prof, 1927
Depression: Regional Director of Dept of Labor
4-state Study of Consumer Purchases
50. Propagating the Species Data not the plural of anecdote
But recruitment is anecdotal, personal
Activities that allow students to experience importance, excitement of subject
Great teaching
Sampling in intro stat, graduate curriculum
Work with survey investigations
Numerical detectives (B. Joiner)
51. Adult Care Balanced diet
Exercise
Natural sunlight
Socialization
Bathing
Reproduction
Good teaching
Collateral reproduction
High pay