1 / 50

The Care, Feeding, and Training of Survey Statisticians

trang
Download Presentation

The Care, Feeding, and Training of Survey Statisticians

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    2. Care and Feeding of Iguanas Iguana iguana Natural sunlight Variety of fruits and vegetables Water Bathing is a good habit

    3. Care and Feeding of Puppies Canis lupus familiaris Balanced diet Exercise Socialization Bathing is a good habit

    4. Care and Feeding of Survey Statisticians Statisticus exemplus reprćsentativus Balanced diet Exercise Natural sunlight Socialization Bathing is a good habit

    5. Survey Sampling

    6. Balanced Diet Mathematical and statistical nutrients at university Sampling courses Other aspects of training and care

    7. Essence of Survey Sampling How to generalize from seen to unseen? Quantify uncertainty about population 18th, 19th Century Immanuel Kant Charles Peirce John Venn Adolphe Quetelet What is P(sun will rise tomorrow)?

    8. 1920s and 1930s Convenience, judgment samples Models (usually not explicitly stated) Faith Famous example: Literary Digest Survey Correct winner, every election 1912-1932 “Uncanny accuracy”: n = 2.3 million 1936: predicted Landon with 55% 1936: Roosevelt won with 61%

    9. 1940+: Probability sampling Revolutionary Idea: Inference is based on random variables for sample inclusion Fisher, Neyman, Mahalanobis, Hansen Robust, nonparametric approach

    11. 1960s +: Predictive approach

    13. HHM Volume I (1953) Sampling Principles Biases, Nonsampling Errors Sample Designs SRS, Stratified, One- & Two-Stage Cluster Sampling, Stratified Multistage Control of Variation in Cluster Size Estimating Variances Regression Estimates, Double Sampling, Other Case Studies

    14. HHM Volume II (1953) Fundamental Theory of Probability Derivations for Chapters of Volume 1 Response Errors in Surveys

    15. What diet are students getting? SRS of 80 university programs that offer MS or PhD in Statistics or Biostatistics Exclude JPSM, Iowa State, UNC, UNL Sampling frame: www.amstat.org listings Thank you, Burcu Eke!

    16. Basic syllabus Sampling Principles Biases, Nonsampling Errors Sample Designs SRS, Stratified, One- & Two-Stage Cluster Sampling, Stratified Multistage Control of Variation in Cluster Size Estimating Variances Regression Estimates, Double Sampling, Other Case Studies SRS Stratified Cluster Multistage Ratio, regression estimation

    17. Beyond Basics Replication variance estimation Nonresponse models, calibration Regression, categorical data Spatial sampling Adaptive sampling Model-based inference

    18. SRS of 80 Grad Programs

    19. Exercise: Analyze Survey Data Download data from fedstats.gov Codebook, SAS code Investigate topics of interest to students Graph data Multivariate analyses Regression, logistic regression, categorical Discuss nonsampling errors Variance estimation

    20. Exercise: Analyze Survey Data Cholesterol, obesity (NHANES) Predicting number of friends (Add Health) Energy-saving systems & consumption (Commercial Buildings Energy Consumption Survey) Math scores, sex, calculator use (TIMSS) Jackknife macros

    21. Exercise: Design Work on all steps of a survey Survey center helpful, not necessary Take sample from Internet data amazon.com Treat large data set as population IPUMS, baseball Compare sampling designs Generate nonresponse

    22. Exercise: Inferential Framework Population N = 100 Take SRS of size 30 X1 = mean of first sample Put them back Take a second SRS of size 30 X2 = mean of second sample Are X1 and X2 independent? Model-, design-based simulations in R

    23. Socialization Students need to work with people outside statistics Socialize with other statisticians Exposure to new ideas Integrate sampling with other classes

    24. Bathing Need to cleanse old, crusted concepts What are main goals? Would I teach this material if starting over? Do students really need to work out small samples by hand? Want data-centric training Problem solvers

    25. Sunlight Instead of preparing statisticians for survey problems of 1950, look at What a survey statistician actually does What a survey statistician might need to do in the future

    26. Current Research Topics Weighting and weight smoothing / trimming Computer-intensive variance estimation Visualization Multi-mode, multi-frame Small area, disease mapping Nonparametric, robust models for surveys Time series / spatial methods Record linkage, administrative data Confidentiality Nonresponse, calibration, imputation

    27. Technology and Sampling 1940s: Errors in surveys Depression, war: Need for data Sampling: lower cost, fewer errors Computing 1960s: Telephone, errors, computing Measurement error Model-based inference 1980s: Computing ? Replication variance estimation methods, data analysis

    28. 2000s: Internet Inexpensive data collection But Coverage problems Nonresponse Measurement error “Opportunity for ingenuity in sample design” HHM, V1, p. 456

    29. 1920s and 1930s Convenience, judgment samples Models (usually not explicitly stated) Faith Literary Digest Survey Claimed accuracy Predicted correct winner, 1912-1932

    30. 2000s ACS, other govt surveys: high quality data Volunteer (or paid) online panel polls Convenience, judgment samples Models (usually not explicitly stated) Faith Claim accuracy because predicted correct winner in last few elections But give margin of error

    31. From pollster.com blogs September 10, 2009 Justification of convenience samples for estimating population values Use model-based inference “See Sharon Lohr’s Sampling: Design and Analysis” But what is the model, and how do you know it fits non-volunteers?

    32. 2000s Coverage Nonresponse Measurement error Massive amounts of data available Networked data Multiple sources, linking Data fusion

    33. Danger Ready availability of data Wilkinson (2008): structural equation software Correlational studies ? Designed experiments ? Designed surveys are important Careful data collection Inference to population

    34. New uses for survey data Detecting anomalies False discovery rates Forecasting Better survey design Combining information from surveys From data sampling to data integration ????

    35. New uses for survey methods Relationships in massive data sets SRS sometimes used, but rarely other designs Dynamic data collection Data dispersed on servers Microarray data Effectiveness of medical treatments “Value added” by teachers

    36. Better connections Tukey (1962) The Future of Data Analysis “It is, incidentally, both surprising and unfortunate that those concerned with statistical theory and statistical mathematics have had so little contact with the recent developments of sophisticated procedures of empirical sampling.”

    37. Better connections Efron (2007) The Future of Statistics “Statistics is in a period of rapid expansion and change. During such times, it pays to concentrate on basics and not tie oneself too closely to any one technology or analysis fad.”

    38. Training for the Future Balanced diet: mathematical and statistical background that will give flexibility Variety of backgrounds Parallels with 1930s Economic Need for more survey theory, expertise Who foresaw probability sampling in 1920?

    39. Statistics Curriculum

    40. Statistics Curriculum

    41. Training for the Future Still need mathematical theory for statistics methodology probability and model-based sampling But these need to be updated Solve problems using statistical thinking Integrate theory and practice Emphasize data collection

    42. Socialization Better integration of survey sampling with other courses Asymptotics, probability Computing See stat.berkeley.edu/users/statcur Some students should learn about: Machine learning Graph and social network theory Spatial statistics, bioinformatics, …

    43. Statistics Curriculum

    44. Species Survival Groves Senate Confirmation Hearing, May 15 Sen. Akaka: The federal government is facing major human capital challenges … 45% of current Census employees will be eligible to retire next year…. Bob Groves: I am terribly worried about this problem … the number of programs in the country training people that have the requisite skills for the Census Bureau is way below the need.

    45. SRMS Distribution 1241 SRMS members in US1241 SRMS members in US

    46. SRMS Members per Million People 1241 SRMS members in US1241 SRMS members in US

    47. Morris Hansen Born Thermopolis, WY, 1910 Univ. Wyoming (Deming, Bryant) Bachelor’s degree, accounting, 1934 Why did he become a survey statistician?

    48. Morris Hansen

    49. Teacher: Forest R. Hall Asst prof, 1927 Depression: Regional Director of Dept of Labor 4-state Study of Consumer Purchases

    50. Propagating the Species Data not the plural of anecdote But recruitment is anecdotal, personal Activities that allow students to experience importance, excitement of subject Great teaching Sampling in intro stat, graduate curriculum Work with survey investigations Numerical detectives (B. Joiner)

    51. Adult Care Balanced diet Exercise Natural sunlight Socialization Bathing Reproduction Good teaching Collateral reproduction High pay

More Related