1 / 34

The obsession with weight in the modelling world

The obsession with weight in the modelling world. And it’s ancillary affects on Analysis. The basic. The basic idea of sampling The reason behind complicating a good idea The implication when modelling data. How Sampling Works.

arvin
Download Presentation

The obsession with weight in the modelling world

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The obsession with weight in the modelling world And it’s ancillary affects on Analysis

  2. The basic • The basic idea of sampling • The reason behind complicating a good idea • The implication when modelling data

  3. How Sampling Works. Now let’s assume that we had some idea about the picture we wanted to see. And we decide to stratify the sample. In this case we decide to sample different areas of the picture at different rates, the backgroud, the dress, the face, the hands, etc... Imagine a well known picture Since a picture is made up of points of colour (pixels), we will sample the points of colour at different rates. 1% Random (systematic) 10% Random 5% Random 3% Random 2.5% Stratified

  4. How Sampling Works. 1% 3% 5% 10% 2.5% Stratified

  5. How does this affect modeling or analysis • The sample is no longer simply random • We purposefully biaised the sample to gain efficiencies to meet other goals • This bias is corrected when we apply the design weights.

  6. Framework If you were to analyse each stratum separately Still there would be some difficulty associated with the correction for non-response and final callibration (post) Each part can actually be treated as surveys each with a simpler design The sampling frame or design allows you to keep all these part together in a cohesive way for analysis.

  7. How to interpret sampling The way we sample is reflected and corrected by how we weight the data in the end. • If you looked only at the parts we sampled • You wouldn’t get an accurate picture. • All the parts would be there but not in the right proportions. • The design weights compensate for the known distortions. The final weights include estimated distortions.

  8. What would you use to base the fundamental multivariate relationships in your model or analysis ?

  9. Steps to calculate the weights – Basic overview • At the survey design stage, some factors are used to determine the sample size required • Probability of selection calculated • First series of adjustments for non-response • Post-stratification

  10. Factors to determine the sample size • Characteristics to be estimated (small proportions) • Required precision of the estimates (targetted CV) • Variability of the data • Expected non-response rate • Size of the population

  11. Original design weight • Once the sample is selected in each stratum, calculate the original weight: • Nh/nh, where « h » is the stratum • Since the sample is selected from LFS, get original weight from LFS. • Adjustments for the number of available children.

  12. Non-response adjustment • Adjustments must be made to take into account the total non-response • Characteristics of respondents vs non-respondents are analyzed: • Province, income, level of education of parents, depression scale of PMK, urban/rural, etc.

  13. Post-stratification • Adjustment factor calculated in order to post-stratify the sample to known population counts, by: • Province, age, gender

  14. Final weight • Wf = Wi X Adj1 X Adj2 • Where • Wf: Final weight • Wi: initial weight • Adj1: Non-response adjustment • Adj2: Post stratification

  15. Link between analysis and the sample design (weight) Intelligence Grade level • Child’s • Ability Social environment Teachers School Materials Subject Curriculum Province The proportion of kids in the sample being taught the PEI curriculum is much larger than what’s found in the population Province is a stratum

  16. Link between analysis and the sample design • There are very few things in a child’s life that is not related to where they live. • In the city versus in a small village • In a small province versus a large one • what social/educational programs are offered • what social support and services are offered • regional cultural differences • to name a few…

  17. Weights for cycle 4 • Cross-sectional weights • Longitudinal weights, including the converted respondents. • Longitudinal weights, children introduced in C1 and respondent to all cycles. NEW • Not to mention the bootstrap weights, which are used for an entirely different purpose.

  18. Cross-sectional Weights • Available for all cycles, up to Cycle 4. • When are they used? • Cycle 4 cross-sectional weights: • to represent the population aged 0-17 in 2000-01. • … • Cycle 1 weights: • to represent the population aged 0-11 in 1994-95.

  19. Cross-sectional Weights - Cycle 4 - Warning • In Cycle 4, children with a cross-sectional weight come from 4 different cohorts (introduced in 1994, 1996, 1998 and 2000). • By 2000, the 1994 cohort has been around for 6 years: • cross-sectional representativity decreases over time because of sample erosion and population change (immigration).

  20. Cross-sectional Weights - Cycle 5 • For Cycle 5 (2002-2003), no children aged 6 and 7. • In addition, the 1994 cohort’s cross-sectional representativity has declined even further (erosion and immigration). • As a result, cross-sectional weights will be calculated only for children aged 0-5.

  21. Cross-sectional weights must be used whenthe analysisconcerns a specific year, when you want a snapshot of the situation at a specific point in time. Cross-sectional weights in a nutshell

  22. Longitudinal Weights • Longitudinal weights represent the population of children at the time they were brought in to the survey. • Children introduced in Cycle 1: longitudinal weights represent the population of children aged 0-11 in 1994-95.

  23. Longitudinal Weights (continued) • Children introduced in Cycle 2: longitudinal weights represent the population of children aged 0-1 in 1996-97. • Children introduced in Cycle 3: longitudinal weights represent the population of children aged 0-1 in 1998-99. • Children introduced in Cycle 4: longitudinal weights represent the population of children aged 0-1 in 2000-01.

  24. When are longitudinal weights used? • When you want to track a cohort of children introduced in a particular cycle and see how they’ve developed over time.

  25. Longitudinal Weights - Cycle 4 • Something new in Cycle 4: • 2 sets of longitudinal weights: • Set 1: Weights for children who responded in their first cycle and in Cycle 4 (possible non-response in Cycle 2 or 3) • Set 2: Weights for those introduced in cycle 1 who responded in every cycle. NEW.

  26. Longitudinal Weights - Cycle 4 • Difference between the 2 sets of longitudinal weights • To avoid total non-response in Cycle 2 or 3, the set of weights for those who responded throughout can be used. • If you’re only interested in the changes between Cycle 1 and Cycle 4 directly, the longitudinal weights including converted respondents can be used.

  27. Examples • Following are real examples taken from the NLSCY data

  28. Weighting - Examples Average weights inCycle 4. Prince Edward Island 7 1-year-olds 5-year-old

  29. Weighting - Examples Average weights inCycle 4 (continued) Ontario 712 15-year-olds 15-year-old

  30. Example: Proportion of children aged 0-17, by province, Cycle 4, UNWEIGHTED • 24% of Canada’s children live in the Maritime provinces … whereas in reality...

  31. Example: Proportion of children aged 0-17, by province, Cycle 4, WEIGHTED • Whereas in reality…7.3% of children live in the Maritime provinces.

  32. The conclusion is obvious… Huge increase in births in 1993 and 1997!!!!! Age Birth Year Sample size Percentage 0 1998 306 4.9% 1 1997 1,055 16.8% 2 1996 326 5.2% 3 1995 449 7.1% 4 1994 405 6.4% 5 1993 1,627 25.8% 6 1992 313 5.0% 7 1991 201 3.2% 8 1990 265 4.2% 9 1989 170 2.7% 10 1988 221 3.5% 11 1987 154 2.4% 12 1986 224 3.6% 13 1985 167 2.7% 14 1984 241 3.8% 15 1983 171 2.7% Total 6,295 Number of children aged 0-15 by year of age, Quebec, Cycle 3, unweighted

  33. So much for the pseudo baby boom... Age Birth Year Population Percentage 0 1998 73,254 5.2% 1 1997 78,769 5.5% 2 1996 84,713 6.0% 3 1995 88,662 6.2% 4 1994 87,895 6.2% 5 1993 91,466 6.4% 6 1992 95,101 6.7% 7 1991 78,882 5.6% 8 1990 116,752 8.2% 9 1989 73,451 5.2% 10 1988 107,819 7.6% 11 1987 75,130 5.3% 12 1986 98,202 6.9% 13 1985 79,400 5.6% 14 1984 100,385 7.1% 15 1983 91,205 6.4% Total 1,421,086 Number of children aged 0-15 by year of age, Quebec, Cycle 3, WEIGHTED

  34. Conclusion • To be obsessed with weights is a good thing…where statistical analysis is concerned

More Related