290 likes | 575 Views
Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics Canada ICES III June 2007. Optimal Coordination of Samples in Business Surveys. OUTLINE OF THE PRESENTATION:. Coordinated sampling
E N D
Lenka Mach, Statistics CanadaIoana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics Canada ICES IIIJune 2007 Optimal Coordination of Samplesin Business Surveys
OUTLINE OF THE PRESENTATION: • Coordinated sampling • Optimal Sample Coordination 2.1 Transportation Problem 2.2 Reduced Transportation Problem 2.3 Variability of the Overlap • Example 1: NWCR method for negative coordination of two surveys. • Example 2: Reduced TP for positive coordination after re-stratification. • Conclusion
1. COORDINATED SAMPLING • Needed when multiple sample surveys of overlapping populations are conducted. • Encompasses many different techniques to control the overlap of samples = number of common units. higher overlap (positive coordination) • Objective: lower overlap (negative coordination) than if samples are selected independently. • References: Ernst (1999), ICES II (2000), etc.
1. COORDINATED SAMPLING First Survey: S = set of all possible samples s (marginal) prob. distribution on S Second Survey: S’ = set of all possible samples s’ (marginal) prob. distribution on S’ Integrated surveys: joint prob. distribution s. t. and
1. COORDINATED SAMPLING Overlap of s and s’ = number of units that s and s’ have in common Expected sample overlap (1) Survey are positively coordinated if
2. OPTIMAL SAMPLE COORDINATION2.1 Transportation Problem We integrate two surveys so that the expected overlap is maximized (minimized): Find max (min) of (1) over all (2) subject to (3) objective function unknown constraints
2. OPTIMAL SAMPLE COORDINATION2.1 Transportation Problem s’ s o(s1,s’1) o(s1,s’2) o(s1,s’3) o(s1,s’L) X1 1 X12 X1 3 X1 L o(s2,s’1) o(s2,s’2) o(s2,s’3) o(s2,s’L) X2 1 X2 2 X2L X2 3 o(s3,s’1) o(s3,s’2) o(s3,s’3) o(s3,s’L) X3 1 X3 2 X3 3 X3 L o(sK,s’1) o(sK,s’2) o(sK,s’3) o(sK,s’L) XK 1 XK 2 XK 3 XK L
2. OPTIMAL SAMPLE COORDINATION2.1 Transportation Problem TP is too large, too many variables! Example: First survey selects SRSWORof n = 20 fromN = 40. = 137,846,528,820 BUT, for stratified SRSWOR designs, we can reduce TP by grouping samples! Condition: The matrix of o(s, s’)within each group must be “symmetric”. We use a two-stage procedure.
2. OPTIMAL SAMPLE COORDINATION2.2 Reduced Transportation Problem Notation: Pframe for Survey 1, P’frame for Survey 2, C = PP’ c = c(s) = number of units in C s c’ = c’(s’) = number of units in C s’ Solution - Stage 1: • Group samples s super-rows c • Group samples s’ super-columns c’ • Form a matrix of blocks (c, c’), define block optimum o(c, c’) • Solve the reduced TP joint probabilities p(c, c’) Solution - Stage 2: Distribute p(c, c’) evenly among the pairs (s, s’) that have the optimum overlap • each row s within the block gets the same probability • each column s’ within the block gets the same probability
2. OPTIMAL SAMPLE COORDINATION2.2 Reduced Transportation Problem Matrix of o(s, s’) within a block.
2. OPTIMAL SAMPLE COORDINATION2.2 Reduced Transportation Problem B=4 D=3 Example 1: Survey 1:N =40, SRSWOR n =20 Survey 2:N’=41, SRSWOR n’=20 C=37 c = 17, 18, 19, 20 4 super-rows c’ = 16, 17, 18, 19, 20 5 super-columns Reduced TP has only 4 x 5 = 20 unknowns. Constraints:
2. OPTIMAL SAMPLE COORDINATION2.3 Variability of the Overlap • Optimal coordination maximizes (minimizes) • In practice, one pair of samples (s, s’) is selected its overlap o(s, s’) should be close to ! • TP can be used in 2 steps: • Step 1: as described on Slide 6 • Step 2: - Use from Step 1 as an additional constraint • New objective function: For example, find the minimum of (4)
3. Example 1NWCR method for negative coordination of two surveys. Survey 1:N =40, SRSWOR n =20 Survey 2:N’=41, SRSWOR n’=20 D=3, C=37, B=4 Minimize . • Stage 1 – Solve the Reduced TP: • Group samples s into super-rows and s’ into super-columns. • Order super-rows by ascending c and super-columns by descending c’, form a matrix of blocks. • Block optimum o(c, c’) = max{0, c+c’–C} • = smallest possible overlap o(s, s’) within (c, c’). • Use NWCR algorithm to obtain a solution. • Stage 2 - Determine p(s, s’) for each pair (s, s’): • Distribute p(c, c’) equally among all pairs (s, s’) within the block that have o(s, s’) = o(c, c’).
3. Example 1NWCR method for negative coordination of two surveys. Table 1a: Reduced TP, p(c, c’)assigned by NWCR o(c, c’) p(c, c’)
3. Example 1NWCR method for negative coordination of two surveys. • Stage 2 - Distribution of probabilities within blocks • Consider (c=17, c’=20) with o(c, c’)=0: • there are = 15,905,368,710 different samples (rows) s • there are = 15,905,368,710 different samples (columns) s’ • The matrix of overlaps o(s, s’) is symmetric: • For each sample s, there is exactly one sample s’ such that o(s, s’)=0. • For each sample s’, there is exactly one sample s such that o(s, s’)=0. • Each sample s will getprobability of • Each sample s’ will getprobability of
3. Example 1NWCR method for negative coordination of two surveys. Theorem: • The joint density XNWCRobtained by the NWCR method for negative coordination satisfies the constraints given in (3). (b) XNWCR has the minimum expected overlap within the set of joint densities that satisfy (3). • XNWCR has the minimum variance within this set of joint densities. Proof in Mach, Reiss, Şchiopu-Kratina (2006).
3. Example 1NWCR method for negative coordination of two surveys. • Simultaneous Selection • Select one block using the joint probabilities p(c, c’) in Table 1a. • To draw samples s and s’, randomly select units from each set: C = commonunits, D= deaths, B = births. • Suppose block (19, 18) selected in i). • To select s, randomly select 19 units from 37 inC, and 1 unit from 3 in D . • To select s’, take the remaining 37-19=18 units from C, and randomly select two • units from 4 in B . • Sequential Selection (s drawn first) • Select one block from the super-row c(s) using the conditional probabilities • p{(c, c’)| c(s)} corresponding to the joint probabilities in Table 1b. • ii) Randomly select units from Cand B sets to form s’.
3. Example 1NWCR method for negative coordination of two surveys. Common Units (C=37) Births (B=4) Deaths (D=3) s s’ n = 20 c = 19 o (s, s’ ) = 0 c’ = 18 n ’= 20
3. Example 1NWCR method for negative coordination of two surveys. Table 1b:Empirical block probabilities for Sequential SRSWOR (PRN) Table 1c:Expectations
4. Example 2Reduced TP for positive coordination after re-stratification. C1: C1= 2 C2 :C2= 3 Old stratum 1: N1 =20 n1 =10 Old stratum 2: N2 = 6 n2 = 3 New stratum: N’ =15 n’ = 5 C3 : C3 = 10 Old stratum 3: N3 =10 n3 = 2 Objective: Maximize .
4. Example 2Reduced TP for positive coordination after re-stratification. Super-rows: → 3 x 4 x 1 = 12 super-rows Super-columns: (0, 0, 5), (0, 1, 4), (0, 2, 3), (0, 3, 2), (1, 0, 4), (1, 1, 3), (1, 2, 2), (1, 3, 1), (2, 0, 2), (2, 1, 2), (2, 2, 1), (2, 3, 0). → 12 super-columns Reduced TP has 12 x 12 = 144 unknowns. Constraints: Product of hypergeometric probabilities Multihypergeometric probabilities
4. Example 2Reduced TP for positive coordination after re-stratification. Table 2a: Block overlap and probabilities p(c,c’) (TP solution) c’ o(c, c’) = min(c1,c1’) + min(c2,c2’)+ min(c3,c3’) ETP[o(s, s’)] = 3.6494VTP[o(s, s’)] =0.7292
4. Example 2Reduced TP for positive coordination after re-stratification. Sequential selection: Suppose c = (2,3,2) with p(c’)=0.01184 Table 2b: Probabilities forc = (2,3,2) ETP{o|c=(2,3,2)} = 5 VTP{o|c=(2,3,2)} = 0 • i) Select super-column c’ using p{c’ |c=(2,3,2)}. • Suppose c’ = (2,1,2)selected.→ • Randomly de-select 2 units from s C2to form s’.
4. Example 2Reduced TP for positive coordination after re-stratification. • Is the matrix of overlaps o(s, s’),within a block, is symmetric? • Consider block {c =(2,3,2),c’ =(2,1,2)} with o(c, c’)=5: • = 43,758 x 1 x 45 different samples (rows) s • = 1 x 3 x 45 different samples (columns) s’ • For each s, there are exactly 3 samples s’ such that o(s, s’)=5. • For each s’, there are exactly 43,758samples s such that o(s, s’)=5. • Each s’ will getprobability of
4. Example 2Reduced TP for positive coordination after re-stratification. Table 2c: Matrix of o(s, s’); block {c =(2,3,2),c’ =(2,1,2)} 28 s’ 16 s’ 28 s’ 16 s’ 16 s’ 28 s’ 43,758 rows 43,758 rows
4. Example 2Reduced TP for positive coordination after re-stratification. Table 2d:Empirical block probabilities for Sequential SRSWOR (PRN) c’ Table 2e:Expectations
5. CONCLUSION • Optimal sample coordination is a TP. • For stratified SRSWOR, we can reduce TP by grouping samples. The groups must be formed so that the matrix of o(s, s’)within each group is symmetric. • The solution and the selection is done in two stages. • Different objective functions can be defined, depending on the goal of the sample coordination project.
Optimal Coordination of Samplesin Business Surveys Lenka Mach E-mail/Courriel: Lenka.Mach@statcan.ca
REFERENCES Ernst, L.R. (1999), “The Maximization and Minimization of Sample Overlap Problems: A Half Century of Results,” Bulletin of the International Statistical Institute, Proceedings, Tome LVIII, Book 2, pp 293-296. Mach, L., Reiss, P.T., and Şchiopu-Kratina, I. (2006), “Optimizing the Expected Overlap of Survey Samples via the Northwest Corner Rule,” Journal of the American Statistical Association, Vol. 101, No. 476, Theory and Methods, pp. 1671-1679. McKenzie, B. and Gross, B. (2000), “Synchronized Sampling,” ICES II,The Second International Conference on Establishment Surveys, American Statistical Association, pp. 237-243. Ohlsson, E. (2000), “Coordination of PPS Samples Over Time,” ICES II,The Second International Conference on Establishment Surveys, American Statistical Association, pp. 255-264. Royce, D. (2000), “Issues in Coordinated Sampling at Statistics Canada,” ICES II,The Second International Conference on Establishment Surveys, American Statistical Association, pp. 245-254.