210 likes | 227 Views
Tracing a Single User. Joint work with Noga Alon. Group Testing. Dorfman raised the following problem in 1941: All American inductees gave blood samples, that were tested for the presence of a syphilitic antigen.
E N D
Tracing a Single User Joint work with Noga Alon
Group Testing Dorfman raised the following problem in 1941: • All American inductees gave blood samples, that were tested for the presence of a syphilitic antigen. • We assume that the number of infected blood samples r is much smaller than the total number m. • Testing each sample separately requires m tests.
Group Testing (cont.) • Instead, one can test pools that contain blood from a set of samples. • If the outcome is negative – none of the samples in the pool is infected. • Otherwise, the pool contains at least one infected sample, which can be determined by further tests. • This way, less than m tests are needed.
Molecular Biology • In recent years this problem has gained popularity again in the field of molecular biology. • For example, when we are given a large set of DNA sequences, and we look for all those that contain a specific short subsequence. • We can use a method similar to that of the blood testing problem.
Molecular Biology (cont.) • In some applications, we are interested in finding one sequence that contains the short subsequence, rather than all of them.
Parallelization • Often, we would prefer to conduct all experiments simultaneously, even at the cost of increasing the number of experiments. • Thus, we need our tests to be non-adaptive, i.e. the pool tested in each experiment is independent of the outcomes of other experiments.
r-SUT Definition Definition:Let F be a family of subsets of[n] = {1,…,n}. F is called r-single-user-tracing superimposed (r-SUT) if F1,…,FkF with |Fi|r, In other words, given the union of up to r sets from F, one can identify at least one of those sets.
Communication • Suppose that m users share a common channel. • Each user is associated with a vector in {0,1}n. • All active users transmit their vectors, and a single receiver gets the OR of all transmitted vectors. • Given that at most r users are active simultaneously, we would like the receiver to be able to identify at least one of them.
Maximal r-SUT Families • Let g(n,r) denote the maximum size of anr-SUT family of subsets of [n]. • Let Rg(r) = lim sup n log g(n,r) / n. Csűrös and Ruszinkó: There exist constants c1,c2>0 s.t. . Our result:Rg(r) =(1/r) (and hence (1/r)).
Lower Bound • Let m = 2n/(20r). • We construct a family F={F1,…,Fm} of subsets of [n] at random as follows: • 1 ≤ i ≤ m and 1 ≤ j ≤ n independently, put j in Fi with probability 1/r.
Lower Bound (cont.) • We show that F is r-SUT with positive probability. • We say a configuration of F1,…,FkF with |Fi|r and is bad if all the unions are equal. • We show that with positive probability there are no bad configurations.
Lower Bound (cont.) • We show that with probability > ½ no small configuration is bad, and that with probability > ½ no large configuration is bad. • Therefore, with positive probability there is no bad configuration.
Small Configurations Proposition: With probability > ½ the following holds:s<2r and distinct A1,…,AsF, j[n] that belongs to exactly one of the sets A1,…,As. Corollary: With probability > ½ no small configuration is bad.
Small Configurations (cont.) A5 A7 A2 A1 A8 A4 A9 A3 A6
Large Configurations Proposition: With probability > ½ the following holds. For all distinct A1,…,Ar,B1,…,BrF, Corollary: With probability > ½ no large configuration is bad.
Large Configurations (cont.) B3 B2 B1 B1 Ai B3 A1 A3 B2 A2
Tracing Multiple Users • Recently, Laczay and Ruszinkó have introduced the following generalization of r-SUT families. • For integers n, r2, and 1kr, a family F of subsets of [n] is called k-out-of-r multiple-user-tracing superimposed (MUTk(r)) if given the union of any ℓr sets from F, one can identify at least min(k,ℓ) of them.
Tracing Multiple Users (cont.) • Let h(n,r,k) denote the maximum size of aMUTk(r) family of subsets of [n]. • Let Rh(r,k) = lim sup n log h(n,r,k) / n. • We have shown that there are constants c1,c2,c3,c4>0 s.t. .
Open Problems • We have shown that Rg(r) = (1/r), but the question of finding the exact constant is still open. • This problem is open even for the case of r = 2. • 1/3 Rg(2) 1/2+o(1). Follows from a result of Coppersmith and Shearer By a careful analysis of the random construction
Open Problems (cont.) • We show how to construct an r-SUT family in time mO(r), where m is the size of the family.It would be interesting to find explicit constructions for all r. • There are other related problems for which there are still gaps between lower and upper bounds: • Multiple-user tracing families • r-superimposed families • Disjointly r-superimposed families • Graph identifying codes