500 likes | 720 Views
Group Testing and Coding Theory. Atri Rudra ( U. at Buffalo ) joint works with Piotr Indyk ( MIT ) Hung Ngo ( UB ) Ely Porat ( Bar- Ilan ). Main Message. Group Testing. Data Stream Algorithms. Coding Theory. Group Testing Overview. Test soldier for a disease.
E N D
Group Testing and Coding Theory Atri Rudra (U. at Buffalo) joint works with PiotrIndyk (MIT) Hung Ngo (UB) Ely Porat (Bar-Ilan)
Main Message Group Testing Data Stream Algorithms Coding Theory
Group Testing Overview Test soldier for a disease WWII example: syphillis
Group Testing Overview Can pool blood samples and check if at least one soldier has the disease Test an army for a disease WWII example: syphillis What if only one soldier has the disease?
Group Testing Tons of applications Set of items: (Unknown) vector x in {0,1}n At most d positives: |x| ≤ d Tests: a subset S of {1,..,n} ………… 1 2 3 n …………. …………. …………. …………. 1 0 0 1 0 0 1 0 1 1 0 0 1 0 1 0 1 Non-adaptive tests: all tests are fixed a priori 2 Result of a test: OR of xi’s such that i in S 3 . . . . . . Output + items Goal 1: Figure out x t t = O(d2log n) is possible Goal 2: Minimize the number of tests t
The Decoding Step To be designed unknown Observed r1 x1 r2 x2 r3 ………… x3 1 2 3 n . . . …………. …………. …………. …………. . . . . . . 0 1 0 1 1 0 0 0 0 1 0 1 1 1 0 0 1 2 rt 3 How fast can this step be done? . . . . . . xn t
Our Main Result d is O(logn) # tests (t) Decoding time O(d2 log n) O(nt) Folklore, [PR08] Big savings O(d4 log n) O(t) [GI04] O(d2 log2 n) poly(t) [GI04, implicit] O(d2 log n) poly(t) Our result
An application: Heavy Hitters One pass, poly log space, poly log update, poly log report time Stream items are numbers in the range {1,…,n} Output all items that occur at least 1/d fraction of the times
Cormode-Muthukrishnan idea Use group testing: maintain counters for each test Heavy tail property: Total frequency of non-heavy items < 1/d Maintain total count m ………… 1 2 3 n c1 …………. …………. 1 0 0 0 0 0 1 1 …………. 0 0 1 0 c2 ri = 1 iff ci ≥ m/d c3 xj= 1 iff j is a heavy item (|x| ≤ d) . . . Maintain count of items in tests . . . Reporting the heavy items is just decoding! r = M × x …………. 1 1 1 0 ct
Requirements from Group Testing Non-adaptiveness is crucial Minimize t (space) ………… 1 2 3 n c1 …………. …………. 0 1 0 0 0 0 1 1 Strongly explicit matrix …………. 0 0 1 0 c2 c3 . . . . . . Minimize decoding time (report time) …………. 1 1 1 0 ct
d-disjunct Matrices Every non-positive column has one0test result Sufficient condition for group testing d columns 0 0 0 …………….. 0 1 Test result=0 Exists Set of positives True for every d subset of columns and a disjoint column
Naïve Decoder for d-disjunct Matrices If rj = 0 then for every column i that is in test j, set xi = 0 d columns If xi=1 then all tests column i participates in will have a 1 0 0 0 …………….. 0 1 Set of positives O(nt) time O(Lt) time L columns
So far… Strongly explicit d-disjunct matrix with t = O(d2 log2n) [Kautz-Singleton 1964] d columns Deterministic d-disjunct matrix with t = O(d2 logn) [Porat-Rothschild 2008] r1 r2 Lower bound of Ω(d2 log n/log d) [Dyachkov-Rykov 1982] r3 . . . 0 0 0 …………….. 0 1 rt d-disjunct matrix Set of positives O(nt) time
Filter-Evaluate Decoding Paradigm L columns d columns r1 y1 “Filtering” matrix r2 y2 r3 y3 . . . . . . 0 0 0 …………….. 0 1 rt yt’ d-disjunct matrix Set of positives O(Lt) time poly(t’)time
So all we need to do o(d2 log n/log d) tests
The filtering matrix New* object: (d,L)-list disjunct matrix d columns Running naïve decoder returns ≤ L boguscolumns (d,d)-list disjunct matrices exists with O(d log n) tests Independently considered by [Cheraghchi 09] Set of positives d+L columns
The rest of the talk Strongly explicit d-disjunct matrix with O(d2 log2 n) tests Strongly explicit (d,d2)-list disjunct matrix with t’=O(d1.6 log n) tests and can be decoded in time poly(t’)
Coding Theory is the Bridge Group Testing Data Stream Algorithms Coding Theory
All you ever needed to know about (Reed-Solomon) codes… at least for this talk q is a prime power codewords qq/(d+1)vectors from [q]qwhere every two agree in < q/(d+1) positions poly(q) time algorithm for list recovery S1 S2 S3 Sq Si subset of [q] . . . Output all codewords that agree with all the input lists ……………………… . . . . |Si| ≤ d ……………………… c1 c3 cq c2
Disjunct matrices from RS codes Column i gets ith codeword n = qq/(d+1) Code Concatenation t = q2= O(d2 log2n) …. x …. 0 0 0 1 q x q rows x d-disjunct matrix [Kautz,Singleton] .
A q=3 example 0 1 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 1 2 1 1 0 2 0 0 0 1 0 1 0 1 0 0 1 0 0 0 1 0 0 0 1 1 1 2 0 0 2 0 1 0 0 0 1 1 0 0 0 0 1 1 0 0 0 1 0 2 1 0 1 0 2 1 0 2 0 0 1 0 1 0 0 0 1
1-Agreement between two columns 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1 1 0 0 ≤ 1 agr 2 1 1 0 0 2 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 2 2 1 1 0 0 1 0 1 0 1 0 0 0 1 0 1 0 0 0 0 1 2 1 1 2 0 0 1 0 2 Agreement in binary = Agreement among RS codewords < q/(d+1) 0 0 1 0 1 0 1 0 0
d-Disjunctness of Kautz-Singleton d columns 1 0 0 0 > q- q*d/(d+1)) rows 1 1 < q/(d+1) agr 1 1 < q/(d+1) agr 1 1 < q/(d+1) agr
d-disjunct Matrices Sufficient condition for group testing d columns 0 0 0 …………….. 0 1 Exists Set of positives True for every d subset of columns and a disjoint column
The rest of the talk Strongly explicit d-disjunct matrix with O(d2 log2 n) tests Strongly explicit (d,d2)-list disjunct matrix with t’=O(d1.6 log n) tests and can be decoded in time poly(t’)
A detour The Kautz-Singleton matrix is a Strongly explicit (d,d2)-list disjunct matrix with t’=O(d2 log2 n) tests and can be decoded in time poly(t’)
Back to the example 0 0 1 0 1 0 1 0 0 0 0 1 0 1 0 1 0 0 0 1 1 {1,2} {2} {0,2} 0 1 2 0 2 1 1 0 0 0 1 0 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 1 2 2 0 1 0 0 1 0 0 0 1 0 0 1 0 1 0 1 0 0 1 0 0 1 0 1 1 2 0 1 0 2 1 0 2 + items Result vector 1 0 0 0 1 0 0 0 1
All you ever needed to know about (Reed-Solomon) codes… at least for this talk q is a prime power qq/(d+1)vectors from [q]qwhere every two agree in < q/(d+1) positions poly(q) time algorithm for list recovery S1 S2 S3 Sq Si subset of [q] . . . Output all codewords that agree with all the input lists ……………………… |Si| ≤ d . . . . ……………………… c1 c3 cq c2
Connection to List Recovery Decoding: Output all codewords that match the test results List recover from S1,…,Stto get the positive codewords 1 . . . 2 …. x …. 0 0 0 1 . . . . . . |Sj|≤ d x x ………… ………… ………… ………… 1 j x S1 S2 Sj Sq . . . . . . . . . q . . . . r
What does this imply? d2 columns d columns Implicit in [Guruswami-Indyk 04] t = O(d2 log2 n) r1 r2 r3 . . . 0 0 0 …………….. 0 1 rt KS matrix Set of positives poly(t) time O(d2t) time
The rest of the talk Strongly explicit d-disjunct matrix with O(d2 log2 n) tests Strongly explicit (d,d2)-list disjunct matrix with t’=O(d1.6 log n) tests and can be decoded in time poly(t’)
Revisiting the decoding algorithm q 3 1 2 ………. 1 1 ………. 2 1 ………. 3 1 . . . 1 . . 2 q 1 Works but hits a d3 barrier . . . . . . |Sj|≤ d x ………… 1 j x Sj . . . . . . . . . d-disjunct matrix Naïve decoder q . . . . r
Revisiting the decoding algorithm-II q 3 1 2 Need to change the parameters of the Reed-Solomon codes a bit. 1 . . 2 . . . . . . |Sj|≤ 2d x ………… 1 j x Sj . . . . . . . . . (d,d)-list disjunct Naïve decoder q . . . . r
Some number crunching q 3 1 2 RS codeword d log q rows 1 2 . . . . . . j . . . n ~ qq/d . . . . . . (d,d)-list disjunct t = q X (d log q) q . . . . ~ (d X log n/ log q) X (d log q) = d2 log n
What does this imply? Matches best known bound! t = O(d2 log n) d2 columns d columns r1 y1 “Filtering” matrix r2 y2 r3 y3 . . . . . . 0 0 0 …………….. 0 1 rt yt d-disjunct matrix Set of positives O(d2t) time poly(t)time
How we get our hands on… q 3 1 2 RS codeword d log q rows 1 2 . . . . . . j . . . n ~ qq/d . . . . . . (d,d)-list disjunct t = q X (d log q) q . . . . ~ (d X log n/ log q) X (d log q) = d2 log n
Solution 1 [Indyk, Ngo, R. 10] q 3 1 2 d log q rows Pick “inner” codes at random (d,d)-list disjunct
Can also show d-disjunctness Different “inner” matrices for different RS codeword positions Random matrix x Can show whp all matrices are what they should be .
Solution 2 [Ngo, Porat, R. 11] q 3 1 2 d log q rows Use explicit expanders! (d,d)-list disjunct
(d,d)-list disjunct Matrices d columns d columns 0 0 0 …………….. 0 1 0 Exists Set of positives True for every disjoint d subsets of columns
The expander connection Works if sets of size 2d expand by at least .75*degree d columns d columns 0 0 0 …………….. 0 1 0 Exists Rows Set of positives Columns
Solution 2 [Ngo, Porat, R. 10] q 3 1 2 d log q rows Use explicit expanders! Some comments: (d,d)-list disjunct Left degree of the expander not important d1+o(1) log q rows possible [GUV 07, Cheraghchi 09] Use PV codes instead of RS codes
The rest of the talk Strongly explicit d-disjunct matrix with O(d2 log2 n) tests Strongly explicit (d,d2)-list disjunct matrix with t’=O(d1.6 log n) tests and can be decoded in time poly(t’)
Our Main Result O(d4 log n) O(t) [GI04] # tests (t) Decoding time O(d2 log n) O(nt) Folklore, [PR08] O(d2 log2 n) poly(t) [GI04, implicit] O(d2 log n) poly(t) Our result
Other work/Open Questions Results generalize to compressed sensing [Ngo, Porat, R. 11] Other applications of group testing? Complexity Theory? Strongly explicit construction of optimal disjunct matrices ?