1 / 63

Group Testing and Coding Theory

Group Testing and Coding Theory. Or, A Theoretical Computer Scientist’s (Biased) View of Group Testing. Atri Rudra University at Buffalo, SUNY. Group testing overview. Test soldier for a disease. WWII example: syphillis. Group testing overview. Can we do better?.

lnorris
Download Presentation

Group Testing and Coding Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Group Testing and Coding Theory Or, A Theoretical Computer Scientist’s (Biased) View of Group Testing Atri Rudra University at Buffalo, SUNY

  2. Group testing overview Test soldier for a disease WWII example: syphillis

  3. Group testing overview Can we do better? Test an army for a disease WWII example: syphillis What if only one soldier has the disease?

  4. Communicating with my 2 year old C(x) x y = C(x)+error • “Code” C • “Akash English” • C(x) is a “codeword” x Give up

  5. The setup C(x) x y = C(x)+error • Mapping C • Error-correcting code or just code • Encoding: xC(x) • Decoding: yx • C(x) is a codeword x Give up

  6. The fundamental tradeoff Correct as many errors as possible with as little redundancy as possible Can one achieve the “optimal” tradeoff with efficient encoding and decoding ?

  7. The main message Coding Theory Group Testing

  8. Asymptotic view n! 10n2 n2

  9. O() notation ≤ is O with glasses poly(n) is O(nc) for some fixed c

  10. Group testing overview Can pool blood samples and check if at least one soldier has the disease Test an army for a disease WWII example: syphillis What if only one soldier has the disease?

  11. Group testing Tons of applications Set of items: (Unknown) vector x in {0,1}n At most d positives: |x| ≤ d Tests: a subset S of {1,..,n} ………… 1 2 3 n …………. …………. …………. …………. 1 0 0 1 0 0 1 0 1 1 0 0 1 0 1 0 1 Non-adaptive tests: all tests are fixed a priori 2 Result of a test: OR of xi’s such that i in S 3 . . . . . . Output + items Goal 1: Figure out x t t = O(d2log n) is possible Goal 2: Minimize the number of tests t

  12. The decoding step To be designed unknown Observed r1 x1 r2 x2 r3 ………… x3 1 2 3 n . . . …………. …………. …………. …………. . . . . . . 0 1 0 1 1 0 0 0 0 1 0 1 1 1 0 0 1 2 rt 3 How fast can this step be done? . . . . . . xn t

  13. An application: heavy hitters One pass, poly log space, poly log update, poly log report time Stream items are numbers in the range {1,…,n} Output all items that occur at least 1/d fraction of the times

  14. Cormode-Muthukrishnan idea Use group testing: maintain counters for each test Heavy tail property: Total frequency of non-heavy items < 1/d Maintain total count m ………… 1 2 3 n c1 …………. …………. 1 0 0 0 0 0 1 1 …………. 0 0 1 0 c2 ri = 1 iff ci ≥ m/d c3 xj= 1 iff j is a heavy item (|x| ≤ d) . . . Maintain count of items in tests . . . Reporting the heavy items is just decoding! r = M × x …………. 1 1 1 0 ct

  15. Requirements from group testing Non-adaptiveness is crucial Minimize t (space) ………… 1 2 3 n c1 …………. …………. 0 1 0 0 0 0 1 1 Strongly explicit matrix …………. 0 0 1 0 c2 c3 Minimize decoding time (report time) . . . . . . …………. 1 1 1 0 ct

  16. An overview of results d is O(logn) # tests (t) Decoding time O(nt) O(d2 log n) [DR82], [PR08] Big savings O(d4 log n) O(t) [GI04] O(d2 log2 n) poly(t) [GI04, implicit] O(d2 log n) poly(t) [INR10, NPR11]

  17. Tackling the first row O(d4 log n) O(t) [GI04] # tests (t) Decoding time O(d2 log n) O(nt) [DR82], [PR08] O(d2 log2 n) poly(t) [GI04, implicit] O(d2 log n) poly(t) [INR10, NPR11]

  18. d-disjunct matrices Every non-positive column has one0test result Sufficient condition for group testing d columns 0 0 0 …………….. 0 1 Test result=0 Exists Set of positives True for every d subset of columns and a disjoint column

  19. Naïve decoder for d-disjunct matrices If rj = 0 then for every column i that is in test j, set xi = 0 d columns If xi=1 then all tests column i participates in will have a 1 0 0 0 …………….. 0 1 Set of positives O(nt) time O(Lt) time L columns

  20. What is known Strongly explicit d-disjunct matrix with t = O(d2 log2n) [Kautz-Singleton 1964] d columns Randomized d-disjunct matrix with t = O(d2 logn) [Dyachkov-Rykov 1982] Deterministic d-disjunct matrix with t = O(d2 logn) [Porat-Rothschild 2008] r1 r2 Lower bound of Ω(d2 log n/log d) [Dyachkov-Rykov 1982] r3 . . . 0 0 0 …………….. 0 1 rt d-disjunct matrix Set of positives O(nt) time

  21. Up next O(d4 log n) O(t) [GI04] # tests (t) Decoding time O(d2 log n) O(nt) [DR82], [PR08] O(d2 log2 n) poly(t) [GI04, implicit] O(d2 log n) poly(t) [INR10, NPR11]

  22. Error-correcting codes C(x) x • Mapping C : km • Dimension k, blocklength m • m≥ k • Rate R =k/m 1 • Efficient means polynomial in m • Decoding time complexity y x Give up

  23. Noise model Errors are worst case (Hamming) error locations arbitrary symbol changes Limit on total number of errors

  24. Hamming’s 60 yr old observation D/2 ≥ D Large “distance” is good

  25. All you need to remember about Reed-Solomon codes– Part I q is a prime power qq/(d+1)vectors from [q]qwhere every two agree in < q/(d+1) positions

  26. Concatenation of codes [Forney 66] C1: ({0,1}k)K({0,1}k)M (Outer code) C2: {0,1}k{0,1}m (Inner code) C1° C2: {0,1}kK {0,1}mM Typically k=O(log M) C2(wM) C2(w1) C2(w2) How do we get binary codes ? x x1 x2 xK w1 w2 wM C1(x) C1° C2(x)

  27. Disjunct matrices from RS codes Column i gets ith codeword n = qq/(d+1) Code Concatenation t = q2= O(d2 log2n) …. x …. 0 0 0 1 q x q rows x d-disjunct matrix [Kautz,Singleton] .

  28. A q=3 example 0 1 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 1 2 1 1 0 2 0 0 0 1 0 1 0 1 0 0 1 0 0 0 1 0 0 0 1 1 1 2 0 0 2 0 1 0 0 0 1 1 0 0 0 0 1 1 0 0 0 1 0 2 1 0 1 0 2 1 0 2 0 0 1 0 1 0 0 0 1

  29. 1-Agreement between two columns 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1 1 0 0 ≤ 1 agr 2 1 1 0 0 2 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 2 2 1 1 0 0 1 0 1 0 1 0 0 0 1 0 1 0 0 0 0 1 2 1 1 2 0 0 1 0 2 Agreement in binary = Agreement among RS codewords < q/(d+1) 0 0 1 0 1 0 1 0 0

  30. d-disjunct matrices Sufficient condition for group testing d columns 0 0 0 …………….. 0 1 Exists Set of positives True for every d subset of columns and a disjoint column

  31. d-disjunctness of Kautz-Singleton d columns 1 0 0 0 >q- q*d/(d+1)>0 rows 1 1 < q/(d+1) agr 1 1 < q/(d+1) agr 1 1 < q/(d+1) agr

  32. Up next O(d4 log n) O(t) [GI04] # tests (t) Decoding time O(d2 log n) O(nt) [DR82], [PR08] O(d2 log2 n) poly(t) [GI04, implicit] O(d2 log n) poly(t) [INR10, NPR11]

  33. The basic idea Every column is a codeword unknown Observed r1 x1 Show is same as `decoding’ the code r2 x2 r3 ………… x3 1 2 3 n . . . …………. …………. …………. …………. . . . . . . 1 1 0 0 0 0 0 1 0 0 1 1 0 1 1 0 1 2 rt 3 . . . . . . n= # codewords = exp(m) xn t t = poly(m)

  34. Decoding x C(x) C(x) sent, y received x k,y m How much of y must be correct to recover x ? At least k symbols must be correct At most (m-k)/m = 1-R fraction of errors 1-R is the information-theoretic limit : the fraction of errors decoder can handle Information theoretic limit implies 1-R R = k/m y

  35. Not if we always want to uniquely recover the original message Limit for unique decoding,  <(1-R)/2 R 1-R c1 c2 r Can we get to the limit or 1-R ? (1-R)/2 (1-R)/2 1-R (1-R)/2

  36. (1-R)/2 List decoding[Elias57, Wozencraft58] Almost all the space in higher dimension. All but an exponential (in m) fraction Always insisting on unique codeword is restrictive The “pathological” cases are rare “Typical” received word can be decoded beyond (1-R)/2 Better Error-Recovery Model Output a list of answers List Decoding Example: Spell Checker

  37. Unique decoding Inf. theoretic limit Frac. of Errors () Rate (R) Information theoretic limit • < 1 - R • Information-theoretic limit • Can handle twice as many errors Achievable by random codes. NOT ALGORITHMIC!

  38. Other applications of list decoding Cryptography Cryptanalysis of certain block-ciphers [Jakobsen98] Efficient traitor tracing scheme [Silverberg, Staddon, Walker 03] Complexity Theory Hardcore predicates from one way functions [Goldreich,Levin 89; Impagliazzo 97; Ta-Shama, Zuckerman 01] Worst-case vs. average-case hardness [Cai, Pavan, Sivakumar 99; Goldreich, Ron, Sudan 99; Sudan, Trevisan, Vadhan 99; Impagliazzo, Jaiswal, Kabanets 06] Other algorithmic applications IP Traceback [Dean,Franklin,Stubblefield 01; Savage, Wetherall, Karlin, Anderson 00] Guessing Secrets [Alon,Guruswami,Kaufman,Sudan 02; Chung, Graham, Leighton 01]

  39. Unique decoding Inf. theoretic limit Frac. of Errors () Rate (R) Algorithmic list decoding results   1- R -  > 0 Folded RS codes [Guruswami, R.06] Guruswami-Sudan 98 Parvaresh-Vardy 05 Folded RS

  40. Concatenation of codes [Forney 66] C1: ({0,1}k)K({0,1}k)M (Outer code) C2: {0,1}k{0,1}m (Inner code) C1° C2: {0,1}kK {0,1}mM Typically k=O(log M) C2(wM) C2(w1) C2(w2) Concatenated codes • Brute force decoding for inner code x x1 x2 xK w1 w2 wM C1(x) C1° C2(x)

  41. List decoding C1° C2 S1 S2 SM in {0,1}m y1 y2 yM in {0,1}k How do we “list decode” from lists ?

  42. List recovery S1 S2 S3 SM Si subset of [q] . . . Output all codewords that agree with (all) the input lists ……………………… . . . . |Si| ≤ d ……………………… c1 c3 cM c2

  43. All you need to remember about (Reed-Solomon) codes-- Part II q is a prime power qq/(d+1)vectors from [q]qwhere every two agree in < q/(d+1) positions poly(q) time algorithm for list recovery S1 S2 S3 Sq Si subset of [q] . . . Output all codewords that agree with all the input lists ……………………… . . . . |Si| ≤ d ……………………… c1 c3 cq c2

  44. Back to the example 0 0 1 0 1 0 1 0 0 0 0 1 0 1 0 1 0 0 0 1 1 {1,2} {2} {0,2} 0 1 2 0 2 1 1 0 0 0 1 0 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 1 2 2 0 1 0 0 1 0 0 0 1 0 0 1 0 1 0 1 0 0 1 0 0 1 0 1 1 2 0 1 0 2 1 0 2 + items Result vector 1 0 0 0 1 0 0 0 1

  45. All you ever needed to know about (Reed-Solomon) codes… at least for this talk q is a prime power qq/(d+1)vectors from [q]qwhere every two agree in < q/(d+1) positions poly(q) time algorithm for list recovery S1 S2 S3 Sq Si subset of [q] . . . Output all codewords that agree with all the input lists ……………………… |Si| ≤ d . . . . ……………………… c1 c3 cq c2

  46. What does this imply? d2 columns d columns Implicit in [Guruswami-Indyk 04] t = O(d2 log2 n) r1 r2 r3 . . . 0 0 0 …………….. 0 1 rt KS matrix Set of positives poly(t) time O(d2t) time

  47. Up next O(d4 log n) O(t) [GI04] # tests (t) Decoding time O(d2 log n) O(nt) [DR82], [PR08] O(d2 log2 n) poly(t) [GI04, implicit] O(d2 log n) poly(t) [INR10, NPR11]

  48. Filter-evaluate decoding paradigm L columns d columns r1 y1 “Filtering” matrix r2 y2 r3 y3 . . . . . . 0 0 0 …………….. 0 1 rt yt’ d-disjunct matrix Set of positives O(Lt) time poly(t’)time

  49. So all we need to do [Indyk, Ngo, R. 10] [Ngo, Porat, R. 11] o(d2 log n/log d) tests

  50. Overview of the results O(d4 log n) O(t) [GI04] # tests (t) Decoding time O(d2 log n) O(nt) [DR82], [PR08] O(d2 log2 n) poly(t) [GI04, implicit] O(d2 log n) poly(t) [INR10, NPR11]

More Related