270 likes | 459 Views
The beauty of prime numbers vs the beauty of the random . Ely Porat Bar-Ilan University Israel. Outline. Applications Prime Numbers Group Testing De-randomized approach for group testing Applications getting into details Length Reduction. Pattern Matching .
E N D
The beauty of prime numbersvs the beauty of the random Ely Porat Bar-Ilan University Israel
Outline • Applications • Prime Numbers Group Testing • De-randomized approach for group testing • Applications getting into details • Length Reduction
Pattern Matching • Given a Text T and Pattern P, the problem is to find all the substring of T that equal to P. T= P=
Streaming Model • The character of T arrive one by one • We can’t save T T= Automata? Φ(P) P= Our goal is to do that without saving P
Hamming distance with wildcards • Find a pattern in a text with 2 complications: • Don’t cares (wildcards Ø) • Mismatches Text: Pattern:
Summaries results • Offline • O(nklog2m) hamming distance with wildcards • Online Pattern Matching • hamming distance • O(klog2m) hamming distance with wildcards • O(klogm) Edit distance • Streaming • O(log2m) space O(logm) time – Exact match • O(k3log5m) space O(k2log2m) time – hamming
Open problem t2p1+t3p2+…t5p6 • Online convolution in o(log2m) time per symbol. • Offline is done by FFT in O(nlogm). p1 p2 p3 p4 p5 t1 t2 t3 t4 t5 t6 . . . tn p1 p2 p3 p4 p5 m=5 t1p1+t2p2+…t5p5
. . . Problem Definition . . . • m people • at most k are sick • Query: Is someone in this set sick? • Goal: identify the sick people by only few tests. • Non-adaptive ? ? ? ? ? ?
Motivations • Syphilis, HIV [Dor43] • Mapping genomes [BLC91, BBK+95, TJP00] • Quality control in product testing [SG59] • Searching files in storage systems [KS64] • Sequential screening of experimental variables [Li62] • Efficient contention resolution algorithms for multiple access communication [KS64, Wol85] • Data compression [HL00] • Software testing [BG02, CDFP97] • DNA sequencing [PL94] • Molecular biology [DH00, FKKM97, ND00, BBKT96]
Background Scheme size • Same conditions: • Deterministic KS64 • Random KS64 • Heavy deterministic AMS06 • Lower bound: • CR96 • Relaxed conditions: • Fully adaptive • Two staged group testing and selectors [CGR00, Kni95, BGV03, CMS01, BV03, BGV05] • Optimal monotone encoding [AH08] • Similar problems: • Inhibitors [FKKM97, Dam98, BV98, BGV03] • Bayesian case [Kni95, BL02, BL03, A.J98, BGV03] • Errors [BGV98] • DIMACS 2006 Deterministic Random and Heavy deterministic Lower bound
Our Results Scheme size • Deterministic • Size • Fast construction Deterministic Random and Heavy deterministic Lower bound
Prime Numbers Group Testing Position of sicks Bad event: Exist y s.t
Prime Numbers Group Testing Bad event: Exist y s.t x1 x2 x3 x4 . . . xk There is a dot below each prime There exisit xi that for pi1pi2…pid>n Y mod pij=xi By CRT xi=y
Prime Numbers Group Testing This give group testing of size: p1+p2+…+pr By choosing good enough primes we get O(k2log2m)
Randomized Group Testing • Just choose O(k2logn) random sets of size n/k.
Error correction codes • Length of words = m • Number of words = • Distance = • Rate = R • Relative distance = • Linear code Rm m
Good random linear error correction codes • GV bound: There existswith • Linear codes faster construction • Algorithm: Pick the entries of the generating matrixuniformly and independently.
Method of conditional probabilities • Algorithm: Pick the entries of the generating matrix one by one. • In each step minimize the expectednumber of collisions between code words.
0 0 0 0 2 1 0 0 0 1 1 1 1 1 1 1 0 2 2 2 2 1 0 2 1 0 1 1 0 2 1 2 1 0 1 2 2 0 1 2 0 2 1 2 0 1 2 2 2 0 1 C=[3,2,2]3-RS
Reduction from Error correction codes to group testing schemes C=[3,2,2]3-RS: 1: 0 0 0 2: 1 1 1 3: 2 2 2 4: 0 1 2 5: 1 2 0 6: 2 0 1 7: 0 2 1 8: 2 1 0 9: 1 0 2 GT scheme: {1,4,7} {2,5,9} {3,6,8} {1,6,9} {2,4,8} {3,5,7} {1,5,8} {2,6,7} {3,4,9}
Why should it work? • Theorem: Let C be an Then F(C) is a group testing scheme for n people with up to sick people. C=[3,2,2]3-RS: 1: 0 0 0 2: 1 1 1 3: 2 2 2 4: 0 1 2 5: 1 2 0 6: 2 0 1 7: 0 2 1 8: 2 1 0 9: 1 0 2 (Up to 2 Sick people) GT scheme: {1,4,7} {2,5,9} {3,6,8} {1,6,9} {2,4,8} {3,5,7} {1,5,8} {2,6,7} {3,4,9}
Why should it work? Proof Codewords representing sick men: k A codeword representing a healthy man:
Worst Case Codewords representing sick men: k A codeword representing a healthy man:
What we got? Scheme size Deterministic Random and Heavy deterministic Lower bound
Applications getting into details • Streaming • Up to 1 mismatch: • Assume we have a black box for searching for exact match. P: p1p2p3p4p5…pm P1,2: p1 p3 p5…pm There is more then one mistake P2,2: p2 p4 … The other way around isn’t true
Streaming: Up to 1 mismatch P: p1p2p3p4p5…pm P1,2: p1 p3 p5…pm 2*3*5*7*11*…*q>m P2,2: p2 p4 … With CRT we be able to find the position of the mismatch. P1,3: p1 p4 …pm P2,3 : p2 p5… P3,3: p3 … In order to support more mistake we will had on that The Prime numbers group testing Pq,q: