1 / 31

Computational Molecular Biology

Computational Molecular Biology. Group Testing – Pooling Designs. Group Testing (GT). Definition : Given n items with at most d positive ones Identify all positive ones by the minimum number of tests Each test is on a subset of items

celiac
Download Presentation

Computational Molecular Biology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Molecular Biology Group Testing – Pooling Designs

  2. Group Testing (GT) • Definition: • Given n items with at most d positive ones • Identify all positive ones by the minimum number of tests • Each test is on a subset of items • Positive test outcome: there exists a positive item in the subset My T. Thai mythai@cise.ufl.edu

  3. An Idea of GT _ _ _ _ _ _ _ _ _ _ _ + _ _ _ _ _ + Positive Negative My T. Thai mythai@cise.ufl.edu

  4. Example 1 – Sequential Method 1 2 3 4 5 6 7 8 9 1 2 3 4 5 4 5 My T. Thai mythai@cise.ufl.edu

  5. Example 2 – Non-adaptive Method P4p5 p6 p1 1 2 3 p2 4 5 6 p3 7 8 9 Non-adaptive group testing is called pooling design in biology My T. Thai mythai@cise.ufl.edu

  6. Sequential and Non-adaptive • Sequential GT needs less number of tests, but longer time. • Non-adaptive GT needs more tests, but shorter time. • In molecular biology, non-adaptive GT is usually taken. Why? My T. Thai mythai@cise.ufl.edu

  7. Because… • The same library is screened with many different probes. It is expensive to prepare a pool for testing first time. Once a pool is prepared, it can be screened many times with different probes. • Screening one pool at a time is expensive. Screening pools in parallel with same probe is cheaper. • There are constrains on pool sizes. If a pool contains too many different clones, then positive pools can become too dilute and could be mislabeled as negative pools. My T. Thai mythai@cise.ufl.edu

  8. Pooling Designs • Problem Definition • Given a set of n clones with at most d positive clones • Identify all positive clones with the minimum number of tests • Pool:a subset of clones • Positive pool: a pool contains at least one positive clone • Clones = Items My T. Thai mythai@cise.ufl.edu

  9. Relation to Pooling Designs clones c1 c2 cj cn p1 0 0 … 0 … 0 … 0 … 0 0 p2 0 1 … 0 … 0 … 0 … 0 1 pools . . . . pi 0 0 … 0 … 1 … 0 … 0 1 . . . . pt 0 0 … 0 … 0 … 0 … 0 0 txn tx1 M[i, j] = 1 iff the ith pool contains the jth clone Decoding Algorithm: Given M and V, identify all positive clones V Testing Mtxn = My T. Thai mythai@cise.ufl.edu

  10. Observation clones c1 c2 c3 cj p1 1 1 1 0 0 0 0 0 0 p2 0 0 0 1 1 1 0 0 0 p3 0 0 0 0 0 0 1 1 1 pools 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 Observation: All columns are distinct. To identify up to d positives, all unions of up to d columns should be distinct! Union of d columns: Boolean sum of these d columns My T. Thai mythai@cise.ufl.edu

  11. Challenges • Challenge 1: How to construct the binary matrixM such that: • Outputs of any union of d columns are distinct • Challenge 2: How to design a decoding algorithm with efficient time complexity [O(tn)] My T. Thai mythai@cise.ufl.edu

  12. d-separable Matrix clones c1 c2 c3 cjcn p1 0 0 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0 p2 0 1 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0 p3 1 0 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0 pools 0 0 1 … 0 … 0 … 0 … 0 … 0 … 0 … 0 . . pi 0 0 0 … 0 … 0 … 1 … 0 … 0 … 0 … 0 . . pt 0 0 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0 All unions of d columns are distinct. My T. Thai mythai@cise.ufl.edu

  13. d-separable Matrix clones c1 c2 c3 cjcn p1 0 0 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0 p2 0 1 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0 p3 1 0 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0 pools 0 0 1 … 0 … 0 … 0 … 0 … 0 … 0 … 0 . . pi 0 0 0 … 0 … 0 … 1 … 0 … 0 … 0 … 0 . . pt 0 0 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0 All unions of up to d columns are distinct. Decoding: O(nd) My T. Thai mythai@cise.ufl.edu

  14. d-disjunct Matrix • Definition:An binary matrix Mtxn is a d-disjunct matrix (d < t) if: • The union of any d columns does not contain any other column • Example: 1 0 0 0 1 0 0 0 1 A 2-disjunct matrix M = My T. Thai mythai@cise.ufl.edu

  15. d-disjunct Matrix (cont) • d-disjunct matrix can efficiently identify up to d positive clones. Why? • Theorem 1: All unions of d distinct columns are distinct (thus d-disjunct implies d-separable) • Theorem 2: Thenumber of clones not in negative pools is always at most d • Corollary 1: The tests of negative outputs determine all negative clones • Decoding time complexity: O(tn) My T. Thai mythai@cise.ufl.edu

  16. Proof of Theorem 2 • Note that an item does not appearing in any negative pool iff its corresponding column is contained by the union of d positive columns • Therefore, the number of items not appearing in any negative pool is more than d iff there are at least a non-positive item whose column is contained by the d positive columns • But M is d-disjunct, hence Theorem 2 follows My T. Thai mythai@cise.ufl.edu

  17. Decoding Algorithm Input:d-disjunct matrix M and output vector V Output: All positive clones for each clone c in n clones ifc is in a negative pool remove c return remaining clones c1 c2c3 c4 c5 c6 p1 1 1 1 0 0 0 1 P2 1 0 0 1 1 0 0 P3 0 1 0 1 0 1 0 P4 0 0 1 0 1 1 1 My T. Thai mythai@cise.ufl.edu

  18. Fields • Field: is any set of elements that satisfies the field axioms for both addition and multiplication and is a division algebra • Eg: Compex, Rational, Real My T. Thai mythai@cise.ufl.edu

  19. Division Algebra My T. Thai mythai@cise.ufl.edu

  20. Finite Fields • Finite Field: • is a field with a finite field order, i.e., number of elements. • The order of a finite field is always a prime or a prime power (power of a prime) • Eg: 16 = 2^4 is a prime power where 6, 15 are not • Eg: in GF(5), 4+3=7 is reduced to 2 modulo 5 My T. Thai mythai@cise.ufl.edu

  21. Consider a finite field GF(q). Choose s, q, k satisfying: Step 1: Construct matrixAsxnas follows: forx from 0 to s -1 for each polynomials pj of degree k A[x,pj] = pj(x) p1 p2 pj pn 0 1 A= xp2(x)pj(x) s-1 How to construct a d-disjunct matrix My T. Thai mythai@cise.ufl.edu

  22. Step 2: Construct matrixBtxn from Asxnas follows: forx from 0 to s -1 fory from 0 to q -1 for each polynomials pj of degree k ifA[x,pj] = = y B[(x,y),pj] = 1 elseB[(x,y),pj] = 0 p1 p2 pj pn 0 1 A= x p2(x) pj(x) s-1 Algorithm (cont) p2(x) ≠ y p1 p2 pj pn (0,0) (0,1) B= (x,y) (s-1,q-1) pj(x) = y 0 1 My T. Thai mythai@cise.ufl.edu

  23. Algorithm Analysis • Theorem 3: (Correctness) If kd ≤ s ≤ q, then Btxn is d-disjunct. • Theorem 4: The number of testst obtained from this algorithm is t = qs = O(q2) where: My T. Thai mythai@cise.ufl.edu

  24. Errors in Experiments • False negative: • Pool contains some positive clones • But return the negative outcome • False positive: • Pool contains all negative clones • But return the positive outcome My T. Thai mythai@cise.ufl.edu

  25. An e-Error Correcting Model • Definition: • Assume that there is at most e errors in testing • All positive clones can still be identified • Hamming distance: the Hamming distance of two column vectors is the number of different components between them • e-error-correcting: A matrix is said to be e-error-correcting if the Hamming distance of any two unions of d columns is at least 2e + 1 My T. Thai mythai@cise.ufl.edu

  26. (d,e)-disjunct Matrix • Definition: An t × n binary matrix M is (d, e)-disjunct if for any one column j and any other d columns j1, j2, . . . , jd, there exist e + 1 rows i0, i2, … , ie such that Miuj = 1 and Miujv = 0 for u = 0, 1,…, e and v = 1, 2, . . . , d My T. Thai mythai@cise.ufl.edu

  27. E-error Correcting • Theorem 5: For every (d,k)-disjunct matrix, the Hamming distance between any two unions of d columns is at least 2k + 2 My T. Thai mythai@cise.ufl.edu

  28. Theorem 6 • Theorem 6: Suppose testing is based on a (d,e)-disjunct matrix. If the number of errors is at most e, then the number of negative pools containing a positive item is always smaller than the number of negative pools containing a negative item My T. Thai mythai@cise.ufl.edu

  29. Proof of Theorem 6 • Let i be a positive item, j be a negative item. Suppose #negative pools containing i = m. Then m pools must receive errors. Hence, there are at most e – m error tests turning negative outcome to positive outcome. Moreover, if no error exists, # negative pools containing j is at least e + 1 due to (d,e)-disjunct. Hence #negative pools containing j is at least (e+1)-(e-m) = m +1>m My T. Thai mythai@cise.ufl.edu

  30. Decoding in e-error-correcting • Corollary: From Theorem 6, we see that to decode positives from testing based on (d,e)-disjuct matrix, we only need to compute the number of negative pools containing each item and select d smallest one. This runs in time O(nt) My T. Thai mythai@cise.ufl.edu

  31. Decoding Algorithm with e Errors T= empty set for each clone ci (i = 1…n) t(ci) = # negative pools containing ci T= Tt(ci) end for Let Td = set of dsmallestt(ci) in T return ci if t(ci) in Td Time complexity: O(tn) My T. Thai mythai@cise.ufl.edu

More Related