250 likes | 423 Views
shRNA libraries sequencing using DNA Sudoku. Yaniv Erlich Hannon Lab. Preparing DNA libraries. Programmable microarray. Cloning into plasmids. Transformation. Array single colonies. The problem. Input : 40,000 bacterial colonies Output: The sequence of the shRNA inserts. Insert type.
E N D
shRNA libraries sequencing using DNA Sudoku Yaniv Erlich Hannon Lab erlich@cshl.edu
Preparing DNA libraries Programmable microarray Cloning into plasmids Transformation Array single colonies erlich@cshl.edu
The problem Input: 40,000 bacterial colonies Output: The sequence of the shRNA inserts Insert type erlich@cshl.edu
Motivation • Filtering the correct fragments • Balanced representation • Subset selection. erlich@cshl.edu
Clone-by-clone sequencing Clone-by-clone sequencing: Sequence each clone by a capillary platform Caveat: Cost: ~40,000$ Conclusion: using next generation sequencing erlich@cshl.edu
Naïve next-gen Solexa Pooling ?? Conclusion: we need to add a source clone identifier (barcode) erlich@cshl.edu
Naive barcoding Solexa Pooling Barcoding • Caveats: • Order 40,000 barcodes. Each of length of ~95nt. • 40,000 PCR reactions. Conclusion: we need less barcodes erlich@cshl.edu
Naive Pooling(1) Barcode: Case #1: Which specimen appears in both barcode #5 and #B? Specimen #13! erlich@cshl.edu erlich@cshl.edu
Naive Pooling(2) Barcode: Case #2: Or maybe ACGTT associated with specimens #25(D,2) and #34(E,1)? ACGTT associated with specimens #25(D,1) and #34 (E,2)! Ambiguity Conclusion: we should deal with shRNA ‘duplicates’ erlich@cshl.edu erlich@cshl.edu
Lessons learned for the desired scheme erlich@cshl.edu erlich@cshl.edu
Overview of our solution ‘Chinese’ Pooling PE sequencing Barcoding Decoding erlich@cshl.edu erlich@cshl.edu
The pooling design Combinatorial pooling using the Chinese Remainder Theorem (CRT). "I have never done anything 'useful'. No discovery of mine has made, or is likely to make, directly or indirectly, for good or ill, the least difference to the amenity of the world” (G. Hardy, A Mathematician's Apology,1940) erlich@cshl.edu
Chinese remainder riddle “An old woman goes to market and a horse steps on her basket and crashes the eggs. The rider offers to pay for the damages and asks her how many eggs she had brought. She does not remember the exact number, but when she had taken them out 3 at a time, there was one egg left. The same happened when she picked them out 4, and 5 at a time, but when she took them 7 at a time they came out even. What is the smallest number of eggs she could have had?” • Chinese Remainder Theorem says: • There is one-to-one correspondence between n (0n<2*3*5*7) and the residues. • There is an easy algorithm to solve the equation system. Answer: 91 eggs erlich@cshl.edu
Pooling construction with modular equations Destination well (different plates) Specimen Pooling window One-to-One correspondence… erlich@cshl.edu
Example of Chinese pooling Source array: 03/06/09 erlich@cshl.edu erlich@cshl.edu
Chinese Remainder Pooling Design • Inputs:N (number of specimens in the experiment) • Weight (pooling efforts) • Algorithm: • 1. Find W numbers {x1,x2,…,xw} such that: • Bigger than • Pairwise coprime • For instance: {5,8,9} but not {5,6,9} • 2. Generate W modular equations: • 3. Construct the pooling design upon the modular equations • Output: Pooling design Chinese Remainder Theorem asserts: (1) Two specimens will be meet in no more than one pool. (2) The number of pools Number of bc: erlich@cshl.edu erlich@cshl.edu
How good is our method? erlich@cshl.edu erlich@cshl.edu
Barcode reduction IEEE Transaction on Information Theory (1964) Proved upon pure combinatorial constrains: the lower theoretical bound of the number of barcodes is Our method is very close the lower theoretical bound erlich@cshl.edu erlich@cshl.edu
How good is our method? erlich@cshl.edu erlich@cshl.edu
Dealing with duplicates - simulation 0.99 Probability of correct decoding Duplicates size 40,000 specimens with only 384 barcodes erlich@cshl.edu erlich@cshl.edu
How good is our method? • W=5: • 5 lanes of Solexa • One week and a half of robotics erlich@cshl.edu erlich@cshl.edu
How good is our method? erlich@cshl.edu erlich@cshl.edu
Real results… • Arabidopsis shRNA library with 17,000 shRNA fragments • Picked 40,320 bacterial colonies • Sequence 3,000 colonies with capillary sequencing for comparison. • Decoded ~20,500 bacterial colonies with correct inserts • 96% of the assignments were correct. • ~8,000 unique fragments of the library. erlich@cshl.edu
Future directions • Developing a more advance decoder using machine learning approach • 2-stage algorithm erlich@cshl.edu
Acknowledgements Greg Hannon Oron Navon and Roy Ronen Ken Chang Michelle Rooks Assaf Gordon 03/06/09 DNA Sudoku erlich@cshl.edu erlich@cshl.edu