1 / 32

Watson School of Biological Sciences Cold Spring Harbor Laboratory

Compressed Sensing Approaches for High Throughput Carrier Screen. Watson School of Biological Sciences Cold Spring Harbor Laboratory. Yaniv Erlich. Joint work with Noam Shental , Amnon Amir and Or Zuk. Outline. What is a carrier screen? Our vision - compressed sensing carrier screen

dgenaro
Download Presentation

Watson School of Biological Sciences Cold Spring Harbor Laboratory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compressed Sensing Approaches for High Throughput Carrier Screen Watson School of Biological Sciences Cold Spring Harbor Laboratory Yaniv Erlich Joint work with Noam Shental, Amnon Amir and Or Zuk erlich@cshl.edu

  2. Outline • What is a carrier screen? • Our vision - compressed sensing carrier screen • Unique features of our setting • Bayesian reconstruction algorithm • Simulations Compressed sensing carrier screen erlich@cshl.edu

  3. Rare recessive genetic diseases Cystic Fibrosis Name Genotype Phenotype Normal Healthy ~29/30 ~1/30 Carrier Healthy! 0.003% Disease Affected Compressed sensing carrier screen erlich@cshl.edu

  4. Carrier breading may lead to devastating results Carrier couple No Carrier Carrier Affected 1:4 1:2 1:4 Compressed sensing carrier screen erlich@cshl.edu

  5. What can we do? • Several countries employ nationwide programs • - screen the bulk population • - very limited set of genes Compressed sensing carrier screen erlich@cshl.edu

  6. Carrier screen - the current mechanism Input: Thousands of specimens. Output: Finding carriers for rare genetic diseases Serial processing: - sequence: 1 region of 1 person per reaction - expensive and does not scale A needle in a haystack problem Compressed sensing carrier screen erlich@cshl.edu

  7. Carrier screens - our vision Ultra-high throughput carrier screen Many specimens + many regions • Adding more genes to the test panel while keeping the task in a tractable scale • Increase the participation by reducing the cost Compressed sensing carrier screen erlich@cshl.edu

  8. Next generation sequencers – parallel processing Sequence 100 million DNA molecules in a single batch (~1 week) • BUT • On pooled samples - only histogram of the DNA sequence type. • How to multiplex many specimens with next generation sequencers? When pooling 4 normal specimens and 1 carrier WT allele Fraction of reads Example: Mutant Compressed sensing carrier screen erlich@cshl.edu

  9. Φ X y = T pools N The ratio of carrier reads Pooling design 0-1 matrix carrier Multiplexing - the compressed sensing approach CS principle: when x is sparse, very few measurements are sufficient for faithful reconstruction. y = Φx Compressed sensing carrier screen erlich@cshl.edu

  10. Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Signal domain Compressed sensing carrier screen erlich@cshl.edu

  11. Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Signal domain Compressed sensing carrier screen erlich@cshl.edu

  12. Compression level Specimens (N) Pools (t) Weight (w) On a budget compressed sensing Φ= Random matrix with p=0.5 • Heavy weight design requires long pooling steps and higher material consumption • Higher compression level is more prone to technical difficulties • We want a verysparse sensing matrix Compressed sensing carrier screen erlich@cshl.edu

  13. Light Chinese Design mod 6 mod 7 • Inputs:N (number of specimens in the experiment) • Weight (pooling efforts) • Algorithm: • 1. Find W numbers {x1,x2,…,xw} such that: • Bigger than • Pairwise coprime • 2. Generate W modular equations: • 3. Construct the pooling design upon the modular equations • Output: Sparsepooling design with • Advantages: • (w-1)-disjunct matrix • The weight does not explicitly depend on the number of specimens • The compression level is • Easy to debug Compressed sensing carrier screen erlich@cshl.edu

  14. Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Signal domain Compressed sensing carrier screen erlich@cshl.edu

  15. # carrier reads ~ Not all pools were born equal • The sequencer does not report the absolute number of carriers in the pool • Instead: # total sequence reads Fraction of carriers in the pool / 2 • Pools with ↑sequence reads and ↓carriers provide more reliable information. • The noise is not additive but with correlation to the content of the pool. • We need a reconstruction algorithm that takes into account the reliability of the data from each pool. Compressed sensing carrier screen erlich@cshl.edu

  16. Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Signal domain Compressed sensing carrier screen erlich@cshl.edu

  17. Signal Domain In traditional CS: Traditional CS decoder solves: In compressed carrier screen: • What are the implications of using traditional decoder and employing rounding procedure? • Can we find reconstruction procedure that directly finds Compressed sensing carrier screen erlich@cshl.edu

  18. Bayesian reconstruction algorithm Biological data Pooling data Pooling model and sequencing Biological expectations Φ Only the specimens in the pool are affecting the pool results Biologically, the genotype of one specimen is not dependent on the genotype of other one (unless relatives) Approximation by loopy Belief Propagation… Compressed sensing carrier screen erlich@cshl.edu

  19. Advantages of Belief Propagation • Bottom up approach – weighs the reliability of each individual pool • Bayesian – everything speaks the same language. Can incorporate a-priori medical information and familial connections. • Encoding advantage – Chinese pooling ensures that there are no short cycles • Binary results directly – no rounding procedure at the end Pooling data Biological data Compressed sensing carrier screen erlich@cshl.edu

  20. Simulations of compressed carrier screen in Ashkenazi Jews • Finding carriers for two Ashkenazi Jews diseases: Tay-Sachs and Bloom syndrome. • Chinese pooling design • Comparing GPSR (traditional solver) and BP • Evaluating Nmax – the largest number of specimens for which at least 48 out of 50 runs give 100% accuracy. Compressed sensing carrier screen erlich@cshl.edu

  21. Results Bloom Tay-Sachs BPGPSR Pools/Specimen = 6.5% Pools/Specimens= 13% Compressed sensing carrier screen erlich@cshl.edu

  22. Conclusions • CS framework can be utilized for ultra-high throughput carrier screens. • Our setting shows several unique features not in traditional framework • We suggest tailored encoding (light Chinese) and decoding (BP) procedures • At least in our settings: a tailor decoder, BP, has an advantage over reconstructing with off-the shelf CS solver • CS carrier screen has the potential to reduce dramatically the cost of sequencing. Compressed sensing carrier screen erlich@cshl.edu

  23. The real thing An ongoing study… Compressed sensing carrier screen erlich@cshl.edu

  24. Acknowledgements Funding: Lindsay Goldberg PhD Fellowship ACM/IEEE-CS HPC PhD Fellowship Greg Hannon Noam Shental & Amnon Amir Or Zuk Igor Carron (Nuit Blanche) For more information: hannonlab.cshl.edu/labmembers/erlich Compressed sensing carrier screen erlich@cshl.edu

  25. Loopy belief propagation is tricky Damping is the key DNA Sudoku erlich@cshl.edu

  26. erlich@cshl.edu

  27. Pools not in use   Pooling imperfections • Background contamination • Pooling failures (erasures) Data from a real experiment mod 377 # Reads Pools erlich@cshl.edu

  28. Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Pooling imperfections • Signal domain erlich@cshl.edu

  29. Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Pooling imperfections • Signal domain erlich@cshl.edu

  30. Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Pooling imperfections • Signal domain erlich@cshl.edu

  31. Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Pooling imperfections • Signal domain erlich@cshl.edu

  32. Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Pooling imperfections • Signal domain erlich@cshl.edu

More Related