320 likes | 345 Views
Compressed Sensing Approaches for High Throughput Carrier Screen. Watson School of Biological Sciences Cold Spring Harbor Laboratory. Yaniv Erlich. Joint work with Noam Shental , Amnon Amir and Or Zuk. Outline. What is a carrier screen? Our vision - compressed sensing carrier screen
E N D
Compressed Sensing Approaches for High Throughput Carrier Screen Watson School of Biological Sciences Cold Spring Harbor Laboratory Yaniv Erlich Joint work with Noam Shental, Amnon Amir and Or Zuk erlich@cshl.edu
Outline • What is a carrier screen? • Our vision - compressed sensing carrier screen • Unique features of our setting • Bayesian reconstruction algorithm • Simulations Compressed sensing carrier screen erlich@cshl.edu
Rare recessive genetic diseases Cystic Fibrosis Name Genotype Phenotype Normal Healthy ~29/30 ~1/30 Carrier Healthy! 0.003% Disease Affected Compressed sensing carrier screen erlich@cshl.edu
Carrier breading may lead to devastating results Carrier couple No Carrier Carrier Affected 1:4 1:2 1:4 Compressed sensing carrier screen erlich@cshl.edu
What can we do? • Several countries employ nationwide programs • - screen the bulk population • - very limited set of genes Compressed sensing carrier screen erlich@cshl.edu
Carrier screen - the current mechanism Input: Thousands of specimens. Output: Finding carriers for rare genetic diseases Serial processing: - sequence: 1 region of 1 person per reaction - expensive and does not scale A needle in a haystack problem Compressed sensing carrier screen erlich@cshl.edu
Carrier screens - our vision Ultra-high throughput carrier screen Many specimens + many regions • Adding more genes to the test panel while keeping the task in a tractable scale • Increase the participation by reducing the cost Compressed sensing carrier screen erlich@cshl.edu
Next generation sequencers – parallel processing Sequence 100 million DNA molecules in a single batch (~1 week) • BUT • On pooled samples - only histogram of the DNA sequence type. • How to multiplex many specimens with next generation sequencers? When pooling 4 normal specimens and 1 carrier WT allele Fraction of reads Example: Mutant Compressed sensing carrier screen erlich@cshl.edu
Φ X y = T pools N The ratio of carrier reads Pooling design 0-1 matrix carrier Multiplexing - the compressed sensing approach CS principle: when x is sparse, very few measurements are sufficient for faithful reconstruction. y = Φx Compressed sensing carrier screen erlich@cshl.edu
Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Signal domain Compressed sensing carrier screen erlich@cshl.edu
Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Signal domain Compressed sensing carrier screen erlich@cshl.edu
Compression level Specimens (N) Pools (t) Weight (w) On a budget compressed sensing Φ= Random matrix with p=0.5 • Heavy weight design requires long pooling steps and higher material consumption • Higher compression level is more prone to technical difficulties • We want a verysparse sensing matrix Compressed sensing carrier screen erlich@cshl.edu
Light Chinese Design mod 6 mod 7 • Inputs:N (number of specimens in the experiment) • Weight (pooling efforts) • Algorithm: • 1. Find W numbers {x1,x2,…,xw} such that: • Bigger than • Pairwise coprime • 2. Generate W modular equations: • 3. Construct the pooling design upon the modular equations • Output: Sparsepooling design with • Advantages: • (w-1)-disjunct matrix • The weight does not explicitly depend on the number of specimens • The compression level is • Easy to debug Compressed sensing carrier screen erlich@cshl.edu
Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Signal domain Compressed sensing carrier screen erlich@cshl.edu
# carrier reads ~ Not all pools were born equal • The sequencer does not report the absolute number of carriers in the pool • Instead: # total sequence reads Fraction of carriers in the pool / 2 • Pools with ↑sequence reads and ↓carriers provide more reliable information. • The noise is not additive but with correlation to the content of the pool. • We need a reconstruction algorithm that takes into account the reliability of the data from each pool. Compressed sensing carrier screen erlich@cshl.edu
Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Signal domain Compressed sensing carrier screen erlich@cshl.edu
Signal Domain In traditional CS: Traditional CS decoder solves: In compressed carrier screen: • What are the implications of using traditional decoder and employing rounding procedure? • Can we find reconstruction procedure that directly finds Compressed sensing carrier screen erlich@cshl.edu
Bayesian reconstruction algorithm Biological data Pooling data Pooling model and sequencing Biological expectations Φ Only the specimens in the pool are affecting the pool results Biologically, the genotype of one specimen is not dependent on the genotype of other one (unless relatives) Approximation by loopy Belief Propagation… Compressed sensing carrier screen erlich@cshl.edu
Advantages of Belief Propagation • Bottom up approach – weighs the reliability of each individual pool • Bayesian – everything speaks the same language. Can incorporate a-priori medical information and familial connections. • Encoding advantage – Chinese pooling ensures that there are no short cycles • Binary results directly – no rounding procedure at the end Pooling data Biological data Compressed sensing carrier screen erlich@cshl.edu
Simulations of compressed carrier screen in Ashkenazi Jews • Finding carriers for two Ashkenazi Jews diseases: Tay-Sachs and Bloom syndrome. • Chinese pooling design • Comparing GPSR (traditional solver) and BP • Evaluating Nmax – the largest number of specimens for which at least 48 out of 50 runs give 100% accuracy. Compressed sensing carrier screen erlich@cshl.edu
Results Bloom Tay-Sachs BPGPSR Pools/Specimen = 6.5% Pools/Specimens= 13% Compressed sensing carrier screen erlich@cshl.edu
Conclusions • CS framework can be utilized for ultra-high throughput carrier screens. • Our setting shows several unique features not in traditional framework • We suggest tailored encoding (light Chinese) and decoding (BP) procedures • At least in our settings: a tailor decoder, BP, has an advantage over reconstructing with off-the shelf CS solver • CS carrier screen has the potential to reduce dramatically the cost of sequencing. Compressed sensing carrier screen erlich@cshl.edu
The real thing An ongoing study… Compressed sensing carrier screen erlich@cshl.edu
Acknowledgements Funding: Lindsay Goldberg PhD Fellowship ACM/IEEE-CS HPC PhD Fellowship Greg Hannon Noam Shental & Amnon Amir Or Zuk Igor Carron (Nuit Blanche) For more information: hannonlab.cshl.edu/labmembers/erlich Compressed sensing carrier screen erlich@cshl.edu
Loopy belief propagation is tricky Damping is the key DNA Sudoku erlich@cshl.edu
Pools not in use Pooling imperfections • Background contamination • Pooling failures (erasures) Data from a real experiment mod 377 # Reads Pools erlich@cshl.edu
Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Pooling imperfections • Signal domain erlich@cshl.edu
Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Pooling imperfections • Signal domain erlich@cshl.edu
Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Pooling imperfections • Signal domain erlich@cshl.edu
Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Pooling imperfections • Signal domain erlich@cshl.edu
Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Pooling imperfections • Signal domain erlich@cshl.edu