New Results and Open Problems for Insertion/Deletion Channels and Related Problems

New Results and Open Problems for Insertion/Deletion Channelsand Related Problems Michael Mitzenmacher Harvard University

The Most Basic Channels • Binary erasure channel. • Each bit replaced by a ? with probability p. • Binary symmetric channel. • Each bit flipped with probability p. • Binary deletion channel. • Each bit deleted with probability p. M. Mitzenmacher

The Most Basic Channels • Binary erasure channel. • Each bit is replaced by a ? with probability p. • Very well understood. • Binary symmetric channel. • Each bit flipped with probability p. • Very well understood. • Binary deletion channel. • Each bit deleted with probability p. • We don’t even know the capacity!!! M. Mitzenmacher

Motivation This seems a disturbing, sad state of affairs. It bothers me greatly. M. Mitzenmacher

Motivation This seems a disturbing, sad state of affairs. It bothers me greatly. And there may be applications… M. Mitzenmacher

Motivation This seems a disturbing, sad state of affairs. It bothers me greatly. And there may be applications… Hard disks, pilot tones, genetics, etc. M. Mitzenmacher

What’s the Problem? • Erasure and error channels have pleasant symmetries; deletion channels do not. • Example: • Delete one bit from 1010101010. • Delete one bit from 0000000000. • Understanding this asymmetry seems fundamental. • Requires deep understanding of combinatorics of random sequences and subsequences. • Not a historical strength of coding theorists. • But it is for CS theory…. M. Mitzenmacher

In This Talk • Main result: capacity of binary deletion channel is at least (1- p)/9. • Compare to capacity (1- p) for erasure channel. • First within constant factor result. • Still not tight…. • We describe path to this result. • Generally, we’ll follow the history chronologically. • We describe recent advances on related problems. • Insertion channels, alternative models. • We describe many related open problems. • What do random subsequences of random sequences look like? M. Mitzenmacher

Capacity Lower Bounds Shannon-based approach: • Choose a random codebook. • Define “typical” received sequences. • Construct a decoding algorithm. M. Mitzenmacher

Capacity Lower Bounds: Erasures • Choose a random codebook. • Each bit chosen uniformly at random. • Define “typical” received sequences. • No more than (p + e) fraction of erasures. • Construct a decoding algorithm. • Find unique matching codeword. M. Mitzenmacher

Capacity Lower Bounds: Errors • Choose a random codebook. • Each bit chosen uniformly at random. • Define “typical” received sequences. • Between (p –e),(p + e) fraction of errors. • Construct a decoding algorithm. • Find unique matching codeword. M. Mitzenmacher

Capacity Lower Bounds: Deletions • Choose a random codebook. • Each bit chosen uniformly at random. • Define “typical” received sequences. • No more that (p + e) fraction of deletions. • Construct a decoding algorithm. • Find unique matching codeword. Yields poor bounds, and no bound forp> 0.5. M. Mitzenmacher

GREEDY Subsequence Algorithm • Is S a subsequence of T? • Start from leftmost point of S and T • Move right on T until match next character of S • Move to next character of T T 0 0 0 1 1 0 0 1 0 S 0 1 0 1 0 M. Mitzenmacher

Basic Failure Argument • When codeword X of length n is sent, and pjust greater than 0.5, received sequence R has just less than n/2 bits. • Is R a subsequence of another codeword Y? • Consider GREEDY algorithm • If Y is chosen u.a.r., on average two bits of Y are needed to cover each bit of R. • So most other codewords match! M. Mitzenmacher

Deletions: Diggavi/Grossglauser • Choose a random codebook. • Codeword sequences chosen by a symmetric first order Markov chain. • Define “typical” received sequences. • No more that (p + e) fraction of deletions. • Construct a decoding algorithm. • Find unique matching codeword. M. Mitzenmacher

Symmetric First Order Markov Chain 1–q/1 q/0 q/1 0 1 1–q/0 0’s tend to be followed by 0’s, 1’s tend to be followed by 1’s M. Mitzenmacher

Intuition • To send a 0 bit, if deletions are likely, send many copies in a block. • Lowers the rate by a constant factor. • But makes it more likely that the bit gets through. • First order Markov chain gives natural blocks. M. Mitzenmacher

Diggavi/Grossglauser Results • Calculate distribution of number of bits required for GREEDY to cover each bit of received sequence R using “random” codeword Y. • If R is a subsequence of Y, GREEDY algorithm will show it! • Received sequence R also behaves like a sym. first order Markov chain, with parameter q’. • Use Chernoff bounds to determine how many codewords Y of length n are needed before R is covered. • Get a lower bound on capacity! M. Mitzenmacher

The Block Point of View • Instead of thinking of codewords being randomly chosen bit by bit: 0, 00, 000, 0001, 00011, 000110, 0001101…. • Think of codewords as being a sequence of maximal blocks: 000, 00011, 000110, …. M. Mitzenmacher

Improvements, Random Codebook • Choose a random codebook. • Codeword sequences chosen by laying out blocks according to a given distribution. • Define “typical” received sequences. • No more that (p + e) fraction of deletions, and number of blocks of each length close to the expectation. • Construct a decoding algorithm. • Find unique matching codeword. M. Mitzenmacher

Changing the Codebook • Fix a distribution Z on positive integers. • Probability of j is Zj. • Start sequence with 0’s. • First block of 0’s has length given by Z. Then block of 1’s has length given by Z. And so on. • Generalizes previous work: first order Markov chains lead to geometric distributions Z. M. Mitzenmacher

Choosing a Distribution • Intuition: when a mismatch between received sequence and random codeword occurs under GREEDY, want it to be long lasting with significant probability. • (a,b,q)-distributions: • A short block a with probability q, long block b with probability 1–q. • Like Morse code. M. Mitzenmacher

Results So Far M. Mitzenmacher

So Far… • Decoding algorithm has always been GREEDY. • Can’t we do better? • For bigger capacity improvements, it seems we need better decoding. • Best algorithm: maximum likelihood. • Find the most likely codeword given the received sequence. M. Mitzenmacher

Maximum Likelihood • Pick the most likely codeword. • Given codeword X and received sequence R, count the number of ways R is obtained as a subsequence of X. Most likely = biggest count. • Via dynamic programming. • Let C(j,k) = number of ways first k characters of R are subsequence of first j characters of X. • Potentially exponential time, but we just want capacity bounds. M. Mitzenmacher

The Recurrence • I would love to analyze this recurrence when: • Y is independent of X • Y is obtained from X by random deletions. • If the I[Xj = Rk] values were all independent, would be possible. • But dependence in both cases makes analysis challenging. • I bet someone here can do it. Let’s talk. M. Mitzenmacher

Maximum Likelihood • Standard union bound argument: • Let sequence R be obtained from codeword X via a binary deletion channel; let S be a random sequence obtained from another random codeword Y. • Let C(R) = # of ways R is a subsequence of X. • Similarly C(S) = # of ways S is a subsequence of X. • What are the distributions of C(R), C(S)? • Unknown; guess is a lognormal or power law type distribution. Also C(S) is often 0 for many parameters. • Want C(R) > C(S) with suitably high probability. M. Mitzenmacher

Conclude: Maximum Likelihood • This is really the holy grail. • As far as capacity arguments. • Questions: • What is the distribution of the number of times a small “random” sequence appears as a subsequence of a larger “random” sequence? • Same question, when the smaller “random” sequence is derived from the larger through a deletion process. M. Mitzenmacher

Better Decoding • Maximum likelihood – haven’t got it yet… • An “approximation”, intuitively like mutual information: • Consider a received block of 0’s (or 1’s). What block(s) did it arise from? • Call that sequence a type. • For random codewords and deletions, number of (type,block) pairs for each type/block combination is highly concentrated around its expectation. M. Mitzenmacher

Type Examples 1 11 10 0 0 011 0 0 0 10 0 01 1 100 1 1 00110 0 0 111100 0011000100 01110011001100 0 Received sequence: 1 10 0 0 0 01 1 1 10 0 (type,block) pairs: (1 1 1 1 , 1 1) (0 0 0 0 1 1 0 0 0 1 0 0 0 , 0 0 0 0 0) (1 1 1 0 0 1 1 0 0 1 1 , 1 1 1 1) (0 0 0 , 0 0) M. Mitzenmacher

New Decoding Algorithm • Choose a random codebook. • Codeword sequences chosen by laying out blocks according to a given distribution. • Define “typical” received sequences. • No more that (p + e) fraction of deletions, and has near the expected number of (type,block) occurrences for each (type,block) pair. • Construct a decoding algorithm. • Find unique codeword that could be derived from the “expected” number of (type,block) occurrences given the received sequence. M. Mitzenmacher

Jigsaw Puzzle Decoding Received sequence: … 0 0 1 1 0 0 1 1 0 1 … Jigsaw puzzle pieces 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 x44 x8 x88 x16 x100 x210 M. Mitzenmacher

Jigsaw Puzzle Decoding : Examples Received sequence: … 0 0 1 1 0 0 1 1 … 0 0 0 0 0 0 0 1 0 1 1 1 1 0 1 1 0 0 0 1 1 1 1 …000001011011011… …0 0 1 1 0 0 1 1… 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 1 0 1 1 1 1 …0000111000001011… …0 0 1 1 0 0 1 1… 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 …00000111000011… …0 0 1 1 0 0 1 1… M. Mitzenmacher

Formal Argument • Calculate upper bound on number of possible jigsaw puzzle coverings. Get lower bound on capacity. • Challenge 1: Don’t get exactly the expected number of pieces for each (type,block) pair; just close. • Challenge 2: For very rare pieces, might not even be close. • End result: an expression that can be numerically computed to give a lower bound, given input distribution. M. Mitzenmacher

Calculations • All done by computer. • Numerical precision – not too challenging for moderate deletion probabilities. • Terms in sums become small quickly. • Fairly smooth. • We guarantee our output is a lower bound. • Computations become time-consuming for large deletion probabilities. M. Mitzenmacher

Improved Results M. Mitzenmacher

Ullman’s Bound • Ullman has an upper bound for synchronization channels. • For insertions of a specific form. • Zero-error probability. • Does not apply to this channel – although it has been used as an upper bound! • We are the first to show Ullman’s bound does not hold for this case. • What is a (non-trivial) upper bound for this channel? • Some initial results… M. Mitzenmacher

Insertion/Deletion Channels • Our techniques apply for some insertion/deletion channels. • GREEDY decoding cannot; depends on received sequence being a subsequence of the original codeword. • Specifically, the case of duplications: 0 becomes 000…. • Maintains block structure. M. Mitzenmacher

Poisson Channels • Recall discrete Poisson distribution with mean m. • Consider a channel that replaces each bit with a Poisson number of copies. • Call this a Poisson channel. • Poisson channels can be studied using our duplication/deletion analysis. • Capacity when m = 1 is approx. 0.1171. • From numerical calculations. M. Mitzenmacher

Reduction! • A code for any Poisson channel gives a code for every deletion channel. • Intuition: solve deletion problem by replacing every bit by 1/(1 – p) bits. • On average, 1 copy of every original bit gets through. • Distribution of number of copies is approximately Poisson. • So it’s like a Poisson channel, but rate reduced by factor of (1 – p). M. Mitzenmacher

Randomized Reduction Picture Take codeword X for Poisson channel Randomly expand to X’ for deletion channel using a Poisson number of copies per bit Expands by 1/(1– p) factor Send X’ over deletion channel Receive R Decode R using the Poisson channel codebook M. Mitzenmacher

Reduction! • A code for any Poisson channel gives a code for every deletion channel. • To send codeword over deletion channel with deletion probability p, use a codeword X for the Poisson channel code… • But independently replace each bit by a Poisson distributed number of bits with mean 1/(1 – p). • At output, each bit of X appears as a Poisson distributed number of copies (with mean 1) – a Poisson channel. • Decode for the Poisson channel. M. Mitzenmacher

Capacity Result • Input to the deletion channel is 1/(1 – p) factor larger than for Poisson channel. • Implies capacity for the deletion channel is at least 0.1171(1 – p)>(1 – p) / 9. • Deletion channel capacity is within a constant factor of the erasure channel (1 – p). • First result of this type that we know of. • Best result (using a different mean) is 0.1185(1 – p). M. Mitzenmacher

More New Directions • Information theoretic analysis • Upper bounds • Sticky channels • Segmented deletion/insertion channels • Large alphabets • Trace reconstruction See survey article… M. Mitzenmacher

Sticky Channels • Motivation: insertion/deletion channels are hard. So what is the easiest such channel we can study? • Sticky channels: each symbol duplicated a number of times. • Like a sticky keyboard! xxxxxxxxxxxxx • Examples: each bit duplicated with probability p, each bit replaced by a geometrically distributed number of copies. • Key point: no deletions. • Intuitively easy: block structure at sender completely preserved at the receiver. M. Mitzenmacher

Sticky Channels : Results • New work: numerical method that give near-tight bounds on the capacity of such channels. • Key idea: symbols are block lengths. • 000 becomes 3. • Capacity for original channel becomes capacity per unit cost in this channel. • Use techniques for capacity per unit cost for Markovian channels. M. Mitzenmacher

Bounds Achieved : Duplication M. Mitzenmacher

Segmented Channels • Motivation: what about deletions makes them so hard? • Can we restrict deletions and make them easy? • Segmented deletion channel: guarantee at most one deletion per segment. • Example: At most 1 deletion per original byte. • Similarly, can have segmented insertion channel. • Motivation: Clock skew/timing errors. M. Mitzenmacher

Codes for Segmented Deletions :Our Approach • Create a codebook C with strings of b bits. • Codeword is concatenation of blocks from C. • Aim to decode blocks from left to right, without losing synchronization, regardless of errors. • Questions: • How can this be done? • What properties does C need? • How large can C be? M. Mitzenmacher

New Results and Open Problems for Insertion/Deletion Channels and Related Problems