Compressive Sensing:

Compressive Sensing: An Introduction and Survey of Applications

Objectives • Description of theory • Discussion of important results • Study of relevant applications

Introduction to the Problem • CS is a new paradigm that makes possible fast acquisition of data using few number of samples • It tries to bring the number of samples acquired as close to the information content as possible • The classical Shannon-Whittaker sampling theorem has monopolized signal acquisition arena • It applies only to band-limited signals • Says number of samples ~ desired resolution

Disadvantages of ST • But band-limitedness not a universal assumption • Most real-life signals have huge frequency extent • Also it is not always a true measure of information content (e.g. : train of spikes in frequency) • Also for increased resolution we need to increase the rate of sampling( but speed of most devices is limited) • In some other situations(e.g. : MRI) no of samples that can be acquired is inherently limited

Our objective • We want to investigate the possibility of reconstructing a signal from fewer no. of samples than dictated by ST • Signal model assumed: Sparsity • Much broader class than BL signals • A nonlinear class • Sparsity of a signal=no. of non-zero coefficients = norm of the signal vector

Sparsity model holds for most known signals • Most naturally-occurring signals are sparse in a certain basis • This property has been used in compression using transform coding: Bandlimit and Sample at Nyquist Rate Thresholding

Even though samples are acquired at max possible rate we get rid of most of them • This strategy is wasteful in many ways • CS combines the first and second stages to acquire signals in the compressed form • The cost incurred is increased computational requirement Some algorithm for reconstruction Sparse result

Questions: • What is the minimum no of measurements we require? • How can reconstruction be done from the reduced no of samples? • The problem can be shown to be reducible to a problem of solving an underdetermined system of linear equations • Reconstruction is possible because of sparsity assumption • This method is non-adaptive

Some Notation • sensing basis • sparsifying basis This is a set of orthonormal vectors where the signal is known to have a sparse representation with S coefficients • Measurements are linear functionals of the form where ∈ s.t. • It can be shown measurements are of the form where x is a sparse vector

Effectively we are doing a projection of an n-dimensional vector into an M dimensional space • Need to make sure unique recovery is possible • i.e.; if • Stability: RIP condition= Sub-matrix is well-conditioned with a good condition number(necessary and sufficient) • For the sensing matrix and number of measurements these conditions should be satisfied

Contd • RIP condition is related to the uncertainty principle for the two orthobases • Another property of interest in incoherence • The less the coherence the better the results • It’s like saying the sensing matrix should be as different from the sparsifying basis or the measurements should be holographic (non-concentrated). • Should convey the least amt of info abt the signal(20 weights problem) • Possibly random(or complementary bases like time frequency)

Method of reconstruction • Various possibilities exist • We need to pick the least sparse solution • The RIP condition ensures it is the unique solution • Could go for a combinatorial optimisation(directly sift thru all sparse solutions to pick the actual one) • Then only require S+1 samples • But it is NP hard –highly intractable • Another choice-l2 norm. Very easy to analyse. But does not give Sparse solutions • So a compromise is to go for l1 norm minimisation

Geometrical Argument • L1 norm

L2 Norm

Why RIP is necessary?

Result • Let’s consider sets of orthogonal bases 𝝍 and 𝜱 • If then with probability exceeding x supported on a fixed set can be recovered using the following optimisation problem: • Imp requirements : incoherence ,randomness , RIP • This ensures that the set of measurements for which reconstruction fails occurs with very small prob: to take adv take random samples

Non-Uniform Sampling Theorem • Signal composed of S discrete frequencies • Take M random measurements in time domain • From above theorem M>Slog(n) gives perfect reconstruction • Time and frequency domains are maximally incoherent • This is also fundamental i.t.s.t. fewer samples and reconstruction is virtually impossible • Eg Dirac Comb

More on RIP • Under the assumptions for which the above result holds it can be shown that for x belonging to a fixed set T : • Ensures any such signal is recovered uniquely recoverable • To include all sets T we need to strengthen the above condition (UUP). But then the number of samples required increases; becomes 4-5th power of log(n)

Definition • Restricted Isometry Constant : It’s smallest constant such that for every x belonging to T with size S • <1 for condition to hold • This is an approximate orthogonality condition • UUP=should hold for all T’s with the same size

Result • Assuming the existence of a matrix that satisfies the above property with sufficiently then • Here the condition on M depends on the type of matrix we choose • This is not probabilistic

Two concerns • Signals are only approximately sparse • Measurements are noisy • Our reconstruction procedure should be robust against these two cases • RIP to the rescue! • Has been shown if the solution to • sub to satisfies

Summary • We need to find sensing matrices that satisfy the isometry property • Simple choice: random matrices • If m>CSlog(n/δ) random matrices satisfy RIP • For orthobases we need 4-5th power of log(n) • Random matrices present storage difficulty

Application: Spectrum Sensing • Spectrum sensing is the task of detecting the presence or absence of a carrier in a wideband of freqs • Cognitive Radios should be equipped with such a mechanism to enable efficient utilisation of channel • A major implementation challenge lies in the very high sampling rates required by conventional spectral estimation methods which have to operate at or above the Nyquist rate. • Because of high rates no of samples is limited • May not provide sufficient statistic

The situation is appropriate for deployment of CS • The spectrum is sparse because only a relatively small no of users are transmitting • Let be the frequency range in use • Bandwidth=B

Our job is to detect the N frequency bands and classify them as black, grey or white regions based on the PSD level • In the analysis we use a vector of time samples sampled at Nyquist rate of To. In the actual implementation only sub-Nyquist sampling is done • So let where is a vector of M values in the duration [0 MTo]. And • This is a generic model. S can be any matrix of basis vectors • Since sparsity under consideration is in freq, time domain is the best sampling domain. then S =

Steps involved • Reconstruct from M measurements • Find high resolution fourier transform • Obtain the frequency edges from • Estimate the PSD in each band • Later we’ll see it is not necessary to reconstruct the frequency vector from • Only course sensing is done so noise is not a key factor • To reduce noise effects a wavelet smoothing operation is done

Edge Detection • Problem similar to edge detection in images • Let be a wavelet smoothing function with a compact support. The dilation of by a scale factor s is given by: • For dyadic scales s is a power of two • Continuous wavelet transform of R(f) is given by: • Then we do a simple differencing to this function

From above: • Replacing by its estimate found below: We get the estimated wavelet transform: Then we take the derivative of the above vector and find local minima

Below is the vector with derivative values: • Take local peaks and nous sommes done • Also in each band we can estimate the average PSD by just averaging the frequency vector in that band • One major simplification can be done by noting zs is itself a sparse vector. Thus we can eliminate the need to reconstruct

To do this we define: which is the differentiation matrix • Then we rewrite the above equation: And finally:

Recovered frequency response

Channel Estimation • CS also has applications in sparse channel estimation • Every delay-dominant channel can be represented as a superposition of the pilot signal sent • It can be shown we can divide the total delay into bins each 1/W in width • Not all these bins are always occupied • Each bin represents a dominant scatterer • No of bins that are non-zero<<p(total bins)=floor(Tm*W)

Two ways to make use of CS • Use proper pilot signals • Can be either random bits 1 or -1 • Or could be a sum of exponentials with random frequencies. • The first corresponds to using a random sensing basis • The second is random sampling in frequency using a subset of FT which is maximally incoherent • In both cases we obtain a better MSE less than LS

Another app is channel coding

References • Channel Estimation: .Bajwa, U.W., Haupt, J., Raz, G. and Nowak, R. (2008). “Compressed Channel Sensing”, Proceedings of the Annual Conference on Information Sciences and Systems, March 2008, Pages 1-6 • Spectrum Sensing: • Z. Tian and G. B. Giannakis, “Compressed sensing for wideband cognitive radios,” in Acoustics, Speech and Signal Processing,2007. ICASSP 2007. IEEE International Conference on, vol.4,Honolulu, HI, Apr. 2007, pp. 1357–1360. • CS • E. Candès and J. Romberg, “Sparsity and incoherence in compressive sampling,” Inverse Prob., vol. 23, no. 3, pp. 969–985, 2007. • An Introduction To Compressive Sampling: Emmanuel Candes Michael B Wakin • A Lecture on CS Richard Baraniuk

Compressive Sensing: