Fusion of Live Audio Recordings for Blind Noise Reduction

Fusion of Live Audio Recordings for Blind Noise Reduction Aaron Ballew Aleksandar Kuzmanovic C. C. Lee Northwestern University Dept. of Electrical Engineering and Computer Science July 7th 2011

Observation You Attend a Concert Bootleggers At the show, you remember cell phones and cameras in the air • You’d like a recording of the show • Live albums exist, but… • You want the show you went to, back in San Jose CA on Feb 22nd 2010

Observation, cont’d Seek it Out Online Database • You find some of those recordings uploaded • Not just one, but three, four, or five copies of your favorite songs • Varying quality

Opportunity • Each song is an unknown source signal with receiver diversity • There must be a way to take advantage of the diversity in these recordings to generate a new recording whose quality is better than any of the originals

Opportunity, cont’d • All the recordings have something in common – a sameness from the music that was generated • They have something uncommon too – a differentness from noisy applause, screaming fans, wind, etc.

Complications • No reference (except in your mind) that defines which part is music rather than noise • Studio recording won’t work in general • You don’t know the SNR of any signal • There’s no pilot signal to imply the channel • No opportunity to pre-code a digital waveform • It’s an Analog source • No M-ary QPSK, Matched-Filters • Uncountably many sources and relatively few recordings, not a good fit for ICA

Assumptions • Recordings are mono • Stage speakers may be physically separated and multitrack • Relative to venue’s scale and listener’s perspective the multitracks arrive synchronized and recorded as mono by mic • Recordings are not synchronized to each other • Different start/stop times and duration • Receivers are distributed arbitrarily among audience • Noise at one receiver is not the same noise at another • Not necessarily true if two receivers are close to each other • Not true out-of-context, such as a quiet auditorium  Sample vs. Sample  Noise vs. Noise

Strategy • We will never know the absolute SNR of any of the recordings • However, if we could be confident their signal powers were equal, then the differences in their total powers would be due to the noise • Assumes the noise is (close to) uncorrelated • Does not assume we know what the signal power actually is • If we could use the total power as a proxy for noise power (given bullet 2 above), we could: • Rank recordings by SNR • Apply a classic averaging technique to cancel noise • Measure whether noise power went up or down compared to any original recording

Strategy, cont’d • It would look like this:

Step 1 – Internal Reference Similarity & Synchronization • Cross-correlations show: • Which sample is most similar to all other samples • The time-shift (lag) between any sample pair • No external reference, so pick internal one from the sample set

Step 2 – Normalize In Absence of SNR, • The effect of combining samples is unclear • Need a way to isolate changes in signal or noise power • It would be helpful if signal powers were already equal • Implies combining affects the noise

Step 2 – Normalize, cont’d Use the Right Tool • Use covariance, not r, to normalize signal powers • You still don’t know the absolute signal powers • You only know that the differences are due to noise • Now, you can tell whether noise goes up or down after combining

Step 3 – Fusion “Weighted” Average • Find the average of the first M ranked samples, such that total power is minimized • Why the first M? • A sample’s noise power may be so large it increases the composite’s noise *not to scale

Benefits • Identify a “best” quality recording without having to manually listen to each • Generate a recording that exceeds the “best” in quality • Encourage user-generated (crowd-sourced) content sharing • Applicable to any context where the source signal is completely unknown

Ongoing and Future • Ongoing: Time-variability of noise • Shows up as “low-frequency” noise that downselects against such a recording • We window in time (and frequency) to take advantage of the high-quality parts of the recordings • Stitching the windows back together post-fusion requires some attention due to an audible discontinuity when adjacent windows generate a different composite • Future: Maximal Ratio Combining • Well-known technique that requires channel knowledge • Gives optimal weighting of samples for maximal fusion gain • I believe we can adapt the inference technique to MRC, such that we get the “maximal” SNR gain, though I may not know exactly what the gain is!

Conclusion Thank You! http://networks.cs.northwestern.edu/~aaron/fusion.html

Fusion of Live Audio Recordings for Blind Noise Reduction Aaron Ballew Aleksandar Kuzmanovic C. C. Lee Northwestern University Dept. of Electrical Engineering and Computer Science

Fusion of Live Audio Recordings for Blind Noise Reduction