Isolating Speech from a Desired Talker in a Room with Multiple Talkers Using Phase Information

Isolating Speech from a Desired Talker in a Room with Multiple Talkers Using Phase Information Avram Levi, Harvey Silverman.Brown University – Laboratory for Engineering Man Machine Systems, Division of Engineering – 182 Hope St, Providence, RI 02906. Problem SRP-PHAT DISCRIMINATOR Discriminate Frequency Components Dominated by the Desired Signal The Method Spectrographic Results Four Talkers – 16 Microphones • :The real part of theUpper Triangular SRP-PHAT value of array with M microphones at the desired talker location at a given frequency is the sum of GCC-PHAT values at that particular frequency for all pairs of microphones. GOAL: A computationally modest method that isolates the speech from a desired talker in a real, noisy and reverberant room with multiple talkers. • Key Idea: GCC-PHAT in the frequency domain between two arbitrary microphones j and k aimed at talker S: • Relationship between and SIR Value. • THEORY MATCHES MEASUREMENTS from REAL SPEECH DATA. • Hypothesis: • If the microphone signals at a given frequency are dominated by the desired signal, i.e. Signal-to-Interference Ratio(SIR) >> 1 • Else if SIR at a given frequency is <= 1 Left: A spectrogram of a speech signal recorded through a close talking microphone. Righ: The same signal recorded using a remote microphone in a noisy reverberant room with four simultaneous talkers. • Verification: Overview of the Algorithm 1) :Pre-determined threshold value for selection of time-frequency points where the desired signal dominates i.e consider a time-frequency point as desired if Aim a Delay-and-Sum Beamformer to Desired Talker Take the Short-Time-Fourier-Transform 2) Attenuate all other time-frequency points to a level that is not audible. Discriminate Frequency Components Dominated by the Desired Signal i.e. High Signal-to-Interference Ratio SRP-PHAT DISCRIMINATOR Discussion and Future Work • A new algorithm to isolate a talker in the presence of other talkers in a noisy, reverberant environment using a microphone array is presented. • Requires the point source location estimate of the desired talker. • Uses phase information to discriminate the points in the short-time-Fourier-transform as desired or interfering. • Using straight-forward discrimination by threshold, almost all interfering signals are removed while retaining approximately 80% of the desired signal for a real recording with four simultaneous talkers. • The method has low computational cost real time implementation feasible. • Future work: Many interesting new directions using this idea. Left: Overview of the room where the data was recorded. Four speakers were placed in the locations presented in red. Simultaneous speech data was recorded at 20kHz for 10 seconds. The length of the analysis frames were 51.2ms advancing every 6.4ms. RIght: Normalized counts of phase differences between any two microphones in a 16 microphone array for a real recording with 4 simultaneous talkers. We looked at frequency points between 1-3kHz. The signal-to-interference ratios were obtained using separate recordings of the desired talker and the interfering talkers. Attenuate All Other Frequencies Reconstruct the Signal with the Desired Components Percentage of time-frequency points that are determined to be desired-talker dominant as a function of Signal-to-Interference Ratio for different threshold (R) values.

Isolating Speech from a Desired Talker in a Room with Multiple Talkers Using Phase Information

Isolating Speech from a Desired Talker in a Room with Multiple Talkers Using Phase Information

Presentation Transcript

Every Child a Talker

Evaluating Speech Separation with a Speech Recognizer

A Room with A View

A Room with a View

A room with a view

A Room with a View

A Room With a View

Using Visual Aids in a Speech

A Room With A View

A Room with a View

Synthesizing Information from Multiple Sources

An efficient Video Coding using Phase-matched Error from Phase Correlation Information

perceptual constancy in hearing speech played in a room, several metres from the listener

A Room With a View:

speech, played several metres from the listener in a room

Room with a View Criteria

Every Child a Talker

Every Child a Talker

Presenting Information with Speech

E.M. Forster’s A Room With A View

Blind speech dereverberation using multiple microphones

Different Aspects of Growing Multiple Cannabis Strains in a Single Room with a Grower License