1 / 1

Isolating Speech from a Desired Talker in a Room with Multiple Talkers Using Phase Information

Isolating Speech from a Desired Talker in a Room with Multiple Talkers Using Phase Information Avram Levi, Harvey Silverman. Brown University – Laboratory for Engineering Man Machine Systems, Division of Engineering – 182 Hope St, Providence, RI 02906. Problem. SRP-PHAT DISCRIMINATOR

Download Presentation

Isolating Speech from a Desired Talker in a Room with Multiple Talkers Using Phase Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Isolating Speech from a Desired Talker in a Room with Multiple Talkers Using Phase Information Avram Levi, Harvey Silverman.Brown University – Laboratory for Engineering Man Machine Systems, Division of Engineering – 182 Hope St, Providence, RI 02906. Problem SRP-PHAT DISCRIMINATOR Discriminate Frequency Components Dominated by the Desired Signal The Method Spectrographic Results Four Talkers – 16 Microphones • :The real part of theUpper Triangular SRP-PHAT value of array with M microphones at the desired talker location at a given frequency is the sum of GCC-PHAT values at that particular frequency for all pairs of microphones. GOAL: A computationally modest method that isolates the speech from a desired talker in a real, noisy and reverberant room with multiple talkers. • Key Idea: GCC-PHAT in the frequency domain between two arbitrary microphones j and k aimed at talker S: • Relationship between and SIR Value. • THEORY MATCHES MEASUREMENTS from REAL SPEECH DATA. • Hypothesis: • If the microphone signals at a given frequency are dominated by the desired signal, i.e. Signal-to-Interference Ratio(SIR) >> 1 • Else if SIR at a given frequency is <= 1 Left: A spectrogram of a speech signal recorded through a close talking microphone. Righ: The same signal recorded using a remote microphone in a noisy reverberant room with four simultaneous talkers. • Verification: Overview of the Algorithm 1) :Pre-determined threshold value for selection of time-frequency points where the desired signal dominates i.e consider a time-frequency point as desired if Aim a Delay-and-Sum Beamformer to Desired Talker Take the Short-Time-Fourier-Transform 2) Attenuate all other time-frequency points to a level that is not audible. Discriminate Frequency Components Dominated by the Desired Signal i.e. High Signal-to-Interference Ratio SRP-PHAT DISCRIMINATOR Discussion and Future Work • A new algorithm to isolate a talker in the presence of other talkers in a noisy, reverberant environment using a microphone array is presented. • Requires the point source location estimate of the desired talker. • Uses phase information to discriminate the points in the short-time-Fourier-transform as desired or interfering. • Using straight-forward discrimination by threshold, almost all interfering signals are removed while retaining approximately 80% of the desired signal for a real recording with four simultaneous talkers. • The method has low computational cost real time implementation feasible. • Future work: Many interesting new directions using this idea. Left: Overview of the room where the data was recorded. Four speakers were placed in the locations presented in red. Simultaneous speech data was recorded at 20kHz for 10 seconds. The length of the analysis frames were 51.2ms advancing every 6.4ms. RIght: Normalized counts of phase differences between any two microphones in a 16 microphone array for a real recording with 4 simultaneous talkers. We looked at frequency points between 1-3kHz. The signal-to-interference ratios were obtained using separate recordings of the desired talker and the interfering talkers. Attenuate All Other Frequencies Reconstruct the Signal with the Desired Components Percentage of time-frequency points that are determined to be desired-talker dominant as a function of Signal-to-Interference Ratio for different threshold (R) values.

More Related