FYP0202 Advanced Audio Information Retrieval System

FYP0202Advanced Audio InformationRetrieval System By Alex Fok, Shirley Ng

Outline • Overview • Read in the raw speech • MFCC processing • Detect the audio scene change • Audio Clustering • Interleave Audio Clustering • Conclusion

Overview • Automatic segmentation of an audio stream and automatic clustering of audio segments have quite a bit of attention nowadays. • Example, in the task of automatic transcription of broadcast news, the data contains clean speech, telephone speech, music segments, speech corrupted by music or noise.

Overview (cont’) • We would like to SEGMENT the audio stream into homogenous regions according to speaker identity. • We would like to cluster speech segments into homogeneous clusters according to speaker identity.

Step1:Read in the raw speech • Read in a mpeg file as input • Convert the file from .mpeg format to .wav format • Because the MFCC library only process on .wav file

Step2:MFCC processing • A wav is viewed as frames, each contains different features • We make use of the MFCC library to convert the wav to MFCC features for processing • We extract 24 features for each frames • The result are stored in feature vectors Frame1 Frame 2 Frame 3

Step3: Detect the audio scene change • Make use of the feature vector to detect the audio scene change • The input audio stream will be modeled as Gaussian process • Model selection criterion called BIC (Bayesian Information Criterion) is used to detect the change point

Step3: Detect the audio scene change • Denote Xi (i = 1,…,N) as the feature vector of frame i • N is the total number of frame • mi : mean of mean vector of frame i • ∑i : full covariance matrix of frame i • R(i) = N log |∑| - N1 log |∑1| - N2 log |∑2| • ∑, ∑1, ∑2 are the sample covariance matrices from all the data, from {x1,…,xi}, from {xi+1,…,Xn} respectively

Step3: Detect the audio scene change • BIC(i) = R(i) – constant • If there is only one change point, then the frame with highest BIC score is the change point • If there are more than one change point, just simple extend the algorithm

Step 4:Audio Clustering • As we want to speed up the audio detecting, so we just roughly find the change point. • As a result, there maybe some wrongly calculated change point. • In this part, we try to combine the wrongly segmented neighbor segments • Compare with neighbor segments, if they are speech of the same person, then combine it.

Step5:Interleave Audio Clustering • Group all the segments of the same speaker into one node. • Before • After Speaker 1 Speaker 1 Speaker 2 Combined Speaker1 Speaker 1 Speaker 1 Speaker 2

Conclusion • We would like to make a precise and speedy engine that recognize the identity of speaker in a wave file. • We would like to group the same speaker in the wave.

Conclusion (cont’) • Instead of making local decision based on distance between fixed size sample, we expand the decision as wide as possible • Avoid the respectively calculation by using dynamic programming. • Detection algorithm can detects acoustic changing points with reasonable detestability.

FYP0202 Advanced Audio Information Retrieval System

FYP0202 Advanced Audio Information Retrieval System

Presentation Transcript

Audio Information Retrieval and Audio Search

Audio Information Retrieval and Audio Search

Information Retrieval System

Information Retrieval

Medical Information Retrieval: eEvidence System

Advanced Information- Retrieval Models

Text Information Retrieval and Applications – Advanced Topics

Advanced Information System Engineering

Information Retrieval

INFORMATION STROAGE AND RETRIEVAL SYSTEM

Audio Indexing as a first step in an Audio Information Retrieval System

Audio Retrieval

Audio Retrieval

Advanced information retrieval

Advanced Traveler Information System

Advanced information retrieval

Information Retrieval