1 / 6

Audio classification Discriminating speech, music and environmental audio

Audio classification Discriminating speech, music and environmental audio. Rajas A. Sambhare ECE 539. Objective. Discrimination between speech, music and environmental audio (special effects) using short 3-second samples. To extract a relevant set of feature vectors from the audio samples

wyatt
Download Presentation

Audio classification Discriminating speech, music and environmental audio

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Audio classificationDiscriminating speech, music and environmental audio Rajas A. Sambhare ECE 539

  2. Objective Discrimination between speech, music and environmental audio (special effects) using short 3-second samples • To extract a relevant set of feature vectors from the audio samples • To develop a pattern classifier that can successfully discriminate the three different classes based on the extracted vectors

  3. Feature extraction Frequency Centroid Bandwidth

  4. Feature extraction 512-sample frames 3 sec audio sample (22050 Hz) 2 1 23.21ms, 512 samples, 25% overlap, Hanning 512 point FFT Calculate mean, SD for centroid, log power ratios and bandwidth across all frames Calculate log power ratios in each band Extract centroid, energy in 22 critical bands,and bandwidth Concatenate mean, SD of centroid, log power ratios, bandwidth and silence ratio Save 49 dimension feature vector Calculate silence ratio (SR)

  5. Neural network development Feedforward Multi-layer perceptron with back-propagation training • Create a database of 135 training and 45 testing samples • Develop neural network using MATLAB • Dynamically partition training samples using 25% for tuning • Decide on network architecture (No. of hidden layers and neurons) • Decide on network parameters like  and  • Attempt classification using various combinations of feature vectors 3 20 49 Designed network, 49-20-3

  6. Results • Classification rate of 82.37% after using critical sub-band ratios, frequency centroid, bandwidth and silence ratios • Classification rate of 79.78% after using only critical sub-band ratios. • Classification rate of 84.44% after using only frequency centroid, bandwidth and silence ratios but extremely slow training and variable results (2.34% std. dev. in classification rate) • Baseline study: Study by Zhang and Kuo [1] a classification rate of ~90% was reported, using a rule-based heuristic. However better results are expected on increasing database size. References: [1] Hierarchical System for Content-based Audio Classification and Retrieval, Tong Zhang, C.-C. Jay Kuo, Proc. SPIE Vol. 3527, p. 398-409, Multimedia Storage and Archiving Systems III, 1998

More Related