300 likes | 471 Views
Birdsong Recognition 鳥類鳴聲辨識. 李 建 興 中華大學資訊工程學系教授.
E N D
Birdsong Recognition鳥類鳴聲辨識 李 建 興 中華大學資訊工程學系教授
Automatic Classification of Bird Species From Their Sounds Using Two-Dimensional Cepstral CoefficientsChang-Hsing Lee, Chin-Chuan Han, and Ching-Chien ChuangIEEE Trans. on Audio, Speech, and Language Processing, Vol. 16, No. 8, Nov. 2008, pp. 1541-1550.
System Framework Training syllable Test syllable Feature Extraction Feature Extraction PCA PCA Transformation Prototype Vectors Generation LDA LDA Transformation Feature Database Classification Classified Bird Species sc
Feature Extraction Two-dimensional Mel-frequency cepstral coefficient (TDMFCC) MFCC MFCC Time Time DCT TDMFCC
Feature Extraction (cont.) • Dynamic Two-dimensional MFCC ( DTDMFCC )
Prototype Vector Generation • Gaussian mixture model (GMM) vs. Vector quantization (VQ) • Acoustic Model Selection – Bayesian information criterion (BIC) • Component Number Selection – self-splitting Gaussian mixture learning (SGML)
Experimental Results 28 bird species Training set – 3143 syllables Yushan National Park, CD Sound of the Mountain IV: The songs of Wild Birds Yushan National Park, CD Sound of the Mountain V: The songs of Wild Birds Test set – 646 syllables Downloaded from website of National Fonghuanggu Bird Park
Experimental Results (cont.) Comparison of classification results for different PCA threshold
Experimental Results (cont.) SUMMARIZATION OF CLASSIFICATION ACCURACY (CA), SELECTED MODEL (EVQ OR GMM), THE CLUSTER NUMBER (NS) FOR EACH BIRD SPECIES USING SDTDMFCC WHEN PCA THRESHOLD = 0.97
Experimental Results (cont.) SUMMARIZATION OF CLASSIFICATION ACCURACY (CA), SELECTED MODEL (EVQ OR GMM), THE CLUSTER NUMBER (NS) FOR EACH BIRD SPECIES USING SDTDMFCC WHEN PCA THRESHOLD = 0.97 (cont.)
Continuous Birdsong Recognition Using Gaussian Mixture Modeling of Image Shape FeaturesChang-Hsing Lee, Sheng-Bin Hsu, Jau-Ling Shih, and Chih-Hsun ChouIEEE Trans. on Multimedia, Vol. 15, No. 2, Feb. 2013, pp. 454-463.
Feature Extraction Angular Radial Transformation (ART) Feature
Feature Extraction (cont.) Music wave form : Zoom in Overlap Frame • Step 1: Spectrogram Generation
Feature Extraction (cont.) • Step 1: Spectrogram Generation (cont.) frequency Spectrum analysis … frame decomposition
Feature Extraction (cont.) • Step 1: Spectrogram Generation (cont.) Waveform Spectrogram
Feature Extraction (cont.) • Step 1: Spectrogram Generation (cont.) 火冠戴菊鳥 (Taiwan Firecest) 白耳畫眉(Taiwan Sibia) 黃腹琉璃(Vivid Niltava) 鳳頭蒼鷹(Crested Goshawk)
Feature Extraction (cont.) • Step 2: Recognition window segmentation
Feature Extraction (cont.) • Step 3: Sector image generation
Feature Extraction (cont.) • Step 3: Sector image generation (cont.)
Feature Extraction (cont.) • Step 4: ART feature extraction • Vn,m(ρ, θ): the ART basis function of order n and m, which is separable along the angular and radial directions: • where
Feature Extraction (cont.) • Step 4: ART feature extraction (cont.) The 1212 (N = 12 and M = 12) complex ART basis functions (a) real parts of ART basis functions (b) imaginary parts of ART basis functions
Feature Extraction (cont.) • Step 4: ART feature extraction (cont.)
Feature Extraction (cont.) • Step 4: ART feature extraction (cont.)
Experimental ResultsCOMMON AND LATIN NAME OF BIRD SPECIES IN THE BIRDSONG DATABASE AND THE NUMBER OF BIRDSONG SEGMENTS IN THE TRAINING SET (NTr) AND TEST SET (NTe) FOR BIRDSONG SEGMENTS OF DIFFERENT DURATIONS (D)
Experimental Results (cont.) Comparison of classification accuracy for different number of GMM Gaussian components (G) and distinct PCA thresholds () using 624 ART basis functions for the recognition of birdsong segments having distinct durations (D)
Experimental Results (cont.) Comparison of classification accuracy on distinct ART basis functions (NM) for the classification of birdsong segments having different durations (D) with fixed number of GMM component (G = 5)
Experimental Results (cont.) Comparison of various feature descriptors in terms of classification accuracy (CA)