220 likes | 415 Views
Improving Musical Genre Classification with RBF Networks. Douglas Turnbull Department of Computer Science and Engineering University of California, San Diego June 4, 2003. motivation: goal: The goal of this project is to improve automatic musical classification by genre. previous work:
E N D
Improving Musical Genre Classification with RBF Networks Douglas Turnbull Department of Computer Science and Engineering University of California, San Diego June 4, 2003
motivation: goal: The goal of this project is to improve automatic musical classification by genre. previous work: A method proposed by Tzanetakis and Cook extract high level features from a large database of songs and then use Gaussian Mixture Model (GMM) and K-nearest neighbor (KNN) classifiers to decide the genre of a novel song. idea: Use the existing audio feature extraction technology but improve the classification accuracy using Radial Basis Function (RBF) networks.
motivation: secondary goal: Find techniques for improving RBF network performance. previous work: An RBF network is commonly used classifier in machine learning. We would like to explore ways to improving their ability to classify novel data. ideas: Merge supervised and unsupervised initialization methods for basis functions parameters. Use with feature subset selection methods to eliminate unnecessary features.
music: audio feature extraction: digital signal: …1001011001… MARSYAS Digital Signal Processing feature extraction: feature vector:
Extraction of 30 features from a 30-second audio tracks • Timberal Texture (19): • music-speech discrimination • Rhythmic Content (6): • beat strength, amplitude, tempo analysis • Pitch Content (5): • frequency of dominant chord, pitch intervals MARSYAS: x1 xi xD For this application, the dimension D of our feature vector is 30.
A radial basis function measure how far an input vector (x) is from a prototype vector (μ). We use Gaussians for our M basis functions. radial basis functions: We will see three method for initializing the parameters – (μ , σ). Basis Functions: Φ Φ1 Φj ΦM Inputs: x1 xi xD
The output vector is a weighted sum of the basis function: linear discriminant: We find the optimal set of weights (W) by minimizing the sum of squares error function using a training set of data: Where the target value, , is 1 if the nth data point belongs to the kth class. Otherwise, is 0. Outputs:y y1 yk yC Weights: W w11 wkj Basis Functions: Φ Φ1 Φj ΦM
Targets:t a radial basis function network: t1 tk tC Outputs:y y1 yk yC Weights: W w11 wkj Basis Functions: Φ Φ1 Φj ΦM Inputs: x x1 xi xD
number of basis functions • Too few make it hard to separate data • Too many can cause over-fitting • Depends on initialization method • initializingparameters to basis functions - (μ , σ). • unsupervised • 1. K-means clustering (KM) • supervised • 2. Maximum Likelihood for Gaussian (MLG) • 3. In-class K-means clustering (ICKM) • use above methods together • improvingparameters of the basis functions - (μ , σ). • Use gradient descent constructing RBF networks:
We differentiate our error function gradient descent on μ , σ: with respect to σj and mji We then update σj mji:by moving down the error surface: The learning rate scale factors, η1 and η2, decrease each epoch.
number of basis functions • initializing parameters to basis functions - (μ , σ). • improvingparameters of the basis functions - (μ , σ). • feature subset selection • There exists noisy and/or harmful features that hurt network performance. By isolating and removing these feature, we can find better networks. • We also may wish to sacrifice accuracy to create a more robust network requiring less computation during training. • Three heuristics for ranking features • Wrapper Methods • Growing Set (GS) Ranking • Two-Tuple (TT) Ranking • Filter Method • Between-Class Variance (BCV) Ranking constructing RBF networks:
A greedy heuristic that adds next best feature to a growing set of features: growing set (GS) ranking: This method requires the training of |D|2/2 RBF network where the first D networks use 1 feature, the next D-1 networks use 2 features, …
This greedy heuristic that finds the classification accuracy for network that uses every combination of two features. We select that first two feature that produce the best classification result. The next feature is the feature that has the largest minimum accuracy when used with the first two features, and so on. two-tuple (TT) ranking: This method also requires the training of |D|2/2 RBF network, but all networks are trained using only 2 features.
This method that ranks based on the between-class variance. The assumption is that if class averages are far from the average across all of the data for a particular feature, that feature will be useful for separating novel data. between-class variance (BCV) ranking: fbad fgood Unlike the previous two method, it does not require training RBF networks. It can be compute in a matter of seconds as opposed to matter of minutes.
experimental setup: • 1000 30-second songs – 100 song per genre • 10 genres -classical, country, disco, hip hop, jazz, rock, blues, reggae, pop, metal • 30 feature extracted / song – timbral texture, rhythmic content, pitch content • 10-fold cross validation • results: • comparison of initialization method (KM, MLG, ICKM) with and without using gradient descent. • comparison of feature ranking methods (GS, TT, BCV). • table of best classification results music classification with RBF networks:
basis function initialization methods: MLG does as well as the other method with fewer basis functions
feature ranking methods: Growing Set (GS) ranking outperforms the other methods
results table: • observations: • Multiple initializationmethod produces better classification than using only one initialization method. • Gradient descent boosts performance. • Subsets of feature produce better results than using all of the features.
RBF networks: • 70.9%* (std 0.063) • GMM with 3 Gaussians per class (Tzanetakis & Cook 2001): • 61% (std 0.04) • Human classification in similar experiment (Tzanetakis & Cook 2001): • 70% • Support Vector Machine (SVM) (Li & Tzanetakis 2003): • 69.1% (std 0.053) • Linear Discriminant Analysis (LDA) (Li & Tzanetakis 2003): • 71.1%(std 0.073) comparison with previous results: *(Found construction a network with MLG using 26 features (Experiment J) with gradient descent for 100 epochs)
created more flexible musical labels discussion: • In is not our opinion that music classification is limited to ~70% but rather that the data set used is the limiting factor. The next steps are to find a better system for labeling music and then to create data set that uses the new labeling system. This involves working with experts such as musicologists. However, two initial ideas are: • Non-mutually exclusive genres • A rating system based on the strength of relationship is between a song and each genre • These ideas are cognitively plausible in that we naturally classify music into a number of genres, streams, movements and generation that are neither mutually exclusive nor always agreed upon. • Both of these ideas can be easily be added handled by RBF network by altering the target vectors.
2. larger features sets and feature subset selection Borrowing from computer vision, one technique that has been successful is to automatically extract tens of thousands of features and then use features subset selection for find a small set (~30) of good features. Computer Vision Features: select sub-images of different sizes an locations alter resolution and scale factors. apply filters (e.g. Gabor filters) Computer Audition Analogs: select sound samples of different lengths and starting locations alter pitches and tempos within the frequency domain apply filters (e.g. comb filters) Future work will involve extracting new features and improving existing feature subset selection algorithms. discussion: