100 likes | 216 Views
Teaching machines to appreciate music. Classifying songs into three genres using a trusty Multi-Layer Perceptron. A Project by Chad Ostrowski & Curtis Reinking for EE 456 (Neural Networks) in the spring of 2009. Our genres-to-classify are Post-Rock, Folk, and Hip-Hop.
E N D
Teaching machines to appreciate music Classifying songs into three genres using a trusty Multi-Layer Perceptron A Project by Chad Ostrowski & Curtis Reinkingfor EE 456 (Neural Networks)in the spring of 2009
Our genres-to-classify are Post-Rock, Folk, and Hip-Hop Can you guess which is which?
Let’s feed songs-as-data-arrays into an MLP and let it do its thing! The sample bit rate of the type of music files MATLAB insists on is 44100Hz Good ol’ .wav (but the right kind of .wav (ACM Waveform, if you’re wondering). There are at least 4 kinds.) A lot of our songs are >6 minutes in length 6 minutes * 60 seconds/minute * 44100 data points/second = 15,876,000 data points so we’d have an input layer of that size. Hidden layers of…eh, 2e7. worse: this is a variable depending on the song.
Let us chop it up. Let us extract features. folk hip hop Features, anyone?
FFT (the Fast (or Discrete) Fourier Transform) ought to be a good way for a computer to learn about music it listens to Hip hop Folk
It turns out double peaks are frightfully common. We mute them. Hip Hop Folk
So all of our inputs are: Size of song Means of un-fft-ed clips (that’s four inputs) Means of fft-ed clips (that’s four more) The average number of “big” peaks Locations of the five tallest peaks in the fft of each clip (twenty inputs) 30 total inputs
How to output? We recall that output vectors have no boundary problems and give greater accuracy than output scalars. 100 denotes folk 010 denotes post rock 001 denotes hip hop
Drum-roll please!! We tested various network sizes, settling on 30x100x100x3. It crapped on us.
(and that’s all we got) (so far)