Segmenting Popular Music Sentence by Sentence

Segmenting Popular Music Sentence by Sentence Wan-chi Lee

Basic Idea • In a song, The energy of audio signal will be low in the gap between sentences. • Trying to detect the energy gap. • Problem: • There will be accompaniment sound. • The dynamic range of audio signal varies a lot: hard to choose threshold.

Examples of audio signal

Methods • Band-pass Filtering the signal: • Here I use 6 order elliptic filter with pass band 800Hz~1.6KHz. • For a short sliding window, calculating the average energy of the signal • I use a 0.1 second window. • Detecting the valley of average energy by piecewise linear approximation.

Piece-wise Linear Approximation • I used a top-down method in determining the approximation. • Specify an error bound. • Find a segmentation point that best improve the approximation. • Calculate linear regression for each segment as the approximation. • If the error bound is not achieved, repeat above steps.

Segmentation Point • After finding the linear approximation, choose points representing the gap in energy. • Place some restrictions to make the segments be in reasonable length.

Demo and Discussion • I only used one feature. Other features can be incorporated. • Heuristic method: no training needed, but lots of parameters to tune. • It should be integrated with onset detection to let the segmenting points coincide with the onset.

Segmenting Popular Music Sentence by Sentence