80 likes | 90 Views
This study explores segmenting popular music sentence by sentence to identify energy gaps in audio signals. The challenge lies in the varied dynamic range and accompaniment sounds. The method involves band-pass filtering and piece-wise linear approximation to determine optimal segmentation points, enhancing the approximation accuracy. The process is iterative, aiming to minimize error bounds and improve the audio signal handling capabilities. This approach showcases potential in audio signal analysis for music segmentation.
E N D
Segmenting Popular Music Sentence by Sentence Wan-chi Lee
Basic Idea • In a song, The energy of audio signal will be low in the gap between sentences. • Trying to detect the energy gap. • Problem: • There will be accompaniment sound. • The dynamic range of audio signal varies a lot: hard to choose threshold.
Methods • Band-pass Filtering the signal: • Here I use 6 order elliptic filter with pass band 800Hz~1.6KHz. • For a short sliding window, calculating the average energy of the signal • I use a 0.1 second window. • Detecting the valley of average energy by piecewise linear approximation.
Piece-wise Linear Approximation • I used a top-down method in determining the approximation. • Specify an error bound. • Find a segmentation point that best improve the approximation. • Calculate linear regression for each segment as the approximation. • If the error bound is not achieved, repeat above steps.
Segmentation Point • After finding the linear approximation, choose points representing the gap in energy. • Place some restrictions to make the segments be in reasonable length.
Demo and Discussion • I only used one feature. Other features can be incorporated. • Heuristic method: no training needed, but lots of parameters to tune. • It should be integrated with onset detection to let the segmenting points coincide with the onset.