350 likes | 482 Views
A System for Hybridizing Vocal Performance. By Kim Hang Lau. Parameters of the singing voice. Parameters of the singing voice can be loosely classified as: Timbre Pitch contour Time contour (rhythm) Amplitude envelope (projections). Vocal Modification.
E N D
A System for Hybridizing Vocal Performance By Kim Hang Lau
Parameters of the singing voice • Parameters of the singing voice can be loosely classified as: • Timbre • Pitch contour • Time contour (rhythm) • Amplitude envelope (projections)
Vocal Modification • Vocal modification refers to the signal processing of live or recorded singing to achieve a different inflection and/or timbre • Commercially available units include • Intonation corrector • Pitch/formant processor • Harmonizer • Vocoder
Objectives • Prototype a system for vocal modification • Modify a source vocal sample to match the time evolution, pitch contour and amplitude envelope of a similarly sung, target vocal sample • Simulates a transfer of singing techniques from a target vocalist to a source vocalist – thus a hybridizing vocal performance
Order of Presentation • System Overview • Individual components • System evaluation • System limitations • Conclusions and recommendations
System Overview • Three components • Pitch-marking • Time-alignment • Time/pitch/amplitude modification engine • Inspired by Verhelst’s prototype system for the post-synchronization of speech utterances
Pitch-marks P P’ 5ms 5ms Pitch-marking and Glottal Closure Instants (GCIs) • Information generated from pitch-marking • Pitch period • Amplitude envelope • Voiced/unvoiced segment boundaries
Pitch-marking applying Dyadic Wavelet Transform (DyWT) • Kadambe adapted Mallat’s algorithm for edge detection in image signal to the detection of GCIs in speech signal • He assumed the correlation between edges in image signal and GCIs in speech signal • DyWT computation for dyadic scales 2^3 to 2^5 was sufficient for pitch-marking • If a particular peak detected in DyWT matches for two consecutive scales, starting from a lower scale, that time-instant is taken as a GCI
Mallat Kadambe Original Signal 2^1 2^2 2^3 2^4 2^5 Base-band
The proposed pitch-marking scheme • Detection principle • Detection of the scale that contains the fundamental period • Starting from a higher scale (of lower frequency), there is a considerable jump in frame power when this scale is encountered • Features • 4X decimation to support high sampling rates • Frame based processing and error correction for possible quasi-real-time detection
Comparisons of results with Auto-Tune Proposed system Auto-Tune
(n) (n) (n) D(n) Time/pitch/amplitude modification engine (n): time-modification factor (n): pitch-modification factor (n): amplitude modification factor D(n): time-warping function
TD-PSOLA(Time-domain Pitch Synchronous Overlap-Add) • Time-domain splicing overlap-add method • Used in prosodic modification of speech
Evaluation of the modification engine Original TD-PSOLA Auto-Tune
Time-alignment • Based on Verhelst’s prototye system that applies Dynamic Time Warping (DTW) • He claimed that the basic local constrain produces the most accurate time-warping path • Exponential increase in computation as length of comparison increases • Accuracy deteriorates as length of comparison increases
Adaptations from Verhelst’s method • Proposed to perform time-alignment on a voiced/unvoiced segmental basis • DTW for voiced segments • Linear Time Warping (LTW) for unvoiced segments • Global constraints are introduced to further reduce computations • Synchronization of voiced/unvoiced segments are required, which is manually edited in current implementation
Manipulation of modification parameters • Simple smoothing of (n), (n) using linear phase FIR low-pass filters are performed before feeding them to the modification engine
System Limitations • Segmentation • Lack of a reliable technique for voiced/unvoiced segmentation • Segmentation and classification of different vocal sounds is the key to devise rules for modification • Modification engine • Lack capabilities to handle pitch transition, total dependence to the pitch-marking stage
System Limitations • Pitch-marking • Proposed system lacks robustness • Despite desirable time-response of the wavelet filter bank, its frequency response is not capable of isolating harmonics effectively and efficiently • Time-alignment • The DTW basic local constraint allows infinite time expansion and compression. • This factor often causes distortions in the synthesized vocal sample
Conclusions and Recommendations • Current systems works well for slow and continuous singing • Further improvements on the individual components are recommended to handle greater dynamic changes of the vocal signal, thereby extending the current good results to a wider range of singing styles
Questions & Answers