90 likes | 217 Views
Tuning the Unit Selection Voices (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009). Kishore Prahallad Email: kishore@iiit.ac.in International Institute of Information Technology (IIIT) Hyderabad, India & Language Technologies Institute, Carnegie Mellon University. Better Labels. Research:
E N D
Tuning the Unit Selection Voices(Workshop Talk at IIT Kharagpur, Mar 4-5, 2009) Kishore Prahallad Email: kishore@iiit.ac.in International Institute of Information Technology (IIIT) Hyderabad, India &Language Technologies Institute, Carnegie Mellon University Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad
Better Labels • Research: • Automatic segmentation models such as HMMs or neural networks could be tuned to obtain better labels. • Practical: • Use existing state-of-art speech segmentation algorithm • Manually verify and correct the misaligned labels • For small databases, manual correction is more apt. • Emulabel / Wavesurfer are the tools suited for this purpose. Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad
Pitch Marks • Why Pitch Marks: • In speech synthesis, pitch synchronous processing is commonly employed to extract features and during concatenation. (different from block processing) • Pitch synchronous processing leads to smoother concatenation of two speech segments (thus better quality) • Pitch extraction is done through autocorrelation based algorithm • Implementation details may be necessary to tune the pitch • Tune the parameters of pitch extraction to tune to the specific speaker (your voice talent) • Reference: http://festvox.org/bsv/bsv-pitchmarks-sect.html Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad
What you need to do to extract better pitch marks? • Read: http://festvox.org/bsv/bsv-pitchmarks-sect.html • STEP 1: • Open bin/make_pm_wave • Edit the line PM_ARGS • min, max correspond to *expected* time difference between two major peaks in the autocorrelation sequence • -min 0.005 (-min 0.0016 for female) • -max 0.012 (-max 0.007 for female) • -lx_lf 200 (400 depending for female) • -lx_hf 40 (200 depending for female) Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad
Step 2: Check the output • Modify the script to your approximate needs, • run it on a single file, • then run the script that translates the pitchmark file into a labeled file suitable for emulabel • bin/make_pm_wave wav/awb_0001.wavbin/make_pm_pmlab pm/awb_0001.pm • You can the display the pitchmark with emulabel etc/emu_pm awb_0001 Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad
Step 2 • A good pitch marks would be as shown above, (red lines at the maximum amplitude positions) • If they are not repeat Step 1 Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad
Step 3: Rebuild the Voice Once new labels and new pitch marks are extracted repeat the following steps. • 6. Smooth the pitch markers bin/make_pm_fix pm/*.pm • 7. Generate Mel Cepstral coefficients bin/make_mcep wav/*.wav • 8. Generate Utterance Structure festival -b festvox/build_ldom.scm '(build_utts "etc/time.data")' • 9. Cluster the units festival -b festvox/build_ldom.scm '(build_clunits "etc/time.data")' • 10. Test the voice. festival festvox/iiit_time_pra_ldom '(voice_iiit_time_pra_ldom)' Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad
Evaluation • Compare the voice samples synthesized • before and after changing pitch marks (no change in the labels) • Better labels + better pitch marks Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad
References • http://festvox.org • 11-752 CMU course slides • http://festvox.org/festtut/ • 11-752 CMU Course Lecture Notes • http://festvox.org/festtut/notes/festtut_toc.html • http://festvox.org/bsv/bsv-pitchmarks-sect.html Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad