110 likes | 309 Views
Topic: Tuning the Pitch Markers Prerequisite: Pitch Extraction. Kishore Prahallad Email: skishore@cs.cmu.edu Carnegie Mellon University & International Institute of Information Technology Hyderabad. Objective of this Lecture. To tune the pitch markers for better quality synthesis.
E N D
Topic: Tuning the Pitch MarkersPrerequisite: Pitch Extraction Kishore Prahallad Email: skishore@cs.cmu.edu Carnegie Mellon University & International Institute of Information Technology Hyderabad Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Objective of this Lecture • To tune the pitch markers for better quality synthesis Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Pitch Marks • Why Pitch Marks: • In speech synthesis, pitch synchronous processing is commonly employed to extract features and during concatenation. (different from block processing) • Pitch synchronous processing leads to smoother concatenation of two speech segments (thus better quality) • Pitch extraction is done through autocorrelation based algorithm • Implementation details may be necessary to tune the pitch • Tune the parameters of pitch extraction to tune to the specific speaker (your voice talent) • Reference: http://festvox.org/bsv/bsv-pitchmarks-sect.html Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
What you need to do to extract better pitch marks? • Read: http://festvox.org/bsv/bsv-pitchmarks-sect.html • STEP 1: • Open bin/make_pm_wave • Edit the line PM_ARGS • min, max correspond to *expected* time difference between two major peaks in the autocorrelation sequence • -min 0.005 (-min 0.0016 for female) • -max 0.012 (-max 0.007 for female) • -lx_lf 200 (400 depending for female) • -lx_hf 40 (200 depending for female) Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Step 2: Check the output • Modify the script to your approximate needs, • run it on a single file, • then run the script that translates the pitchmark file into a labeled file suitable for emulabel • bin/make_pm_wave wav/awb_0001.wavbin/make_pm_pmlab pm/awb_0001.pm • You can the display the pitchmark with emulabel etc/emu_pm awb_0001 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Step 2 • A good pitch marks would be as shown above, (red lines at the maximum amplitude positions) • If they are not repeat Step 1 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Step 3: Rebuild the Voice Once new labels and new pitch marks are extracted repeat the following steps. • 6. Smooth the pitch markers bin/make_pm_fix pm/*.pm • 7. Generate Mel Cepstral coefficients bin/make_mcep wav/*.wav • 8. Generate Utterance Structure festival -b festvox/build_ldom.scm '(build_utts "etc/time.data")' • 9. Cluster the units festival -b festvox/build_ldom.scm '(build_clunits "etc/time.data")' • 10. Test the voice. festival festvox/iiit_time_pra_ldom '(voice_iiit_time_pra_ldom)' Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Evaluation • Compare the voice samples synthesized • before and after changing pitch marks (no change in the labels) • Better labels + better pitch marks Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Additional Reading for the lecture • http://festvox.org • 11-752 CMU Course Lecture Notes • http://festvox.org/festtut/notes/festtut_toc.html • http://festvox.org/bsv/bsv-pitchmarks-sect.html Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)