1 / 14

Evaluation of Corpus based Synthesizers

Evaluation of Corpus based Synthesizers. The Blizzard Challenge – 2005: Evaluating corpus-based speech synthesis on common datasets Alan W. Black and Keiichi Tokuda Large Scale Evaluation of Corpus based Synthesizers: Results and Lessons from the Blizzard Challenge 2005 Christina L. Bennett.

goldy
Download Presentation

Evaluation of Corpus based Synthesizers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluation of Corpus based Synthesizers The Blizzard Challenge – 2005: Evaluating corpus-based speech synthesis on common datasets Alan W. Black and Keiichi Tokuda Large Scale Evaluation of Corpus based Synthesizers: Results and Lessons from the Blizzard Challenge 2005 Christina L. Bennett Presented by:Rohit Kumar

  2. What are they Evaluating ? • Corpus based Speech Synthesis Systems • 2 primary Elements of any such system • Corpus (High quality speech data) • Approach to build a Text to Speech System • The Quality of the Text to Speech System developed by this Corpus based System is heavily tied with the Quality of Speech Corpus • How do we evaluate the Approach then ?? • Common Corpus (Database)

  3. What are they Evaluating ? • Quality of the Approach • Not considering how good the corpus itself is • Capability to quickly build systems given the Corpus • TTS development has evolved from being a science to being a Toolkit • Again not considering the time to create the corpus. • Tug of War between Time taken to create a high quality corpus, fine tuning the system (Manual work) and Merit of the Approach itself. • Reliability of each Particular Evaluation Method (Black & Tokuda) • Reliability of each Listener Group forEvaluation (Black & Tokuda)(Bennett)

  4. Alrite. How to Evaluate ? • Create Common Databases • Issues with common databases • Design parameters, Size of Databases, etc. • Non Technical Logistics: Cost of creating databases • Using the CMU-ARCTIC Databases

  5. Alrite. How to Evaluate ? • Evaluate different Quality Measures • Quality is a realllllllllllly broad term • Intelligibility, Naturalness, etc.. • 5 Test of 3 types • 3 Mean opinion score tests- different domains • Novels (in-domain), News, Conversation • DRT/MRT • Phonetically confusable words embedded in sentences • Semantically Unpredictable Sentences • Create Common Databases

  6. Alrite. How to Evaluate ? • Evaluate different Quality Measures • 5 Test of 3 types • Create Common Databases • 6 Teams = 6 Systems : Different approaches • Another 7th System added: Real Human Speech

  7. Alrite. How to Evaluate ? • Evaluate different Quality Measures • 5 Test of 3 types • Create Common Databases • 6 Teams = 6 Systems : Different approaches • 2 Databases released in Phase 1 to develop approaches (Practice databases) • Another 2 Databases released in Phase 2 (with time bounded submission)

  8. Alrite. How to Evaluate ? • Evaluate different Quality Measures • 5 Test of 3 types • Create Common Databases • 6 Teams = 6 Systems : Different approaches • 2 Phase Challenge • Web based Evaluation • Participants choose a test and Complete it • Can do the whole set of test in multiple sessions • Evaluates 100 sentences per participant

  9. Alrite. How to Evaluate ? • Evaluate different Quality Measures • 5 Test of 3 types • Create Common Databases • 6 Teams = 6 Systems : Different approaches • 2 Phase Challenge • Web based Evaluation • Different types of Listeners • Speech Experts, Volunteers, US Undergrads • Special Incentive to take test 2nd time

  10. Alrite. How to Evaluate ? • Evaluate different Quality Measures • 5 Test of 3 types • Create Common Databases • 6 Teams = 6 Systems : Different approaches • 2 Phase Challenge • Web based Evaluation • Different types of Listeners • Any Question about Evaluation Setup ?

  11. Fine. So what did they get ? • Evaluation of 6 Systems + 1 Real Speech • Observations:Real Speech consistently BestLot of inconsistency across tests But Agreement on the Best SystemListener Groups V & U very similar for MOS test

  12. Additional Agenda • Comparing Voices • Exit Poll • Votes for Voices • Inconsistencies between Votes and Scores • Consistency in votes of voices across Listener Groups

  13. Additional Agenda (contd)..

  14. Discussion • Numbers given are all averages • No variance figures • Consistency of scores of each system ?? • Ordering of Tests: Participant’s choice • Measuring Speed of Development ?? • Nothing in the Evaluation method as such to measure speed of development • Some of the participants who submitted papers about their system in this competition did give those figures • Also, no control on Number of Man-Hours, Computational Power • Testing Approach on Quality of Speech • Issues like How much computational effort it takes not looked at • Web based Evaluation (Black & Tokuda) • Uncontrolled Random variables • Participant’s Environment, Network connectivity • Ensuring usage of the Common Database (and no additional Corpus) • Voice Conversion: Similarity Tests (Black & Tokuda) • Word Error Rate calculation for Phonetically Ambiguous Pairs ? • Non-Native Participant’s effect on Word Error Rates (Bennett) • Homophone Words (Bean/Been) (Bennett) • Looking back what they were Evaluating

More Related