1 / 24

Automatic Pruning of Unit Selection Speech Databases for Synthesis without loss of Naturalness

Automatic Pruning of Unit Selection Speech Databases for Synthesis without loss of Naturalness. Rohit Kumar + , S. P. Kishore + ** + International Institute of Information Technology Hyderabad, INDIA ** Carnegie Mellon University, Pittsburg. ICSLP – INTERSPEECH 2004 : Jeju Island, Korea.

chico
Download Presentation

Automatic Pruning of Unit Selection Speech Databases for Synthesis without loss of Naturalness

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Pruning of Unit Selection SpeechDatabases for Synthesiswithout loss of Naturalness Rohit Kumar + , S. P. Kishore + ** + International Institute of Information Technology Hyderabad, INDIA ** Carnegie Mellon University, Pittsburg ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

  2. Organization of the Talk • Introduction to Unit Selection based T.T.S. • Need / Scope for Pruning of Speech Databases • Our Aims at this Work • Low Memory Device Synthesizer • Definitions of Neutral and Optimal Units • Implementation and Results • How do we prune ?? • Ranking of Units • Heuristic for Creating Database of Any Size • Optimal Sized Database : Perceptual Evaluation • Conclusions ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

  3. Unit Selection based T.T.S. Linguistic Information Selected Units Unit Selection Algorithms Text / Linguistic Processing Signal Processing Text Speech Sequence of Basic Units Inventory Building Modules Basic Unit Inventory Instance12..n1..m… UnitAA..AB..B… Featuresx(A1), y(A1) ..x(A2), y(A2) ....x(An), y(An)..x(B1), y(B1) ....x(Bm), y(Bm) ..… Speech Corpus Transcriptions Signal Labels Features ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

  4. Unit Pruning • Typical size of High quality Unit Selection Databases is large (~100 MB to 500 MB) • Using Unit Pruning Techniques, the database size can be significantly reduced without loss in quality • Unit Pruning refers to removal of units instances from the Unit Selection Database that do not add (or may even be harmful) to the quality of synthesized speech. ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

  5. Need / Scope for Pruning 1. To Improve Quality 2. To Reduce Size • << Deviant Units >> • Units having features too deviant from usual values of such features • These units rarely get selected due to the high costs of selection • Removal of such units from database improves the quality of synthesized output • << Redundant Units >> • Units having very similar feature sets • Do not contribute significantly to the diversity of units in the database • Removal of such units from database does no harm to synthesis quality and helps in reducing database size ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

  6. Our Aims at this Work Multiple Aims To come up with a Low Memory Device Synthesizer (for PDAs, Mobiles) To be able to Create a Database of Any Size with a Corresponding Quality To corner upon an Optimal Size of the Database without any loss of Quality ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

  7. Low Memory Device Synthesizer • Requirement is to come up with a Speech Synthesizer that would fit into a Low memory device like a PDA • The database size of a normal Unit Selection based TTS would be prohibitively large to fit the system into a Small Device • So we trade off: [ Quality  Size ] • But keep all possible basic units • So instead of multiple Instance of each basic Unit, we keep only one Instance of each basic unit ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

  8. Low Memory Device Synthesizer • Requirement is to come up with a Speech Synthesizer that would fit into a Low memory device like a PDA • The database size of a normal Unit Selection based TTS would be prohibitively large to fit the system into a Small Device • So we trade off: [ Quality  Size ] • But keep all possible basic units • So instead of multiple Instance of each basic Unit, we keep only one Instance of each basic unit QUESTION:How to choose one most suitable instance from the various Instances of each unit ?? ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

  9. QUESTION:How to choose one most suitable instance from the various Instances of each unit ?? NEUTRAL UNITS Hypothesis:Best Instance is the one that is prosodically neutral and will have minimal contextual effects. Neutral Units will join best with Neutral Units Definition:Neutral (Average) Unit is the unit instance that has features closest to Average of features. Average Pitch Average Duration Average Energy So <PNeutral, DNeutral, ENeutral> is closest to <PIdeal, DIdeal, EIdeal> ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

  10. QUESTION:How to choose one most suitable instance from the various Instances of each unit ?? Contd… Optimal Units Alternative Hypothesis:Best instance is the one that joins most suitably in all contexts that it is likely to appear in. Definition:Unit Instance that joins most suitably with all the units that appear in the context of the instances of the unit under consideration ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

  11. Optimal Units Let {A1, A2, …. Ai, …., An} be the instances of a basic unit A Let Ai-1 and Ai+1 be units preceding and succeeding the instance Ai in the corpus Global Prosodic Mismatch Function (GPMF) But PExpected[Ai-1] = P[Ai] DExpected[Ai-1] = D[Ai] EExpected[Ai-1] = E[Ai] By definition, Optimal Instance of the Unit is one that minimizes GPMF ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

  12. Most Suitable Instance Selection Low Memory Device Synthesizer Implementation Linguistic Information Signal Processing Text / Linguistic Processing Text Speech Sequenceof Basic Units Basic Unit Inventory Instancexyz....wm… UnitABC....PQ… Instance12..n1..m… UnitAA..AB..B… Featuresx(A1), y(A1) ..x(A2), y(A2) ....x(An), y(An)..x(B1), y(B1) ....x(Bm), y(Bm) ..… Neutral / Optimal Unit Selection has now moved from Synthesis Time to Inventory Building Time ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

  13. Low Memory Device Synthesizer Results Perceptual Tests :: on 8 subjects scoring 10 sentences on a scale of 0(worst) to 5(best) Database F: Optimal Units is a best performing Approach ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

  14. Low Memory Device Synthesizer Results A LMDS System implemented on a Handheld Computing Device Hindi Database consisting of 2786 unique basic units (syllables, phonemes) collected using Optimal Unit Approach Actual Database Size @ 16Khz 256kbps = 180 MB GSM Coded @ 8KHz  Database Size = 1.27 MB G722 Coded @ 16 Khz  Database Size = 5.02 MB ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

  15. How do we Prune ?? For every unit in the database, • Score each instance of the unit for the (un) desirability of that particular instance given all the other instances • Pick the top x%of the instances of the unit and remove all the others ( x ) is the Pruning Control Parameter ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

  16. How do we Prune ?? For every unit in the database, • Score each instance of the unit for the (un) desirability of that particular instance given all the other instances • Pick the top x%of the instances of the unit and remove all the others ( x ) is the Pruning Control Parameter Question: How do we rank the units ?? ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

  17. Ranking the Unit Occurrences 1. Measure of Undesirability of Instance2. Inter Instance Repulsion Using Weighted Global Prosodic Mismatch Function (GPMF) Undesirability Repulsion SCOREx = Ux – (WREPLUSION x Rx) Ranking is in descending order of Score ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

  18. Database of Any Desired Size Having the instances of Units in Ranked order, we need a Pruning Control Parameter (x%), to decide what kind of database we want. Experiment Hindi Speech Corpus (96 Minutes) 2 Kinds of Basic Units Syllables: Unique: 2391 Total: 23096 Phonemes: Unique: 49 Total: 54734 Pruning Control Parameters P%  Percentage of Phonemes to be Kept S%  Percentage of Syllables to be Kept WREPULSION  Inter Instance Repulsion Weight in Scoring ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

  19. Database of Any Desired Size We created several pruned database using different sets of pruning control parameters. A database specific empirical has been derived to come up a pruned database of any desired size. ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

  20. Optimal Database Size To come up with an optimal set of pruning parameters, so that minimal size of database can be achieved without degradation in quality (naturalness, perceptibility) of speech. Perceptual tests conducted on several pruned databases created with different Pruning Control Parameters. 10 Databases with different Pruning Control Parameter Values 8 Subjects ranked 5 sentences each on a scale of 0 (worst) to 5 (best) ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

  21. Contd.. ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

  22. Wrepulsion = 2.0 Contd.. Wrepulsion = 0.5 ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

  23. Optimal Size Around here Wrepulsion = 2.0 Contd.. Wrepulsion = 0.5 ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

  24. Conclusions • Various approaches for selecting the most suitable instance of a unit type in a unit selection database proposed. • GPMF based Optimal Unit found to be most suitable. • Technique for Ranking unit instances using GPMF described • Used for Pruning Unit selection database • A Low Memory Device Synthesizer implemented • Database Specific Empirical Formula derived to come up with a database of any desired size (based on set of suitable pruning control parameters) • Optimal Sized Database created (pruned) without loss of any naturalness ICSLP – INTERSPEECH 2004 : Jeju Island, Korea

More Related