130 likes | 262 Views
Building a Catalan diphone voice. Ariadna Font Llitjos May 10, 2001. Defining the phoneset. Most Catalan phones (34) plus 2 Spanish phones ( th and jj )
E N D
Building a Catalan diphone voice Ariadna Font Llitjos May 10, 2001
Defining the phoneset • Most Catalan phones (34) plus 2 Spanish phones (th and jj) • Reason: All Catalan speakers also have Spanish phones, and there are many Spanish borrowed words that are in most Catalan speaker’s lexicon • Left out phones that need a much finer classification than the ones made for English phones (beta, gamma, etc)
Generating Diphone Schema • Mostly same as Spanish, but with the new set of phones. • Catalan has 8 vowels (w/o considering stress), whereas Spanish has only 5 -> had to add a level of vheight (high mid-high mid-low low) ( draw graph on the board) • Mapping Catalan phones to a predefined set of phones • Over generative. Voice better suited to pronounce foreign or nonsense words that contain phones in the language but no legal combination of those
Mapping Catalan phones to a predefined set of phones • Options: Spanish and English • My choice: English • Reasons: • English has more phones for vowels, more appropriate than Spanish, • Spanish phones have already been mapped to English phones, better to just map the phones directly to English, rather than indirectly
Generating and recording the prompts • 1109 prompts (recorded on festvox0) • Lots of room noise (typing, door, talking, etc.) • Microphone not always in same position • Different power and even different intonation and duration throughout the whole recording process
Labeling nonsense words • Automatically: • make_labs • make_diph_index • Manually: • Find a set of diphones that are wrong and look them up in dic/afldiph.est • Edit and correct the corresponding file with emulabel • Rerun make_diph_index (etc.)
Extracting pitchmarks and LPS coefficients • Automatically: • make_pm_wav (edit to modify pitch range of speaker) • find_powerfactors (tells us what general power difference exists between files, calculated a table of power modifiers for each file) • make_lpc
Testing phone synthesis • (SayPhones ‘(pau o l a pau s o k l a r i a d n a pau)) • Catalan voice • Spanish voice • English voice (modifying the phones)
Catalan voice is still quite bad • Bad example • But it does have a basic Spanish phone… and without it, it would sound like this And here is how kal_diphone sounds
Added tokenization • To be able to tell the numbers in Catalan (followed the Spanish tokenizer) Show file
Added some lexical entries • Letters of the alphabet, symbols, punctuation, some content words…
Phrasing, duration and intonation • Not there yet • Nor can I get it to SayText
Summary: building a diphone voice • Define phoneset • Generate diphone schema • Generate prompts • Record prompts • Label prompts • Extract pitchmarks and LPC coefficients • Test phone synthesis • Hand correct labels • Add tokenizer • Add lexicon • Add prosody, durations and intonation • Test and evaluate voice • Package for distribution