330 likes | 577 Views
A Text-to-Speech Synthesis System. Presented By: Michael Beddaoui Abdel-Aziz El-Solh. Presentation Outline. Introduction Background 3 Components of TTS System Text Pre-processing Aziz Prosody Mike Concatenation Mike Summary What has been done / Future Work Conclusion Questions.
E N D
A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh
Presentation Outline • Introduction • Background • 3 Components of TTS System • Text Pre-processing Aziz • Prosody Mike • Concatenation Mike • Summary • What has been done / Future Work • Conclusion • Questions
What is a TTS System? Definition: • A system which takes as input a sequence of words and converts them to speech Applications: • Services for the hearing impaired • Reading email aloud Commercial TTS Systems: • Festival • Bell Labs TTS
Different TTS Systems • Phonemes are: • The minimal distinctive phonetic units • Relatively small in number (39 phonemes in English) Phoneme-Based TTS System Disadvantage: Phonemes ignore transitional sound !!!
Different TTS Systems (cont’d) Diphone-Based TTS System • Diphones are: • Made up of 2 phonemes • Incorporate transitional sound • Make for better sounding speech Disadvantage: Over 1500 diphones in the English language !!!
Fundamental Components TTS System words Text Pre-processing Prosody Concatenation
Text Pre-Processing • Input • String of characters (sentence) • Output • String of diphone symbols • Objective • Perform sentence level analysis • Punctuation marks • Pauses between words • Convert all input to corresponding diphones
Text Pre-Processing (Block Diagram) NumberConverter NumberConverter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary
Number Converter • Replace numerals with their textual versions 100 one hundred • Handle fractional and decimal numbers 0.25 point two five
Text Pre-Processing (Block Diagram) NumberConverter Acronym Converter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary
Acronym Converter • Replace acronyms with single letter components A.B.C. A B C • Change abbreviations to full textual format Mr. Mister
Text Pre-Processing (Block Diagram) NumberConverter Acronym Converter Word Segmenter Word Segmenter Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary
Word Segmenter • Divide sentence into word segments • Special delimiter to separate segments (i.e. ‘||’) • Segments can be: • A single word • An acronym • A numeral • Identify punctuation marks
Text Pre-Processing (Block Diagram) NumberConverter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary
Word To Diphone Converter (Phonetization) • Purpose • Translate words to their diphone representations • Resource • Dictionary of words and their diphones (derived from CMU phoneme database) • Over 175,000 words supported
W-to-D Converter Cont’d • Implementation • Binary Search Algorithm in C • Start with whole dictionary as search range • start index, end index, middle index • If target word alphabetically less then middle word, • then ignore second half (i.e. end index = middle index) • else ignore first half (i.e. start index = middle index) • Repeat until word found or range contains zero words
W-to-D Converter Cont’d • Advantages • Fast search times • Search range decreases exponentially with each iteration (max of 1 sec currently) • Less complicated to implement • Compared to indexing dictionary or • Importing the dictionary to an internal structure
Text Pre-Processing (Block Diagram) NumberConverter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) MLDS MLDS Diphone Dictionary
The Multi-Level Data Structure • Contains all necessary data for the next sub-system: • Word • Diphone representation • Prosodic parameters for each diphone • This reflects both word-level and sentence- level prosody • Allows for modularization
Prosody done MLDS Acoustic Manipulation Diphone Retrieval Concatenation yes no Diphone Database
Diphone Retrieval • Database of recorded diphones • Every diphone matched with txt file • Distinguished by type (CC, CV, VC, VV) • References to specific components within waveform • Store diphone waveform and prosodic parameters in variables
Properties of Speech Signals eg. cat.wav c a t Non- Periodic Periodic Non- Periodic
Acoustic Manipulation - MATLab • Recognizes wave files (.WAV) • load, play, write • Vast array of signal processing tools • Built-in functions • Ease of debugging • GUI-capable
Pitch/Duration/Amplitude Alteration • As pitch increases, pitch period shrinks • As pitch decreases, pitch period expands • Need to alter length between pitch marks in order to alter pitch of speech signal Pitch – vowels only
Altering Pitch = X Hanned pitch period Original diphone Extracted pitch period Hanning window ‘C_A’
Altering Pitch Cont’d PSOLA – Pitch Synchronous Overlap and Add = 50% Overlap + Add Pitch Up > 50% Pitch Down < 50%
Altering Pitch Cont’d X Kaiser window X 12 -naturally spoken vowels contain 12-18 pitch marks =
Altering Duration • Increase number of PSOLA iterations (overlaps) to increase duration • Decrease number of PSOLA iterations (overlaps) to decrease duration Altering Amplitude • Multiplying the signal by a constant • If constant > 1, amplitude increase • If constant < 1, amplitude decrease
Concatenation • Using PSOLA at the joining ends • Ensures smooth transition Diphones Words Words Sentence • Straight joining at the end points due to presence of pauses
Summary TTS System words Text Pre-processing Prosody Concatenation • System modularized
Progress • Work Completed / Current Status • Text pre-processing and prosodic manipulation for a multi-syllable word • Diphone concatenation • 200+ diphones in database • Fully functional GUI implemented • Work To Be Done • Sentence level synthesis • Expand diphone database • Fine-tuning and enhancing • Prepare for Poster Fair • Write final report
Questions? Contact Information Michael Beddaoui mich121212@hotmail.com Abdel-Aziz El-Solh zizo01@hotmail.com