930 likes | 1.14k Views
Fundamental Frequency Contour Synthesis for Turkish Text to Speech. Erkan Abdullahbeşe. Content :. TTS systems and prosody Turkish Intonation, Stress Observations on Collected Data Methodology Improvements on Methodology Discussion Conclusion. Introduction to Text to Speech (TTS) Systems.
E N D
Fundamental Frequency Contour Synthesis for Turkish Text to Speech Erkan Abdullahbeşe
Content: • TTS systems and prosody • Turkish Intonation, Stress • Observations on Collected Data • Methodology • Improvements on Methodology • Discussion • Conclusion
Introduction to Text to Speech (TTS) Systems • Text -> speech signal • Widespread applications • Message to speech generation • Man-machine dialogue • Multimedia applications • Talking aids for handicapped CHALLENGE: Machine Accent -> Natural Speech SOLUTION: Prosody Generation Modules
What is Prosody? • Properties of speech that cannot be derived from the phoneme sequence • Modulation of voice pitch • Rhythm, changes in durations • Fluctuations of loudness • Related to domains larger than one phoneme (supra-segmental properties)
Basic Acoustic Parameters • Fundamental Frequency F0(pitch) • Duration • Intensity Prosodic Phenomena • Modulate the basic acoustic parameters • Modulation of fundamental frequency • Intonation • Stress (accent)
Intonation • Ensemble of pitch variations • Perceived as speech melody Stress • Modulate all the basic acoustic parameters • Increase in F0 and intensity (loudness) • Lengthening in duration • Three types: • Word stress • Phrase stress • Sentence stress • Stress on a single syllable • Phrase and sentence stress coincide with word stress
Prosody Generation Modules in TTS • Prosodic description • Prosodic phrasing -> phrase boundaries • Accent labeling -> accents on syllables • Prosodic labels -> F0 contour PROBLEMS • Complex linguistic processing units (morphology, syntax, semantics) • Speaker-dependence • Articulation-related problems: microprosody vs. macroprosody
Basic Intonation Models • Tone Sequence Models : Pitch contour as a sequence of fluctuations generated by local accents • Pierrehumbert: A sequence of independent H and L tones (ortography) • Pitch accent -> pitch movements on stressed syllables • Boundary tone ->at phrase boundaries • Phrase accent -> between stressed syllable and phrase boundary • Superposition Models : Pitch contour as the superposition of several components with different domains: syllables, words, phrases, sentences, paragraphs, whole text • Fujisaki: purely mathematical model -> parametric • A basic F0 • A phrase component (crit. Damped sec. Order to impulse) • An accent component (crit. Damped sec. Order to rectangular) • Optimization of parameter values wrt F0 (Analysis by Synthesis) • Möbius -> Fujisaki + Linguistics -> German
Approaches • Perform an analysis on a speech corpus • Transcribe the corpus • Define F0 labels(rise, fall, peak etc.) and boundary labels (minor, major etc.) • Labeling • By hand • Examination -> rules -> automatic • Automatic learning of : labels -> F0 values (or parametrized) • Neural Networks • Stochastic methods • Intonation pattern dictionary (from natural speech) • Store pitch values in ST and key information (labels) for each pattern • For the patterns in input sentence -> compare key info -> find closest pattern from dictionary -> apply pitch
Approaches • For integration into TTS (labeling input sentence from text) • Complex linguistic processing units • Morphology • Syntax • Semantics • Stochastic methods • Syntax -> most probable label sequence
Sentence Intonation Types • Terminal intonation • pitch decreases at the end -> message completed • Interrogative intonation • pitch slightly increases on the last syllable -> waiting for response • Progressive intonation • pitch either increases slightly or does not show any lowering at the end -> message not completed yet
Turkish Intonation • Classification of sentences • Type: • Declaratives(↓) • wh-questions(↑) • yes-no questions(↓) • Structure: • Simple • Compound: (↑) at the end of subordinate • Meşgul olduğundan(↑) bizimle sinemaya gelemedi(↓).
Turkish Intonation • Tone groups (phrase or segment) • Division into tone groups • / Oraya varınca beni arayın. / • / Oraya varınca / beni arayın. / • Focus (new information) in each tone group • / Oraya varınca beni arayın. / • / Oraya varınca beni arayın. / • / Oraya varınca beni arayın. / • / Oraya varınca beni arayın. / • Pitch variations on focus
Turkish Intonation • Four levels of pitch: low(1), mid(2), high(3), extra high(4) • gi2di3yoru1m • sa2hi4 mi1 • Speech melody <–> musical melody (Nash) • Hierarchy of intonation units(phrase -> text) • Each intonation unit -> melody • Successive intonation units related by motifs -> melody of the upper level • Music: reiteration of motifs -> musical melody
Turkish Stress Word Stress • Fixed(bound) stress vs. Free stress(Turkish) • Stress on a single syllable of a word in Turkish • Effect of suffixes on stress • Stress on final syllable of root + stressable suffix yolcu + -lar → yolcular • Stress on final syllable of root, unstressable suffix involves oku + -yor → okuyor+ -lar → okuyorlar • Stress on non-final syllable of root karınca + -lar → karıncalar • May disappear in sentence
Turkish Stress Sentence Stress • Signals the prominance of the most information-bearing element in a sentence • Types • Unmarked (preverbal position) • Yarın İstanbul’a gidiyorlar. • marked (any position) • Yarın İstanbul’a gidiyorlar. • Focusing elements • Precede focus: sadece, daha • Mehmet daha bugün ödevine başlayabildi. • Follow focus: -mi, da, bile • Ayla mı bugün Ankara’dan dönüyor?
Turkish Stress Phrase Stress • Phrase: modifier or complement and head • Phrase stress on modifier in Turkish • Types • Phrases used as nouns • telefon ahizesi • güzel çiçekler • Phrases used as verbs • hızlı koş • severek yaşa • Others • senin için • yarından sonra • Preserved in the sentence
Motivation • Nevinbugünmenemenyemeli. (template) N Z F V Nevinmenemenyemeli. N F V BizimNevindomateslimenemenyemeli. P N A F V • Nalanyarınaynaalıyor. N Z F V Nalanaynaalıyor. N F V KardeşimNalanyeniaynaalıyor. N N A F V
Nevin bugün menemen yemeli. Nevin menemen yemeli.
Nevin bugün menemen yemeli. Bizim Nevindomateslimenemen yemeli.
Nevin bugün menemen yemeli. Nalan yarın ayna alıyor.
Nevin bugün menemen yemeli. Nalan ayna alıyor.
Nevin bugün menemen yemeli. Kardeşim Nalan yeni ayna alıyor.
Sentence Type Positive Negative Declaratives 25 15 Wh-questions 10 5 Yes-no questions 10 5 Conditionals 6 4 Imperatives 6 4 Exclamations 6 4 Sentences • 100 database sentences • 19 close test sentences (add/remove categories) • 18 random test sentences • Syllable-based handlabeling • Pitch extraction
Observations Declaratives • Pitch decrease at the end (terminal intonation) • Division into phrases • Pitch increase on the phrase-final syllable (progressive intonation) Nevin/bugün/menemen yemeli.
Observations Declaratives • Pitch decrease at the end (terminal intonation) • Division into phrases • Pitch increase on the phrase-final syllable (progressive intonation) Evvelki gün/ikimiz de/kuyumcu Ali’ye uğradık.
Observations Wh-questions • Pitch increase on the last syllable (interrogative intonation) • Evident pitch increase on the stressed syllable of the wh-word • No division into phrases • Word stress often disappears Dün neden zamanımı aldın?
Observations Wh-questions • Pitch increase on the last syllable (interrogative intonation) • Evident pitch increase on the stressed syllable of the wh-word • No division into phrases • Word stress often disappears Kimler yarın sınıf gezisine katılacaklar?
Observations Yes-no questions • Pitch decrease at the end • Evident pitch increase on the stressed syllable of the word before -mi • No division into phrases • Word stress often disappears Oraları yine eskisi gibi güzel mi?
Observations Yes-no questions • Pitch decrease at the end • Evident pitch increase on the stressed syllable of the word before -mi • No division into phrases • Word stress often disappears Mudanya’da bu sene de çok yağmur yağıyor mu?
Observations Conditionals • Pitch decrease at the end (terminal intonation) • Division into phrases • Pitch increase on the phrase-final syllable (progressive intonation) • -se always a phrase-final syllable İnsan azimliyse herşeyi başarabilir.
Observations Conditionals • Pitch decrease at the end (terminal intonation) • Division into phrases • Pitch increase on the phrase-final syllable (progressive intonation) • -se always a phrase-final syllable Babam keyifsizse ona konuyu bu akşam anlatamam.
Observations Imperatives • Pitch decrease at the end (terminal intonation) • Division into phrases • Pitch increase on the phrase-final syllable (progressive intonation) Akşam yemeği için çarşıdan birşeyler alsınlar.
Observations Imperatives • Pitch decrease at the end (terminal intonation) • Division into phrases • Pitch increase on the phrase-final syllable (progressive intonation) Sevgiyi ve mutluluğu yarınlara erteleme.
Observations Exclamations • Diverse • Pitch decrease at the end (terminal intonation) • Evident pitch increase on the stressed syllable of interjection or of another word Aman büyüklerine bir saygısızlık yapma!
Observations Exclamations • Diverse • Pitch decrease at the end (terminal intonation) • Evident pitch increase on the stressed syllable of interjection or of another word Haydi bugün hep birlikte pikniğe gidelim!
Local Observations • At most single stressed syllable excluding phrase-final increase • Stress within the sentence coincides with the word stress • Phrase stress preserved Ekonomik kriz / her kesimden insanı / olumsuz etkiledi.
Local Observations • At most single stressed syllable excluding phrase-final increase • Stress within the sentence coincides with the word stress • Phrase stress preserved Evvelki gün / ikimiz de / kuyumcu Ali’ye uğradık.
Local Observations • Word stress may disappear Beden sağlığımız için akşamları erken yatmalıyız. Mehmet daha bugün ödevine başlayabildi.
Local Observations • Word stress disappears at the end of positives (terminal intonation) Nevin bugün menemen yemeli. Merve evine zamanında dönemez.
Local Observations • Sentence stress (stress on focus) Nevin bugün menemen yemeli. Mehmet daha bugün ödevine başlayabildi.
Local Observations • Effects on neighbour syllables • Unstressed + stressed (ne+vin) • Stressed + stressed • nevin+bu+gün Nevin bugün menemen yemeli.
Local Observations • Effects on neighbour syllables • Stressed + stressed (Partiye+gelmeyeceğim) Ben akşam partiye gelmeyeceğim.
Local Observations • Effects on neighbour syllables • Stressed + unstressed (Gece+rüyasında) Kardeşim beni dün gece rüyasında görmüş.
Local Observations • Effects on neighbour syllables • Stressed + unstressed (ney+le) Bu geç vakitte sizin eve neyle döneceğiz?
Local Observations • Effects on neighbour syllables • Stressed + unstressed (last syllable, terminal intonation) (değil+di) Akşamki yemek pek güzel değildi.
Local Observations • Effects on neighbour syllables • Stressed + unstressed (last syllable, terminal intonation) • (güzel+mi) Oraları yine eskisi gibi güzel mi?
Read Files Choose Best Sentence Generate Regional Durations Apply Pitch Methodology Overwiev • Choose best sentence from a sentence database • Apply its pitch to the matching regions of input sentence • Compression / Stretching • Interpolation • Fit data to remaining regions using interpolation
Methodology Read Files • Input information used for sentences • Sentence type (declarative, wh-question, yes-no question, conditional, imperative, exclamation) • Sentence state (positive or negative) • Categories of each word • Number of syllables of each word • The index of the syllable bearing word stress, for each word (stress in sentence coincides with word stress)
Category Examples noun elma apple adjective güzel beautiful pronoun biz we verb geliyorum I’m coming adverb akşamleyin in the evening postposition kadar as…as conjunction fakat but interjection aman wh-word hangi which question suffix word almış mı did he take conditional iyiyse if good number beş five auxiliary şikayet (etti) (he complained) component Ali’nin Ali’s focus kitap (okuyor) (he reads) book comma (,) Methodology Read Files • Word categories rely mainly on part-of-speech (POS) categories: