240 likes | 251 Views
This project aims to evaluate the effectiveness of prosody prediction in synthesis with respect to Modern Greek prenuclear accents. Theoretical background, tools and methods, pilot experiment results, and future work are discussed.
E N D
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing University of Edinburgh Supervisor: Prof. D. R. Ladd External Advisor: Robert Clark (CSTR)
Today’s presentation • Project’s main goal • Theoretical background • Hypothesis • Tools & Methods • Pilot experiment • design • results • Future work
Prosody prediction in modern TTS systems • Abstract Level AcousticsPerception f0pitch duration rhythm amplitude loudness • Interaction of correlates not always clear… • Not necessarily adequate information from text • Speaker variability (production & perception)
F0prediction • Global f0 properties: declination, reset. • Local f0properties: contour shape, tonal targets, alignment. • F0 predictors: syllable properties word properties rhythm syntactic structure information structure
Project’s Main Goal • Intonational phonetics & phonology prosody prediction in synthesis • Synthetic speech: insight on role of tonal alignment • Naturalness judgements • effect • distribution • TTS system design?
Pre-nuclear accents • Prosodic units: IP (intonational phrase) iP (intermediate phrase) • iP contains one or more pitch accents • Final accent in iP is the nuclear accent • All non final accents are pre-nuclear
The case of Modern Greek (Arvaniti et al., 1998) • Tonal targets: scaling & alignment • Modern Greek pre-nuclear accents: two tonal targets, a L and a H. • Stability of valley (F0min) vs variability of peak (F0 max) type of accent? • bitonal L* + H • L* accent followed by H phrase tone
The case of Modern Greek (Arvaniti et al., 1998) H L C0 V0* C1 V1 • Tonal targets independently aligned with specific points in segmental string. • Duration & slope off0movement depends on segmental quality. (-5ms) (+15ms)
What does the project actually involve? • Presuppose validity of Arvaniti et al.’s findings • Apply them in synthetic speech (DEMOSTHeNES Speech Composer) • Move alignment points of both L and H (Praat) • Perceptual experiments (E-Prime)
Original hypothesis • Movements in alignment are not going to influence perception of naturalness significantly. • In case perception is affected, late alignment of the F0 max is expected to have the greatest influence.
Test Sentences • At least one unaccented syllable preceding accented one • Accented vowel between nasals, lateral • At least two syllables before following accent • Example Sentence Τοανώνυμογράμματηναναστάτωσε. To ano*nimo gra*ma tin anasta*tose
DEMOSTHeNES • University of Athens, M-PIRO project • a modular system like Edinburgh’s Festival (HRG, VSERVER, VCOM, VMOD) • Prosody in DEMOSTHeNES • duration, pitch, amplitude offered as VCOMs linked to the HRG • Current prosodic model: phrasing & lexical stress
Output (Praat) • f0declination • reset at phrase breaks • limited pitch range • limited movements
Towards naturalness I • Apply results of Arvaniti et al. to default pitch contour of DEMOSTHeNES. H L C0 V0* C1 V1 • Not only first but also second stressed syllable (+15ms) (-5ms)
Output (Praat) • f0 declination • same pitch range • more f0 movements
Towards naturalness II : modifications in alignment • Targets moved independently earlier or later than normal alignment points • Early – Late • Late – Early • Normal – Late etc… • 40 – 80 ms 50 – 100 ms 60 – 120 ms ?
Design of pilot perceptual experiment • 2 sentences: standardVSmodified alignment N – N VS Early – Late Late – Early Normal - Late • Naturalness judgement of pair-comparisons • 12 native Greek speakers, students in Edinburgh • Aim: 40 – 80 50 – 100 60 - 120 ?
Future Work • 10 sentences: standardVSmodified alignment N – N all possible combinations between Early – Normal – Late • Modifications by 40 – 80 and 60 – 120 ms • Native Greek speakers, Greece, July :-) • Aim: patterns in perception of naturalness?
The contribution of this project • Insight on role of alignment in perceiving a synthetic utterance as natural • TTS system design • results not restricted to Greek • evidence for segmental anchoring in other languages – studies of Dutch, German, English
Sound files DEMOSTHeNES Arvaniti et al. Early L (50ms)– Late H (100ms) Late L (50ms)– Early H (100ms)