240 likes | 347 Views
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents. Elisabeth Chorianopoulou MSc in Speech and Language Processing University of Edinburgh Supervisor : Prof. D. R. Ladd External Advisor : Robert Clark (CSTR). Today’s presentation.
E N D
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing University of Edinburgh Supervisor: Prof. D. R. Ladd External Advisor: Robert Clark (CSTR)
Today’s presentation • Project’s main goal • Theoretical background • Hypothesis • Tools & Methods • Pilot experiment • design • results • Future work
Prosody prediction in modern TTS systems • Abstract Level AcousticsPerception f0pitch duration rhythm amplitude loudness • Interaction of correlates not always clear… • Not necessarily adequate information from text • Speaker variability (production & perception)
F0prediction • Global f0 properties: declination, reset. • Local f0properties: contour shape, tonal targets, alignment. • F0 predictors: syllable properties word properties rhythm syntactic structure information structure
Project’s Main Goal • Intonational phonetics & phonology prosody prediction in synthesis • Synthetic speech: insight on role of tonal alignment • Naturalness judgements • effect • distribution • TTS system design?
Pre-nuclear accents • Prosodic units: IP (intonational phrase) iP (intermediate phrase) • iP contains one or more pitch accents • Final accent in iP is the nuclear accent • All non final accents are pre-nuclear
The case of Modern Greek (Arvaniti et al., 1998) • Tonal targets: scaling & alignment • Modern Greek pre-nuclear accents: two tonal targets, a L and a H. • Stability of valley (F0min) vs variability of peak (F0 max) type of accent? • bitonal L* + H • L* accent followed by H phrase tone
The case of Modern Greek (Arvaniti et al., 1998) H L C0 V0* C1 V1 • Tonal targets independently aligned with specific points in segmental string. • Duration & slope off0movement depends on segmental quality. (-5ms) (+15ms)
What does the project actually involve? • Presuppose validity of Arvaniti et al.’s findings • Apply them in synthetic speech (DEMOSTHeNES Speech Composer) • Move alignment points of both L and H (Praat) • Perceptual experiments (E-Prime)
Original hypothesis • Movements in alignment are not going to influence perception of naturalness significantly. • In case perception is affected, late alignment of the F0 max is expected to have the greatest influence.
Test Sentences • At least one unaccented syllable preceding accented one • Accented vowel between nasals, lateral • At least two syllables before following accent • Example Sentence Τοανώνυμογράμματηναναστάτωσε. To ano*nimo gra*ma tin anasta*tose
DEMOSTHeNES • University of Athens, M-PIRO project • a modular system like Edinburgh’s Festival (HRG, VSERVER, VCOM, VMOD) • Prosody in DEMOSTHeNES • duration, pitch, amplitude offered as VCOMs linked to the HRG • Current prosodic model: phrasing & lexical stress
Output (Praat) • f0declination • reset at phrase breaks • limited pitch range • limited movements
Towards naturalness I • Apply results of Arvaniti et al. to default pitch contour of DEMOSTHeNES. H L C0 V0* C1 V1 • Not only first but also second stressed syllable (+15ms) (-5ms)
Output (Praat) • f0 declination • same pitch range • more f0 movements
Towards naturalness II : modifications in alignment • Targets moved independently earlier or later than normal alignment points • Early – Late • Late – Early • Normal – Late etc… • 40 – 80 ms 50 – 100 ms 60 – 120 ms ?
Design of pilot perceptual experiment • 2 sentences: standardVSmodified alignment N – N VS Early – Late Late – Early Normal - Late • Naturalness judgement of pair-comparisons • 12 native Greek speakers, students in Edinburgh • Aim: 40 – 80 50 – 100 60 - 120 ?
Future Work • 10 sentences: standardVSmodified alignment N – N all possible combinations between Early – Normal – Late • Modifications by 40 – 80 and 60 – 120 ms • Native Greek speakers, Greece, July :-) • Aim: patterns in perception of naturalness?
The contribution of this project • Insight on role of alignment in perceiving a synthetic utterance as natural • TTS system design • results not restricted to Greek • evidence for segmental anchoring in other languages – studies of Dutch, German, English
Sound files DEMOSTHeNES Arvaniti et al. Early L (50ms)– Late H (100ms) Late L (50ms)– Early H (100ms)