170 likes | 302 Views
ProZed: an Editor for the Automatic Processing of Prosodic Variation. C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université de Provence. Summary. 1. Prosodic systems Prosody as a multidimensional macro-system Levels of representation.
E N D
ProZed: an Editor for the Automatic Processing of Prosodic Variation C. AURAN, C. BOUZON & D.J. HIRSTLaboratoire Parole et LangageCNRS UMR6057Université de Provence
Summary 1. Prosodic systemsProsody as a multidimensional macro-systemLevels of representation 2. ProZEdGeneral conceptionsDemonstrations (a few modules)Long sound file fragmentation, Speaker separationDuration manipulation Silence detection and fragmentation MOMEL-INTSINT coding Phonological resynthesis 3. Perspectives
Prosody as a macro-system « Prosody » does not mean « intonation » • Prosody seen as consisting of 3 systems (Di Cristo 2001): • Tonal system • Temporal system • Metrical system • Intimate interactions between elements from these 3 systems • Complex relations between the acoustic, the phonetic and the phonological levels
Orthogonal dimensions • Tonal and temporal systems make use of 2 orthogonal dimensions (Ladd 1996, Di Cristo et al. 2003 and forthcoming): • Linear dimension (tonal sequences, syllable length distribution, …) • Frame dimension (register level and span, downtrends, tempo, …) Both dimensions play a major part in the organisation of discourse and the linguistic characterisation of dialects (ref.)
Levels of representation (1) • 4 levels of representation (cf. Hirst et al. 2000): • 0.Physical level (acoustic data) • 1. Phonetic level (continuous quantitative variables) • 2. Surface phonological level (abstract qualitative characteristics) • 3. Underlying phonological level • Interpretability constraint → local interpretation in relation with adjacent levels • Mapping: • between level 0 and level 1: phonetic representation • between level 1 and level 2: surface phonological representation
Levels of representation (2) • Phonetic representation: • Temporal system: unit alignment with the speech signal • Tonal system: quadratic spline modelling of fundamental frequency (MOMEL algorithm)
Levels of representation (3) • Surface phonological representation: • Temporal system: categorical coding (--, -, , +, ++) • Base dimension: raw segment duration • Frame dimension: tempo factor on raw segment duration • Tonal system: INTSINT coding of MOMEL targets (M, T, B, L, H, U, D) • Purely formal coding (≠ ToBI but cf. narrow IPA transcription) • Base dimension + frame dimensions (register level, register span, declination effect)
INTSINT: base dimension • Absolute tones • T (Top) • M (Mid) • B (Bottom) • Relative tones • non-iterative • H (Higher) • L (Lower) • iterative • U (Up) • D (Down) • H (Higher) • L (Lower) • S (Same) • U (Up) • D (Down)
INTSINT: Frame dimension Downdrift Register level and register span codings(cf. Portes & Di Cristo 2003)
General conceptions (1) • ProZEd: « Prosodic Editor » • Multi-functional • Preliminary processing (file segmentation, speakers separation, …) • Specific processing (duration processing, silence detection, intonation processing, resynthesis, …) • « Theory independent » (cf. Mixdorf’s work) • Multi-platform (Praat, Perl), freeware and open source (GPL)
General conceptions (2) ProZEd: Representation levels Reversible mapping (for intonation): 0. Physical level 1. Phonetic level 2. Surface phonological level MBROLA MOMEL QSP INTSINT INT2PHO
Demonstrations Long sound file fragmentation Duration manipulationSilence detection and fragmentationMOMEL-INTSINT codingPhonological resynthesis [ Launch ProZEd ]
Perspectives • Improved modelling of duration (z-score method) • Automatic generation of both xml and human (more easily) readable data sheets (polymetrical expressions for instance) • Ex.: _<M>(nV, <H>)(TIN, <BU>)_ • New modules for: • automatic pseudo-segment detection and processing (IRIT’s Vocalis software) • automatic complementary information extraction • automatic alignment using iterative DTW (Di Cristo & Hirst 1997)
Thank you for your attention Presentation available fromwww.lpl.univ-aix.fr/~EPGA/ (ProZEd modules also available shortly… )