Design and Implementation of Voice Conversion Application (VOCAL)

Design and Implementation of Voice Conversion Application (VOCAL) Elizabeth Kwan (26406025) Supervised by: Ms. Liliana, M.Eng Mr. Resmana Lim, M.Eng

A method to transform the input speech signal such that the output signal will be perceived as produced by another speaker ? DEFINITIONWhat is Voice Conversion?

Rapid development in speech technology • Speech recognition and text-to-speech have been the priorities in research efforts to improve human-machine (computer) interaction • Improve the naturalness of human-machine (computer) interaction • Voice conversion used in personification of speech enabled system ? BACKGROUNDWhy Voice Conversion?

GENERAL : • Format : wave file (.wav), single channel (mono) INPUT : • Source speaker and target speaker which speaks same utterances • Home recording • One person with minimal noise (no background sound) • For speech only ? SCOPE & LIMITATIONScope and limitation of project

PROCESS : • Not real-time, pre-record speech needed • Text-dependent OUTPUT • Output signal will be perceived as produced by another speaker, judge by subjectivity of human auditory perception • Dialect not included ? SCOPE & LIMITATIONScope and limitation of project

Test using Mean Opinion Score (MOS) • Developed in .NET environment (C# .NET Visual Studio 2005) ? SCOPE & LIMITATIONScope and limitation of project

Difference system conversion used difference methods General system: • A method to represent the speaker specific characteristics of the speech waveform • A method to map the source and the target acoustical spaces • A method to modify the characteristics of the source speech using the mapping obtained in previous step ? VOICE CONVERSION METHODBrief explanation on Voice Conversion

? VOICE CONVERSION METHODPage 33

SEGMENTATION ANALYSIS or MODELING TRANSFORMATION SYNTHESIS ? VOICE CONVERSION METHODMain Process (Flow Chart see Page 30)

Complexity of human language Speech is more than sequences of phones that forms words and sentences. It carries information (rhythm, intonation, stress of words, etc) This information is varied from one person to the others The infinite variety raised the application complexity, especially in segmentation ? WHY IT IS DIFFICULT?External Problems

Speaker Variability Unique voice. Speech generated from one person may varied too - Realization - Speaking style - Sex of speaker - Anatomy of vocal tract - Speed of speech - Dialects ? WHY IT IS DIFFICULT?External Problems

Digital form only contains information of amplitude per periods • Amplitude can not directly used to determined the speech parameters (problems for analysis process) • Manipulate (add or delete) some part of the sound would effect to whole sound ? WHY IT IS DIFFICULT?Internal Problems

It is difficult to process entire phrase as tone, pitch, and other characteristics may diverse over the whole signal • Split base on syllable • Use end-point detection methods, combination of volume (two volume threshold) and zero-crossing rate (ZCR) ? SEGMENTATIONFlow Chart see Page 34

Volume Loudness of audio signal • Zero-Crossing Rate (ZCR) Rate where signal change from positive to negative, and vise versa ? SEGMENTATIONFlow Chart see Page 34

? SEGMENTATIONFlow Chart see Page 34

ANALYSIS or MODELING Linear Predictive Coding Pitch Period Computation ? ANALYSIS OR MODELINGMain Process (Flow Chart see Page 36)

? ANALYSIS OR MODELINGModeling Vocal Tract

? ANALYSIS OR MODELINGModeling Vocal Tract Source : signal x(t) [excitation signal] Filter : linear time invariant h(t)[transfer function] Speech : convolution of source and filter y(t) = x(t) * h(t)

? ANALYSIS OR MODELINGModeling Vocal Tract De-convolution needed Use of LPC methods predicting a sample of a speech signal based on several previous samples

? ANALYSIS OR MODELINGLinear Predictive Coding

ANALYSIS or MODELING Linear Predictive Coding Pitch Period Computation ? VOICE CONVERSION METHODMain Process (Flow Chart see Page 36)

Pitch Period Computation Pitch Analysis Glottal Pulse Computation Pitch Tier Computation ? VOICE CONVERSION METHODMain Process (Flow Chart see Page 36)

Pitch Analysis Based on autocorrelation methods (Boersma 1993) ? ANALYSIS OR MODELINGPitch Period Computation

Glottal Pulse Computation Repeated pattern of voiced sound τ : glottal pulse ? ANALYSIS OR MODELINGPitch Period Computation

Pitch Tier Calculation total points according to total voiced frames from pitch contour obtained from previous step ? ANALYSIS OR MODELINGPitch Period Computation

SEGMENTATION ANALYSIS or MODELING TRANSFORMATION Synthesis ? VOICE CONVERSION METHODMain Process (Flow Chart see Page 30)

? TRANSFORMATIONTransform speech parameter obtained

SEGMENTATION ANALYSIS or MODELING TRANSFORMATION SYNTHESIS ? SYNTHESISMain Process (Flow Chart see Page 30)

Use of LPC Filter method to reconstruct transformed speech ? SYNTHESISFlow Chart see Page 46

? EXPERIMENTAL RESULT

? TESTINGEffect of choice of hardware used to record

Speech : “Hai” from 4 difference speakers ? TESTINGTest on segmentation

Speech : “Hai” from 4 (four) difference speakers ? TESTINGTest on segmentation

Speech : “Hai” from 4 (four) difference speakers Percentage result: For speech with only 1 (one) syllable : 100% success ? TESTINGTest on segmentation

Speech : “Saya” from 4 difference speakers ? TESTINGTest on segmentation

Speech : “Saya” from 4 (four) difference speakers Percentage result: For speech with 2 (two) syllables without paused : 0% success (All detect as 1 (one) syllable only) But it works good in the application : 100% success ? TESTINGTest on segmentation

Speech : “Sistem Cerdas” from 4 difference speakers ? TESTINGTest on segmentation

Speech : “Sistem Cerdas” from 4 (four) difference speakers Percentage result: For speech with more complex forms : 50% success Related to Speaker Variability ? TESTINGTest on segmentation

? TESTINGTest on pitch modification

Average percentage result: 98.67 % ? TESTINGTest on pitch modification

Similarity (based on human auditory perception) • Test on 20 peoples, 5 utterances • Overall result : 3.71 of 5.0 ? TESTINGSubjectivity Test

Based on gender • Test on 22 peoples, 2 utterances. • 4 combinations gender for each utterance ? TESTINGSubjectivity Test

Similarity of speaker characteristic • Test on 22 peoples, 5 utterances • Overall result : 3.64 of 5.0 ? TESTINGSubjectivity Test

Design and Implementation of Voice Conversion Application (VOCAL)

Design and Implementation of Voice Conversion Application (VOCAL)

Presentation Transcript

Voice and Voice Disorders

Physiology of the Voice

ChatterVox Voice Amplifier

Childhood Voice Disorders

Unit 2 VOICE

Characteristics of a Well-Trained Voice

The Human Voice

Disorders of the voice

Voice

Voice Conversion

Voice and the Actor

Physiology of the Voice

Objective: Draw and label the vocal tract. Agenda: Journal Anatomy of the Voice

Vocal Conversion from Speaking voice to Singing voice Using STRAIGHT

Voice Disorders

HMM-Based Synthesis of Creaky Voice

Your Voice

Investigating The Voice

Best Voice Teacher

Protect your child's voice from developing throat nodules

Investigating The Voice