Rethinking Algorithm Design and Development in Speech Processing

1. existing algorithm/process 2. unexpected outcome => question/problem step 1 prerequisites step 2 Rethinking Algorithm Design and Development in Speech Processing … = ? step n … data 1 data 2 data n 3. generate data from intermediate results4. find suitable domain5. use transformation tool methodology data & result intuition = T. Stadelmann, Y.Wang, M. Smith, R. Ewerth, and B. Freisleben Universities of Marburg and Hannover, Germany Problem statement Eidetic Design • What to do if algorithms do not behave as expected? • Reimplementation does not reach published results • Adaptation to new data & problem does not work • Implementation does not show what theory suggests • How to select competing techniques and parameters? • Effect of particular choice on hole process unclear • Effect of specific parameter combination unknown • How to arrive at a promising hypothesis? • Conceptualize a method like “know your data” from data mining – for speech processing • Create methodology for making failure s in complex speech processing algorithms graspable by humans • Use intuition – but how? • Other disciplines naturally gain intuition via visualization • But visualization is not enough – it is just one possible transformation to the data in order to perceive meaning due to natural human abilities • Instead: recast algorithmic sub-results… • …to the specific perceptual domain in which humans are experts in intuitively grasping the context, the character and the reasons of the issue at hand • I.e., visualization, audibilization, “perceptualization”, … • Implement a culture of perceptually motivated speech research [Hill, 2007] • Motivate the use of intuition beyond visualization • Facilitate its use by conceptualizing a workflow • Enable the use intuition by providing free tools Proposed workflow suitable domain Availabletools Case study • WebVoice: resynthesize speech- and speaker features and models • PlotGMM: plot Gaussian mixture models • Visit http://www.informatik.uni-marburg.de/~stadelmann/eidetic.html • Initial question: why does MFCC+GMM not work reliably for speaker clustering whereas it does for speaker identification? • Algorithm: MFCC extraction and GMM building algorithm • Problem: techniques seem not expressive enough for the more difficult task => where is the bottleneck? • Data: MFCC matrix, GMM parameter vectors • Suitable domain: features and models originate from auditory domain=> resynthesize to domain of auditory perception to hear if they include what makes up a voice • Result: found bottleneck in missing time coherence information in GMM, • improved DER by 56% in experiment w/ prototyp [Stadelmann et al. 2009] ICPR‘2010 - 20th International Conference on Pattern Recognition, 23.-26. August, Istanbul, Turkey

Rethinking Algorithm Design and Development in Speech Processing

Rethinking Algorithm Design and Development in Speech Processing

Presentation Transcript

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Rethinking Development

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing