10 likes | 102 Views
1. existing algorithm/process. 2. unexpected outcome => question/problem. step 1. prerequisites. step 2. Rethinking Algorithm Design and Development in Speech Processing. …. = ?. step n. …. data 1. data 2. data n.
E N D
1. existing algorithm/process 2. unexpected outcome => question/problem step 1 prerequisites step 2 Rethinking Algorithm Design and Development in Speech Processing … = ? step n … data 1 data 2 data n 3. generate data from intermediate results4. find suitable domain5. use transformation tool methodology data & result intuition = T. Stadelmann, Y.Wang, M. Smith, R. Ewerth, and B. Freisleben Universities of Marburg and Hannover, Germany Problem statement Eidetic Design • What to do if algorithms do not behave as expected? • Reimplementation does not reach published results • Adaptation to new data & problem does not work • Implementation does not show what theory suggests • How to select competing techniques and parameters? • Effect of particular choice on hole process unclear • Effect of specific parameter combination unknown • How to arrive at a promising hypothesis? • Conceptualize a method like “know your data” from data mining – for speech processing • Create methodology for making failure s in complex speech processing algorithms graspable by humans • Use intuition – but how? • Other disciplines naturally gain intuition via visualization • But visualization is not enough – it is just one possible transformation to the data in order to perceive meaning due to natural human abilities • Instead: recast algorithmic sub-results… • …to the specific perceptual domain in which humans are experts in intuitively grasping the context, the character and the reasons of the issue at hand • I.e., visualization, audibilization, “perceptualization”, … • Implement a culture of perceptually motivated speech research [Hill, 2007] • Motivate the use of intuition beyond visualization • Facilitate its use by conceptualizing a workflow • Enable the use intuition by providing free tools Proposed workflow suitable domain Availabletools Case study • WebVoice: resynthesize speech- and speaker features and models • PlotGMM: plot Gaussian mixture models • Visit http://www.informatik.uni-marburg.de/~stadelmann/eidetic.html • Initial question: why does MFCC+GMM not work reliably for speaker clustering whereas it does for speaker identification? • Algorithm: MFCC extraction and GMM building algorithm • Problem: techniques seem not expressive enough for the more difficult task => where is the bottleneck? • Data: MFCC matrix, GMM parameter vectors • Suitable domain: features and models originate from auditory domain=> resynthesize to domain of auditory perception to hear if they include what makes up a voice • Result: found bottleneck in missing time coherence information in GMM, • improved DER by 56% in experiment w/ prototyp [Stadelmann et al. 2009] ICPR‘2010 - 20th International Conference on Pattern Recognition, 23.-26. August, Istanbul, Turkey