Kernel-Based Detectors and Fusion of Phonological Attributes

Kernel-Based Detectors and Fusion of Phonological Attributes Brett Matthews Mark Clements

Outline • Frame-Based Detection • One-vs-all detectors • Context-dependent framewise detection • Probabilistic Outputs • Kernel-Based Attribute Detection • SVM • Least-Squares SVM • Evaluating Probabilistic Estimates • Naïve Bayes Combinations • Hierarchical Manner Classification • Detector Fusion • Genetic Programming

vowel silence dental velar voicing Frame-Based Detection • One-vs-All classifiers • Manner of articulation • Vowel, fricative, stop, nasal, glide/semivowel, silence • Place of articulation • Dental, labial, coronal, palatal, velar, glottal, back, front • Vowel Manners • High, mid, low, back, round • Framewise Detection • 10ms frame rate • 12 MFCCs+En • 8 context dependent frames • Classifier Types & Posterior Probs • Artificial Neural nets • Probabilistic outputs • Kernel-Based Classifiers • SVM • Empirically determined posterior probs • LS-SVMs • Probabilistic outputs Event Fusion

w Kernel-Based Classifiers • Support Vector Machines (SVM) • LS-SVM Classifier • Kernel-based classifier like SVM • Least-Squares formulation • Probabilistic output scores • LS-SVM Lab package • Katholieke Universeit Lueven • Same decision function as SVM • Subject to • Equality constraints, instead of inequality constraints • No margin optimization • Linear system solution

Least Squares SVMs • “Support Vectors” a found by solving a linear system • Kernel Functions • Probabilistic Outputs • Bayesian Inference for Posterior probs • Moderated outputs can be directly interpreted as posterior probabilities Linear Polynomial RBF

Evaluating Probabilistic Estimates • Reliability and Accuracy of probabilistic scores • Initial Fusion Experiments • Hierarchical Manner Classification • LS-SVM, SVM • Naïve Bayes combination for Phone Detection • LS-SVM, SVM, ANN LS-SVM SVM

Hierarchical Combinations • Probabilistic Phonetic feature hierarchy for classifying frames into 6 manner classes • Train binary detectors on each split in hierarchy • 5 Detectors, 6 classes • silence vs speech • sonorant | speech • vowel | sonorant • stop | non-sonorant • semivowel | sonorant consonant P(fric | x) = (1 – P(st | non-sc)) · (1 – P(son | spch)) · P (spch | x) fricative detection and gnd truth

Hierarchical Combinations LS-SVM (Combined) • Reliability of Posterior Probs (right) • Plot probabilistic estimates of combinations vs. observed frequencies • Hierarchical Combinations much more reliable for SVM than LS-SVM • Classification Accuracy (below) • Higher classification accuracy for SVMs, especially fricatives • Upper-bound Comparison (below) • One-vs-all classifiers trained directly for each class. • Combinations nearly as accurate as one-vs-all for classification performance • LS-SVM combinations not good for semivowel and nasal vowel stop fricative semivowel/ glide nasal silence SVM (Combined) stop vowel fricative semivowel/ glide nasal silence Classification accuracy (%)

Naïve Bayes Combinations • One-vs-all frameworks desired • Phonetic hierarchies are cumbersome • Phone Detection • Combine phonological attribute scores with Naïve Bayes product • Initial experiments in evaluating probabilities • Compare accuracy and reliability of probabilistic outputs for ANN, SVM and LS-SVM • Limited training data (LS-SVM limit is 3000 due to memory restrictions) • Detect phones with combinations of relevant phonetic attributes P(/f/ | x) = P(labial | x) P(fric | x) (1-P(voicing | x))

Naïve Bayes Combinations • Phone Detection • Compare combined attributes with direct training on phones as an upper bound • ROC Stats (right) • SVMs best for attribute detection • Mixed results for NB combinations • No clear winner between LS-SVM and SVM • Direct training outperforms combinations • Reliability • Naïve Bayes combinations give poor reliability for all detector types • Rare phones & vowels • For /v/, /ng/ and /oy/, improvements in EER and AUC across detector types (lower right) • Most vowels saw improvements as well ROC Stats Direct vs. Combined

Phone Detection Compare combined attributes with direct training on phones as an upper bound ROC Stats (right) SVMs best for attribute detection Mixed results for NB combinations No clear winner between LS-SVM and SVM Direct training outperforms combinations Reliability Naïve Bayes combinations give poor reliability for all detector types Rare phones & vowels For /v/, /ng/ and /oy/, improvements in EER and AUC across detector types (lower right) Most vowels saw improvements as well Naïve Bayes Combinations Combined attributes (SVM) Direct Training (SVM)

Genetic Programming • Evolutionary algorithm for tree-structured feature “creation” (Extraction) • Maximize a fitness function across a number of generations (iterations) • Operations like crossover & mutation control the evolution of the algorithm • Trees are algebraic networks • Inputs are multi-dimensional features • Tree nodes are unary or binary mathematical operators (+, -, *, (.)2, log) • Algebraic networks simpler and more transparent than neural nets • GPLab Package from Universidade de Coimbra, Portugal • http://gplab.sourceforge.net

1-D feature /aa/ vowel /ae/ silence dental velar /zh/ voicing Genetic Programming • Trained GP trees on SVM outputs • Develop algebraic networks for combining detector outputs • Produce a 1-D feature from a nonlinear combination of detector outputs • choose fitness function, set of node operators, tree depth, etc. to maximize separation • Trees are algebraic networks • Inputs are multi-dimensional features • Tree nodes are unary or binary mathematical operators (+, -, *, (.)2, log) • Algebraic networks simpler and more transparent than neural nets

/oy/ /th/ Genetic Programming /oy/ • System is complex for speech recognition (tree + classifier for each phone), but GP trees themselves provide insights for combination • Fitness function • Tree node operators • Important features • Initial results • Mixed results • Good separation for some phones, not good for most • GP Trees select attributes of interest, discard others • Still in progress /th/

Summary • Evaluating Posterior Probs • ANNs, SVMs, LS-SVMs • SVMs are best for reliability and accuracy • In limited training data, rare phones may benefit from from overlapping phonetic classes • Genetic Programming for detector fusion • Small, transparent algebraic networks for combining attribute detectors • GP trees select relevant attributes, but much room for improvement • Limiting tree node operators and selecting “fitness functions” should provide insights into detector fusion

Extras Feature Space correlation matrix (1) Feature Space correlation matrix (2) Feature Space correlation matrix (3) Training Data Represents the kernel function K and the range of kernel parameters

Extras Determine w and b by solving the optimization problem Subject to Generalization/ Regularization term Regression error for training sample k Expression for the trade-off between generalization and training set error Positive scale parameters

Extras • Support Vector Machines • Good performance, but the majority of training points became support vectors • Posterior probabilities w

Kernel-Based Detectors and Fusion of Phonological Attributes

Kernel-Based Detectors and Fusion of Phonological Attributes

Presentation Transcript

BORON CARBIDE BASED NEUTRON DETECTORS

Knowledge-Based Kernel Approximation

Evidence Based Practices for Phonological Awareness Intervention

Component-Based Software Engineering and composition of Quality Attributes

Language and Phonological Processes

Vacuum based Photon Detectors

Phonological Awareness and Concepts of Print

Kernel – Based Methods

Phonological

Are phonological representations feature based?

Kernel-based tracking and video patch replacement

The Performance of μ -Kernel-Based Systems

Sensor-Based Mapping and Sensor Fusion

Ionization-based detectors and low-noise techniques:

An Overview of Kernel-Based Learning Methods

The Performance of µ-Kernel-Based Systems

Kernel based data fusion

Status and main challenges for detectors at fusion facilities

Phonological

The Performance of Micro-Kernel-Based Systems

Gas Based Detectors

Models of phonological development