1.14k likes | 1.45k Views
Tutorial on Neural Network Models for Speech and Image Processing. B. Yegnanarayana Speech & Vision Laboratory Dept. of Computer Science & Engineering IIT Madras, Chennai-600036 yegna@cs.iitm.ernet.in. WCCI 2002, Honululu, Hawaii, USA May 12, 2002.
E N D
Tutorial onNeural Network Models for Speech and Image Processing B. Yegnanarayana Speech & Vision Laboratory Dept. of Computer Science & Engineering IIT Madras, Chennai-600036 yegna@cs.iitm.ernet.in WCCI 2002, Honululu, Hawaii, USA May 12, 2002
Need for New Models of Computing for Speech & Image Tasks • Speech & Image processing tasks • Issues in dealing with these tasks by human beings • Issues in dealing with the tasks by machine • Need for new models of computing in dealing with natural signals • Need for effective (relevant) computing • Role of Artificial Neural Networks (ANN) < Prev Next >
Organization of the Tutorial Part IFeature extraction and classification problems with speech and image data Part II Basics of ANN Part IIIANN models for feature extraction and classification Part IVApplications in speech and image processing < Prev Next >
PART IFeature Extraction and Classification Problems in Speech and Image
Feature Extraction and Classification Problems in Speech and Image • Distinction between natural and synthetic signals (unknown model vs known model generating the signal) • Nature of speech and image data (non-repetitive data, but repetitive features) • Need for feature extraction and classification • Methods for feature extraction and models for classification • Need for nonlinear approaches (methods and models) < Prev Next >
Speech vs Audio • Audio (audible) signals (noise, music, speech and other signals) • Categories of audio signals • Audio signal vs non-signal (noise) • Signal from speech production mechanism vs other audio signals • Non-speech vs speech signals (like with natural language) < Prev Next >
Speech Production Mechanism < Back
Different types of sounds < Back
Nature of Speech Signal • Digital speech: Sequence of samples or numbers • Waveform for word “MASK” (Figure) • Characteristics of speech signal • Excitation source characteristics • Vocal tract system characteristics < Prev Next>
Source-System Model of Speech Production Pitch period Vocal tract parameters Voice/ unvoiced switch Impulse train generator Time-varying digital filter X s(n) u(n) Random noise generator G < Prev Next >
Features from Speech Signal (demo) • Different components of speech (speech, source and system) • Different speech sound units (Alphabet in Indian Languages) • Different emotions • Different speakers < Prev Next >
Speech Signal Processing Methods • To extract source-system features and suprasegmental features • Production-based features • DSP-based features • Perception-based features < Prev Next >
Models for Matching and Classification • Dynamic Time Warping (DTW) • Hidden Markov Models (HMM) • Gaussian Mixture Models (GMM) < Prev Next >
Applications of Speech Processing • Speech recognition • Speaker recognition/verification • Speech enhancement • Speech compression • Audio indexing and retrieval < Prev Next >
Limitations of Feature Extraction Methods and Classification Models • Fixed frame analysis • Variability in the implicit pattern • Not pattern-based analysis • Temporal nature of the patterns < Prev Next >
Need for New Approaches • To deal with ambiguity and variability in the data for feature extraction • To combine evidence from multiple sources (classifiers and knowledge sources) < Prev Next >
Images • Digital Image - Matrix of numbers • Types of Images • line sketches, binary, gray level and color • Still images, video, multimedia < Prev Next >
Image Analysis • Feature extraction • Image segmentation: Gray level, color, texture • Image classification < Prev Next >
Processing of Texture-like Images 2-D Gabor Filter A typical Gaussian filter with =30 A typical Gabor filter with =30, =3.14 and =45 < Prev Next >
Limitations • Feature extraction • Matching • Classification methods/models < Prev Next >
Need for New Approaches • Feature extraction: PCA and nonlinear PCA • Matching: Stereo images • Smoothing: Using the knowledge of image and not noise • Edge extraction and classification: Integration of global and local information or combining evidence < Prev Next >
Artificial Neural Networks • Problem solving: Pattern recognition tasks by human and machine • Pattern vs data • Pattern processing vs data processing • Architectural mismatch • Need for new models of computing < Prev Next >
Biological Neural Networks • Structure and function: Neurons, interconnections, dynamics for learning and recall • Features: Robustness, fault tolerance, flexibility, ability to deal with variety of data situations, collective computation • Comparison with computers: Speed, processing, size and complexity, fault tolerance, control mechanism • Parallel and Distributed Processing (PDP) models < Prev Next >
Basics of ANN • ANN terminology: Processing unit (fig), interconnection, operation and update (input, weights, activation value, output function, output value) • Models of neurons: MP neuron, perceptron and adaline • Topology (fig) • Basic learning laws (fig) < Prev Next >
Model of a Neuron <back
Topology <back
Basic Learning Laws <back
Activation and Synaptic Dynamic Models • General activation dynamics model Passive decay term Excitatory term Inhibitory term • Synaptic dynamics model Correlation term Passive decay term • Stability and convergence Next> <Prev
Functional Units and Pattern Recognition Tasks • Feedforward ANN • Pattern association • Pattern classification • Pattern mapping/classification • Feedback ANN • Autoassociation • Pattern storage (LTM) • Pattern environment storage (LTM) • Feedforward and Feedback (Competitive Learning) ANN • Pattern storage (STM) • Pattern clustering • Feature map < Prev Next >
Two Layer Feedforward Neural Network (FFNN) < Prev Next >
PR Tasks by FFNN • Pattern association • Architecture: Two layers, linear processing, single set of weights • Learning:, Hebb's (orthogonal) rule, Delta (linearly independent) rule • Recall: Direct • Limitation: Linear independence, number of patterns restricted to input dimensionality • To overcome: Nonlinear processing units, leads to a pattern classification problem • Pattern classification • Architecture: Two layers, nonlinear processing units, geometrical interpretation • Learning: Perceptron learning • Recall: Direct • Limitation: Linearly separable functions, cannot handle hard problems • To overcome: More layers, leads to a hard learning problem • Pattern mapping/classification • Architecture: Multilayer (hidden), nonlinear processing units, geometrical interpretation • Learning: Generalized delta rule (backpropagation) • Recall: Direct • Limitation: Slow learning, does not guarantee convergence • To overcome: More complex architecture < Prev Next >
Perceptron Network • Perceptron classification problem • Perceptron learning law • Perceptron convergence theorem • Perceptron representation problem • Multilayer perceptron < Prev Next >
Geometric Interpretation of Perceptron Learning < Prev Next >
Generalized Delta Rule (Backpropagation Learning) < Prev Next >
Issues in Backpropagation Learning • Description and features of error backpropagation • Performance of backpropagation learning • Refinements of backpropagation learning • Interpretation of results of learning • Generalization • Tasks with backpropagation network • Limitations of backpropagation learning • Extensions to backpropagation < Prev Next >
PR Tasks by FBNN • Autoassociation • Architecture: Single layer with feedback, linear processing units • Learning: Hebb (orthogonal inputs), Delta (linearly independent inputs) • Recall: Activation dynamics until stable states are reached • Limitation: No accretive behavior • To overcome: Nonlinear processing units, leads to a pattern storage problem • Pattern Storage • Architecture: Feedback neural network, nonlinear processing units, states, Hopfield energy analysis • Learning: Not important • Recall: Activation dynamics until stable states are reached • Limitation: Hard problems, limited number of patterns, false minima • To overcome: Stochastic update, hidden units • Pattern Environment Storage • Architecture: Boltzmann machine, nonlinear processing units, hidden units, stochastic update • Learning: Boltzmann learning law, simulated annealing • Recall: Activation dynamics, simulated annealing • Limitation: Slow learning • To Overcome: Different architecture < Prev Next >
Hopfield Model • Model • Pattern storage condition where • Capacity of Hopfield model: Number of patterns for a given probability of error • Energy analysis: Continuous Hopfield model: < Prev Next >
State Transition Diagram < Prev Next >
Computation of Weights for Pattern Storage Patterns to be stored (111) and (010). Results in set of inequalities to be satisfied. < Prev Next >
Pattern Storage Tasks • Hard problems : Conflicting requirements on a set of inequalities • Hidden units: Problem of false minima • Stochastic update Stochastic equilibrium: Boltzmann-Gibbs Law < Prev Next >
Simulated Annealing < Prev Next >
Boltzmann Machine • Pattern environment storage • Architecture: Visible units, hidden units, stochastic update, simulated annealing • Boltzmann Learning Law: < Prev Next >
Discussion on Boltzmann Learning • Expression for Boltzmann learning • Significance of p+ij and p-ij • Learning and unlearning • Local property • Choice of and initial weights • Implementation of Boltzmann learning • Algorithm for learning a pattern environment • Algorithm for recall of a pattern • Implementation of simulated annealing • Annealing schedule • Pattern recognition tasks by Boltzmann machine • Pattern completion • Pattern association • Recall from noisy or partial input • Interpretation of Boltzmann learning • Markov property of simulated annealing • Clamped-free energy and full energy • Variations of Boltzmann learning • Deterministic Boltzmann machine • Mean-field approximation < Prev Next >
Competitive Learning Neural Network (CLNN) Output layer with on-center and off-surround connections Input layer < Prev Next >
PR Tasks by CLNN • Pattern storage (STM) • Architecture: Two layers (input and competitive), linear processing units • Learning: No learning in FF stage, fixed weights in FB layer • Recall: Not relevant • Limitation: STM, no application, theoretical interest • To overcome: Nonlinear output function in FB stage, learning in FF stage • Pattern clustering (grouping) • Architecture: Two layers (input and competitive), nonlinear processing units in the competitive layer • Learning: Only in FF stage, Competitive learning • Recall: Direct in FF stage, activation dynamics until stable state is reached in FB layer • Limitation: Fixed (rigid) grouping of patterns • To overcome: Train neighbourhood units in competition layer • Feature map • Architecture: Self-organization network, two layers, nonlinear processing units, excitatory neighbourhood units • Learning: Weights leading to the neighbourhood units in the competitive layer • Recall: Apply input, determine winner • Limitation: Only visual features, not quantitative • To overcome: More complex architecture < Prev Next >
Learning Algorithms for PCA networks Next > < Prev
Self Organization Network Output layer Input layer (b) Neighborhood regions at different times in the output layer (a) Network structure < Prev Next >