Audio Features & Machine Learning

Audio Features &Machine Learning E.M. Bakker API2010

Features for Speech Recognition and Audio Indexing • Parametric Representations • Short Time Energy • Zero Crossing Rates • Level Crossing Rates • Short Time Spectral Envelope • Spectral Analysis • Filter Design • Filter Bank Spectral Analysis Model • Linear Predictive Coding (LPC) API2010

Methods • Vector Quantization • Finite code book of spectral shapes • The code book codes for ‘typical’ spectral shape • Method for all spectral representations (e.g. Filter Banks, LPC, ZCR, etc. …) • Ensemble Interval Histogram (EIH) Model • Auditory-Based Spectral Analysis Model • More robust to noise and reverberation • Expected to be inherently better representation of relevant spectral information because it models the human cochlea mechanics API2010

Pattern Recognition Speech Audio, … Parameter Measurements Test Pattern Query Pattern Pattern Comparison Reference Patterns Recognized Speech, Audio, … Decision Rules API2010

Speech Audio, … Feature Detector1 Feature Detectorn Feature Combiner and Decision Logic Recognized Speech, Audio, … Hypothesis Tester Reference Vocabulary Features Pattern Recognition API2010

Spectral Analysis Models • Pattern Recognition Approach • Parameter Measurement => Pattern • Pattern Comparison • Decision Making • Parameter Measurements • Bank of Filters Model • Linear Predictive Coding Model API2010

Audio Signal s(n) Bandpass Filter F() Result Audio Signal F(s(n) Band Pass Filter • Note that the bandpass filter can be defined as: • a convolution with a filter response function in the time domain, • a multiplication with a filter response function in the frequency domain API2010

Bank of Filters Analysis Model API2010

Bank of Filters Analysis Model • Speech Signal: s(n), n=0,1,… • Digital with Fs the sampling frequency of s(n) • Bank of q Band Pass Filters: BPF1, …,BPFq • Spanning a frequency range of, e.g., 100-3000Hz or 100-16kHz • BPFi(s(n)) = xn(ejωi), where ωi = 2πfi/Fs is equal to the normalized frequency fi, where i=1, …, q. • xn(ejωi) is the short time spectral representation of s(n) at time n, as seen through the BPFi with centre frequency ωi, where i=1, …, q. • Note: Each BPF independently processes s to produce the spectral representation x API2010

Bank of Filters Front End Processor API2010

Typical Speech Wave Forms API2010

MFCCs Speech Audio, … Preemphasis Windowing Fast Fourier Transform MFCCs are calculated using the formula: Mel-Scale Filter Bank Log() • Where • Ci is the cepstral coefficient • P the order (12 in our case) • K the number of discrete Fourier • transform magnitude coefficients • Xk the kth order log-energy output • from the Mel-Scale filterbank. • N is the number of filters MFCC’s first 12 most Signiifcant coefficients Direct Cosine Transform API2010

Linear Predictive Coding Model API2010

Filter Response Functions API2010

SomeExamples of Ideal Band Filters API2010

Perceptually Based Critical Band Scale API2010

Short Time Fourier Transform • s(m) signal • w(n-m) a fixed low pass window API2010

Short Time Fourier TransformLong Hamming Window: 500 samples (=50msec) Voiced Speech API2010

Short Time Fourier TransformShort Hamming Window: 50 samples (=5msec) Voiced Speech API2010

Short Time Fourier TransformLong Hamming Window: 500 samples (=50msec) Unvoiced Speech API2010

Short Time Fourier TransformShort Hamming Window: 50 samples (=5msec) Unvoiced Speech API2010

Short Time Fourier TransformLinear Filter Interpretation API2010

Linear Predictive Coding (LPC) Model • Speech Signal: s(n), n=0,1,… • Digital with Fs the sampling frequency of s(n) • Spectral Analysis on Blocks of Speech with an all pole modeling constraint • LPC of analysis order p • s(n) is blocked into frames [n,m] • Again consider xn(ejω) the short time spectral representation of s(n) at time n. (where ω = 2πf/Fs is equal to the normalized frequency f). • Now the spectral representation xn(ejω) is constrained to be of the form σ/A(ejω), where A(ejω) is the pth order polynomial with z-transform: A(z) = 1 + a1z-1 + a2z-2 + … + apz-p • The output of the LPC parametric Conversion on block [n,m] is the vector [a1,…,ap]. • It specifies parametrically the spectrum of an all-pole model that best matches the signal spectrum over the period of time in which the frame of speech samples was accumulated (pth order polynomial approximation of the signal). API2010

Vector Quantization • Data represented as feature vectors. • VQ Training set to determine a set of code words that constitute a code book. • Code words are centroids using a similarity or distance measure d. • Code words together with d divide the space into a Voronoi regions. • A query vector falls into a Voronoi region and will be represented by the respective codeword. API2010

Vector Quantization Distance measures d(x,y): • Euclidean distance • Taxi cab distance • Hamming distance • etc. API2010

Vector Quantization Clustering the Training Vectors • Initialize: choose M arbitrary vectors of the L vectors of the training set. This is the initial code book. • Nearest neighbor search: for each training vector, find the code word in the current code book that is closest and assign that vector to the corresponding cell. • Centroid update: update the code word in each cell using the centroid of the training vectors that are assigned to that cell. • Iteration: repeat step 2-3 until the averae distance falls below a preset threshold. API2010

Vector Classification For an M-vector code book CB with codes CB = {yi | 1 ≤ i ≤ M} , the index m* of the best codebook entry for a given vector v is: m* = arg min d(v, yi) 1 ≤ i ≤ M API2010

VQ for Classification A code book CBk = {yki | 1 ≤ i ≤ M}, can be used to define a class Ck. Example Audio Classification: • Classes ‘crowd’, ‘car’, ‘silence’, ‘scream’, ‘explosion’, etc. • Determine by using VQ code books CBk for each of the classes. • VQ is very often used as a baseline method for classification problems. API2010

Sound, DNA: Sequences! • DNA: helix-shaped molecule whose constituents are two parallel strands of nucleotides • DNA is usually represented by sequences of these four nucleotides • This assumes only one strand is considered; the second strand is always derivable from the first by pairing A’s with T’s and C’s with G’s and vice-versa • Nucleotides (bases) • Adenine (A) • Cytosine (C) • Guanine (G) • Thymine (T) API2010

Gene DNA Transcription genomics molecular biology RNA Translation structural biology biophysics Protein Protein folding Biological Information: From Genes to Proteins API2010

From Amino Acids to Proteins Functions CGCCAGCTGGACGGGCACACCATGAGGCTGCTGACCCTCCTGGGCCTTCTG… TDQAAFDTNIVTLTRFVMEQGRKARGTGEMTQLLNSLCTAVKAISTAVRKAGIAHLYGIAGSTNVTGDQVKKLDVLSNDLVINVLKSSFATCVLVTEEDKNAIIVEPEKRGKYVVCFDPLDGSSNIDCLVSIGTIFGIYRKNSTDEPSEKDALQPGRNLVAAGYALYGSATML DNA / amino acid sequence 3D structure protein functions DNA (gene) →→→ pre-RNA →→→ RNA →→→ Protein RNA-polymerase Spliceosome Ribosome API2010

Motivation for Markov Models • There are many cases in which we would like to representthe statistical regularities of some class of sequences • genes • proteins in a given family • Sequences of audio features • Markov models are well suited to this type of task API2010

A Markov Chain Model • Transition probabilities • Pr(xi=a|xi-1=g)=0.16 • Pr(xi=c|xi-1=g)=0.34 • Pr(xi=g|xi-1=g)=0.38 • Pr(xi=t|xi-1=g)=0.12 API2010

Definition of Markov Chain Model • A Markov chain[1] model is defined by • a set of states • some states emit symbols • other states (e.g., the begin state) are silent • a set of transitions with associatedprobabilities • the transitions emanating from a given state define a distribution over the possible next states [1] Марков А. А., Распространение закона больших чисел на величины, зависящие друг от друга. — Известия физико-математического общества при Казанском университете. — 2-я серия. — Том 15. (1906) — С. 135—156 API2010

Markov Chain Models: Properties • Given some sequence x of length L, we can ask howprobable the sequence is given our model • For any probabilistic model of sequences, we can write thisprobability as • key property of a (1st order) Markov chain: the probabilityof each xidepends only on the value of xi-1 API2010

The Probability of a Sequence for a Markov Chain Model Pr(cggt)=Pr(c)Pr(g|c)Pr(g|g)Pr(t|g) API2010

Example Application CpG islands • CG di-nucleotides are rarer in eukaryotic genomes thanexpected given the marginal probabilities of C and G • but the regions upstream of genes are richer in CGdi-nucleotides than elsewhere – CpG islands • useful evidence for finding genes Application: Predict CpG islands with Markov chains • one Markov chain to represent CpG islands • another Markov chain to represent the rest of the genome API2010

Markov Chains for Discrimination • Suppose we want to distinguish CpG islands from othersequence regions • Given sequences from CpG islands, and sequences fromother regions, we can construct • a model to represent CpG islands • a null model to represent the other regions • We can then score a test sequence by: API2010

Markov Chains for Discrimination • Why can we use • According to Bayes’ rule: • If we are not taking into account prior probabilities (Pr(CpG) and Pr(null)) of the twoclasses, then from Bayes’ rule it is clear that we just need tocompare Pr(x|CpG)andPr(x|null)as is done in our scoring function score(). API2010

Higher Order Markov Chains • The Markov property specifies that the probability of a statedepends only on the probability of the previous state • But we can build more “memory” into our states by using ahigher order Markov model • In an n-th order Markov model The probability of the current state depends on the previous n states. API2010

Selecting the Order of aMarkovChain Model • But the number of parameters we need to estimate growsexponentially with the order • for modeling DNA we need parameters for ann-th order model • The higher the order, the less reliable we can expect ourparameter estimates to be • estimating the parameters of a 2ndorder Markov chainfrom the complete genome of E. Coli (5.44 x 106 bases) , we’d see eachword ~ 85.000 times on average (divide by 43) • estimating the parameters of a 9th order chain, we’dsee each word ~ 5 times on average (divide by 410 ~ 106) API2010

Higher Order Markov Chains • An n-th order Markov chain over some alphabet A isequivalent to a first order Markov chain over the alphabetof n-tuples: An • Example: A 2nd order Markov model for DNA can betreated as a 1st order Markov model over alphabet AA, AC, AG, AT CA, CC, CG, CT GA, GC, GG, GT TA, TC, TG, TT API2010

A Fifth Order Markov Chain Pr(gctaca)=Pr(gctac)Pr(a|gctac) API2010

Hidden Markov Model: A Simple HMM Model 2 Model 1 Given observed sequence AGGCT, which state emits every item? API2010

Tutorial on HMM L.R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceeding of the IEEE, Vol. 77, No. 22, February 1989. API2010

HMM for Hidden Coin Tossing T H H ……… H H T T H T H H T T H T T T T T API2010

Hidden State • We’ll distinguish between the observed parts of a problemand the hidden parts • In the Markov models we’ve considered previously, it isclear which state accounts for each part of the observedsequence • In the model above, there are multiple states that couldaccount for each part of the observed sequence • this is the hidden part of the problem API2010

Learning and Prediction Tasks(in general, i.e., applies on both MM as HMM) • Learning • Given: a model, a set of training sequences • Do: find model parameters that explain the training sequences withrelatively high probability (goal is to find a model that generalizes wellto sequences we haven’t seen before) • Classification • Given: a set of models representing different sequence classes, and given a test sequence • Do: determine which model/class best explains the sequence • Segmentation • Given: a model representing different sequence classes, and given a test sequence • Do: segment the sequence into subsequences, predicting the class of eachsubsequence API2010

Algorithms for Learning & Prediction • Learning • correct path known for each training sequence-> simple maximumlikelihoodor Bayesian estimation • correct path not known -> Forward-Backward algorithm + ML orBayesian estimation • Classification • simple Markov model-> calculate probability of sequence along singlepath for each model • hidden Markov model-> Forward algorithm to calculate probability ofsequence along all paths for each model • Segmentation • hidden Markov model-> Viterbi algorithm to find most probable pathfor sequence API2010

The Parameters of an HMM • Transition Probabilities • Probability of transition from state k to state l • Emission Probabilities • Probability of emitting character b in state k Note: HMM’s can also be formulated using an emission probability associated with a transition from state k to state l. API2010

Audio Features & Machine Learning