270 likes | 753 Views
Wild Dolphin Project 11-751 Speech Final Project . by Jiazhi Ou jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu. Outline . Wild Dolphin Project, Dolphin Speech Data, Labeling, Labeling problems Previous work Models training Experiments & Results Conclusions. The Wild Dolphin Project (WDP) .
E N D
Wild Dolphin Project11-751 Speech Final Project by Jiazhi Ou jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu
Outline • Wild Dolphin Project, Dolphin Speech • Data, Labeling, Labeling problems • Previous work • Models training • Experiments & Results • Conclusions
The Wild Dolphin Project (WDP) • The Wild Dolphin Project (WDP), founded by Dr. Denise Herzing in 1985, is engaged in an ambitious, long-term scientific study of a specific pod of Atlantic spotted dolphins that live 40 miles off the coast of the Bahamas, in the Atlantic Ocean. For about 100 days each year, Phase I research has involved the photographing, videotaping, and audio taping of a group of resident dolphins, aiming to learn about their lives. • http://www.wilddolphinproject.org/index.cfm
Dolphin’s Speech • Dolphin’s Speech is very different than man’s speech • Range of frequencies is wider • Two mechanisms for producing sound simultaneously • Directionality of some of the frequencies • Carried in water • Can travel large distances
Dolphin’s Speech(2) • Is used for: • Identification • Communicating • Fighting • Defending • Courting • Warning • Calling • Hunting
Dolphin’s Speech(3) • 3 main types • Whistles • Signature • Non-signature • Clicks • Spike trains
What do we know • Not much • We know that each dolphin has a unique whistle called signature whistle. • The signature whistle is similar to those that are in close contact with the baby dolphin
Data • 164 files containing sounds of one dolphin whose name is known. • Average file length is 7 sec • Total data length less than 20 minutes out of which about half is silence • The data does not contain all of the relevant frequencies
Labeling • Dolphin Names • Dolphin ID project • Pause, Noise, Dolphin Signature Whistles, Dolphin Non-Signature whistles.
Labeling Problems • How do we distinguish between those 2 whistles? • How to distinguish between whistles and non-whistles? • They co-occur • How to determine the duration of the label? • Should close labels be labeled as one label? • This has an effect on the model • Some signals are weak, probably due to a change in the dolphins direction
Previous Work • Dolphin-ID Project by Tanja, Alan and Yue • Task: To identify dolphin ID using their signature whistles • 51 labeled files by Alan • 13 HMMs: 10 for each dolphin + DOLPHIN, PAUSE, and GARBAGE • Use Janus to do training and testing • Try different kinds of features
Our Work • Model Generalized Signature Whistles • Label More Files • Create HMMs for signature whistles, non-signature whistles, garbage, and pause • Train and test the HMMs using Janus • Evaluate the test results with our own method • Compare different model selections
Signal Processing • Tanja scripts • Down sampling • High Pass Filter • FFT • LDA
b b b m m m m m e m e e HMM Topologies Signature Whistles Non-Signature Whistles Garbage Pause (Water)
Model Selection • Scheme 1 • Signature Whistles, Non-Signature Whistles, GARBAGE, PAUSE • Scheme 2 • Signature Whistles, GARBAGE, PAUSE • Scheme 3 • 10 HMMs (one for each dolphin), GARBAGE, PAUSE
Evaluation • We can not use WER here since there are no words, just segments. • The method we used was to compute a confusion matrix over hidden states. • Janus treat silence differently and doesn’t show silence classification which complicates the evaluation.
Experiments • Data • 162 labeled files were used • Half of the data for training, half for testing • Swap the training set and test set • 162 test results all together • Features • The same as those in dolphin-ID project • Model Selection • 3 different schemes
Analysis of Results • You can only get as good as your labels • Scheme 3 is the best to align signature whistles -- speaker dependent • Scheme 1 is the worst – Not enough data to model non-signature whistles and garbage • Scheme 2 is in the middle – speaker independent • Pause is the most difficult to model – It contains all different things. We modeled it with only 1 state
Conclusion • Analyzing dolphin sounds is quite different than analyzing human speech. The methods used have to be adjusted to the characteristics of the dolphin sounds. • There is a lot of work to be done in the signal processing stage • Partly supervised training • It might be better just to construct a model for the labels we are sure and let the model learn what are signature whistles or units that discriminate between different labels.
We also tried … • One-state model for non-signature whistles, garbage, and pause -- Segmentation fault in training • “Loop back” model for signature whistles -- The loop back transition makes no difference
Acknowledgement Tanja Schultz Yue Pan Alan W Black Szu-Chen Stan Jou Hua Yu
Thank You! Jiazhi Ou Tal Blue {jzou, tblum}@cs.cmu.edu