Wild Dolphin Project 11-751 Speech Final Project

Wild Dolphin Project11-751 Speech Final Project by Jiazhi Ou jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu

Outline • Wild Dolphin Project, Dolphin Speech • Data, Labeling, Labeling problems • Previous work • Models training • Experiments & Results • Conclusions

The Wild Dolphin Project (WDP) • The Wild Dolphin Project (WDP), founded by Dr. Denise Herzing in 1985, is engaged in an ambitious, long-term scientific study of a specific pod of Atlantic spotted dolphins that live 40 miles off the coast of the Bahamas, in the Atlantic Ocean. For about 100 days each year, Phase I research has involved the photographing, videotaping, and audio taping of a group of resident dolphins, aiming to learn about their lives. • http://www.wilddolphinproject.org/index.cfm

Dolphin’s Speech • Dolphin’s Speech is very different than man’s speech • Range of frequencies is wider • Two mechanisms for producing sound simultaneously • Directionality of some of the frequencies • Carried in water • Can travel large distances

Dolphin’s Speech(2) • Is used for: • Identification • Communicating • Fighting • Defending • Courting • Warning • Calling • Hunting

Dolphin’s Speech(3) • 3 main types • Whistles • Signature • Non-signature • Clicks • Spike trains

What do we know • Not much • We know that each dolphin has a unique whistle called signature whistle. • The signature whistle is similar to those that are in close contact with the baby dolphin

Data • 164 files containing sounds of one dolphin whose name is known. • Average file length is 7 sec • Total data length less than 20 minutes out of which about half is silence • The data does not contain all of the relevant frequencies

Labeling • Dolphin Names • Dolphin ID project • Pause, Noise, Dolphin Signature Whistles, Dolphin Non-Signature whistles.

Labeling Problems • How do we distinguish between those 2 whistles? • How to distinguish between whistles and non-whistles? • They co-occur • How to determine the duration of the label? • Should close labels be labeled as one label? • This has an effect on the model • Some signals are weak, probably due to a change in the dolphins direction

Mapping from Labels to Models

Label Statistics

Previous Work • Dolphin-ID Project by Tanja, Alan and Yue • Task: To identify dolphin ID using their signature whistles • 51 labeled files by Alan • 13 HMMs: 10 for each dolphin + DOLPHIN, PAUSE, and GARBAGE • Use Janus to do training and testing • Try different kinds of features

Our Work • Model Generalized Signature Whistles • Label More Files • Create HMMs for signature whistles, non-signature whistles, garbage, and pause • Train and test the HMMs using Janus • Evaluate the test results with our own method • Compare different model selections

Signal Processing • Tanja scripts • Down sampling • High Pass Filter • FFT • LDA

b b b m m m m m e m e e HMM Topologies Signature Whistles Non-Signature Whistles Garbage Pause (Water)

Model Selection • Scheme 1 • Signature Whistles, Non-Signature Whistles, GARBAGE, PAUSE • Scheme 2 • Signature Whistles, GARBAGE, PAUSE • Scheme 3 • 10 HMMs (one for each dolphin), GARBAGE, PAUSE

Evaluation • We can not use WER here since there are no words, just segments. • The method we used was to compute a confusion matrix over hidden states. • Janus treat silence differently and doesn’t show silence classification which complicates the evaluation.

Experiments • Data • 162 labeled files were used • Half of the data for training, half for testing • Swap the training set and test set • 162 test results all together • Features • The same as those in dolphin-ID project • Model Selection • 3 different schemes

Results – Scheme 1

Analysis of Results • You can only get as good as your labels • Scheme 3 is the best to align signature whistles -- speaker dependent • Scheme 1 is the worst – Not enough data to model non-signature whistles and garbage • Scheme 2 is in the middle – speaker independent • Pause is the most difficult to model – It contains all different things. We modeled it with only 1 state

Conclusion • Analyzing dolphin sounds is quite different than analyzing human speech. The methods used have to be adjusted to the characteristics of the dolphin sounds. • There is a lot of work to be done in the signal processing stage • Partly supervised training • It might be better just to construct a model for the labels we are sure and let the model learn what are signature whistles or units that discriminate between different labels.

We also tried … • One-state model for non-signature whistles, garbage, and pause -- Segmentation fault in training • “Loop back” model for signature whistles -- The loop back transition makes no difference

Acknowledgement Tanja Schultz Yue Pan Alan W Black Szu-Chen Stan Jou Hua Yu

Thank You! Jiazhi Ou Tal Blue {jzou, tblum}@cs.cmu.edu

Wild Dolphin Project 11-751 Speech Final Project

Wild Dolphin Project 11-751 Speech Final Project

Presentation Transcript

Project WILD

Final Project

Final Project

Final Project

Final Project

Speech Acts Videos Final Project

Final Project

Final project

Final Project

Final Project

Final Project

Speech Final Project

Speech Processing Final Project

Final Project

Final project

Project Dolphin

Project 5 Final Project

Wild Dolphin Project 11-751 Speech Final Project

Speech Recognition Final Project Resources

Final Project