1 / 27

Wild Dolphin Project 11-751 Speech Final Project

Wild Dolphin Project 11-751 Speech Final Project. by Jiazhi Ou jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu. Outline . Wild Dolphin Project, Dolphin Speech Data, Labeling, Labeling problems Previous work Models training Experiments & Results Conclusions. The Wild Dolphin Project (WDP) .

Download Presentation

Wild Dolphin Project 11-751 Speech Final Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Wild Dolphin Project11-751 Speech Final Project by Jiazhi Ou jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu

  2. Outline • Wild Dolphin Project, Dolphin Speech • Data, Labeling, Labeling problems • Previous work • Models training • Experiments & Results • Conclusions

  3. The Wild Dolphin Project (WDP) • The Wild Dolphin Project (WDP), founded by Dr. Denise Herzing in 1985, is engaged in an ambitious, long-term scientific study of a specific pod of Atlantic spotted dolphins that live 40 miles off the coast of the Bahamas, in the Atlantic Ocean. For about 100 days each year, Phase I research has involved the photographing, videotaping, and audio taping of a group of resident dolphins, aiming to learn about their lives. • http://www.wilddolphinproject.org/index.cfm

  4. Dolphin’s Speech • Dolphin’s Speech is very different than man’s speech • Range of frequencies is wider • Two mechanisms for producing sound simultaneously • Directionality of some of the frequencies • Carried in water • Can travel large distances

  5. Dolphin’s Speech(2) • Is used for: • Identification • Communicating • Fighting • Defending • Courting • Warning • Calling • Hunting

  6. Dolphin’s Speech(3) • 3 main types • Whistles • Signature • Non-signature • Clicks • Spike trains

  7. What do we know • Not much • We know that each dolphin has a unique whistle called signature whistle. • The signature whistle is similar to those that are in close contact with the baby dolphin

  8. Data • 164 files containing sounds of one dolphin whose name is known. • Average file length is 7 sec • Total data length less than 20 minutes out of which about half is silence • The data does not contain all of the relevant frequencies

  9. Labeling • Dolphin Names • Dolphin ID project • Pause, Noise, Dolphin Signature Whistles, Dolphin Non-Signature whistles.

  10. Labeling Problems • How do we distinguish between those 2 whistles? • How to distinguish between whistles and non-whistles? • They co-occur • How to determine the duration of the label? • Should close labels be labeled as one label? • This has an effect on the model • Some signals are weak, probably due to a change in the dolphins direction

  11. Mapping from Labels to Models

  12. Label Statistics

  13. Previous Work • Dolphin-ID Project by Tanja, Alan and Yue • Task: To identify dolphin ID using their signature whistles • 51 labeled files by Alan • 13 HMMs: 10 for each dolphin + DOLPHIN, PAUSE, and GARBAGE • Use Janus to do training and testing • Try different kinds of features

  14. Our Work • Model Generalized Signature Whistles • Label More Files • Create HMMs for signature whistles, non-signature whistles, garbage, and pause • Train and test the HMMs using Janus • Evaluate the test results with our own method • Compare different model selections

  15. Signal Processing • Tanja scripts • Down sampling • High Pass Filter • FFT • LDA

  16. b b b m m m m m e m e e HMM Topologies Signature Whistles Non-Signature Whistles Garbage Pause (Water)

  17. Model Selection • Scheme 1 • Signature Whistles, Non-Signature Whistles, GARBAGE, PAUSE • Scheme 2 • Signature Whistles, GARBAGE, PAUSE • Scheme 3 • 10 HMMs (one for each dolphin), GARBAGE, PAUSE

  18. Evaluation • We can not use WER here since there are no words, just segments. • The method we used was to compute a confusion matrix over hidden states. • Janus treat silence differently and doesn’t show silence classification which complicates the evaluation.

  19. Experiments • Data • 162 labeled files were used • Half of the data for training, half for testing • Swap the training set and test set • 162 test results all together • Features • The same as those in dolphin-ID project • Model Selection • 3 different schemes

  20. Results – Scheme 1

  21. Results – Scheme 2

  22. Results – Scheme 3

  23. Analysis of Results • You can only get as good as your labels • Scheme 3 is the best to align signature whistles -- speaker dependent • Scheme 1 is the worst – Not enough data to model non-signature whistles and garbage • Scheme 2 is in the middle – speaker independent • Pause is the most difficult to model – It contains all different things. We modeled it with only 1 state

  24. Conclusion • Analyzing dolphin sounds is quite different than analyzing human speech. The methods used have to be adjusted to the characteristics of the dolphin sounds. • There is a lot of work to be done in the signal processing stage • Partly supervised training • It might be better just to construct a model for the labels we are sure and let the model learn what are signature whistles or units that discriminate between different labels.

  25. We also tried … • One-state model for non-signature whistles, garbage, and pause -- Segmentation fault in training • “Loop back” model for signature whistles -- The loop back transition makes no difference

  26. Acknowledgement Tanja Schultz Yue Pan Alan W Black Szu-Chen Stan Jou Hua Yu

  27. Thank You! Jiazhi Ou Tal Blue {jzou, tblum}@cs.cmu.edu

More Related