1 / 12

What does speech “look” like

What does speech “look” like. to an Automatic Speech Recognition System? . Jiang Wu jiang.wu@binghamton.edu Electrical Engineering Department. What are “speech features?”. Say if you are only allowed to use 39 values to represent a speech seg of 1 sec long…. What are good features?.

myron
Download Presentation

What does speech “look” like

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What does speech “look” like to an Automatic Speech Recognition System? Jiang Wu jiang.wu@binghamton.edu Electrical Engineering Department

  2. What are “speech features?” • Say if you are only allowed to use 39 values to represent a speech seg of 1 sec long… • What are good features? • Discriminative • “Curse of Dimensionality”

  3. How do we extract features from speech? From both time and frequency domain…

  4. Frequency Domain Features : “Static” Features • For each frame of pre-recorded speech, we try to extract the feature as to compress its spectrum.

  5. Time Domain Features : Trajectories of Static Spectral Features over Time • Most recent research has shown that spectral trajectories, over time, also play an important role in ASR. • Thus, we also want to let computers see what happens over time, about the center of each static feature.

  6. So finally to the ASR system, the speech features look like…

  7. Other Features for Future Study… • Pitch Contour : • Strongly related to “tones” • Very popular feature type for tonal languages: Mandarin, Cantonese, Some of Korean dialects, etc. • “Perceptual Features: • Analyze speech signal as to how human’s auditory system “perceptually” process sound • Frequency resolution and time resolution both depend on frequency and time..

  8. Our Current Projects Our Speech Lab • Project 1: To create an open source multi- language audio database for spoken language processing applications. • Project 2: To understand tonal languages. • Dr. MontriKarnjanadecha (Force Alignment) • Chandra Vootkuri (Ph.D.) (Landmark Theory) • Brian Wong (M.S.) (Freq. Non-linearity) • Andrew Hwang(M.S.) (Feature Transform)

  9. Background you need.. • Signal processing (A/D) • Probability theory, pattern recognition and machine learning • Understanding of human auditory system/ linguistic/musicality will be a bonus!

  10. Speech Research is interesting! • Pronunciation therapy Tons of interesting applications!! • Hearing aids • Singing voice processing +

  11. Talk to a public computer in your real life… • Not just the Microsoft speech-text software on your PC..

  12. Thank you! Questions?

More Related