Speech Processing

Speech Processing Applications of Images and Signals in High Schools AEGIS RET All-Hands Meeting University of Central Florida July 20, 2012

Contributors Dr. VetonKëpuska, Faculty Mentor, FIT vkepuska@fit.edu Jacob Zurasky, Graduate Student Mentor, FIT jzuraksy@my.fit.edu Becky Dowell, RET Teacher, BPS Titusville High dowell.jeanie@brevardschools.org

Speech Processing Project • Speech recognition requires speech to first be characterized by a set of “features” • Features are used to determine what words are spoken. • Our project implements the feature extraction stage of a speech processing application.

Timeline • 1874: Alexander Graham Bell proves frequency harmonics from electrical signal can be divided • 1952: Bell Labs develops first effective speech recognizer • 1971-1976 DARPA: speech should be understood, not just recognized • 1980’s: Call center and text-to-speech products commercially available • 1990’s: PC processing power allows use of SR software by ordinary user Timeline of Speech Recognition. http://www.emory.edu/BUSINESS/et/speech/timeline.htm

Applications • Call center speech recognition • Speech-to-text applications (e.g. dictation software) • Hands-free user-interface (e.g., OnStar, XBOX Kinect, Siri) • Science Fiction 1968: Stanley Kubrick’s 2001: A Space Odysseyhttp://www.youtube.com/watch?v=6MMmYyIZlC4 • Science Fact 2011: Apple iPhone 4S Sirihttp://www.apple.com/iphone/features/siri.html • Medical Applications • Parkinson’s Voice Initiative • Detection of Sleep Disorders

Difficulties • Continuous Speech (word boundaries) • Noise • Background • Other speakers • Differences in speakers • Dialects/Accents • Male/female

Speech Recognition Front End: Pre-processing Back End: Recognition Features Recognized speech Speech Large amount of data. Ex: 256 samples Reduced data size. Ex: 13 features • Front End – reduce amount of data for back end, but keep enough data to accurately describe the signal. Output is feature vector. • 256 samples ------> 13 features • Back End - statistical models used to classify feature vectors as a certain sound in speech

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Pre-emphasis

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • Pre-emphasis

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • Pre-emphasis • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • Pre-emphasis • Mel-Scale • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale)

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • log • Pre-emphasis • Mel-Scale • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale) • Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals

Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • log • IFFT • Pre-emphasis • Mel-Scale • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale) • Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals • Inverse of FFT to transform to Cepstral Domain… the result is the set of “features”

Speech Analysis and Sound Effects (SASE) Project • Graphical User Interface (GUI) • Speech input • Record and save audio • Read sound file (*.wav, *.ulaw, *.au) • Graphs the entire audio signal • Process user selected speech frame and display output for each stage of processing • Displays spectrogram • Apply audio effects

MATLAB Code • Graphical User Interface (GUI) • GUIDE (GUI Development Environment) • Callback functions • Front-end speech processing • Modular functions for reusability • Graphs display output for each stage • Sound Effects • Echo, Reverb, Flange, Chorus, Vibrato, Tremolo, Voice Changer

GUI Components

GUI Components Plotting Axes

Buttons GUI Components Plotting Axes

SASE Lab Demo • Record, play, save audio to file, open existing audio files • Select and process speech frame, display graphs of stages of front-end processing • Display spectrogram for entire speech signal or user selectable 3 second sample • Play speech – all or selected 3 sec sample • Show differences in certain sounds in spectrogram and the features ex: “a e i o u” so audience understands how these graphs tell us about the sounds • Apply sound effects, show user configurable parameters • Graphs spectrogram and speech processing on sound effects • Show echo effect in spectrogram • Use as teaching tool

Future Work on SASE Lab • Audio Effects • Ex: Pitch removal • Noise Filtering

Applications of Signal Processing in High Schools • Convey the relevance and importance of math to high school students • Bring knowledge of engineering, technological innovation, and academic research into high school classrooms • Opportunity for students to acquire technical knowledge and analytical skills through hands-on exploration of real-world applicationsin the field of Signal Processing • Encourage students to pursue higher education and careers in STEM fields

Unit Plan: Speech Processing • Collection of lesson plans introduce high school students to fundamentals of speech and sound processing • Connections to Pre-Calculus mathematics standards (NGSSS and Common Core) • Mathematical Modeling • Trigonometric Functions • Complex Numbers in Rectangular and Polar Form • Function Operations • Logarithmic Functions • Sequences and Series • Matrices • Hand-on lessons involving MATLAB projects • Teacher notes

Unit Introduction • Students research, explore, and discuss current applications of speech and audio processing

Lesson 1: The Sound of a Sine Wave • Modeling sound as a sinusoidal function • Concepts covered: • Continuous vs. Discrete Functions • Frequency of Sine Wave • Composite signals • Connections to real-world applications: • Synthesis of digital speech and music

Lesson 1: The Sound of a Sine Wave • Student MATLAB Project • Create discrete sine waves with given frequencies • Create composite signal of the sine waves • Plot graphs and play sounds of the sine waves • Analyze the effect of frequency on the graphs and the sounds of the sine functions • Project Extensions • Play songs using sine waves • Synthesize vowel sounds with sine waves

Lesson 2: Frequency Analysis • Use of Fourier Transformation to transform functions from time domain to frequency domain • Concepts covered: • Modeling harmonic signals as a series of sinusoids • Sine wave decomposition • Fourier Transform • Euler’s Formula • Frequency spectrum • Connections to real-world applications: • Speech processing and recognition

Lesson 2: Frequency Analysis • Student MATLAB Project • Create a composite signal with the sum of harmonic sine waves • Plot graphs and play sounds of the sine waves • Compute the FFT of the composite signal • Plot and analyze the frequency spectrum

Lesson 3: Sound Effects • Concepts covered: • Connections to real-world applications: • Digital music effects and speech sound effects

Lesson 3: Sound Effects • Student MATLAB Project

Unit Conclusion • Student presentation and report or poster • Summarize and reflect on lessons • Ask research questions • Develop new ideas for applications of speech processing

References • Ingle, Vinay K., and John G. Proakis. Digital signal processing using MATLAB. 2nd ed. Toronto, Ont.: Nelson, 2007. • Oppenheim, Alan V., and Ronald W. Schafer. Discrete-time signal processing. 3rd ed. Upper Saddle River: Pearson, 2010. • Weeks, Michael. Digital signal processing using MATLAB and wavelets. Hingham,Mass.: Infinity Science Press, 2007. • Timeline of Speech Recognition. http://www.emory.edu/BUSINESS/et/speech/timeline.htm

AEGIS Project • AEGIS website: http://research2.fit.edu/aegis-ret/ • Lesson plans available for download ????? • Contacts: • Becky Dowell, dowell.jeanie@brevardschools.org • Dr. VetonKëpuska, vkepuska@fit.edu • Jacob Zurasky, jzuraksy@my.fit.edu

Thank you! Questions?

Speech Processing