Chapter 6 Speech

Chapter 6Speech By Frankie, K. F. Yip

Sound Waves Lecture 6 - Sound

The sound waves are longitudinal waves Period is the time required for one complete cycle Pressure Period amplitude Time Wavelengthis the distance travelled in one cycle Pressure Wavelength amplitude Distance Lecture 6 - Sound

Lecture 6 - Sound

Basic Sound Concepts • Frequency is the number of pressure waves that pass by a reference point per unit time and is measured in Hertz (Hz) or cycles per second. (Frequency = 1/Period) • Relationship between frequency and pitch: • An increase in frequency is perceived as a higher pitched sound • A decrease in frequency is perceived as a lower pitched sound • Humans generally hear sound waves whose frequencies are between 20 Hz and 20,000 Hz. • < 20 Hz, sounds are referred to as infrasonic • > 20,000 Hz, sounds are referred to as ultrasonic. • The frequency of middle “C” on a piano is 246 Hz. Lecture 6 - Sound

Basic Sound Concepts Human can hear: 20Hz – 20KHz Human can speak: 0Hz – 3.3KHz • Amplitudeof a sound is the measure of the displacement of the air pressure wave from its normal state. (i.e. Volume of sound) • Sound intensity is defined as the sound power per unit area (watts/m2) • Sound Intensity Measurement in decibels (dB) 0 dB - essentially no sound heard 35 dB - quiet home 70 dB - noisy street 120dB - discomfort Sound level measurements in decibels are generally referenced to a standard threshold of hearing at 1000 Hz for the human ear. Lecture 6 - Sound

Sound Nature Sound in the natural World is analog Sound stored on a computer is digital Natural sound is the result of a stream of continuous changes of vibrations in the atmosphere. Lecture 6 - Sound

How to convert Analog into Digital? The process of converting natural analog sound into discrete digital sound is digitization. (i.e. PCM – Pulse Code Modulation) Digitization of analog sounds is composed of two phases: 1. sampling; 2. quantization. Lecture 6 - Sound

What is PCM? This two step process just described for sampling and quantizing sound digitially is termed Pulse Code Modulation (PCM). PCM is the sampling technique for digitizing analog signals, especially audio signals PCM is a lossless audio encoding scheme used by audio CDs and DVD-Audio. This encoding scheme maintains very high fidelity sound. Lecture 6 - Sound

Why need PCM? This is basically an audio format that is pretty much "raw" audio. This is generally the format that audio hardware directly interacts with. Though some hardware can directly play other formats, generally the software must convert any audio stream to PCM and then attempt to play it. Lecture 6 - Sound

Sampling Sampling involves the rate at which the converted sound is captured. The higher the sampling rate:  the higher the sound quality (fidelity)  the higher the storage requirements Time Lecture 6 - Sound

Relationship between Sampling and frequency of sound wave A 400 Hz wave sampled at 400 Hz becomes a flat wave. (Sampling Rate = Frequency of Sound Wave) A 400 Hz wave sampled at 600 Hz. The digitized wave will not sound like the original. (Sampling Rate = 1.5 x Frequency of Sound Wave) A 400 Hz wave sampled at 800 Hz. Still not enough (But is acceptable) (Sampling Rate = 2 x Frequency of Sound Wave) A 400Hz wave sampled at 1200Hz. This sampling rate is sufficient to allow us to recreate the sound wave. (Sampling Rate = 3 x Frequency of Sound Wave) Lecture 6 - Sound

More Details of PCM The volume of the sound is sampled at regular intervals, extremely frequently - thousands of times per second. Each sample is converted into a binary number and that is stored in the file or transmitted. In order to play back the data, each binary number is converted to an analog voltage, at the same rate as was used during recording PCM is believed to be the simplest, uncompressed method of holding sound information, as used in conventional WAV files Lecture 6 - Sound

Nyquist Effect Knowing the type of audio to be sampled and the intended purpose of the audio allows for a reasonable choice of sampling rate. In deciding upon a sampling rate one must be aware of the difference between playback rate and capturing (sampling), rate. These two rates are not the same. In fact the sampling rate must be two times the playback rate. The reason for this discrepancy is due to the Nyquist Effect (or Nyquist Theorem). Sampling Rate > 2 x Playback Rate Lecture 6 - Sound

What is the minimum sampling rate to digitize voice signal? Sampling Rate = at least 8000 samples per second Since human can speak < 3400Hz, it is more convenient to estimate it as 4000Hz. By using Nyquist Theorem, Sampling Rate > 2 x Playback Rate = 2 x 4000 Hz = 8000 Hz Lecture 6 - Sound

Quantization The process of converting a sampled sound into a digital value is termed “quantization” The number of distinct sound levels that can be represented is determined by the number of bits used to store the quantization value 3 bits per sample Amplitude can be divided into 23 = 8 Q-level Lecture 6 - Sound

What is the minimum no. of bits per sample in voice digitization? Minimum no. of bits = At least 8 bits per sample (256 Quantization Levels) Since it will produce large quantization noise when it is not enough Q level. Lecture 6 - Sound

Practical Value of PCM Sampling Rate , Quality of digital voice  No. of bits per sample , Quality of digital voice  Lecture 6 - Sound

Bit Rate Calculation • Bit Resolution  No. of bits per sample • Byte rate of a monophonic digital recording (Bps) = sampling rate x (bit resolution/8) x 1. • Byte rate of stereo digital recording (Bps) = sampling rate x (bit resolution/8) x 2. • File size of digital recording (in Bytes) = sampling rate x duration of recording (in secs) x (bit resolution/8) x number of tracks. Lecture 6 - Sound

CD, Radio, Phone Quality Lecture 6 - Sound

Downsampling Downsampling helps to decrease the size of audio files Downsampling simply refers to reducing the sampling rate and/or quantization level of the file. For Example, if you had a CD Quality music clip that you wanted to post on your web page. You can downsample that to Radio Quality and reduce the file size by 87.5% while still maintaining a respectable quality level -- equivalent to a good non-stereo FM radio station. Lecture 6 - Sound

Pure music audio file (AU Files  .au) Developed by Sun Microsystem A standard for Unix computers Most common sound format on the web Lecture 6 - Sound

Pure music audio file (MIDI File - .mid) • MIDI (Musical Instrumental Digital Interface) music. • A sequencer software and sound synthesizer is required in order to create MIDI scores. • Since they are small, MIDI files embedded in web pages load and play promptly. • Length of a MIDI file can be changed without affecting the pitch of the music or degrading audio quality. Lecture 6 - Sound

Audio File Formats • A sound file’s format is a recognized methodology for organizing data bits of digitized sound into a data file. • On the Macintosh, digitized sounds may be stored as data files, resources, or applications such as AIFF or AIFC. • In Windows, digitized sounds are usually stored as WAV files. Lecture 6 - Sound

Uncompressed Audio Files (Wave File  .wav) A Wave file (.wav) is a high quality, uncompressed audio file and can be embedded in webpage Native sound format for windows. Short for WAVEform audio format A Microsoft and IBM audio file format standard for storing audio on PCs This type of file is great as an archive and for editing on a DAW (digital audio workstation) Longer for end user to download Lecture 6 - Sound

What is Streaming? Streaming is a method of delivering an audio signal to your computer over the Internet Differ from the "normal" method of receiving Internet audio in one important way: Instead of having to download a ".wav", ".au" or other type of file completely before being able to listen to it, you hear the sound as it arrives at your computer, and therefore do not have to wait for a complete download (which would be difficult with a live broadcast anyway!) Lecture 6 - Sound

What is Streaming? As the data arrives it is buffered for a few seconds and then playback begins As the audio is playing, more data is constantly arriving (or streaming), and as long as you are receiving a constant stream of data, you should hear constant audio Lecture 6 - Sound

Compressed Audio Files Different compressed audio formats in Internet Streaming Enable audio formats WMA (Windows Media Audio) A proprietary compressed audio file format used by Microsoft Played using Windows Media Player, Winamp and other media players Lecture 6 - Sound

Compressed Audio Files Streaming Enable audio formats RA (Real Audio) Developed by RealNetworks A proprietary audio codec Designed to conform to low bandwidth Used as streaming audio format Played with Real One Player The current version of codec is RealAudio 10 Lecture 6 - Sound

Compressed Audio Files Non-streaming audio formats Ogg developed by Xiph.org Foundation Open source software for digital multimedia (Free for commercial use) MP3 (Motion Picture Expert Group) Digital Audio Broadcast (DAB) project initiated by Fraunhofer IIS-A MPEG-1 Audio Layer 3 An audio comprssion algorithm Reducing the amount of data required to reproduce audio Lecture 6 - Sound

Compressed Audio Files Non-streaming audio formats MP3 (Motion Picture Expert Group) Data rates and Compression Ratio for MPEG-1 Layer 1, 2, 3 Layer1: 384 kbit/s, 4:1 Layer2: 192..256 kbit/s, 6:1..8:1 Layer3: 112.. 128 kbit/s, 10:1..12:1 Lecture 6 - Sound

Comparison: Audio Files Size *** These numbers are not meant to be exact and could change at any time *** Lecture 6 - Sound

Normalization Normalization is the same as Volume Balancing For Example: If your MP3's are recorded in different volume levels it can be quite frustrating to have to change the volume on the player after each song. Instead you can use the normalization to give all MP3's the same approximate sound level. Lecture 6 - Sound

Normalization This is a process where you soften the loud parts of a song, and turn up the quiet parts. It is also known as compressing/ limiting. The result is a consistent volume level without any shocking surprises for the listener. Lecture 6 - Sound

Audio file conversion software : .WAV .MP3 Acoustica MP3 Audio Mixer Acoustica MP3 To WAVE Converter PLUS Ashampoo MP3 Studio Deluxe Easy CD Ripper P2P Music Jukebox Lecture 6 - Sound

Audio file conversion software : .WAV  .WMA Acoustica MP3 Audio Mixer Advanced WMA Workshop All To WMA Converter Easy CD Ripper P2P Music Jukebox Lecture 6 - Sound

Audio file conversion software : . WAV  .RA FairStars Audio Converter MP3 Audio Mixer Lecture 6 - Sound

Chapter 6 Speech

Chapter 6 Speech

Presentation Transcript

6-Speech Quality Assessment

Chapter 7 SPEECH COMMUNICATIONS

Appendix 6: Interpreting Figures of Speech

Speech Recognition Chapter 3

Chapter Two Speech Sounds

6-Text To Speech (TTS) Speech Synthesis

Lab 6: Child-Directed Speech

Speech I – Chapter 13

Chapter 6 Linear Predictive Coding (LPC) of Speech Signals

Chapter 3.2 Speech Communication

Chapter 2 Speech Sounds

Chapter 2 Speech Sounds

Chapter two speech sounds

Chapter 13: Speech Perception

Chapter 13: Speech Perception

Speech Chapter 2

6- Speech Quality Assessment

Chapter 6 Audiovisual Speech Perception

CHAPTER 11 Speech Delivery