Speech & Hearing Perception

Speech & Hearing Perception Perry C. Hanavan

Recommendation • My Fair Lady (musical adaption) • Pygmalion

Review • Peripheral Auditory Mechanism • Outer ear (pinna & external auditory canal) • Acoustic transmission • Quarter wave resonator • Middle ear (TM, ossicles, Eustachian tube, tympanum) • Mechanical transmission/transduction • Inner ear (cochlea, semicircular canals, saccule, utricle) • Hydraulic transmission/transduction • Mechanical transduction • Auditory Nerve (afferent, afferent) • Chemical-electrical transmission

Outer Ear • Pinna • External auditory meatus • Quarter wave resonator • The resonant frequency of the average adult ear canal is about 3000 Hz. • Smaller ear canals, like in children, have higher resonant frequencies around 4000 Hz

Localization • 2 ears • The two most important localization cues are the interaural time difference, or ITD, and the interaural intensity difference or IID. • Head shadow effect of the sound wave: a sound coming from a source located to one side of the head will have a higher intensity, or be louder, at the ear nearest the sound source. • Phase differences also plays a role in localization

Middle Ear • Impedance mismatch • Air vs. fluid • Area ratio hypothesis • Lever hypothesis (3:1) • Stiffness and mass have inverse effects on frequency in a resonant system: f=(1/2p) • Mass dominated systems have a lower resonant frequency than stiffness dominated systems. • Increasing stiffness in any ear component (membranes, ossicles, cavity) improves the efficiency of transmission of high frequencies. • Adding mass to the system, e.g., by increasing cavity volume or increasing ossicular chain mass, favors low frequencies.

Middle Ear

Inner Ear • The cochlea is a fluid-filled spiral with a resonator, the basilar membrane, and neuroreceptor, the Organ of Corti • Inner ears are tuned in that inner ear stiffness and mass characteristics are major determinants of hearing ranges • Differences in hearing ranges are dictated largely by differences in stiffness and mass of the basilar membrane that are the result of basilar membrane thickness and width variations along the cochlear spiral. • Basilar membranes are essentially tonotopically arranged resonator arrays, ranging high to low from base to apex.

Basilar Membrane

Traveling Wave • http://www.lloydwatts.com/collaborators.shtml

Inner vs. Outer Hair Cells

Inner Ear Mechanics • Basilar membrane • Basilar membrane animations • Hair Cells • Outer Hair Cell Motility

Central Auditory Path

CNS • Cochlear nuclei (modulate motility of OHC, acoustic reflex) • Trapezoid Body • Superior Olivary Complex (reflexes centers for Moro, startle, auralpalpebral, acoustic reflexes) • Lateral Lemniscus • Inferior Colliculus • Medial Geniculate Body • Primary Auditory Cortex • Wernicke-s Area • Corpus Callosum

Auditory CNS Path Central Auditory Pathway

Excellent Brief Review • Review of Function

Hearing Threshold

Auditory Masking • Blocking or obscuring a sound • Simultaneous masking • Presentation of target sound and masking sound • Broadband Noise (BBN) vs Narrowband Noise (NBN) • Critical bandwidth (when using NBN) • Upward spread of masking • Central masking

Precedence Effect • Fusion of sounds and initial echoes into one auditory event and the localization of that fused sound at the source of the earliest arriving sound • Stenger test used by Audiologists using this effect when individual suspected of malingering

Equal Level Contours

Music Analysis • Pure Tone (Periodic) • Periodic Complex Tone • Aperiodic Complex (Noise) • Fundamental – lowest tone in complex periodic sound • Harmonics – whole number multiple of fundamental • Missing fundamental – auditory illusion

Fundamentals • 100, 200, 300 Hz (100) • 800, 900, 1000 Hz (100)

Frequency • Place principle: Helmholtz suggested the basilar membrane resonate in specific places to a tone which Bekesy confirmed later • Frequency principle: Seeback and revived by Wever, suggested that the spike potentials of auditory nerve determines pitch • Volley principle: neurons fire in groups while one neuron is reloading another is firing

Auditory Scene Analysis ASA: a concept created by Albert Bregman, is a process in which the auditory system takes the mixture of sound that it derives from a complex natural environment and sorts it into packages of acoustic evidence in which each package probably has arisen from a single source of sound. This grouping helps pattern recognition not to mix information from different sources. Online Examples • Compact disc of ASA Link • Segregating and Grouping

Speech Production • Formants

Speech Production • Phonemes (sound units of language) • Consonants (s, z) • Voiced (b, d, g) • Unvoiced (p, t, k) • Vowels (a, e, o, i, u) • Diphthongs (oy, ei)

Formants • Vowels • Greater intensity, formant structure, all voiced, constriction of air flow less than consonant • Diphthongs • Vowel characteristics, but transition (glide) • Consonants • Less intensity, greater constriction of air flow

Pattern Playback Haskins Laboratory

Vocal Tract • Approximately 17 cm for males • 5/6 the length for females • Children roughly half the length of adult male

Math Model for Vowel Formants • Formant Calculation Handout • Formant Plotting Handout • Excel Model

Source Filter Fo (source produced at vocal folds) Formants (F1, F2, F3, …) created by vocal tract resonance Source which is emphasized and not modulated by vocal tract resonance (F1, F2, F3, shown at left)

Perception of Vowels • /a/ vowel has greatest intensity with unvoiced /θ/ as weakest vowel • Front vowels perceived on basis of F1 frequency and average of F2 and F3, whereas back vowels are perceived on the basis of the average of F1 and F2, as well as F3 • So is it the absolute frequency values of the formants? • Or the ratio of F2 to F1? • Perhaps it is the invariant cues (frequency changes that occur with coarticulation F1/F2 F3 F1 F2/F3

Formant with Tongue Position More pictorials

Vowel Spectrograph

Chart Vowel Formants • Acoustics and Tongue Position • Video Clip

Lip Rounding

Vowel Formants

Online Examples of Formants • Sound to Graph • Spectral Cues Homepage

Perception of Diphthongs • Perceived on basis of their formant transitions • Salient feature: rapidity of transition

Diphthongs

Consonants • Perception different for consonants than vowels • Greater variety of consonant types than vowels • Greater complexity for consonants

International Phonetic Alphabet(consonants) • 26 letters of alphabet • abcdefghijklmnopqrstuvwxyz • Only list phonemes • bdfghjklmnprstvwz • Digraph phonemes • ch, sh, th • Other phonemes

Production of Consonants • Place of production • Where major constriction occurs in vocal tract • Manner of production • How consonant is produced • Voicing • Voiced or unvoiced

Place of Production Example of some consonant phonemes: • Bilabial p b m w • Labiodental f v • Dental th • Alveolar t d s z l r • Palatal ch sh • Velar k g ng • Glottal h

Manner of Production Example of some consonants: • Stops p t k b d g • Fricatives f v s sh z h • Affricates ch dg • Nasals m n ng • Semivowels w l r j

Voicing Examples of some consonants • Voiced b d g v z l r w • Unvoiced p t k f s h

Stops • Produced with a closure within the oral cavity, a build up of pressure behind this closure and a release of the closure allowing the air to be rapidly expelled. • Acoustically these events can be divided into five components: • Occlusion • Transient • Frication • Aspiration • Transition • More info

Fricatives Fricative production involves two articulators being brought together and held close enough for the escaping air to become turbulent creating an aperiodic (noise) sound. Maybe be voiced or unvoiced. The closure phase of fricatives is characterized by the continuant noisy aperiodic component. The characteristics of the noise are the result of the position of the constriction, the shape of the orifice, and the aerodynamic forces of the air stream. Acoustic characteristics include: High frequency hiss, long duration, weak to moderate intensity

Affricates • Stop with a fricative release – but palatal. • Combination of stop and fricative characteristics. • Closure, burst followed by short silence then frication • The affricates can be distinguished from the fricatives by the presence of closure and by the duration of noise which is longer for the fricatives. • The shorter the duration of noise, the shorter the silence necessary to elicit an affricate response. • Affricates have a shorter rise time than fricatives. Rise time is the time from onset to peak intensity of frication.

Nasals • Like the oral tract, the nasal tract has its own resonant frequencies or formants. • The most commonly reported nasal formants occur at 300Hz, 1kHz, 2.2 kHz, 2.9kHz, 4kHz. • Antiresonances enter whenever there is a side branch in the main acoustic pathway. An antiresonance or zero serves to decrease the spectral energy at specific frequencies by absorbing the sound at or near the antiresonant frequencies. These cumulatively have the effect of reducing the total amplitude of the sound generated.

Speech & Hearing Perception