1 / 46

Hearing and Speech

Cognitive Neuroscience and Embodied Intelligence. Hearing and Speech. Based on book Cognition, Brain and Consciousness ed. Bernard J. Baars. Janusz A. Starzyk. Time domain sinewave signal and the same signal in time-frequency domain. Sound and hearing basics.

meli
Download Presentation

Hearing and Speech

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cognitive Neuroscience and Embodied Intelligence Hearing and Speech Based on book Cognition, Brain and Consciousness ed. Bernard J. Baars Janusz A. Starzyk

  2. Time domain sinewave signal and the same signal in time-frequency domain Sound and hearing basics • Complex sound signals can be decomposed into a series of sinewave signals of various frequencies. • Human auditory system detects sounds in the range of 20 Hz to 20 kHz • bats and whales can hear up to 100 kHz • Musicians can detect the difference between 1000 Hz and 1001 Hz

  3. Sound and hearing basics • 20 msec is needed for the onset of a consonant • 200 msec is time of an average syllable • And 2000 msec is needed for a sentence • These various time scales and other parameters of the sound like timbre or intensity must be properly processed to recognize speech or music. A spectrogram of a speech signal – frequency is represented on the y-axis

  4. Sound and hearing basics Near total silence - 0 dB A whisper - 15 dB Normal conversation - 60 dB A lawnmower - 90 dB A car horn - 110 dB A rock concert - 120 dB A gunshot - 140 dB • Dynamic range of human hearing system is very broad from 1 SPL (sound pressure level where hearing is accruing) to 1015 SPL or 150 dB SPL. Human and cat hearing sensitivity

  5. Sound and hearing basics There are two cochlear windows – oval and round. Stapes coveys sound vibrations through oval window to inner ear fluids. • Sound wave caused by vibrating objects moves through the air and enters external auditory canal reaching membrane or eardrum. • Vibrations propagate through the middle ear through mechanical action of three bones the hammer, anvil and stirrup (or malleus, incus and stapes). • Because of the length of the ear canal, it is capable of amplifying sounds with frequencies of approximately 3000 Hz.

  6. Sound and hearing basics • The cochlea and the semicircular canals are filled with a water-like fluid. • Cochlea in the inner ear contains a basilar membrane. • Traveling wave of sound moves across the basilar membrane moving the small hair-like nerve cells.

  7. Sound and hearing basics • The inner surface of the cochlea is lined with over 16 000 hair-like nerve cells which perform one of the most critical roles in our ability to hear. • Each hair cell has a natural sensitivity to a particular frequency of vibration. • The brain decodes the sound frequencies based on which hair cells along the basilar membrane are activated this is known as place principle. Pathways at the auditory brainstem

  8. Inner ear details From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000

  9. Inner ear details From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000.

  10. Inner ear details From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000

  11. Inner ear details From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000.

  12. Inner ear details From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000.

  13. Inner ear details Figure 30-5 From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000.

  14. Inner ear details Figure 30-5 From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000.

  15. Inner ear details From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000.

  16. The central auditory system • The auditory system has many stages from the ear, to the brainstem, to subcortical nuclei, and to cortex. • Ascending (affarent) pathways transmit information from the periphery to cortex. • The neuron signals travel starting from the auditory nerve to the lower (ventral) cochlear nucleus. • Then signal travels through lateral lemniscus, inferior colliculus, thalamus, to auditory cortex. • A key task of the ascending pathway is to localize sound in space.

  17. The central auditory system • The descending (efferent) pathways from auditory system cortex go down to periphery under cortical control. • This control extends all the way to hair cells in the cochlea. • Descending pathway provides ‘top down’ information critical for selective attention and perception in a noisy environment. • Besides ascending and descending pathways there is connection between left and right auditory pathways through corpus callosum and other brain regions.

  18. Auditory cortex • Auditory cortex specializes in sound processing. • It serves as a hub for sound processing and interacts with other systems within cortex and back down the descending path to the cochlea. • These processes provide a wide range of perceptual abilities like selecting a single person's voice in a crowded space or recognizing melody even when it is played off-key.

  19. Auditory cortex • In humans primary auditory cortex is located within Heschl’s gyrus. • Heschl’s gyrus corresponds to Brodmann’s area 41. • Another important region in auditory cortex is planum temporale located posterior to Heschl’s gyrus. • Planum temporale is much larger in the left hemisphere (up to 10 times) in right handed individuals. • It plays important role in language understanding. • Posterior to planum temporale is Broadmann area 22 that Carl Wernicke associated with speech comprehension (Wernicke area).

  20. Auditory cortex • There are several types of neurons in the auditory system. • They have different response properties for coding frequency, intensity, and timing information in sounds as well as encoding spatial information for localizing sounds in space. • Main cells of cochlear nucleus and their corresponding post stimulus time (PST) histograms. • Sound stimulus used is typically 25 ms tone bursts at the center frequency and sound level 30 dB above threshold.

  21. Auditory cortex • Receptive fields of auditory neurons have different sensitivity to the location of the sound source (in azimuth angle) and its loudness (in dB). • The top neuron sensitivity is to a broad range of sound intensity located to the right with larger sensitivity to louder signals. • The lower neuron sensitivity is more narrowly tuned to sounds level 30-60 dB located slightly to the left of center. • Broadly tuned neurons are useful for detection of the sound source, while narrowly tuned give more precise information needed to locate the sound source like more precise direction of the sound and its loudness level.

  22. Auditory cortex • Auditory tonotopic cortical fields of a cat. • a) lateral view • b) lateral view “unfolded’ to show parts hidden within sulci. • The four tonotopic fields are: • Anterior (A) • Primary (AI) • Posterior (P) and • Ventroposterior (VP) • Positions of the lowest and highest center frequencies in these fields are indicated in (b) • Other cortical areas have a little tonotopy: seconday (AII), ventral (V), temporal (T), and dorsoposterior (DP).

  23. Functional mapping of auditory processing • The planum temporale (PT) location close to Wernicke’s area for speech comprehension, points towards its role as the site for auditory speech and language processing. • However neuroimaging studies of PT provide evidence that functional role of PT is not limited to speech. • PT is a hub for auditory scene analysis, decoding sensory inputs and comparing them to memories and past experiences. • PT further directs cortical processing to decode spatial location and auditory object identification. • Planum temporale and its major associations: lateral superior temporal gyrus (STG), superior temporal sulcus (STS), middle temporal gyrus (MTG), • parieto-temporal operculum (PTO), inferior parietal lobe (IPL).

  24. Functional mapping of auditory processing • PT as a hub for auditory and spatial analysis. • In a crowded environment it is important to decode auditory objects such as friend’s voice, alarm signal or a squeaking wheel. • To do so, auditory system must determine where sounds are occurring in space, and what they represent. • All these will be associated with other sensory inputs like vision, smell, or feel and memory associations.

  25. Functional mapping of auditory processing • Neurons’ response to • interaural time difference (ITD) and • interaural level difference (ILD) Abbreviations: CN – cochlear nucleus MSO – medial superior olive LSO – lateral superior olive MNTB – medial nucleus of the trapezoidal body • To determine where the sound is coming from, two cues are used: • Interaural (between ear) time difference • Interaural level difference • Sensitivity to time difference must be smaller than millisecond. • The head produces a ‘sound shadow’ so that the sound reaching farther ear is slightly weaker.

  26. Functional mapping of auditory processing • It was demonstrated that musical conductors were able to better locate sound sources in a musical score • They demonstrated higher sensitivity to sounds presented in peripheral listening than other groups including other musicians.

  27. Functional mapping of auditory processing • Auditory objects are categorized into human voices, musical instruments, animal sounds, etc. • Auditory objects are learned over our lifetime, and associations are stored in the memory. • Auditory areas in superior temporal cortex are activated both by recognized and unrecognized sounds. • Recognized sounds also activate superior temporal sulcus and middle temporal gyrus (MTG). Fig. (c) shows difference between Activations for recognized sounds and unrecognized sounds

  28. Functional mapping of auditory processing • Binder and colleagues propose that middle temporal gyrus (MTG) is the region that associates sounds and images. • This is in agreement with case studies of patients who suffered from auditory agnosia (inability to recognize sounds). • Research results showed that auditory object perception is a complex process and involves multiple brain regions in both hemispheres. Brain activities in auditory processing – cross sections at different depth

  29. Cocktail party effect • How auditory system separates sounds coming from different sources? • Bregman (1990) proposed a model for such segregation. • It contains four elements: • The source • The stream • Grouping • Stream segregation • The source is the sound signal. It represents physical features like frequency, intensity, spatial location. • The stream is the percept of the sound and represents psychological aspects depending on individual. • Grouping – creates stream • Simultaneous grouping e.g. instruments in the orchestra • Sequential grouping e.g. grouping sounds across time • Stream segregation into objects.

  30. Cocktail party effect • Bergman grouping principles: • Proximity: sounds that are close in time are grouped. • Closure: if a sound does not belong to the stream (like cough during a lecture) are excluded. • Good continuation: sounds that follow smoothly each other (similar to proximity). • Common fate: sounds that come from the same location or coincide in time (orchestra). • Exclusive allocation – selective listening (focus on one stream). Cortical areas of auditory stream analysis: intraparietal sulcus (IPS) is involved in binding of multimodal information (vision, touch, sound)

  31. Cocktail party effect • There is a growing evidence that like in visual stream cortical networks for decoding ‘what’ and ‘where’ information in sound are processed in separate but highly interactive processing streams. Human brain processing: Blue – language specific phonological structure Lilac – phonetic cues and speech features Purple – intelligible speech Pink – verbal short term memory Green – auditory spatial tasks Audio (blue) and visual (pink) processing areas in macaque brain, and ‘what’, ‘where’ audio processing streams

  32. Speech perception • There is no agreement how speech is coded in the brain. • What are the speech ‘building blocks’? • A natural way would be to code words based on phonemes. • Word ‘dig’ would be obtained by identifying a sequence of phonemes • Perhaps a syllable is the appropriate unit? • We must decode not only ‘what ‘ but ‘who’ and ‘when’ as well to understand temporal order of phonemes, syllables, words, and sentences. • The speech signal must be evaluated on the scale of times from 20 ms to 2000 ms independently of the pitch (high for a child, low for a man), loud or quiet, fast or slow.

  33. Speech perception • Early attempts in simplifying the speech processing were done in Bell Labs by Homer Dudley who developed vocoder: • Vocoder (voice + coder) was able to reduce speech signal for a transmission over long telephone circuits by analyzing and recoding speech. • Cochlear implants that stimulate auditory system are based on the vocoder technology for some types of hearing loss.

  34. Speech perception • A second invention spectrograph developed in Bell Labs during World War II produced voice picture with frequency on y-axis, time on x-axis and intensity as a level of grey. • Problems in analyzing spectrograms: • Gaps or silences do not mark when the word begins and ends. • Individual phonemes change depending on what phonemes were before and after them.

  35. short-term spectrum frequency What is wrong with the short-term spectrum? Inconsistent (same message, different representation) Shannon (1998) showed that a minimum information for speech decoding is included in the shape of the speech signal called temporal envelope

  36. Speech perception • Lack of invariant features in speech spectrogram forced researchers to look for other ways of speech perception. • The motor theory developed by Liberman (1985) assumes domain-specific approach to speech. • This theory suggests that speech perception is tightly coupled with speech production • While acoustics of phonemes lack invariance, the motor gestures to produce the speech is invariant and can be accessed in speech perception. • Another theory developed by Tallal assumes that speech and language are domain-general. • In this theory left-hemisphere language organization is not result of domain-specific development, but results from domain general bias of the left hemisphere for decoding rapidly changing sounds (such as those contained in speech). • It is likely that the neural system uses a combination of domain-specific and domain-general processing for speech perception.

  37. Speech perception • A process model for word comprehension. • Language areas.

  38. Speech perception Brain response to: Words Pseudowords Reversed speech • Binder and colleagues (1997) studied activation of brain areas to words, reverse speech and pseudowords and found that Heschl’s gyrus and the planum temporale were activated similarly for all stimuli. • This supports the notion of hierarchical processing of sounds with Heschl’s gyrus representing early sensory analysis. • Speech signals activated larger portion of auditory cortex than non-speech sounds in posterior superior temporal gyrus and superior temporal sulcus, but there was no difference in activation between words, pseudowords and reversed speech. • The conclusion is that these regions do not reflect semantic processing of the words but reflect phonological processing of the speech sounds.

  39. Speech perception and production • Speech perception and production are tightly coupled. • One explanation is that when we speak we hear our voice. • Wernicke proposed a model for language processing that links a pathway from auditory speech perception to motor speech production • The verbal signal enters the primary cortex (A) and then Wernicke’s area (WA) • The response will be formulated in Broca’s area (B) and the primary motor cortex (M). • We can listen and respond to our own speech using the same brain regions. • Producing internal response to a question will result in silent speaking to ourselves.

  40. Phonetic foils Damage to speech perceptual system • Damage to speech perceptual system may be caused by strokes that block the blood flow to the brain area and cause death of neurons. • When the stroke impairs the language functions it is called aphasia. • Paul Broca discovered aphasia in the region in frontal lobe important for speech production. • Carl Wernicke discovered a region in temporal lobe important for speech perception. • Experiments by Blumstein tested phonetic deficits and semantic deficits by providing patients with four choices in the test: • correct word, semantic foil, phonetic foil and unrelated foil (e.g. peas, carrots, keys, and calculator)

  41. Learning and plasticity • An important theme in studying human cognition is to find out how new information is encoded during learning and how the brain adapts – plasticity. • Much of what is known about plasticity of the auditory system is due to deprivation in animal study. • Both cochlea and brainstem are organized tonotopically and this organization is reflected in auditory cortex. • After cochlea or brainstem are lesioned some frequencies are no longer transmitted to auditory cortex and then cortex is studied for changes reflecting neural plasticity. • Changes in neural response in auditory cortex were observed in human after sudden hearing loss. • Children with hearing loss showed some maturational lag comparing to typical development, however after having cochlear implants, their auditory system continued to mature in a typical fashion. • This indicates plasticity of the auditory cortex.

  42. Learning and plasticity • Plasticity due to learning was observed in laboratory animals using classical conditioning – presented tones were paired with mild electrical shock so the animal learned sounds more relevant to survival (avoiding shock). • Plasticity related changes were more pronounced for higher motivational levels. • Trained tones were 4.1-8kHz and • motivational levels were high (red) medium (black) and low (blue) Untrained Trained Cortical area change for the desired signal frequency for different motivational levels

  43. Auditory awareness • Auditory system is the last to fall asleep and the first to wake up. • People in sleep respond to their names better than to other sounds. • Figure compares responses in auditory cortex during awaken and sleep states.

  44. Auditory imagery Brain areas active for imagined sounds • Sounds are played in our head all day even if we do not hear them. • Some are voluntary and uncalled for like a melody or your inner voice. • Some are planned like when you rehearse a verse or a telephone number in your head. • Halpern and colleagues (2004) showed that non-primary auditory cortex is active during imagined (and not heard) sounds.

  45. Auditory imagery • A related results were obtained by Jancke and colleagues (2005). • They used fMRI images to compare neural responses to real sounds and to imagined sounds. • Imagined sounds activate similar regions in auditory cortex as the real ones.

  46. Summary • We discussed organization of the acoustic system • Learned sound and hearing basics • Traced auditory pathways • Analyzed organization of auditory cortex • Observed functional mapping of auditory processing • Discussed sound and music perception • Effect of learning on sound processing • Research on animals confirmed existence of ‘what’ and ‘where’ pathways in auditory system, however these pathways may be organized differently in humans. • When you hear uncalled melody in your head, think which of your brain areas are activated.

More Related