1 / 29

Radiation Directivity of Human and Artificial Speech

Radiation Directivity of Human and Artificial Speech. Teemu Halkosaari 1 & Markus Vaalgamaa 2 1 Helsinki University of Technology, Laboratory of Acoustics and Audio Signal Processing 2 Nokia Technology, Audio Quality e-mail: markus.vaalgamaa@nokia.com

yagil
Download Presentation

Radiation Directivity of Human and Artificial Speech

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Radiation Directivity of Human and Artificial Speech Teemu Halkosaari 1 & Markus Vaalgamaa 2 1 Helsinki University of Technology, Laboratory of Acoustics and Audio Signal Processing 2 Nokia Technology, Audio Quality e-mail: markus.vaalgamaa@nokia.com Workshop of Wideband Speech Quality in Terminals and Networks: Assessment and Prediction, 8th-9th June 2004, Mainz, Germany

  2. Agenda • Motivation and Background • Goal • Measurements • Analyzes • Modelling • Results • Improvements to telephonometry • Conclusions Part I: WHY? Part II: The STUDY Part III: The OUTCOME

  3. Part IWHY? • Motivation and Background • Goal

  4. Part I: WHY? 1. Motivation and Background • Artificial mouths and HATS were originally designed for narrowband speech measurements with big landline handsets. • In past years, the size of phone has become smaller and use of headsets is becoming common. • In future, wideband phones are coming… • It seems that HATS and artificial mouths will be used in future, although • we do not know exactly the directivities of artificial mouths or • human mouth especially on near field locations • Additional interesting topic is that how does the speech content affect to the directivity? • Extensive literature on the subject is not available • Standardization is inadequate • ITU-T P.58 Head and torso simulator for telephonometry • ITU-T P.51 Arficial mouth • narrow band: 300-3400Hz, wide band: 150-7000Hz

  5. Part I: WHY? 2. Goal • Measure frequency responses for artificial mouths to positions where microphones of handsets and headsets are situated. • Repeat the same measurements for a group of test subjects • Most important part of the study is to analyze the difference between artificial mouths and an average person. • In practice a set of transfer functions is acquired.

  6. Part IIThe STUDY • Measurements • Analyzes • Modelling • Results

  7. Part II: The STUDY 3. Measurements: Arrangements • Two separate systems • Test subjects were measured in Helsinki University of Technology, anechoic chamber, 8 channel parallel recording system. • Artificial mouths were measured in Nokia facilities in Salo, Finland, anechoic chamber, Audio Precision APwin measurement system, 2 channel parallel MLS measurement. • Probes • B&K 1/8 and ¼ inch measurement microphones • 8 Sennheiser KE-4-211-2 electret microphones • In addition, some phones: large and small handset, and accessories: boom headset, mono and stereo headsets, were measured

  8. Part II: The STUDY 3. Measurements: Positions • Referred to • lip plane and its perpendicular • on chest: axes from throat downwards and sidewards • Two reference positions for transfer functions • MRP = 25 mm in front of mouth for artificial mouths • 0.5 m in front of mouth for test subjects. • Measurement positions: On chest Near cheek In front of mouth

  9. Part II: The STUDY 3. Measurements: Setup for artificial mouths • Two B&K measurement microphones. • Two impulse responses parallel (reference and point of interest) • Apwin measurement system, MLS

  10. Part II: The STUDY 3. Measurements: Positioners • Helmet and chest positioner were developed to keep microphones in right positions.

  11. Part II: The STUDY pre-amp E.A.A. 2 channels pre-amp E.A.A. 2 channels pre-amp E.A.A. 2 channels E.A.A. pre-amp 3. Measurements: Setup for test subjects (1) • Parallel 8 channel audio recording at 32kHz sample rate Data acquisition system 8 channels 8 channels Biasing unit IOtech WaveBook/516 LapTop computer . . . . LPT A/V-monitoring and Test Subject guidance . . 8 channels Microphones Mixer Loudspeaker Test subject or HATS Camera VHS Recorder Display Anechoic chamber Monitoring room

  12. Part II: The STUDY 3. Measurements : Setup for test subjects (2) • Small anechoic chamber at Helsinki University of Technology and monitoring system

  13. Part II: The STUDY 3. Measurements: Test subjects • Test subjects: • 8 male, 5 female, mainly young students, Finnish as mother tongue • Speech material: • ”Kaksi vuotta sitten kävimme ravintola Gabrielissa Helsingissä ja söimme siellä padallisen fasaania banaanilla höystettynä” (includes all Finnish phonemes) • Translation: “Two years ago we went to restaurant Gabriel and we ate there a pot of pheasant larded with banana “ • Separate phonemes \n m h s r\ and \a e i o u y ä ö\ • Different speech volumes: loud, normal, and silent

  14. Part II: The STUDY 4. Analyzes • Starting point: • parallel 8-channel 32kHz 16bit raw audio data. • Impulse responses in measurements for artificial mouths -> directly converted to transfer functions. • Phonemes and active speech were segmented manually (CoolEdit Pro, around 10000 manual marks by the way). • All analyzes were made in Matlab environment: • SNR and Coherences to ensure reliability • Power spectra estimates • Transfer function estimates • Confidential intervals consideration for transfer functions • FFT 1024 samples that corresponds to 32 ms, Hanning-window, 50% overlap • Averaging and Smoothed to 1/3-octave bands • Speech was averaged weighting with coherence

  15. Part II: The STUDY 5. Modelling • In addition to measurements two models were built based on acoustic field theory • Results were validated by modelling: Sphere and piston source • Head = sphere • Mouth = radially vibrating round surface of the sphere (piston). • An infinite baffle can be added to simulate the upper body. • Mirror source method • Numerical implementation in Matlab

  16. Part II: The STUDY 6. Results: HATS Nearfield • Measured and modelled results for B&K 4128 HATS • Transfer functions from Mouth Reference Position (MRP) to other positions • Transfer functions from MRP to near cheek positions: • Large handset (blue circles) • Small handset (red squares) • Boom headset (green triangles) • Corresponding models (black dashed) • Transfer functions from MRP to chest positions: • Near throat with vest (blue circles, solid) • Near throat without (blue circles, dashed) • Middle of chest with vest (red squares, solid) • Middle of chest without vest (red squares, dashed) • Corresponding models (black dashed)

  17. Part II: The STUDY 6. Results: HATS Far field (1) • Measured results with B&K 4128 HATS • Transfer function from MRP to positions in the front of mouth: • 10 cm (blue circles) • 25 cm (red squares) • 50 cm (green triangles) • 100 cm (violet quadrangles) • with (solid) and without vest (dotted)

  18. Part II: The STUDY 6. Results: HATS Far field (2) • Measured and modelled for B&K 4128 HATS • Transfer function for positions in front of mouth on 50 cm distance. • Vest on (blue circles) • Vest off (red squares) • Bare head, torso unattached (green triangles) • Model with the baffle (black dotted, thin) • Model without the baffle (black dasked, thick)

  19. Part II: The STUDY 6. Results: Human – B&K HATS • Comparison between transfer functions, referred to 0.5m in front of mouth position. Positions, colors and symbols are same as in the slide 16. • TFHuman / TFHATS • 1/3 mouth cross-section area ratio in models • 0 dB values indicate that there is no difference between human and HATS • Positive values indicate that HATS is more directive, i.e. has less SPL on measured microphone positions on next to cheek or on chest. Next to cheek On chest

  20. Part II: The STUDY 6. Results: B&K 4227 and Head Acoustics HMS II.3 HATS • Comparison between transfer functions, referred to 0.5m position • TFHuman / TFHATS • Positive values indicate that HATS is more directive, i.e. has less SPL on measured microphone positions on side. • Transfer functions from MRP to near cheek positions: Large handset (blue circles), Small handset (red squares), and Boom headset (green triangles) Next to cheek for B&K 4227 Next to cheek for HA HATS

  21. Part II: The STUDY 6. Results: Speech content • Speech content changes the directivity pattern • Speech style: loud – silent etc.. • Articulation: continuous phonation – natural speech • Phonemes (Smoothed Human/HATS difference for mic. pos. 1.3 below with 95% confidence intervals) • Mouth aperture size = steepness of the response on high frequencies • Statistical approach: ANOVA model did not apply = speech data scattered TFs human vs. HATS by vowel groups

  22. Part II: The STUDY 6. Results: Reliability • Beforehand by careful design of the measurement processes • Reliability of human measurements: • SNR and Coherence (estimates) • SNR on wide-band: >25dB • Coherence on narrow-band: >0.8, on wide-band: >0.65 • Normal distribution hypothesis • 95% confidential intervals on wide-band: <±0.5dB • Reliability of HATS measurements is better than for humans i.e. very good (and thus omitted here)

  23. Part IIIThe OUTCOME • Improvements to telephonometry • Conclusions

  24. Part III: The OUTCOME 7. Improvement proposal: Equalization (1) • Artificial mouths (especially B&K 4128) does not correspond to average test subject • Difference curves are ”well-behaving” for B&K HATS -> A low order filter could equalize the difference near cheek. • Example: Yulewalk recursive IIR filter design (least-squares method) was applied in Matlab for smoothed and averaged differences (B&K HATS). Difference and 95% confidence limits between human and B&K HATS near cheek positions: Large handset (blue circles), Small handset (red squares), and Boom headset (green triangles) Average of curves on right (blue) IIR N = 3 (red) IIR N = 7 (green) 1.3 1.1 1.4

  25. Part III: The OUTCOME 7. Improvement proposal: Equalization (2) • Smoothed difference curves and 95% confidence limits between human - B&K 4227 (left) and Human - Head Acoustics HMS II.3 (right) on near cheek positions • Large handset (blue circles) • Small handset (red squares) • Boom headset (green triangles) Human - B&K 4227 Human - HA HATS

  26. Part III: The OUTCOME 7. Improvement proposal: Mouth size • B&K 4128 HATS • Positions 1.3 (left) and 1.1 (right) where measured with 3 different mouth sizes • small, 15 mm x 11 mm, adaptor partly blocked (blue circles) • normal, 30 mm x 11 mm, adaptor (red squares) • big, 42 mm x 16 mm, mouth adaptor removed (green triangles) Position 1.3 Human - B&K HATS Position 1.1 Human – B&K HATS

  27. Part III: The OUTCOME 7. Improvement proposal: Vest • About 2cm thick vest should be used in B&K HATS measurements. How does it change the directivity in position on the chest? • Let’s take off the vest and compare to average test subject (dashed lines) • The near-chest reflection shifts to higher frequency • Difference between human and HATS measurements on chest positions: • Near throat with (blue circles, solid) and without vest (blue circles, dashed) • Middle of chest with (red squares, solid) and without vest (red squares, dashed) 4.1 vest on (solid) vest off (dashed) 4.2

  28. Part III: The OUTCOME 8. Conclusions • Key results: • On high frequencies B&K HATS is too directional. • Near cheek: the difference (Human/Artificial mouth) is slighly increasing by moving away from the lip plane. • On averate the difference between human and artificial mouth is between 1 and 6 dB. • Near chest: the differences in chest reflections are significant. • Speech content affects the directivity. • Reasons: • Mouth cross-section area during speech is smaller than in artificial mouths. Speech content and mouth cross-section area are linked. • The torso and head design of the artificial mouths do not correspond to the acoustical characteristics of a real body. • What could be improved? • Near cheek: simple equalization was proposed. • Improvement of the structure of HATS (mouth size, torso, etc.). This path requires more measurements and analysis. • Vest usage should always be considered carefully.

  29. Agenda • Motivation and Background • Goals • Measurements • Analyzes • Modelling • Results • Improvements to telephonometry • Conclusions Part I: WHY? Part II: The STUDY Part III: The OUTCOME

More Related