440 likes | 726 Views
音频信号处理(基础篇). 0 开胃酒. 参考文献. 1) 本领域的学科发展 2) 本领域的技术发展. 参考文献. 网 络. 哪些素质(能力)是重要的?. 一个项目的 研发 过程. 有什么. 英语. “ 物理”概念 思路. 是什么. 数学. 为什么. 工具. 怎么做. 1 入手:实验的原材料. Wav 文件. 例子: keep friends with.wav. 格式区. 数据区. 偏移地址 字节数 数据类型 内 容 00H 4 char "RIFF" 标志
E N D
0 开胃酒 参考文献 1) 本领域的学科发展 2) 本领域的技术发展
参考文献 网 络
哪些素质(能力)是重要的? 一个项目的研发过程 有什么 英语 “物理”概念 思路 是什么 数学 为什么 工具 怎么做
1 入手:实验的原材料 Wav文件 例子:keep friends with.wav
格式区 数据区
偏移地址 字节数 数据类型 内 容 00H 4 char "RIFF"标志 04H 4 long 文件长度,'File length'-8, so, is 'data length'+0x24 (File length = data length + 0x2c) 08H 4 char "WAVE"标志 0CH 4 char "fmt"标志 10H 4 过渡字节(不定) 14H 2 int 格式类别(10H为PCM形式的声音数据) 16H 2 int 通道数,单声道为1,双声道为2 18H 4 long 采样率(每秒样本数) 1CH 4 long 波形音频数据传送速率,其值为通道数×每秒数据 位数×每样本的数据位数/8。播放软件利用此值可 以估计缓冲区的大小。
20H 2 int 数据块的调整数(按字节算的),其值为通道数× 每样本的数据位值/8。播放软件需要一次处理多 个该值大小的字节数据,以便将其值用于缓冲区的 调整。 22H 2 每样本的数据位数,表示每个声道中各个样本的数 据位数。如果有多个声道,对每个声道而言,样本 大小都一样。 24H 4 char 数据标记符"data" 28H 4 long 语音数据的长度
typedef struct { char Riff[4]; unsigned long sizeOfFile; char WAVEfmt[8]; unsigned long sizeOfFmt; short int wFormatTag; short int nChannels; unsigned long nSamplesPerSec; unsigned long navgBytesPerSec; short int nBlockAlign; unsigned short nBitPerSample; char Cdata[4]; unsigned long sizeOfData; } HeadOfWave;
几个说明。 * 文件长度和数据长度 * 关键量:采样率/声道数/量化模式/量化bit * navgBytesPerSec和nBlockAlign的计算 * 程序举例 和 说明
2 基本概念 采样率 量化bit
2.1 采样率 48k/44k/32k/22k/16k/11k/8kHz 两条线: 44k/22k/11k 32k/16k/8k 为什么是这些值?
2.2 音频信号的带宽 文件 keep_friend_with.wav (采样率44kHz) 代表频率,32是22kHz 7kHz
22kHz 4kHz
上述文件很特殊。采集环境很好。 一般认为: * 语音(speech) 300-3400kHz,采样率8kHz * 宽带语音(wide-band speech) 带宽7kHz(50-7k),采样率16kHz * 音频(audio) 带宽20kHz(20-20k),采样率44.1kHz,48kHz
2.2 音频信号的带宽 采样率为什么是那些值? Nyquist Sampling Theorem 为什么44.1kHz? 20kHz ->(Nyquist) 40kHz->(Rolloff from passband to stopband ) 44kHz -> 44.1kHz?
At the time the choice was made, only recorders capable of storing such high rates were VCRs. NTSC: 490 lines/frame, 3 samples/line, 30 frames/s = 44100 samples/s PAL: 588 lines/frame, 3 samples/line, 25 frames/s = 44100 samples/s Prof. Brian L. Evans Dept. of Electrical and Computer Engineering The University of Texas at Austin
Listen to the sounds… keep_friends_with(44k_mono).wav keep_friends_with(22k_mono).wav keep_friends_with(16k_mono).wav keep_friends_with(11k_mono).wav keep_friends_with(8k_mono).wav
对语音信号,8kHz/11kHz 采样率是一个效果; 16kHz采样率以上是一个效果。 所以,对语音信号而言,分为voice/wideband speech就可以了。
2.2 量化bits 线性量化/非线性量化 量化信噪比:6b dB。 6.02b + 1.76 复读机规范:声音从磁带上复读到芯片上,再用耳机听芯片上的声音时有用信号和噪声之间的幅度差,标准规定≥34dB。
Listen to the sounds… keep_friends_with(16k_mono).wav keep_friends_with(16k_mono)_8b.wav 8bit线性量化的文件,明显带了背景噪声。 从经验出发,可接受的量化bit,应该是?
入手:实验的原材料 16kHz or 8kHz采样率的语音文件; 44.1kHz采样率的音乐文件; 16bit or 14bit 线性量化;
3 我常用的音频处理的工具 VC6.0, using c; matlab cooledit
Math. environment Signal processing toolbox : filter-design, spectral analysis, waveform generation, linear prediction voicebox Matlab (Mathworks)
pros: open, powerful, scripting, excellent plotting cons: poor speech community, standards, not designed for big files Matlab (Mathworks)
其它的语音分析工具? • Goldwave (audio editor) • Esps Xwaves (routines + visual.) • Praat (speech analysis) • Wavesurfer (speech editor) • Transcriber (annotation tool) • OGI speech tools (routines + app. dev.) • …winpitch, pitchworks, phonedit…..
self-defined as “top rated, professional digital audio editor” Goldwave
pros : edition (good gestion of memory for big files), many FX, noise reduction, real-time spectrum and VU meters, various formats, batch conversion, chain effects, easy interface cons: nothing for speech (pitch, formant), windows only, no scripting Good for file edition not for speech Goldwave
Developed by Entropic + AT&T. Now public Comp.speech FAQ says: Esps: comprehensive set of speech analysis/processing tools Waves is a graphical front-end for speech processing (waveforms, spectrograms, pitch) includes a signal labeling utility Esps - Waves
pros: powerful, designed for big files, cons: UNIX only (free BSD), not standard formats, requires programming skills, development has stopped Esps – waves
Developed by P.Boersma and D.Weenink at the Institute of Phonetic Sciences, University of Amsterdam general purpose speech tool : edition, segmentation and labeling, prosodic manipulation Praat
pros: designed for speech analysis (not only sound edition or spectrogram visualization), nice GUI, scripting, active development and community, prosodic manipulation cons: limited scripting language, native format of transcription and pitch files Praat
Open Source tool for sound visualization and manipulation speech/sound analysis and sound annotation/transcription platform for more advanced/specialized applications: extending WaveSurfer with new custom plug-ins or embedding WaveSurfer visualization components in other applications Requires SnackToolKit WaveSurfer
Authors: C. Barras, E. Geoffrois Relies on Snack (Tcl/tk) Good for annotation Nice, simple GUI No speech analysis Transcriber
development started in 1992 in C on Unix, at Center for Spoken Language Understanding (CSLU) at OGI Includes : An X windows display tool (LYRE) display, edit speech signal, spectrograms, phoneme labels, and other information a set of C library routines (LIBNSPEECH), utilities for converting file formats, filtering, Neural Network training, vector-quantizer, database utility to automate speech database related enquiries a set of PERL Scripts which have been used mainly to automate the use of the OGI Speech Tools. MAN Pages RAD rapid application development points of entry: Package(C), script(tcl), GUI(tk) levels free for research use OGI speech tools/CSLU Toolkit
Summary = yes but requires some dev.