Access audio data in real time and apply to speech recognition

Access audio data in real time and apply to speech recognition Final Exam Project By Hesheng Li Instructor: Dr.Kepuska Department of Electrical and Computer Engineering

Overview • Introduction • Three models to access live audio data • How to get audio data by using low level API model? • Application in speech recognition • Comparison and Analysis • Conclusion

Introduction Why ? How? Live audio data access has a Wide application !

Three model to access live audio data • High level Digital Audio API-----MCI • DirectSound • Low level Digital Audio API----WaveX

High level Digital Audio APIMCI • MCI The media control interface (MCI) provides standard command for playing multi-media device and recording multi-media resource files • Two different ways are possible to send devices a command. 1. Command message interface 2. Command string interface

Command message interface • Passing binary values and structures to an Audio device is referred to as using the "Command message interface“ • We use the function mciSendCommand() to send commands using this approach. • Example waveParams.lpstrElementName = "C:\\WINDOWS\\CHORD.WAV"; mciSendCommand(0, MCI_OPEN, MCI_WAIT|MCI_OPEN_ELEMENT|MCI_OPEN_TYPE| MCI_OPEN_TYPE_ID, (DWORD)(LPVOID)&waveParams)

Command string interface • Passing strings to an Audio device is referred to as using the "Command string interface“ • We use the function mciSendString() to send commands using this approach. • Example mciSendString(“ open C:\\WINDOWS\\CHORD.WAV type waveaudio alias A_Chord", 0, 0, 0)))

MCI Some other command: Command message interface: 1.Start record by “MCI _REOCRD” 2.Write data to wave file by “MCI _SAVE” 3.Stop by “MCI _STOP” 4.Play by “MCI_PLAY” Command string interface: 1.Play by "play %s %s %s" 2.Stop by “stop %s %s %s"

DirectSound • Like other components of DirectX,DirectSound allow you to use the hardware in the most efficient way Here are some other things that DirectSound makes easy: • Querying hardware capabilities at run time to determine the best solution for any given personal computer configuration • Using property sets so that new hardware capabilities can be exploited even when they are not directly supported by DirectSound • Low-latency mixing of audio streams for rapid response • Implementing three dimensional (3-D) sound

Directsound • DirectSound playback is built on the IDirectSound Component Object Model (COM) interface and on the IDirectSoundBuffer interface for manipulating sound buffers. • DirectSound capture is based on the IDirectSoundCapture and IDirectSoundCaptureBuffer COM interfaces.

Low level Digital Audio API----WaveX Start recording Open audio device Prepare structure for recording Data processing Close audio device Release structure

Open Audio Device • There are several different approaches you can take, depending upon how fancy and flexible you want your program to be. • Pass the value ”Wave mapper ” to open "preferred audio input/output device. • Call function to get the list of the devices and then open the audio device which one you want • WaveInOpen() and WaveOutOpen()

EXAMPLE result = waveInOpen(&outHandle, WAVE_MAPPER, &waveFormat, (DWORD)myWindow, 0,CALLBACK_WINDOW); if (result) { printf("There was an error opening the preferred Digital Audio in device!\r\n"); }

EXAMPLE iNumDevs = waveInGetNumDevs(); for (i = 0; i < iNumDevs; i++) { if (!waveOutGetDevCaps(i, &woc, sizeof(WAVEOUTCAPS))) { printf("Device ID #%u: %s\r\n", i, woc.szPname); } } result = waveInOpen(&outHandle,iNumDevs,&waveFormat, (DWORD)myWindow, 0,CALLBACK_WINDOW); Return

Structure wavefomatex

Example WAVEFORMATEX waveFormat; /* Initialize the WAVEFORMATEX for 16-bit, 44KHz, stereo */ waveFormat.wFormatTag = WAVE_FORMAT_PCM; waveFormat.nChannels = 2; waveFormat.nSamplesPerSec = 44100; waveFormat.wBitsPerSample = 16; waveFormat.nBlockAlign =waveFormat.nChannels* (waveFormat.wBitsPerSample/8); waveFormat.nAvgBytesPerSec=waveFormat.nSamplesPerSec * waveFormat.nBlockAlign; waveFormat.cbSize = 0; Return

Recording engine Call back function AddInBuffer() msg waveInStart() buffer1 buffer2 buffer3 buffer4 Audiodevice Data proccesing

Recording engine Call back function Circular buffer msg buffer2 buffer3 buffer4 buffer1 Audiodevice Data processing

1+3+1 Three Important methods: • prepare a buffer for wave-audio input function: WaveInPrepareHeader() • Send the buffer to audio device,when the buffer is full the application is notified function: WaveInAddBuffer() • Start recording function: WaveInStart()

Example if(MMSYSERR_NOERROR != waveInPrepareHeader(m_hWaveIn, &waveheader, sizeof(WAVEHDR))) { printf(“prepare buffer faliure!”) } waveInAddBuffer(m_hWaveIn, &waveheader, sizeof(WAVEHDR)); waveInStart(m_hWaveIn);

Message • Windows messages: MM_WIM_DATA:this message is sent to a window when the data is present in the buffer and buffer is being returned to the application Other messages: MM_WIM_CLOSE、MM_WIM_OPEN、 MM_WOM_CLOSE MM_WOM_DONE、MM_WOM_OPEN • Call back function messages: WIM_DATA: this message is sent to the given call back function when the data is present in the input buffer and the buffer is being returned to the application Other messages: WIM_CLOSE、WIM_DONE、WIN_OPEN、 WOM_CLOSE、WOM_DONE、WOM_OPEN

Message Example • Call back message • waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format, • waveInProc, 0L, CALLBACK_FUNCTION ) • waveInProc(…..) { • switch(msg) { • case WIM_OPEN: …………. • break, • case WIM_DATA: …………. • break, • case WIM_CLOSE: ………… • Window message • waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format, • hWnd, 0L, CALLBACK_WINDOW ) Return

Application in Real-time Key Word Recognition To be continued….

Application in Real-time Key Word Recognition • Practical problems when we apply this model in speech recognition • Asynchronism • Efficiency

Application in Real-time Key Word Recognition Call back function buffer1 msg CALL buffer2 buffer3 buffer4 …. buffer500 Data proccessing

Comparison and Analysis • Mci is the easiest model ,very convenient,but offers the least amount control,”FileLevel” • waveX is more complicit ,but can flexible control audio data,”BufferLevel” • Direct sound is the most efficient method,but most complicit, ”BufferLevel”

Conclusion • Apply MCI to audio document part in “video conference” • Apply WaveX to real time speech recognition and also to “video conference” • Direct sound is widely used in computer game design

Access audio data in real time and apply to speech recognition