280 likes | 762 Views
Speech tools. Jean-Philippe Goldman 03.03.2004. Two questions. What kind of data ? Which task ?. What kind of data ?. Speech content (noise, multivoice,…) Data File Sound/Transcription/PitchCurve Sampling/Quantization 16k 12k 8k 4k 8bit
E N D
Speech tools Jean-Philippe Goldman 03.03.2004
Two questions • What kind of data ? • Which task ?
What kind of data ? • Speech content (noise, multivoice,…) • Data File • Sound/Transcription/PitchCurve • Sampling/Quantization 16k 12k 8k 4k 8bit • Size 16k16bit,256kbps 1.9Mo/mn 115Mo/h • Format • Sound: wav, wma, mp3, ogg, aiff, aifc, au, vox, raw, sd, CSL, Ogg/Vorbis, NIST/Sphere • Transcription: HTK, TIMIT, TextGrid, Phondat • Number of files
Which task ? • Visualization and Edition: • Record, Play, edit, mix, add effects • Analysis: • spectral, pitch • Speech manipulation: • Filtering, mixing, adding effects, prosodic manipulation • Annotation: • segmentation, labeling • Scripting: • Batch, communication with outside • Plotting
Examples of tasks • build stimuli for an experiment (i.e. cross-splicing) • manage a speech database for a TTS engine • create a prosodic database • analyze speech corpus from experiment recordings • verify/correct an automatic segmentation
Two questions • What kind of data ? • Which task ? Two rules • there is no unique tool to do everything • there are plenty of ways to do one thing
Visualization/Edition Analysis Speech manipulation Annotation Scripting Plotting Supported format Platform/installation Evolution/community Accessibility Price Tool features
Softwares • Goldwave (audio editor) • Esps Xwaves (routines + visual.) • Praat (speech analysis) • Wavesurfer (speech editor) • Transcriber (annotation tool) • Matlab (general purpose soft) • OGI speech tools (routines + app. dev.) • …winpitch, pitchworks, phonedit, cooledit…..
Goldwave • self-defined as “top rated, professional digital audio editor”
Goldwave • pros : edition (good gestion of memory for big files), many FX, noise reduction, real-time spectrum and VU meters, various formats, batch conversion, chain effects, easy interface • cons: nothing for speech (pitch, formant), windows only, no scripting • Good for file edition not for speech
Esps - Waves • Developed by Entropic + AT&T. Now public • Comp.speech FAQ says: • Esps: comprehensive set of speech analysis/processing tools • Waves is a graphical front-end for speech processing (waveforms, spectrograms, pitch) includes a signal labeling utility
Esps – waves • pros: powerful, designed for big files, • cons: UNIX only (free BSD), not standard formats, requires programming skills, development has stopped
Praat • Developed by P.Boersma and D.Weenink at the Institute of Phonetic Sciences, University of Amsterdam • general purpose speech tool : edition, segmentation and labeling, prosodic manipulation
Praat • pros: designed for speech analysis (not only sound edition or spectrogram visualization), nice GUI, scripting, active development and community, prosodic manipulation • cons: limited scripting language, native format of transcription and pitch files
WaveSurfer • Open Source tool for sound visualization and manipulation • speech/sound analysis and sound annotation/transcription • platform for more advanced/specialized applications: extending WaveSurfer with new custom plug-ins or embedding WaveSurfer visualization components in other applications • Requires SnackToolKit
Transcriber • Authors: C. Barras, E. Geoffrois • Relies on Snack (Tcl/tk) • Good for annotation • Nice, simple GUI • No speech analysis
Matlab (Mathworks) • Math. environment • Signal processing toolbox : filter-design, spectral analysis, waveform generation, linear prediction • voicebox (2002) mike.brookes@ic.ac.uk • pitch determination algorithm (2002) Xuejing Sun sunxj@northwestern.edu • colea speech editor (1998) Philip Loizou loizou@utdallas.edu Univ of Texas-Dallas
Matlab (Mathworks) • pros: open, powerful, scripting, excellent plotting • cons: poor speech community, standards, not designed for big files
OGI speech tools/CSLU Toolkit • development started in 1992 in C on Unix, at Center for Spoken Language Understanding (CSLU) at OGI • Includes : • An X windows display tool (LYRE) display, edit speech signal, spectrograms, phoneme labels, and other information • a set of C library routines (LIBNSPEECH), utilities for converting file formats, filtering, Neural Network training, vector-quantizer, database utility to automate speech database related enquiries • a set of PERL Scripts which have been used mainly to automate the use of the OGI Speech Tools. • MAN Pages • RAD rapid application development • points of entry: Package(C), script(tcl), GUI(tk) levels • free for research use
Summary = yes but requires some dev.
Expect to do conversions • Sound files • goldwave (win) • sox (unix) • Transcription files • scripts to convert text-formatted label files
Links • www.goldwave.com • www.speech.kth.se/software/#esps • www.praat.org • www.speech.kth.se/software/#wavesurfer • www.cse.ogi.edu/toolkit • www.mathworks.com (Matlab) • www.lpl.univ-aix.fr/~sqlab/ (phonedit) • www.sciconrd.com/pworks.htm (PitchWorks) • www.winpitch.com (WinPitch) • www.adobe.com (CoolEdit > Audition)