180 likes | 354 Views
SPEECH & IMAGE PROCESSING (TSI/LMM - Laboratoire MultiMédia ). Contacts : Frédéric Chartier Tél : +33 1 46 13 31 05 Gwénaël Guilmin Tél : +33 1 46 13 28 35 Fax : +33 1 46 13 25 55 email : frederic.chartier@fr.thalesgroup.com gwenael.guilmin@fr.thalesgroup.com.
E N D
SPEECH & IMAGE PROCESSING (TSI/LMM - Laboratoire MultiMédia) Contacts : Frédéric Chartier Tél : +33 1 46 13 31 05 Gwénaël Guilmin Tél : +33 1 46 13 28 35 Fax : +33 1 46 13 25 55 email : frederic.chartier@fr.thalesgroup.com gwenael.guilmin@fr.thalesgroup.com
Missions • Propose Technical strategy, research innovation and advanced studies • Perform advanced and feasibility studies, demonstrators and SP modules in Thales Com. products • Maximise Efficiency/synergy within Thales Com.for SP R&D • Maintain close links with French administration, SMEs, University laboratories and European research actors • Provide expertise and support for Thales Com. units in its field • Hire and Train young engineers in SP domain • Disseminate new technologies and best practices within Thales Com. • Represent Thales Com. within Thales Common Efficiency Teams
Technical and Technological Challenges Civilian Technologies Software Radio High Data Rate Radio modem Antenna Processing Wireless Telecom Signal & Image Processing Evolutions Multimedia and Internet SIP framework DSP use generalised Electronic Warfare
Team • Multimedia : 11 engineers (4 experts) • Radiocommunications : 16 engineers (5 experts) • Sensor Processing : 26 engineers (5 experts) • Software Development : 16 engineers (2 experts) • 2 technicians, 1 secretary, 8 thesis students 80 persons Active participation to CNRS SP working group memberships in IEEE, SEE and EURASIP 6 patents and 15 publications per year on average
Domains of expertise Low and very low rate speech compression Watermarking JPEG 2000 & Video Codec Multimedia VLF, HF, VUHF and satellite modems Single and multi-carrier modulations Spread spectrum and CDMA Source and channel coding optimisation Spectral efficiency optimisation Modem Antenna diversity & Jammer-interference rejection High resolution direction finding Array optimisation on perturbing platforms Smart antennas and SDMA Antenna Processing Signal Analysis Detection, numbering (energy, cyclic, high order stats., ...) Recognition/identification of modulation and coding schemes Blind demodulation and equalisation Localisation Software radio Digital exciters and receivers, amplifier linearisation
Speech processing • Compression • Low and very low bit rate compression research and development activity • LPC : 800 and 2400 bit/s • HSX : 1200, 2400 and 3200 bit/s • CELP 4.8 kbit/s and TETRA (4567 bit/s) • VLBR : 200 to 400 bit/s (combining recognition and synthesis) • Wide Band Low Bite rate speech Coder : 3200 bit/s • Knowledge/Implementation of higher bit rate coders, but no research activity • Vocal Activity Detector, echo cancellation. • Noise reduction : passive pre-processing or processing included in vocoder • System optimisation of channel and source coding • Best adaptation to service and system/propagation environment
MOS 5 G711 (72) G 728 (92) G 729 (96) WBLBR G 723-1 (96) ST 4591 (02) 4 ST 4591 (02) G726 (88) VLBR GSM (87) HSX 3 FS 1016 (90) ST4209 (83) ST 4198 (87) 2 ST 4479 (93) LPC 10 (83) 1 1k 2k 4k 8k 16k 32k 64k Speech processing
Speech processing Indicative Quality G.711 (64 kb/s) G.721 (32 kb/s) 5 G.728 (16 kb/s) G.729 (8 kb/s) G.723 (5.3 kb/s) Minimum qual. for high cost application 4 Consumer quality 3 Minimum qual. For low cost application HSX (2,4 kb/s) 2 LPC 10 (2,4 kb/s) 1 2000 1980 1990 1970
THC Major achievements • Standards • THC coders chosen for STANAG 4479 (800 bit/s) in 1994 • ETSI TETRA (4567 bit/s) for PMR (licence to Motorola, Nokia, Philips/Simoco,..) • Present participation at NATO for new low bit rate coder STANAG 4591 (1200 and 2400 bit/s, associated noise reduction) • Products • LPC10e implementation within Spartacus, Syracuse, HF processor • Vocoder ASIC for the PR4G (LPC 800, LPC10e 2400, ACELP 4800) • Vocoders (SW) for the PR4G/VS4 (LPC 800, LPC10e 2400, ACELP 4800) • HSX in Sawari, Synthesis in a consumer pager (Info-realité) and analysis in PC, OKI (Asic), Leo (Singapore). • Tetra coders in base-station for ISR • G723.1 and G726 in ATM switch
Existing vocoders at THC Vocoder STANAG 4479, 800 b/s STANAG 4198, 2400 b/s LPC HSX 2400 b/s HSX 1200 b/s ACELP, 4800 b/s TETRA, 4567 b/s ITU G723-1, 6.4/5.3 kb/s ITU G726, 16,24,32,40 kb/s ITU G728, LD CELP 16kb/s ITU G729, CS ACELP 8kb/s GSM STANAG 4591 (2400/1200 b/s) Simulation For/C/FixC For/C/FixC C/FixC C/FixC For/C/FixC FixC C/FixC C C/FixC FixC C C/FixC TR PC x x x x x x x x x C25 C50 x x x x C54x x(*) x(*) x xS x(*) x ASIC x x x C30 C40 x C62 x x x sharc x x Product PR4G, PHF, Sawari,info Tel PR4G, PHF,Spartacus, Syr. II Aztec, Sawari, OKI InfoTelecom, OKI PR4G, PHF, Spartacus Rameau, ISR ATM switch ATM switch ATM switch
Cooperations • Sherbrooke University (Canada) • ACELP specialists • University of Rennes (noise reduction) • hand-free telephone • ENST Paris & ESIEE • Very Low Bit Rate Speech Coding (combining recognition and synthesis). • Wide Band Low Bite rate speech Coder. • Fraunhofer institute • MPEG II layer 3, MPEG 4 audio coders
VLBR speech Codec • VLBR speech Codec • Thanks to the developed speech encoding solution, the system will be used on Very-Low-Bit-Rate channel, lower than 400 bits/s. • This technology could be also used to: • speech recognition, • speaker/language identification,
VLBR speech Codec • Very Low Bit Rate speech coding by indexing natural speech units of variable size • Solution based on a new concept making use of various speech processing technologies • Temporal Decomposition (TD) for robust segmentation of speech • HMM modelling for determination of speech units • Harmonic/Stochastic modelling for speech re-synthesis by concatenating identified speech units • Jan Cernocky, PhD Thesis (Orsay) 1998
Codebook HMM models Determination of optimal synthesis units (DTW) … Prosody Encoding HMM-based Recognition Codebook synthesis units VLBR speech Codec Input speech signal VLBR Encoder Prosody Analysis Spectral Analysis CODER Pitch and Energy Profiles HMM index Index of synthesis unit
Pitch and Energy Profiles Index of synthesis unit HMM index Prosody Decoding Extraction of synthesis units HNM Synthesis Codebook synthesis units DECODER Output synthesised speech signal VLBR speech Codec VLBR Decoder
WLBR speech Codec • WLBR speech Codec algorithms • Parametric Wide Band speech coder (from 50Hz to 7000Hz). • Bit-rate: below 4 kbits (3200 bit/s & 3600 bit/s) • Wide Band speech pre-processing • Noise Reduction, spectral compression, temporal speed modification • Voice activity detection.
WLBR speech Codec • Intérêt pour applications professionnelles: • offrir un plus produit (il n’existe pas encore de codeur de ce type actuellement), la cible visée étant très intéressée par ce genre d ’amélioration. • Le débit reste compatible des réseaux HF/VUHF • « Simple » évolution du codeur HSX (implémentation maîtrisée, C fixe disponible) • Intérêt pour applications civiles: • La seule norme civile existante en WB (AMR WB) offre un débit supérieur à 10 kbit/s. Les utilisateurs vont demander de plus en plus une qualité WB. • Notre offre produit: codeur propriétaire WB à très bas débit, marché potentiel: portail web, enregistreur Numérique, PDA, radio numérique.
Ordre 16 0 fc 7kHz WLBR speech Codec • Codage large bande (0-7kHz) • Amélioration de la qualité perçue • Aide à la discrimination des fricatives • Rehaussement de l’intelligibilité • Extension pleine bande (full band) • Modèle paramétrique sur toute la bande (AR ordre 16) • Choix algorithmiques Longueur de trame : 360 éch. Voisement sur 0-4kHz 4 fréquences de coupure Bande haute non voisée