170 likes | 248 Views
Robust Audio Identification for Commercial Applications. Matthias Gruhne ghe@emt.iis.fhg.de Fraunhofer IIS, AEMT, D-98693 Ilmenau, Germany. Overview. What is AudioID? Requirements System Architecture MPEG 7 Recognition Performance Applications Conclusions Demonstration.
E N D
Robust Audio Identification for Commercial Applications Matthias Gruhne ghe@emt.iis.fhg.de Fraunhofer IIS, AEMT, D-98693 Ilmenau, Germany
Overview • What is AudioID? • Requirements • System Architecture • MPEG 7 • Recognition Performance • Applications • Conclusions • Demonstration
What is AudioID? • Identify audio material (artist, song, etc.) by analysis of the signal itself • ”Content-Based Identification” • No associated information required (headers, ID3 tags) • No embedded signals (e.g. watermark), are required • Some knowledge available about music to be identified (reference database) Purpose Conditions
Requirements • High recognition rates (> 95%), even with distorted signals • Robust against various distortions: • volume change, equalization, noise addition, audio coding (e.g. MP3), ... • “analog” artifacts (e.g. D/A, A/D) • Small “signature” size • Extensibility of database (> 106 items) while keeping processing time low(few ms/item) Recognition rate Robustness Compactness Scalability
System Architecture • Signal preprocessing • Extract the “essence” of audio signal • Increase discriminance & efficiency • Temporal grouping of features (super vector) • Statistics calculation (mean, variance, etc.) FeatureExtractor FeatureProcessor
System Architecture • Clustering of processed feature vectors: • further reduce the amount of data • enhance robustness (overfitting) • Add class with associated metadata to database • Compare feature vectors against classes in database by means of some metric • Find class yielding the best approximation • Retrieve associated metadata Class generator Classification
MPEG-7 - Elements for Robust Audio Matching Low leveldata • “AudioSpectrumFlatness” LLD • Derived from:Spectral Flatness Measure (SFM) • Describes “un/flatness” of spectrum in frequency bands (tonal noise) • “AudioSignature” Description Scheme • Statistical data summarization of“AudioSpectrumFlatness” LLD • Textual description in XML syntax “Fingerprint”
MPEG-7 - Benefits • Standardized Feature Format guarantees worldwide interoperability • Published, open format descriptive data can be produced easily • Large MPEG-7 compliant databases expected to be available in near future (incl. “fingerprints”) • Long term format stability/ life time
Recognition Performance- Conditions Conditions • Training and test sets (mostly rock / pop): • 15,000 items • 90,000 items • Spectral Flatness Measure (SFM) • Number of correctly identified items (both “single best” and “within top 10”) Considered feature Classificationperformance
Top 1 /Top 10 Recognition Performance - 15k items • 16 bands • Advanced matching with temporal tracking
Recognition Performance - 90k items ! ! • 16 bands • Advanced matching with temporal tracking
Applications • Retrieve associated metadata by identifying audio content • Automated search of audio content on the Internet • Broadcast monitoring by protocoling the transmission of audio material • Feature based indexing of audio databases (similarity search) • ...
Conclusions • High recognition rates (>99 % tested with 90,000 items) • Robust to “real world” signal distortions • Fast and reliable extraction and classification • Underlying feature specified in MPEG-7 standard ensures worldwide interoperability and licensing available for everyone
Real Time Demonstration: • Demo running on laptop(Pentium III @ 500 MHz) • Local database with 15,000 items(Rock / Pop genre) • Acoustic transmission: mp3 -> D/A -> Speakers -> Noisy Environment -> Microphone -> A/D -> AudioID