710 likes | 884 Views
Sound Source Separation using 3D Correlogram, Fuzzy Logic, and Neural Networks. A RESEARCH PROJECT Eduardo Dias Trama. Table of Contents. INTRODUCTION PROJECT OVERVIEW THE PREPROCESSOR THE LEARNING PROCESSOR THE SEPARATION PROCESSOR PROJECT EXPERIMENTS CONCLUSION. INTRODUCTION.
E N D
Sound Source Separation using 3D Correlogram,Fuzzy Logic, and Neural Networks A RESEARCH PROJECT Eduardo Dias Trama
Table of Contents • INTRODUCTION • PROJECT OVERVIEW • THE PREPROCESSOR • THE LEARNING PROCESSOR • THE SEPARATION PROCESSOR • PROJECT EXPERIMENTS • CONCLUSION
INTRODUCTION • Overview of sound source separation • Sound separation methods • Related applications of sound separation
Overview of sound source separation • What is sound separation? • Psychoacoustic properties • Timbre • How can sound be modeled?
Sound separation methods • CASA (Computational Auditory scene Analysis), Marrian • Spatial and Periodicity-and-Harmonicity • CASA: 3D Correlogram analysis • Blind source separation and prediction-driven
Related applications of sound separation • Sound and voice recognition • Noise removal • Compression
PROJECT OVERVIEW • Overview • Auditory model analysis • Sound data library and classification • Sound data matching • Complete sound separation system
Overview • What is a piano sound? • Memory • Clustering
Auditory model analysis • Properties • Grouping • Past knowledge • Correlation
Sound data library and classification • Sound memory • How much information is needed for later analysis? • Does it matter if audio data is compressed? • Structure of classification
THE PREPROCESSOR • The Cochlea Filter Model • Correlogram • 3-D Correlogram
The Cochlea Filter Model • Filtering: basilar membrane (BM) • Detection: inner hair cell (IHC) • Compression: automatic gain control (AGC) • Cochleagram
Correlogram • Short time auto-correlations of the neural firing rates as a function of cochlear place (best frequency) versus time • Correlogram movie
Correlogram • Speech processing • Extract the formants of voiced and unvoiced sounds • Short duration • Auto-correlation window size Window size
Correlogram Frame • Vertical axis shows low to high frequencies from bottom to top • Horizontal axis represents the lag or time delay
Correlogram Frame • Dark areas in the image show activity in the Correlogram frame • Vertical lines: cochlear channels firing in the same period
Correlogram Frame • Horizontal bands are indicators of large amounts of energy within a frequency band
3-D Correlogram • A series of Correlograms over time • Frequency information comes from a cochlea filter bank • A finite time/frequency analysis • It depends on the initial time
THE LEARNING PROCESSOR • Creating the network input • Classification • Artificial neuron network fuzzy classification
Creating the network input • Responsible for learning each Correlogram frame of a selected sound • It should be exposed to many small variations of the target (selected) sound • The total number of neural nets (NN) is: NN = FB x CF
Class Family Length Frequency range Number of Correlogram frames Sufficient to classify one particular sound Make the matching process faster Intensive parallel processing Classification
Artificial neuron network fuzzy classification • Fuzzy IF-THEN rules to describe a classifier • An adaptive-network-based fuzzy classifier to solve fuzzy classification problems • ANFIS (adaptive-network-based fuzzy inference system)
THE SEPARATION PROCESSOR • Choosing method for sound matching • The Matching Fuzzy Logic sound library • Sound separation
Choosing method for sound matching • Preamble, search, matching and interpolation • Target and precision • Fuzzy clustering algorithms
The Matching Fuzzy Logic sound library • A set of fuzzy sound elements will be used for matching (FIS) • The initial values for search need to be determined by external inputs • ANFIS (Adaptive Neuro-Fuzzy Inference Systems)
Sound separation • Search, match and extract • Step 1: Input process • Step 2: Classification • Step 3: Choosing what to separate • Step 4: Dynamics and pitch extraction • Step 5: Re-synthesis
Step 1: Input process • Analog to digital conversion • Cochlea filter bank • Cochleagram • Correlogram frames • Neuro-Fuzzy input matrix
Step 3: Choosing what to separate • Rule 1: Assume that human auditory system can recognize one or more sounds from the audio input mixture • Rule 2: One recognizable audio should be selected for separation • Rule3: Assume that complete or partial information of selected audio class must exist in sound library
Step 5: Re-synthesis • Re-synthesis of selected sound Correlogram frames at unit pitch • Apply dynamics to each Correlogram frame • Correlogram frame inversion
PROJECT EXPERIMENTS • Experiment setup • Experiment procedures • Experiment results
Experiment procedures • Recorded wave data:5 sec. @ 44100 Hz sample rate, 16 bits resolution, and two channels (stereo) • Down-sampled to 11025 Hz to one channel • Mixed combinations without delay • Mixed combinations with 0.5 sec. delay
Experiment results • Single Sound Source • Two sound source without delay • Two sound source with delay • Modeling ANFIS for Correlogram frames • Correlogram frame channel training (classification) • Correlogram frame channel evaluation (matching)