Unsupervised feature learning for audio classification using convolutional deep belief networks

Unsupervised feature learning for audio classification using convolutional deep belief networks Honglak Lee, Yan Largman, Peter Pham and Andrew Y. Ng Presented by Bo Chen, 5.7,2010

Outline • 1. What’s Deep Learning? • 2. Why use Deep Learning? • 3. Foundations of Deep Learning • 4. Convolutional Deep Belief Networks • 5. Results

Deep Architecture • Deep architectures: compositions of many layers of adaptive non-linear components. Difficulty: parameter searching (local minima) • Deep belief nets: probabilistic generative models that are composed of multiple layers of stochastic, latent variables. (Hinton et al., 2006) Deep Learning Wiki

Why Use Deep Learning • Insufficient depth can hurt Usually our experiences tell us that one-layer machine only gives us a set of general dictionary elements, unless a huge number of dictionary elements. • The brain has a deep architecture • Cognitive processes seem deep • Learn a feature hierarchies or the complicated functions that can represent high-level abstractions For example, PixelsEdgletsMotifsPartsObjectsScenes Some from Yoshua Bengio’s course notes and Yann Lecun, et.al.,2010

One-layer dictionary 30 16x16 dictionary elements and reconstructed images 250 16x16 dictionary elements and reconstructed images

Restricted Boltzmann Machine Binary-valued Energy function Real-valued Contrastive divergence is used to solve the problem. (Hinton et al., 2006) Figure from R Salakhutdinov et. al.

Deep Architectures RBM in the different layers can be independently trained.

Convolutional Network Architecture Intuitively, in each layer the weight matrix will catch the most consistent ‘structures’ through all of the images. Figure from Yann LeCun et. al, 1998

3-dimensional Dictionary elements in the second layer D: the first-layer dictionary element E: the second-layer dictionary element S: the convolution of the image and the first-layer elements. The dictionary element in the second layer is a 3-dimensional matrix.

Convolutional RBM with Probabilistic Max-Pooling Layer Max-pooling Layer

Convolutional Deep Belief Networks : the weight matrix Connecting pooling unit Pk to detection unit H’l.

Results on Natural Images

Results Caltech101 Images

Unsupervised feature learning for audio classification using convolutional deep belief networks