1 / 33

ALTERNATE LAYER SPARSITY & INTERMEDIATE FINE-TUNING FOR DEEP AUTOENCODERS

Submitted by: Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094) Prof. K S Venkatesh. ALTERNATE LAYER SPARSITY & INTERMEDIATE FINE-TUNING FOR DEEP AUTOENCODERS. AUTOENCODERS. AUTO-ASSOCIATIVE NEURAL NETWORKS OUTPUT SIMILAR AS INPUT. DIMENSIONALITY REDUCTION.

zuriel
Download Presentation

ALTERNATE LAYER SPARSITY & INTERMEDIATE FINE-TUNING FOR DEEP AUTOENCODERS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Submitted by: Supervised by: AnkitBhutani Prof. AmitabhaMukerjee (Y9227094) Prof. K S Venkatesh ALTERNATE LAYER SPARSITY & INTERMEDIATE FINE-TUNING FOR DEEP AUTOENCODERS

  2. AUTOENCODERS • AUTO-ASSOCIATIVE NEURAL NETWORKS • OUTPUT SIMILAR AS INPUT

  3. DIMENSIONALITY REDUCTION • BOTTLENECK CONSTRAINT • LINEAR ACTIVATION – PCA [Baldi et al., 1989] • NON-LINEAR PCA [Kramer, 1991] – 5 layered network • ALTERNATE SIGMOID AND LINEAR ACTIVATION • EXTRACTS NON-LINEAR FACTORS

  4. ADVANTAGES OF NETWORKS WITH MULTIPLE LAYERS • ABILITY TO LEARN HIGHLY COMPLEX FUNCTIONS • TACKLE THE NON-LINEAR STRUCTURE OF UNDERLYING DATA • HEIRARCHICAL REPRESENTATION • RESULTS FROM CIRCUIT THEORY – SINGLE LAYERED NETWORK WOULD NEED EXPONENTIALLY HIGH NUMBER OF HIDDEN UNITS

  5. PROBLEMS WITH DEEP NETWORKS • DIFFICULTY IN TRAINING DEEP NETWORKS • NON-CONVEX NATURE OF OPTIMIZATION • GETS STUCK IN LOCAL MINIMA • VANISHING OF GRADIENTS DURING BACKPROPAGATION • SOLUTION • -``INITIAL WEIGHTS MUST BE CLOSE TO A GOOD SOLUTION’’ – [Hinton et. al., 2006] • GENERATIVE PRE-TRAINING FOLLOWED BY FINE-TUNING

  6. HOW TO TRAIN DEEP NETWORKS? • PRE-TRAINING • INCREMENTAL LAYER-WISE TRAINING • EACH LAYER ONLY TRIES TO REPRODUCE THE HIDDEN LAYER ACTIVATIONS OF PREVIOUS LAYER

  7. FINE-TUNING • INITIALIZE THE AUTOENCODER WITH WEIGHTS LEARNT BY PRE-TRAINING • PERFORM BACKPROPOAGATION AS USUAL

  8. MODELS USED FOR PRE-TRAINING • STOCHASTIC – RESTRICTED BOLTZMANN MACHINES (RBMs) • HIDDEN LAYER ACTIVATIONS (0-1) USED TO TAKE A PROBABILISTIC DECISION OF PUTTING 0 OR 1 • MODEL LEARNS THE JOINT PROBABILITY OF 2 BINARY DISTRIBUTIONS - 1 IN INPUT AND THE OTHER IN HIDDEN LAYER • EXACT METHODS – COMPUTATIONALLY INTRACTABLE • NUMERICAL APPROXIMATION - CONTRASTIVE DIVERGENCE

  9. MODELS USED FOR PRE-TRAINING • DETERMINISTIC – SHALLOW AUTOENCODERS • HIDDEN LAYER ACTIVATIONS (0-1) ARE DIRECTLY USED FOR INPUT TO NEXT LAYER • TRAINED BY BACKPROPAGATION • DENOISING AUTOENCODERS • CONTRACTIVE AUTOENCODERS • SPARSE AUTOENCODERS

  10. CLASSIFIERS & AUTOENCODERS

  11. DATASETS • MNIST • Big and Small Digits

  12. DATASETS • Square & Room • 2d Robot Arm • 3d Robot Arm

  13. Libraries used • Numpy, Scipy • Theano – takes care of parallelization • GPU Specifications • Memory – 256 MB • Frequency – 33 MHz • Number of Cores – 240 • Tesla C1060

  14. MEASURE FOR PERFORMANCE • REVERSE CROSS-ENTROPY • X – Original input • Z – Output • Θ– Parameters – Weights and Biases

  15. BRIDGING THE GAP • RESULTS FROM PRELIMINARY EXPERIMENTS

  16. PRELIMINARY EXPERIMENTS • TIME TAKEN FOR TRAINING • CONTRACTIVE AUTOENCODERS TAKE VERY LONG TO TRAIN

  17. SPARSITY FOR DIMENSIONALITY REDUCTION • EXPERIMENT USING SPARSE REPRESENTATIONS • STRATEGY A – BOTTLENECK • STRATEGY B – SPARSITY + BOTTLENECK • STRATEGY C – NO CONSTRAINT + BOTTLENECK

  18. ALTERNATE SPARSITY

  19. OTHER IMPROVEMENTS • MOMENTUM • INCORPORATING THE PREVIOUS UPDATE • CANCELS OUT COMPONENTS IN OPPOSITE DIRECTIONS – PREVENTS OSCILLATION • ADDS UP COMPONENTS IN SAME DIRECTION – SPEEDS UP TRAINING • WEIGHT DECAY • REGULARIZATION • PREVENTS OVER-FITTING

  20. COMBINING ALL • USING ALTERNATE LAYER SPARSITY WITH MOMENTUM & WEIGHT DECAY YIELDS BEST RESULTS

  21. INTERMEDIATE FINE-TUNEING • MOTIVATION

  22. PROCESS

  23. PROCESS

  24. RESULTS

  25. RESULTS

  26. RESULTS

  27. CONCLUDING REMARKS

  28. NEURAL NETWORK BASICS

  29. BACKPROPAGATION

  30. RBM

  31. RBM

  32. AUTOENCODERS

More Related