100 likes | 124 Views
Document Classification using Deep Belief Nets. Lawrence McAfee 6/9/08 CS224n, Sprint ‘08. Overview. Doc#1 Food. Corpus: Wikipedia XML Corpus Single-labeled data – each document falls under single category Binary Feature Vectors Bag-of-words
E N D
Document Classification using Deep Belief Nets Lawrence McAfee 6/9/08 CS224n, Sprint ‘08
Overview Doc#1 Food • Corpus: Wikipedia XML Corpus • Single-labeled data – each document falls under single category • Binary Feature Vectors • Bag-of-words • ‘1’ indicates word occurred one or more times in document Doc#1 Doc#1 Doc#1 Doc#2 Brazil Doc#3 Classifier Doc#2 Doc#3 Presidents
Background on Deep Belief Nets Very abstract features • Unsupervised, clustering training algorithm RBM 3 Higher level features RBM 2 Features/basis vectors for training data RBM RBM 1 Training Data
Golf Cycling Configuration (v,h) Inside an RBM hidden • Goal in training RBM is to minimize energy of configurations corresponding to input data • Train RBM by repeatedly sampling hidden and visible units for a given data input j Energy i visible Input/Training data
Depth • Binary representation does not capture word frequency information • Inaccurate features learned at each level of DBN
Lions Tigers Lions Tigers Configuration (v,h) Configuration (v,h) Training Iterations • Accuracy increases with more training iterations • Increasing iterations may (partially) make up for learning poor features Energy Energy
Comparison to SVM, NB 30 categories • Binary features do not provide good starting point for learning higher level features • Binary still useful, as 22% is better than random • Time: DBN-2h,13m; SVM-4sec; NB-3sec
Lowercasing • Supposedly richer vocabulary when lowercasing • Overfitting: we don’t need these extra words • Other experiments show only top 500 words relevant
Suggestions for Improvement • Use appropriate continuous-valued neurons • Linear or Gaussian neurons • Slower to train • Not much documentation on using continuous-valued neurons with RBMs • Implement backpropagation to fine-tune weights and biases • Propagate error derivatives from top level RBM back to inputs • Unsupervised training gives good initial weights, while backpropagation slightly modifies weights/biases • Backpropagation cannot be used alone, as it tends to get stuck in local optima