140 likes | 311 Views
Predicting Signal Peptides using Deep Neural Networks. Cecilie Anker, Casper Sønderby and Søren Sønderby 02459 MACHINE LEARNING FOR SIGNAL PROCESSING, DTU COMPUTE, SPRING 2013 . Purpose. Identify Cleavage sites in signal peptides
E N D
Predicting Signal Peptides using Deep Neural Networks Cecilie Anker, Casper Sønderby and Søren Sønderby 02459 MACHINE LEARNING FOR SIGNAL PROCESSING, DTU COMPUTE, SPRING 2013
Purpose • Identify Cleavage sites in signal peptides • Compare SignalP 4.0 neural networks with deep neural networks
Cleavage Site TTGNGLFINESAKLVDTFLEDVKNLHHSKAFSINFRDAEEAK SSSSSSSSSSSSSSSC....................TTTT.. Slide window across sequence ENCODING ...000001000010000000000001000000010000000001... Neural network Input window
Dataset • SignalP 4.0 data [1] • With Signal peptides (n = 1640) • Nuclear (n = 5133) • With Transmembrane region (n = 687)
Dataset Samples [thousands]
Prediction Model DNN 5-Ensemble HMM-DNN hybrid Model DNN model output
Training • Backpropagation • Dropout • L2-norm regularization • Early stopping • Decaying learning rate • Momentum • Minibatches • No Pretraining
Further work • ReLU • Maxout networks • DBN + Pretraining + SVM
Resources • Deeplearntoolbox (Matlab) • Deeplearntoolbox GPU (Matlab) • Matlab script DTU servers • Theano (Python) • Theano script for DTU servers • Theano tutorial/examples • Python GPU matrix operation (cudamat) • Pylearn2 • Questions: Skaaesonderby@gmail.com
References [1] Petersen, TN., et. al. (2011) SignalP4.0: discriminating signal peptides from transmembraneregions. Nature methods 10(8) ,785-786. [2] Hinton, GE., et al. (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580. [3] Qian, N., et. al. (1988). Predicting the secondary structure of globular proteins using neural network models. Journal of molecular biology, 202(4), 865-884. [4] Nanni, Let. al. (2011). A new encoding technique for peptide classification. Expert Systems with Applications, 38(4), 3185-3191. [5] Wu, C. H et. al. (Eds.). (2000). Neural networks and genome informatics (Vol. 1). Elsevier Science. [6] Zamani, M., et al. (2011). Amino acid encoding schemes for machine learning methods. In Bioinformatics and Biomedicine Workshops (BIBMW), 2011 IEEE International Conference on (pp. 327-333). IEEE. [7] Bourlard, H et. al, (1994), Connectionistspeechrecognition: a hybridapproach. Springer. [8] Palm, RB. (2012), Prediction as a candidate for learning deep hierarchical models of data, master thesis.