250 likes | 319 Views
How Microsoft H ad Made Deep Learning Red-Hot in IT Industry. Zhijie Yan, Microsoft Research Asia USTC visit, May 6, 2014. Self Introduction. @MSRA 鄢志杰 996 – studied in USTC from 1999 to 2008
E N D
How Microsoft Had Made Deep Learning Red-Hot in IT Industry Zhijie Yan, Microsoft Research Asia USTC visit, May 6, 2014
Self Introduction • @MSRA鄢志杰 • 996 – studied in USTC from 1999 to 2008 • Graduate student – studied in iFlytek speech lab from 2003 to 2008, supervised by Prof. Renhua Wang • Intern – worked in MSR Asia from 2005 to 2006 • Visiting scholar – visited Georgia Tech in 2007 • FTE – worked in MSR Asia since 2008 • Research interests • Speech, deep learning, large-scale machine learning
In Today’s Talk • Deep learning becomes very hot in the past few years • How Microsoft had made deep learning hot in IT industry • Deep learning basics • Why Microsoft can turn all these ideas into reality • Further reading materials
How Hot is Deep Learning • “This announcement comes on the heels of a $600,000 gift Google awarded Professor Hinton’s research group to support further work in the area of neural nets.” – U. of T. website
Microsoft Had Made Deep Learning Hot in IT Industry • Initial attempts made by University of Toronto had shown promising results using DL in speech recognition on TIMIT phone recognition task • Prof. Hinton’s student visited MSR as an intern, good results were obtained on Microsoft Bing voice search task • MSR Asia and Redmond collaborated and got amazing results on Switchboard task, which shocked the whole industry
Microsoft Had Made Deep Learning Hot in IT Industry *figure borrowed from MSR principal researcher Li DENG
Microsoft Had Made Deep Learning Hot in IT Industry • Followed by others and results were confirmed in various different speech recognition tasks • Google / IBM / Apple / Nuance / 百度 / 讯飞 • Continuously advanced by MSR and others • Expand to solve more and more problems • Image processing • Natural language processing • Search • …
Deep Learning From Speech to Image • ILSVRC-2012 competition on ImageNet • Classification task: classify an image into 1 of the 1,000 classes in your 5 bets lifeboat airliner school bus
Deep Learning From Speech to Image • ILSVRC-2012 competition on ImageNet • Classification task: classify an image into 1 of the 1,000 classes in your 5 bets lifeboat airliner school bus
Deep Learning Basics • Deep learning deep neural networks multi-layer perceptron (MLP) with a deep structure (many hidden layers) Output layer W3 Output layer Hidden layer W1 W2 Hidden layer Hidden layer W0 Input layer W1 Hidden layer W0 Input layer
Deep Learning Basics • Sounds not new at all? Sounds familiar like you’ve learned in class? • Things not change over the years • Network topology / activation functions / … • Backpropagation (BP) • Things changed recently • Data Big data • General-purpose computing on graphics processing units (GPGPU) • “A bag of tricks” accumulated over the years
E.g. Deep Neural Network for Speech Recognition • Three key components that make DNN-HMM work Tied tri-phones as the basis units for HMM states Many layers of nonlinear feature transformation Long window of frames *figure borrowed from MSR senior researcher Dong YU
E.g. Deep Neural Network for Image Classification • The ILSVRC-2012 winning solution *figure copied from Krizhevsky, et al., “ImageNet Classification with Deep Convolutional Neural Networks”
Scale Out Deep Leaning • Training speed was a major problem of DL • Speech recognition model trained with 1,800-hour data (~650,000,000 vector frames) costs 2 weeks using 1 GPU • Image classification model trained with ~1,000,000 figures costs 1 weeks using 2 GPUs* • How to scale out if 10x, 100x training data becomes available? *Krizhevsky, et al., “ImageNet Classification with Deep Convolutional Neural Networks”
DNN-GMM-HMM • Joint work with USTC-MSRA Ph.D. program student, Jian XU (许健, 0510) • The “DNN-GMM-HMM” approach for speech recognition* • DNN as hierarchical nonlinear feature extractor, trained using a sub-set of training data • GMM-HMM as acoustic model, trained using full data *Z.-J. Yan, Q. Huo, and J. Xu, “A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR”
DNN-GMM-HMM • GMM-HMM modeling of DNN-derived features: combine the best of both worlds
Experimental Results • 300hr DNN (18k states, 7 hidden layers) + 2,000hr GMM-HMM (18k states)* • Training time reduced from 2 weeks to 3-5 days *Z.-J. Yan, Q. Huo, and J. Xu, “A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR”
A New Optimization Method • Joint work with USTC-MSRA Ph.D. program student, Kai Chen (陈凯, 0700) • Using 20 GPUs, time needed to train a 1,800-hour acoustic model is cut from 2 weeks to 12 hours, without accuracy loss • The magic is to be published • We believe the scalability issue in DNN training for speech recognition is now solved!
Why Microsoft Can Do All These Good Things • Research • Bridge the gap between academia and industry via our intern and visiting scholar programs • Scale out from toy problems to real-world industry-scale applications • Product team • Solve practical issues and deploy technologies to serve users worldwide via our services • All together • We continuously improve our work towards larger scale, higher accuracy, and to tackle more challenging tasks • Finally • We have big-data + world-leading computational infrastructure
If You Want to Know More About Deep Learning • Neural networks for machine learning: https://class.coursera.org/neuralnets-2012-001 • Prof. Hinton’s homepage: http://www.cs.toronto.edu/~hinton/ • DeepLearning.net: http://deeplearning.net/ • Open-source • Kaldi (speech): http://kaldi.sourceforge.net/ • cuda-convent (image): http://code.google.com/p/cuda-convnet/