1 / 25

How Microsoft H ad Made Deep Learning Red-Hot in IT Industry

How Microsoft H ad Made Deep Learning Red-Hot in IT Industry. Zhijie Yan, Microsoft Research Asia USTC visit, May 6, 2014. Self Introduction. @MSRA 鄢志杰 996 – studied in USTC from 1999 to 2008

Download Presentation

How Microsoft H ad Made Deep Learning Red-Hot in IT Industry

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How Microsoft Had Made Deep Learning Red-Hot in IT Industry Zhijie Yan, Microsoft Research Asia USTC visit, May 6, 2014

  2. Self Introduction • @MSRA鄢志杰 • 996 – studied in USTC from 1999 to 2008 • Graduate student – studied in iFlytek speech lab from 2003 to 2008, supervised by Prof. Renhua Wang • Intern – worked in MSR Asia from 2005 to 2006 • Visiting scholar – visited Georgia Tech in 2007 • FTE – worked in MSR Asia since 2008 • Research interests • Speech, deep learning, large-scale machine learning

  3. In Today’s Talk • Deep learning becomes very hot in the past few years • How Microsoft had made deep learning hot in IT industry • Deep learning basics • Why Microsoft can turn all these ideas into reality • Further reading materials

  4. How Hot is Deep Learning • “This announcement comes on the heels of a $600,000 gift Google awarded Professor Hinton’s research group to support further work in the area of neural nets.” – U. of T. website

  5. How Hot is Deep Learning

  6. How Hot is Deep Learning

  7. How Hot is Deep Learning

  8. How Hot is Deep Learning

  9. Microsoft Had Made Deep Learning Hot in IT Industry • Initial attempts made by University of Toronto had shown promising results using DL in speech recognition on TIMIT phone recognition task • Prof. Hinton’s student visited MSR as an intern, good results were obtained on Microsoft Bing voice search task • MSR Asia and Redmond collaborated and got amazing results on Switchboard task, which shocked the whole industry

  10. Microsoft Had Made Deep Learning Hot in IT Industry *figure borrowed from MSR principal researcher Li DENG

  11. Microsoft Had Made Deep Learning Hot in IT Industry • Followed by others and results were confirmed in various different speech recognition tasks • Google / IBM / Apple / Nuance / 百度 / 讯飞 • Continuously advanced by MSR and others • Expand to solve more and more problems • Image processing • Natural language processing • Search • …

  12. Deep Learning From Speech to Image • ILSVRC-2012 competition on ImageNet • Classification task: classify an image into 1 of the 1,000 classes in your 5 bets lifeboat airliner school bus

  13. Deep Learning From Speech to Image • ILSVRC-2012 competition on ImageNet • Classification task: classify an image into 1 of the 1,000 classes in your 5 bets lifeboat airliner school bus

  14. Deep Learning Basics • Deep learning  deep neural networks  multi-layer perceptron (MLP) with a deep structure (many hidden layers) Output layer W3 Output layer Hidden layer W1 W2 Hidden layer Hidden layer W0 Input layer W1 Hidden layer W0 Input layer

  15. Deep Learning Basics • Sounds not new at all? Sounds familiar like you’ve learned in class? • Things not change over the years • Network topology / activation functions / … • Backpropagation (BP) • Things changed recently • Data  Big data • General-purpose computing on graphics processing units (GPGPU) • “A bag of tricks” accumulated over the years

  16. E.g. Deep Neural Network for Speech Recognition • Three key components that make DNN-HMM work Tied tri-phones as the basis units for HMM states Many layers of nonlinear feature transformation Long window of frames *figure borrowed from MSR senior researcher Dong YU

  17. E.g. Deep Neural Network for Image Classification • The ILSVRC-2012 winning solution *figure copied from Krizhevsky, et al., “ImageNet Classification with Deep Convolutional Neural Networks”

  18. Scale Out Deep Leaning • Training speed was a major problem of DL • Speech recognition model trained with 1,800-hour data (~650,000,000 vector frames) costs 2 weeks using 1 GPU • Image classification model trained with ~1,000,000 figures costs 1 weeks using 2 GPUs* • How to scale out if 10x, 100x training data becomes available? *Krizhevsky, et al., “ImageNet Classification with Deep Convolutional Neural Networks”

  19. DNN-GMM-HMM • Joint work with USTC-MSRA Ph.D. program student, Jian XU (许健, 0510) • The “DNN-GMM-HMM” approach for speech recognition* • DNN as hierarchical nonlinear feature extractor, trained using a sub-set of training data • GMM-HMM as acoustic model, trained using full data *Z.-J. Yan, Q. Huo, and J. Xu, “A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR”

  20. DNN-GMM-HMM • GMM-HMM modeling of DNN-derived features: combine the best of both worlds

  21. Experimental Results • 300hr DNN (18k states, 7 hidden layers) + 2,000hr GMM-HMM (18k states)* • Training time reduced from 2 weeks to 3-5 days *Z.-J. Yan, Q. Huo, and J. Xu, “A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR”

  22. A New Optimization Method • Joint work with USTC-MSRA Ph.D. program student, Kai Chen (陈凯, 0700) • Using 20 GPUs, time needed to train a 1,800-hour acoustic model is cut from 2 weeks to 12 hours, without accuracy loss • The magic is to be published • We believe the scalability issue in DNN training for speech recognition is now solved!

  23. Why Microsoft Can Do All These Good Things • Research • Bridge the gap between academia and industry via our intern and visiting scholar programs • Scale out from toy problems to real-world industry-scale applications • Product team • Solve practical issues and deploy technologies to serve users worldwide via our services • All together • We continuously improve our work towards larger scale, higher accuracy, and to tackle more challenging tasks • Finally • We have big-data + world-leading computational infrastructure

  24. If You Want to Know More About Deep Learning • Neural networks for machine learning: https://class.coursera.org/neuralnets-2012-001 • Prof. Hinton’s homepage: http://www.cs.toronto.edu/~hinton/ • DeepLearning.net: http://deeplearning.net/ • Open-source • Kaldi (speech): http://kaldi.sourceforge.net/ • cuda-convent (image): http://code.google.com/p/cuda-convnet/

  25. Thanks!

More Related