The world’s best chatbot

The world’s best chatbot Marcus Liwicki EISLAB Machine Learning (chair) Luleå University of Technology EISLAB: Embedded Intelligent Systems LAB

What is the Biggest Break Through of AI?

Areas of Machine Learning Machine Learning Data is available, but no labels Training data with labels available Supervised Learning Unsupervised Learning Classification Regression Reinforcement Learning Clustering Feature Learning Artificial Curiosity

Machine Learning @LTU • Fundamental Res. • DocumentAnalysis • eHealth • Space • Speech • And of course: natural language processing • http://bit.ly/liwicki-vdl-17 (all my lecture material)

Overview of Today • Background: small overview of ML • Tools, andlinks – just foryour reference (slides available) • Creating the world's best chatbot • Natural language processing • Semantic hashing • Conclusion

Useful Toolkits (Most Popular) • Keras: https://elitedatascience.com/keras-tutorial-deep-learning-in-python • deeplearnjs: https://deeplearnjs.org/ • Deeplearning4j: https://deeplearning4j.org/ • https://mxnet.incubator.apache.org/how_to/finetune.html • Tensorflow (and interesting visualizations in tensorboard) • https://www.tensorflow.org/get_started/ • https://www.tensorflow.org/programmers_guide/summaries_and_tensorboard • Caffe2 and Caffe • https://caffe2.ai/ • http://caffe.berkeleyvision.org/ • PyTorch & Torch • http://pytorch.org/ • http://torch.ch/ Research business easy-to-use UI’s and more for end-users https://cloud.google.com/ml-engine/docs/ https://aws.amazon.com/machine-learning/ https://azure.microsoft.com/en-us/overview/machine-learning/ https://developer.nvidia.com/embedded/learn/tutorials https://developer.nvidia.com/digits http://deepcognition.ai/resources/ https://orange.biolab.si/

Useful Models • The number one place for finding pre-trained models • https://github.com/BVLC/caffe/wiki/Model-Zoo • (also gives hints for successful applications) • A bit easier to understand, because it is curated • https://modeldepot.io/ • Small, but with demos • http://pretrained.ml/ • https://github.com/keras-team/keras/tree/master/examples • Individual task: Look at both websites and try to find a model working in a domain which is interesting for YOU

Other Useful Links • https://teachablemachine.withgoogle.com/ • http://playground.tensorflow.org • https://experiments.withgoogle.com/ai • https://transcranial.github.io/keras-js/#/imdb-bidirectional-lstm • https://transcranial.github.io/keras-js/#/mnist-acgan • https://quickdraw.withgoogle.com/

We can Learn From Failures & Success • Deep Learning and AI is not the answer to everything • https://www.techrepublic.com/article/top-10-ai-failures-of-2016/ • An extension of reinforcement learning is Artificial Curiosity • Could (and definitely would) go terribly wrong • https://blog.statsbot.co/deep-learning-achievements-4c563e034257

Recent Achievement @LTU: Intent Classification I want.. Can I.. Challenges • Varying Domain • Small Data • DL needs Data • Writing Errors Approach • Standard ML • DL I need.. Je veux.. Gimme.. What.. I want.. Ich.. Jag.. Do I.. Hur..

LTU is Leading in Intent Classification LTU Top Start-Ups Microsoft IBM Google Open Source SAP https://github.com/kumar-shridhar/HackathonLulea

Natural Language Processing

Natural Language Processing • Languages often seem to behave in arbitrary ways and forms • cabz, cats • Ambiguity, sarcasm and irony are often not apparent from purely textual information • Domain-specific terms and phrases that may not even be grammatically correct • to short a stock • no woman no cry

Differences in Principles • Word order • English: John ate apples • Japanese: Jon waringo o tabeta • Null Subject • English: It is raining • Spanish: Estálloviendo • Ambiguity • John and Henry’s parents arrived at the house. -> how many people? • Recursion • He said that _ she knew that _ they are there _ where we have been _ when

Discussion • What is better? • Rule-based NLP (Grammar, like software constructs) • Deep Learning based NLP (much text available)

Word Embeddings the catsaton themat • Classical (Bag of Words - BoW) • Various ways of choosing the values: Binary, Count, TF-IDF • Note: we loose information. Which? • One-hot encoding • dog: 1 0 0 0 0 … 0 • cat: 0 0 1 0 0 … 0 • No semantic similarity conveyed, lots of data required

Vectors can Represent Words • One Hot representation • Let us assume we have 1000 words in the dictionary • A word would be represented by a vector with 1 one and 999 zero elements • Even a sentence or a document can be represented similarly • More ones • Example: are am ... Hello I ... Marcus Hello I am Marcus

Learning from Data: Distributional Hypothesis • Firth, 1957: ‘You can tell a word by the company it keeps’ • The cat sat on the mat • The dog sat on the mat • The elephant sat on the mat • The quickly sat on the mat • Idea: Embed words with just 300 or 500 values, not |V| • More dense • Less dimensions • Should embed domain semantics • Generalize easily

Word2Vec • Mikolov et al., 2013 (while at Google) • Family of Models to train Word Embeddings (E) • Linear models in an encoder-decoder structure • Two models for training embeddings in an unsupervised manner: Continuous Bag-of-Words (CBOW) Target word is output of combined context words Skip-Gram Target word is input to each context word 1-hot (|V|) 1-hot (|V|) PAD = padding (ε) d-dimensional d-dimensional cat the PAD sat on d-dimensional d-dimensional 1-hot (|V|) 1-hot (|V|) Σ E E E E E E E E cat cat cat cat PAD the sat on

And These can be Really Useful and Fun • Task: test the word2vec online demo • https://rare-technologies.com/word2vec-tutorial/#app • Try out your own word combinations • Are there cases where it is particularly good/bad?

AdvancedTechniques & Tricks • SemanticHashing • Reducesvocabularysize • Works w. unknown and spellingerrors • Data Augmentation • Word Shuffling • Mis-spelling • Keyboard key closeness • Data Balancing • Imbalanceddataforclasses • SGDclassifier • Classicalapproach good Word #good# Add # #go, goo,ood, od# Trigrams 0 …. 1 0 1 1 … 1-hot enc 500k reduced to 30k; then typical Embedding

Marvin2025 is happy! • Most of our work is Open Source • Including data and documentation • https://diva-dia.github.io/DeepDIVAweb/ • https://diuf.unifr.ch/main/hisdoc/divaservices • As interactive iPython Notebook with tutorial-like explanation https://github.com/kumar-shridhar/HackathonLulea

Engaging Education with Music • iMuSciCAwww.imuscica.eu • Ongoing EU project • Try it out with Chrome: https://workbench.imuscica.eu/ • Team Teaching with STEAM • Science, Technology, Engineering & Mathematics combined with Arts • Workshop & Concert in Luleå (2019-03-02) • http://www.kulturenshus.com/evenemang/imuscica/ • http://www.ltu.se/eu-steam-2019

Conclusion • Deep Learning is really good (SotA) in many tasks • Speech, image, handwriting, video recognition • Intend recognition, sentiment analysis • Stock market prediction, big data forecast • However, it does not solve everything • More than 1000 classes? • Often biased to training set • https://arxiv.org/ftp/arxiv/papers/1801/1801.00631.pdf • And: https://medium.com/@GaryMarcus/in-defense-of-skepticism-about-deep-learning-6e8bfd5ae0f1

Thank You + Lab Members & Beyond And colleagues • LTU • Kaiserslautern • Fribourg • International Gustav Marcus Fotini Pedro Rajkumar Priamvada Oluwatosin György

The world’s best chatbot

The world’s best chatbot

Presentation Transcript

powerpoint presentation

Powerpoint presentation

PPT Presentation

PowerPoint presentation

Download the Powerpoint Presentation - PowerPoint Presentation

PowerPoint Presentation.

talk-ppt - PowerPoint Presentation

PowerPoint Presentation

PowerPoint Presentation

PowerPoint Presentation

Full Service Moving Plano TX - PowerPoint PPT Presentation

IEinfosoft.Pvt.Ltd Powerpoint PPT Presentation.

Chatbot Creation | Best Car Dealer Chatbot Builder Service

swimming pool cleaning best PPT Presentation

1800 Drivers PPT - PowerPoint PPT Presentation

PPT (PowerPoint Presentation) Combat Pest Control

https://www.slideserve.com/palanibalaji/best-fertility-center-in-chennai-powerpoint-ppt-presentation-8254576

PPT PRESENTATION

Hybrid MLM Software - PowerPoint PPT Presentation

Best MLM Software - PowerPoint PPT Presentation

Affiliate Marketing Software - PowerPoint PPT Presentation

Best PowerPoint Presentation Templates