1 / 29

Introducing Apache Mahout

Introducing Apache Mahout. Scalable Machine Learning for All! Grant Ingersoll Lucid Imagination. Overview. What is Machine Learning? Mahout. Definition. “Machine Learning is programming computers to optimize a performance criterion using example data or past experience”

Download Presentation

Introducing Apache Mahout

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introducing Apache Mahout Scalable Machine Learning for All! Grant Ingersoll Lucid Imagination

  2. Overview What is Machine Learning? Mahout

  3. Definition “Machine Learning is programming computers to optimize a performance criterion using example data or past experience” Intro. To Machine Learning by E. Alpaydin Subset of Artificial Intelligence Many other fields: comp sci., biology, math, psychology, etc.

  4. Types Supervised Using labeled training data, create function that predicts output of unseen inputs Unsupervised Using unlabeled data, create function that predicts output Semi-Supervised Uses labeled and unlabeled data

  5. Characterizations Lots of Data Identifiable Features in that Data Too big/costly for people to handle People still can help

  6. Clustering Unsupervised Find Natural Groupings Documents Search Results People Genetic traits in groups Many, many more uses

  7. Example: Clustering Google News

  8. Collaborative Filtering Unsupervised Recommend people and products User-User User likes X, you might too Item-Item People who bought X also bought Y

  9. Example: Collab Filtering Amazon.com

  10. Classification/Categorization Many, many types Spam Filtering Named Entity Recognition Phrase Identification Sentiment Analysis Classification into a Taxonomy

  11. Example: NER NER? Excerpt from Yahoo News

  12. Example: Categorization

  13. Info. Retrieval Learning Ranking Functions Learning Spelling Corrections User Click Analysis and Tracking

  14. Other Image Analysis Robotics Games Higher level natural language processing Many, many others

  15. What is Apache Mahout? A Mahout is an elephant trainer/driver/keeper, hence… (and other distributed techniques) + Machine Learning =

  16. What? Hadoop brings: Map/Reduce API HDFS In other words, scalability and fault-tolerance Mahout brings: Library of machine learning algorithms Examples

  17. Why Mahout? Many Open Source ML libraries either: Lack Community Lack Documentation and Examples Lack Scalability Lack the Apache License ;-) Or are research-oriented

  18. Why Mahout? Intelligent Apps are the Present and Future Thus, Mahout’s Goal is: Scalable Machine Learning with Apache License

  19. Current Status What’s in it: Simple Matrix/Vector library Taste Collaborative Filtering Clustering Canopy/K-Means/Fuzzy K-Means/Mean-shift/Dirichlet Classifiers Naïve Bayes Complementary NB Evolutionary Integration with Watchmaker for fitness function

  20. How? Examples Taste Clustering Classification Evolutionary

  21. Taste: Movie Recommendations Given ratings by users of movies, recommend other movies http://lucene.apache.org/mahout/taste.html#demo

  22. http://localhost:8080/mahout-taste-webapp/RecommenderServlet?userID=12&debug=truehttp://localhost:8080/mahout-taste-webapp/RecommenderServlet?userID=12&debug=true http://localhost:8080/mahout-taste-webapp/RecommenderServlet?userID=43&debug=true Taste Demo

  23. Clustering: Synthetic Control Data http://archive.ics.uci.edu/ml/datasets/Synthetic+Control+Chart+Time+Series Each clustering impl. has an example Job for running in <MAHOUT_HOME>/examples o.a.mahout.clustering.syntheticcontrol.* Outputs clusters…

  24. Classification: NB and CNB Examples 20 Newsgroups http://cwiki.apache.org/confluence/display/MAHOUT/TwentyNewsgroups Wikipedia http://cwiki.apache.org/confluence/display/MAHOUT/WikipediaBayesExample

  25. Evolutionary Traveling Salesman http://cwiki.apache.org/confluence/display/MAHOUT/Traveling+Salesman Class Discovery http://cwiki.apache.org/confluence/display/MAHOUT/Class+Discovery

  26. What’s Next? More Examples Winnow/Perceptron (MAHOUT-85) Text Clustering Association Rules (MAHOUT-108) Logistic Regression Solr Integration (SOLR-769) GSOC

  27. When, Who When? Now! Mahout is growing Who? You! We want programmers who: Are comfortable with math Like to work on hard problems We want others to: Kick the tires

  28. Where? • http://lucene.apache.org/mahout • Hadoop - http://hadoop.apache.org • http://cwiki.apache.org/MAHOUT • mahout-{user|dev}@lucene.apache.org • http://www.lucidimagination.com/search/p:mahout

  29. Resources “Programming Collective Intelligence” by Segaran “Data Mining - Practical Machine Learning Tools and Techniques” by Witten and Frank “Taming Text” by Ingersoll and Morton

More Related