120 likes | 306 Views
A introduction to Apache Mahout, what is it and how does it work ? What is machine inteligence ? How can mahout be installed and tested on Hadoop ?
E N D
Apache Mahout • What is it ? • How does it work ? • Machine Learning • Algorithms • Install www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Mahout – What is it ? • Machine learning • For large data • Based on Hadoop • But can work on a non Hadoop cluster • Scaleable • Licensed by Apache www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Mahout – How does it work ? • Uses Hadoop Map Reduce • Has many supplied algorithms • Supports four use cases • Recommendation mining • Clustering • Classification • Frequent Itemset Mining www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Mahout - Machine Learning Machine learning – what does it mean ? • A branch of artificial intelligence • Systems that learn from data • Classify data after learning • Learn on test data sets • Generalisation – the ability to classify unseen data sets • after learning www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Mahout – Algorithms Some of the available algorithms (among many others) • Collaborative filtering • Narrow Sense – make predictions about user interests by collecting preferences • General - Multi agent collaboration for information filtering • Mean shift clustering • Mode seeking, used for visual tracking • Parallel frequent pattern mining • Find unique features www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Mahout – Install So how do we install Mahout and test it ? • Install Maven • sudo apt-get install maven3 • Install Apache Mahout • You will need subversion installed • svn co http://svn.apache.org/repos/asf/mahout/trunk • Go to dir containing pom.xml file • mvn install ## in ./trunk Full details available in the Mahout install guide on our web site shop www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Mahout – Test Install So let us run a test • cd $MAHOUT_HOME/examples/bin • ./build-reuters.sh • choose option 1 kmeans clustering • Should finish with – see next slide Full details available in the Mahout install guide on our web site shop www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Mahout – Test Install cd $MAHOUT_HOME/examples/bin ; ./build-reuters.sh Please call cluster-reuters.sh directly next time. This file is going away. Please select a number to choose the corresponding clustering algorithm 1. kmeans clustering 2. fuzzykmeans clustering 3. lda clustering Enter your choice : 1 ok. You chose 1 and we'll use kmeans Clustering ................................. Inter-Cluster Density: NaN Intra-Cluster Density: 0.0 CDbw Inter-Cluster Density: NaN CDbw Intra-Cluster Density: NaN CDbw Separation: NaN Full details available in the Mahout install guide on our web site shop www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Contact Us • Feel free to contact us at • www.semtech-solutions.co.nz • info@semtech-solutions.co.nz • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems