150 likes | 178 Views
Pump it Up: Data Mining the Water Table Project Plan Milestone 1. Team Members Alex Pang Terry Scates Ibrahim Oyekan Patrick Merker. Tools Chosen. Patrick Merker (Rattle based on R) Ibrahim Oyekan(RapidMiner) Terry Scates(Vowpal Wabbit) Alex Pang (Prediction I/O). Rattle on R.
E N D
Pump it Up: Data Mining the Water Table Project Plan Milestone 1 Team Members Alex Pang Terry Scates Ibrahim Oyekan Patrick Merker
Tools Chosen • Patrick Merker (Rattle based on R) • Ibrahim Oyekan(RapidMiner) • Terry Scates(Vowpal Wabbit) • Alex Pang (Prediction I/O)
Rattle on R • Rattle is a GUI based on R that allows the user to analysis data and utilize many of the R packages available. • Rattle simplifies manipulation of data sets and allows the user to easily perform tasks without getting lost in the detail. • One of the most productive features is the ability of Rattle to use R to make decision trees that predict outcomes based on data. • Once we as a team determine the parts of the tool that we want to use, we will extract the functions being used and create a stand alone R program that will the data and give us the decision tree that we will use.
Rattle on R Demonstration
Vowpal Wabbit Demonstration
RapidMiner • RapidMiner is a popular, open-source integrated environment useful for predictive analysis, data science and machine learning • It combines a handy, extensively customizable GUI with scripting functionality to help train and visualize statistical input to predict derived outputs • It provides embedded data mining and machine learning algorithms (regression, random forests, decision trees etc.) called operators which can be applied to your datasets and gives you the option to add personal algorithms or modify their defaults
RapidMiner • It also provides functionality for visualizing statistical data in graphs, histograms, scatter plots etc. • After your dataset is loaded and wired up to an operator, it allows you build a process on the dashboard which is then executed and provides the requested output. • The free edition only allows up to 10,000 rows of data, and the lowest tier costs $2,500 a year but I applied for an educational license that supports unlimited functionality, which just got approved.
RapidMiner Demonstration
Prediction I/O • A machine learning server deployed as a web service, creates predictive models in real time • Separated into three layers: The platform, Event Server, and the Engine • The platform basically controls and maintains the event server and the engine • Input the data via the event server • The engine builds predictive models based on the input • Each engine for creating predictive models are highly customizable
Prediction I/O Demonstration