170 likes | 535 Views
Data Mining. Scratching the surface. And now you have… Datameer RapidMiner Windows Azure Marketplace . b y Prateek Burman. Datameer. Targeted at Hadoop users Around since 2009. Integrate, Analyze, Visualize Scalable Secured access Excel like interface. Datameer cont’d.
E N D
Data Mining Scratching the surface And now you have… Datameer RapidMiner Windows Azure Marketplace by Prateek Burman
Datameer • Targeted at Hadoop users • Around since 2009 • Integrate, Analyze, Visualize • Scalable • Secured access • Excel like interface
Datameer cont’d. Integration • Oracle, DB2, MS SQL, MySQL • Teradata, Greenplum • XML, JSON, CSV • Hbase, Cassandra • Twitter, Facebook, LinkedIn • Email • Log files • SaaS– CRM, GitHub, JIRA Analytics • Time series analysis • Clustering • Decision trees • Built-in Recommendation engine • Column Dependencies • Predictive analysis with R, PMML
Datameer cont’d. Visualization • Graphs • Maps • Shapes • Tables • Dashboard • HTML5 • Visualization apps • from apps market
RapidMiner – Yet Another Learning Environment (YALE) • Around since 2001 • Open source - Older versions • Client/Server model w/ Server as SaaS • Most popular for data analytics • GUI based – no need to write code • Predictive analysis • Text mining • Sentiment analysis • Direct Marketing • Predictive Maintenance
RapidMiner cont’d… • LabView type layout • No coding – min. likelihood of error • One operator's output is another operator’s input • Only structured datasets • 3D graphics & Interactive dashboards
Windows Azure Marketplace • Launched in 2010 • Hundreds of apps • Thousands of subscriptions • Trillions of data point • Scalable – load balance • No need to move data Data Marketplaces • GitHub/svn of data • Point of discoverability • Clean - Ready to use data • An economic model for broad access • OData standard • Excel, SQL server, Office, • Deliver using RESTful web-service access • Infochimps • Factual • Datamarket • Gnip • Datasift • Kasabi
R RapidMiner • Cutting edge Algorithms • Learning curve • Need to import data • Slow • Intuitive • Can execute R scripts • Can be extended using Java or Ruby scripts • Pretty graphics • Need to import data • Cron scheduler Datameer Azure Marketplace • Point & click • Excel like interface • Extensible to R, Python etc. • Need to import data • Supports many Hadoop Distributions • Optimized for Hadoop • Business Infograpics & Dashboards • HTML5 – view anywhere • Known tools like Excel • Data readily available • Cleaner data • Other Windows services