Team: #19 Presenter: Xiaozhe Wang Yue Gu

Ricardo:Integrating R and Hadoop Team: #19 Presenter: Xiaozhe Wang YueGu

Agenda • Background • Introduction to R • Disadvantages for Current Strategies • Introduction to Ricardo • Overview of Ricardo’s Architecture • Evaluation • Reference

Background DataMiningExamples • Eg: • Amazon personalized recommendation of products • Netfix recommend the movies to the customer by the taste of this customer

Introduction to R R’s functionalityforDataMining • Principal and independent component analysis • k-means clustering • SVM classification • Generalized-linear • Latent-factor • Bayesian • Time- series

Introduction to R R: Simplified Method for Data Mining Kmeans Algorithm Kmeans on R

Disadvantages for Current Strategies in Scalability for Data Mining Disadvantages for Current Strategies • Exploit vertical scalability • Limited • Expensive • Sample the dataset • Lose important features • Lose the accuracy • Large-scale management system(DMS) • Less functionality

Introduction to Ricardo Ricardo: R and Hadoop

Architecture Overview of Ricardo’s Architecture

Evaluation Performance and Scalability • Object:Simulate a real recommender system • Original Netflix competition dataset • Jaqlrequires about twice as much time as raw Hadoop. • higher level of abstraction

Conclusion Conclusion • Ricardo, a scalable platform

Reference S. Das, Y. Sismanis, K. S. Beyer, R. Gemulla, P. J. Haas, andJ. McPherson. Ricardo: integrating R and Hadoop. In SIGMOD2010. http://www.mpi-inf.mpg.de/~rgemulla/publications/das10ricardo.pdf

Questions？

Thanks！！！！

Team: #19 Presenter: Xiaozhe Wang Yue Gu

Team: #19 Presenter: Xiaozhe Wang Yue Gu

Presentation Transcript

TEAM BUILDING

Strengths-based Performing Teams

Introduction to Q10 Pharmaceutical Quality System

Chin-Chih Wang (Michael)

Exploring Complex Free Energy Landscapes with Wang-Landau Sampling D. P. Landau Center for Simulational Physic

Advisor: Professor Frank Y.S. Lin Present by J.W. Wang

ELECTRICAL FOR DR. WANG

CanSat 2011 PDR Outline Version 0.2

Presenter: Sayaka Abe

Zhi Wang United States International Trade Commission*

The TEAM FOCUS Framework for Team Problem Solving

Gail Tonnesen, Bo Wang, Chao-Jung Chien, Zion Wang, Mohammad Omary

Final Time

Japan