210 likes | 323 Views
Explorations into Internet Distributed Computing. Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu. Project Overview. Design and implement a simple internet distributed computing framework
E N D
Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu
Project Overview Design and implement a simple internet distributed computing framework Compare application development for this environment with traditional parallel computing environment.
Grapevine An Internet Distributed Computing Framework - Kunal Agrawal, Kevin Chu
Motivation • Supercomputers are very expensive • Large numbers of personal computers and workstations around the world are naturally networked via the internet • Huge amounts of computational resources are wasted because many computers spend most of their time idle • Growing interest in grid computing technologies
Internet Distributed Computing Issues • Nodes reliability • Network quality • Scalability • Security • Cross platform portability of object code • Computing Paradigm Shift
Client Application Grapevine Server Grapevine Volunteer Grapevine Volunteer Grapevine Volunteer
Grapevine Features • Written in Java • Parametrized Tasks • Inter-task communication • Result Reporting • Status Reporting
Un-addressed Issues • Node reliability • Load Balancing • Un-intrusive Operation • Interruption Semantics • Deadlock
Meta Classifier - Ang Huey Ting, Li Guoliang
Classifier • Function(instance) = {True,False} • Machine Learning Approach • Build a model on the training set • Use the model to classify new instance • Publicly available packages : WEKA(in java), MLC++.
Meta Classifier • Assembly of classifiers • Gives better performance • Two ways of generating assembly of classifiers • Different training data sets • Different algorithms • Voting
Building Meta Classifier • Different Train Datasets - Bagging • Randomly generated ‘bags’ • Selection with replacement • Create different ‘flavors’ of the training set • Different Algorithms • E.g. Naïve Bayesian, Neural Net, SVM • Different algorithms works well on different training sets
Why Parallelise? • Computationally intensive One classifier = 0.5 hr Meta classifier (assembly of 10 classifiers) = 10 *0.5 = 5 hr • Distributed Environment - Grapevine • Build classifiers in parallel independently • Little communication required
Distributed Meta Classifiers • WEKA- machine learning package • University of Waikato, New Zealand • http://www.cs.waikato.ac.nz/~ml/weka/ • Implemented in Java • Including most popular machine learning tools
Distributed Meta-Classifiers on Grapevine Distributed Bagging • Generate different Bags • Define bag and Algorithm for each task • Submit tasks to Grapevine • Node build Classifiers • Receive results • Perform voting
Preliminary Study • Bagging on Quick Propagation in openMP • Implemented in C
Trial Domain • Benchmark corpus Reuters21578 for Text Categorization • 9000+ train documents • 3000+ test documents • 90+ categories • Perform feature selection • Preprocess documents into feature vectors
Summary • Successful internet distributed computing requires addressing many issues outside of traditional computer science • Distributed computing is not for everyone