400 likes | 499 Views
Project in Networked Software Systems (044167) Distributed System. May 2009. Supervisors & Staff . Suprevisors : Mr. Maxim Gurevich Dr. Ilana David Developers: Saeed Mhameed Hani Ayoub. Agenda. Problem Definition & Solution Why bothering? Implementation Techniques
E N D
Project inNetworked Software Systems(044167)Distributed System May 2009
Supervisors & Staff • Suprevisors: • Mr. Maxim Gurevich • Dr. Ilana David • Developers: • SaeedMhameed • Hani Ayoub
Agenda • Problem Definition & Solution • Why bothering? • Implementation Techniques • Problems (& Solutions) • HLD (High-Level Design) • Main Communication Data-Flows • UI snapshots • Demo Example: Distributed webpage downloading and parsing • Performance Analysis
Problem Definition • Main Problem: Executing large set of small computational tasks consumes numerous processing time on a single machine. • Tasks are homogeneous and non-related • Executing is serial. • Execution order is not significant.
Solution Distribute tasks over more than one machine and exploit computational power of remote machines.
Why bothering? • Several solution already exist, such as: • Condor • Complex Syntax • One task per run • Not developer-friendly • MPI • Networking understanding needed • Executing and Synchronizing tasks is the user responsibility • And more…
Why bothering? (cont.) • Implementing new solution: • User-friendly API. (ease of usage) • User transparent. • Dynamic System-Management. • Task generic. • Easy to convert from serial to parallel.
Implementation Techniques • Java: • Object oriented • Cross-Platform • Java RMI (Remote Method Invocation) • Easy to use • Transparent networking mechanism
Problems • Firewall • Load Balancing • Auto-Update • Efficiency • Execution Transparency • Fault Tolerance
Firewall • Problem: • A firewall can exist between user machine and remote-machines (Executers). • Solution: • One side connection (user side)
B A Executers Machines Client Machine Send Task Get Results Send Task Send Task Send Result Send Result Send Result Firewall Firewall (cont.)
Load Balancing (Scheduling) • Problem: • Tasks can be submitted to the system asynchronously. Load balancing needed for efficiency. • Solution: • Distributing tasks considering remote-machine load, task weight and priority. • Round-Robin based. • Prevent starvation.
Auto-Update • Problem: • In regular system, when end-user need to update the tasks and the code executing them (for fixing bugs or changing tasks executing purposes) , he needs to go over all remote-machines (Executers) -which can be far far away- and update the code on the machines themselves. • Solution: • Support the ability of updating the code from the System Manager machine –which is always near the end-user. • RMI connection problem arises.
Efficiency • Problem: • Executing large amount of small tasks one by one could be inefficient (overhead expenses). • For each task: • Scheduling • Sending RMI messages • Solution: • formation of smaller units of information into large coordinated units – Chunking. • Send a collection of tasks to the same remote-machine (Executer).
Execution Transparency • Problem: • Same I/O operation’s output must be the same in both serial and distributed system. • Feel like running locally. • Output Stream • Exceptions • More… • Solution: • Simulating these operations.
Fault Tolerance • Problem: • Remote-Machines may disconnect the network anytime. • Executing tasks on the machine will be lost. • Solution: • Failure Detector • Save the machine state until it connect again, then resumes its work.
High-Level Design Task System Manager Client Executer UI Result
High-Level Design (cont.) • Concrete Components • (Provided by user)
Common Components • Item • Base communication item for the System. • Task, Result, Chunk etc… • Task • Type of task user wants to execute. • Result • the result of the Task execution. • Chunk • holds a bundle of Items for network optimizations • Used for efficiency
Common Components (cont.) • Synchronized Counter Mechanism • Used to determine whether the user is idle or not. • Log Tracing • Log tracer, resided on each remote-object. • Used basically for debug reasons and I/O redirection. • Networking Layer • Responsible for communication purposes. • Taking in account Firewall existence.
Executer Component • Main Functionality • Resides on remote-machine waiting for tasks to execute. • Task Executer • Holds the concrete task Executer provided by the user. • Results Organization • Preparing results for Clients
Client • Main Functionality • Resides on user-machine. • Provides the implementation for the user API. • Results Collector • Polling prepared results from Executers.
System Manager • Main functionality • Match-making between Clients and Executers. (Scheduling) • Holds lists of Executers/Clients connected • Manages common system operations • Auto-Update • Clean Exit • Etc… • Failure Detection Mechanism
Task1 Task2 Main User Scenario Updates Client System Manager User Interface Firewall Result1 Result2 Executer 1 Executer 2 Executer 3
User Interface • System Manager Tab
User Interface • Executers Tab
User Interface • Executer Trace Log
User Interface • Update Tab
Demo Example: Distributed webpage downloading and parsing • Distributed webpage downloading and parsing demo • Created in order to test • System performance • Ease of usage • From Serial code to Distributed using the system API • Tested on Windows (XP) and Linux (Suse)
Before publicclassDownloadFiles { privatefinal String rootDir_; privatefinalDocIndexRepositorydownloadedDocs_; privatevoid run(String inputFileName) throws Exception { BufferedReader input = FileHandler.openFileReader(inputFileName); while(true) { • String url = input.readLine(); try { String text = downloadAndParseFile(url); • String fullFileName = createOutputFile(rootDir_); • writeResultToFile(text, fullFileName); • outputFileStream.close(); } catch (Exception e) { System.out.println(e.getMessage()); } } This Line needs to be distributed to make it possible run over more than one machine
Changes to be done • Implement • Task class • Result class • Executer class (downloadAndParseFile function) • Modify the code using the API
After • Task Code: • importdiSys.Common.Item; publicclassDownloadTaskextends Item { public String url; publicDownloadTask() { super(0); • this.url=""; } • publicDownloadTask(long id, String url) { super(id); this.url=url; } public String toString(){ return"Task ID: " + this.getId(); } } Result Code: importdiSys.Common.Item; publicclassDownloadResultextends Item { public String text; public String url; publicDownloadResult(long id) { super(id); } }
After (cont.) • Executer Code: publicclassDownloadExecuterimplementsIExecutor<DownloadTask,DownloadResult> { publicDownloadResult run(DownloadTask task) throws Exception { DownloadTask task=task; DownloadResult res=newDownloadResult(); • //Output will be redirected (appears on Client machine) System.out.println("trying to download url: "+task.url); //Do the job... res.text = downloadAndParseFile(task.url); res.url = task.url; System.out.println("url: " + task.url + " downloaded!); return res; } }
publicclassDownloadClient { Modified Client Code privatefinal String rootDir_; privatefinalDocIndexRepositorydownloadedDocs_; public static void main(Sting[]args)throws Exception { • //Initialization (connecting to system manager) • RemoteClient<DownloadTask, DownloadResult> client = • newRemoteClient<DownloadTask,DownloadResult>("localhost",5000 /*port*/, 10 /*chunk size*/); • //Start the Client • client.Start(); BufferedReader input = FileHandler.openFileReader(inputFileName); • while((String url = input.readLine())!= null) {client.addTask(new DownloadTask(url)); //Submit tasks to System • //String text = downloadAndParseFile(url); • } • while(client.GetTaskNum() > 0) //While there are tasks in execution, try to get results try{ DownloadResultdr = client.GetResult(); • //May throw exception (if exception thrown in Executer code) • String fullFileName = createOutputFile(rootDir_); • writeResultToFile(dr.text, fullFileName); • outputFileStream.close(); } catch (Exception e) { System.out.println(e.getMessage()); } } cilent.Stop() } After (cont.)
Performance AnalysisStudy on a sample application: Distributed webpage downloading and parsing • The system has been tested with the following workload: the Task is downloading a web page given its URL, and the Result is text extracted from the HTML of the web-page using external HtmlParser library.
Benchmark #1 • the Performance (Total time) of executing 150 Tasks on 1, 2, 4 and 8 Executers which ran over 2 windows different machines
Benchmark #2 • the Performance (Total time) of executing 150 Tasks on 3, 6, 9 and 12 Executers which ran over 3 machines • (2 windows machines and 1 Linux machine).