80 likes | 89 Views
This progress report highlights the resolution of issues related to client-side local file system control and explores the challenges of combining and reducing data within the cloud environment. It also discusses non-intuitive experiences with Hadoop's mapping algorithm and the exploration of special files within Hadoop. Additionally, it covers the development of a cloud shell program and expert feedback received through travel engagements at the ASP-DAC conference and HKU.
E N D
Progress Report 2010/3/30
1.Successful client side behavior Since the last report, we have solved the issues involving client-side local file system control. It seems that the primary problem was that we had installed our Hadoop system in an incorrect configuration for our purposes. The client computer’s firewall prevented jobs from running locally; since there were no client jobs, we had not been able to get client side behavior.
2.Combine/Reduce exploration Now that we can trigger processes on the client computer, we can execute linux shell scripts locally. The difficulty however, is that the answer of the shell script will be a local file. We must now identify the proper way to bring this local information into the cloud. This is, in effect, the problem of diverting the input to combine reduce/reduce, so that it can come from a local file.
2.Combine/Reduce exploration(cnt’d) Our students are studying Hadoop’s combiner mechanism to see how such a redirection might be accomplished. There are also alternatives, such as communication through the file system. We are determining the overheads of our alternatives.
3.Reasoning about Map We had some non-intuitive experiences with Hadoop’s algorithm for mapping jobs onto local processes. In some cases, jobs were assigned to clients that did not locally own the file to be processed. In such a case, our entire approach becomes irrelevant. Further study has found that this only happens occasionally. We are trying to determine the cause, so that we can properly attune our piping feature.
4.Exploring special files Hadoop allows a file to be stored in multiple copies. And a large file can even be split. These situations will affect the behavior of our pipes. We do not want to process a file twice, for instance. We are studying how these situations are mapped by Hadoop.
5.Cloud Shell development Our cloud shell program must be able to take Unix scripts, divide them into parallelizable fragments, ship these fragments to the clients, and trigger the execution of the fragments, in sequence, from Hadoop.Currently, we are attempting to analyze scripts to extract parallelizable portions.
6.Expert feedback through travel • The professor attended ASP-DAC in Taipei. At this conference, there were several papers related to cloud/GRID computing. He discussed the research issues with some attendees. • He also visited HKU, where Professor Wang works with JESICA2, which I have been looking at for possible use. (I had met Dr. Wang at ISPAN in December and been invited to talk to see his lab and talk with him.) We had an extensive discussion and are looking for potentials for collaboration. He is also looking into cloud applications for JESICA2 (&3).