240 likes | 247 Views
This presentation discusses the research goals, overview, comparative benchmarks, and applications of Tycho, a framework that combines registry and messaging services into a single software framework for simplified binding of distributed systems. The presentation also includes a demo of Tycho Swarm, a distribution file utility.
E N D
Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.ukViva Presentation, November 2006
Outline • Research Goals, • An Overview Of Tycho, • Comparative Benchmarks, • Applications of Tycho, • Tycho Swarm, a Distribution File Utility - (Demo), • Summary.
Some Background • Two key services for distributed systems are a mechanism for discovering remote components (such as a registry) and then sending messages between these components: • These two services are interdependent. • Current solutions require the application scientists to assemble their systems from a diverse range of services. • One approach has been to produce toolkits which have pre-selected sets of service bundled together, for example Globus.
Research Goals • The thesis of this research work is that by combining registry and messaging into a single software framework, the task of binding together distributed systems can be simplified. • The proposed solution uses an Internet-based architecture that keeps complexity at the edges of a robust and secure set of core services - a novel approach! • This framework facilitates extensibility while limiting the installation and management costs of using the software. • The design and development of the framework - known as Tycho - has an overarching goal of reducing the complexity of developing distributed applications.
High-level Requirements These are the desirable features for Tycho - as argued in the dissertation: • Scalability, be able to cope with the sizes typical of modern distributed systems, • High-performance, • Extensibility, be able to add new features and interoperate with other systems, • Security out of the box, • Manageability, ease of installation and use: • For example minimizing elememnts like software dependencies, firewall requirements and the amount of configuration needed to deploy Tycho.
The Tycho Implementation • Tycho is the reference implementation of the framework developed during the PhD: • The Tycho components are: • Mediators, • Clients (Producers and Consumers), • Utilities: • The Tycho mediator provides services that allow clients to discover each other using a Virtual Registry (VR) made up of a network of mediators – this also aids communication over both LAN and WAN. • Utilities are extensions to Tycho’s functionality. • Tycho used to be called javaGMA or jGMA (poor choice of name!)
General Design Philosophy • Reuse existing software components, if possible, rather than reinvent existing services or functionality. • Try to make use of existing software infrastructure. • Ensure that Tycho is simple to install, configure and use. • Provide a ‘basic release’ with the ability to extend functionality with a further more sophisticated component - Tycho utilities. • Because we require portability and interoperability with other distributed systems, Java was a good choice of implementation language.
Tycho Mediator Implementation • Tycho provides a choice of implementations for each core service. • Tycho’s design described in a paper for a "Work-in-Progress Novel Grid Technologies" track of the IEEE International Conference Cluster Computing and Grid 2005 (CCGrid 2005).
Tycho Clients & Utilities • The Tycho Connector provides the API for building producers and consumers. • Extra functionality can be added as utilities.
Tycho Benchmarks • Three rounds of benchmarking to measure the performance of Tycho compared to state-of-the-art and widely used systems: • Communications - measured the performance of inter-client and inter-mediator messaging for Tycho and NaradaBrokering. • Virtual Registry tests - measured and compared the performance of the Tycho VR to Globus MDS4 and gLite R-GMA. • Component Tests - different components of the VR were tested in various configurations. • Results presented in a paper in proceedings of the IEEE International Conference on Cluster Computing 2006 (Cluster 2006).
Sample VR Benchmark Results MDS4 out of memory
Benchmarks Results Summary • Tycho has a better performance and client-scalability than both R-GMA, MDS4 and NaradaBrokering. • R-GMA, MDS4 and NaradaBrokering all crashed during testing when they exceeded the maximum memory available for the tests (1.5 Gbytes). • Memory management in Java systems is an issue: • Without limited buffering or flow control, consuming the Java heap is a problem. • Storing information internally using XML seems to be a source for some of these memory problems: • Java database solutions such as HSQDLB can provide a high-performance solution for off-loading some of the storage requirements to disk.
Tycho Core – Future Work • Some more performance improvements: • Caching of local mediator queries to reduce response times, • Use of a hybrid VR-interconnect to use IRC for query routing and HTTP for transporting large responses. • Additional functionality can be added to provide advanced services: • WS-based transport handlers for interoperability.
Tycho Applications • We developed a number of applications to further validate the implementation. • These include: • Demonstrations of publishing and discovering distributed webcams, • Remote resource discovery for the VOTechBroker project: • Part of the European Virtual Observatory project, Tycho provides automatic resource discovery for job submission. • Binding components together for the Semantic Log Analyser (Slogger) project: • Here Tycho helps locate and gather distributed logs for analysis.
Content Distribution With Tycho • We wanted to develop a Tycho utility that would demonstrate and validate the utility concept: • We wanted to create something useful! • We created a content distribution system call the Tycho swarm utility. • The swarm utility provides content distribution similar to BitTorrent and overcomes the common ‘2 Gigabyte file size problem’. • Content is split into ‘chunks’ and the VR is used to store chunk availability. • Peers use the VR to locate each other and decide what chunks to download. • Tycho messages are used to transfer the chunks between peers and peers cooperate to distribute the content throughout the swarm.
Swarm Utility Summary. • The utility was developed to test the potential of Tycho utilities and also further stress test the overall infrastructure: • By simultaneously utilising the VR and messaging functionality, • Storing and updating thousands of entry records in the VR, • Sending thousands of multi-megabyte messages between clients. • Its potential uses include: • Distributing files for collaboration purposes, • Staging data for computation, • Mirroring and managing large data sets.
Summary • The reference implementation of Tycho has been completed. • Tycho has been released under the LGPL Open Source license: • http://acet.rdg.ac.uk/projects/tycho/ • The focus now is on developing Tycho utilities to provide more feature rich functionally. • This work has been summarised in a paper accepted for a special issue of The Journal of Supercomputing.
Research Goals • Scalability and high-performance have been demonstrated by the benchmarking. • Extensibility has been shown with the development of the swarm utility and the different services and protocols supported by Tycho. • Tycho has security ‘out of the box’, using HTTPS and passwords or certificates for wide-area access control and encryption - no comparable system we reviewed has this currently. • Manageability has been maximised, Tycho requires one firewall port, has no external dependencies other than a JVM and can run with zero configuration.
Some Experiences / Observations • Java developers should think carefully about how memory is used in their applications. • Systems which store their data internally as XML will probably have relatively poor performance and require large amounts of memory and resources to work. • If you use a servlet container, Jetty offers much better performance than Apache Tomcat. • Instead of using a separate database, consider the Java-based HSQLDB, we have shown it can achieve excellent performance and it removes an external dependency from your software. • Java is not a magic bullet for portability, systems such as R-GMA are evidence of this.
Links • Project Web page: • http://acet.rdg.ac.uk/projects/tycho/ • The DSG Web page: • http://dsg.port.ac.uk/ • The ACET Web page: • http://acet.port.ac.uk/