530 likes | 675 Views
myGrid. Architectural issues in a bioinformatics Grid http://www.mygrid.org.uk Luc Moreau, University of Southampton, UK. Overview. Bioinformatics background myGrid facts Service oriented architecture Architectural issues Notification service Grid component model Service directory
E N D
myGrid Architectural issues in a bioinformatics Grid http://www.mygrid.org.uk Luc Moreau, University of Southampton, UK
Overview • Bioinformatics background • myGrid facts • Service oriented architecture • Architectural issues • Notification service • Grid component model • Service directory • Conclusions
Bioinformatics & Genomics • Large amounts of data • Highly heterogeneous • Data types • Data forms • Community • Highly complex and inter-related • Volatile
Bioinformatics Data • Descriptive as well as numeric • Literature • Analogy/ knowledge-based Text Extraction
Bioinformatics Analysis • Different algorithms • BLAST, FASTA, pSW • Different implementations • WU-BLAST, NCBI-BLAST • Different service providers • NCBI, EBI, DDBJ
The HGP will make available potentially thousands of targets for Understanding biology & genetics Drug discovery Diagnostics Many genes will be linked with diseases Cancer HIV Parkinson’s Asthma Malaria Autoimmune (arthritis) Cardiovascular Antibacterial & antifungal The Human Genome Project
In silico experimentation • Discovery of resources and tools, staging of operations, sharing of results • Process is as important as outcome • Science is dynamic – change happens • Scientific discovery is personal & global • Provenance and history
Overview • Bioinformatics background • myGrid facts • Service oriented architecture • Architectural issues • Notification service • Grid component model • Service directory • Conclusion
myGrid • EPSRC funded pilot project • Generic middleware within application setting • 36 month in 42 month performance period • Start 1st October • 16 full-time post docs altogether • 6 DTA studenships • 1 technical project manager • 1 system manager • 1 secretarial post
myGrid consortium • Scientific Team • Biologists and Bioinformaticians • GSK, AZ, Merck KGaA, Manchester, EBI • Technical Team • Manchester, Southampton, Newcastle, Sheffield, EBI, Nottingham • IBM, SUN • GeneticXchange • Network Inference, Epistemics Ltd
myGrid outcomes • e-Scientists • Bioinformatics demonstrator (on cold carp) • Developers • myGrid-in-a-Box developers kit • Integrating some existing bioinformatics tools with myGrid
Overview • Bioinformatics background • myGrid facts • Service oriented architecture • Architectural issues • Notification service • Grid component model • Service directory • Conclusions
Overview • Bioinformatics background • myGrid facts • Service oriented architecture • Architectural issues • Notification service • Grid component model • Service directory • Conclusions
Architectural Issues • Notification service
Vision • Asynchronous delivery and persistence of messages • Topics can be created and discovered on the fly • Subscribers can subscribe to topics, publishers can publish messages on a given topic • Peer to peer network of notification services • Topology can be re-organized to enhance reliability • Subscribers and publishers can negotiate over QoS
notifications Subscriber Subscriber stub Publisher stub Publisher Subscriberdelegator publisherdelegator QoS A notification service instance
Hub-1 Hub-2 Hub-3 NS-1-2 NS-2-2 NS-2-1 NS-1-3 NS-3-1 NS-1-1 NS-3-2 P-1-3-1 P-1-1-2 P-1-3-2 P-1-1-1 S-2-1-1 S-1-1-1 P-2-2-2 S-3-1-1 P-2-2-1 P-3-1-1 P-3-2-1 Federated notification services • Strong communication links between hubs • Efficient data replication • Simple notification routing
Current status • Push and pull messaging • Topic,message and publisher filter • WSDL interface • Workflow interaction • Integration with mySQL, openJMS, tomcat and Axis • Federated service (undergraduate project) • QoS negotiation (PhD work underway) • OGSA compliance
Experimentation • Windows and Unix platforms with Tomcat 4.0.5, Axis beta 3.0, OpenJMS 0.7.2 and mySQL 3.23.51 • Aggregation test with 500 topics, 2,000 subscribers, 2,000 publishers and 10,000 registered subscriptions, 10,000 notifications • 72 hours non-stop subscribing/publishing with the above populations
Architectural Issues • Notification service
Architectural Issues • Notification service • Grid component model
Grid Component Model The myGrid framework is a component model for flexible, simple and future-proof deployment and use of services on the Grid.
Problems Addressed • For service developers and deployers: • Ease of development of sophisticated services by separation of concerns and re-use of third party functionality. • Consistent distribution of functionality over a set of services, e.g. access control, support for fault-tolerance. • Application of solutions to the above to services deployed using technologies such as OGSA Grid Services, Web Services and Enterprise JavaBeans.
Problems Addressed • For service clients: • Development of service clients that are not limited by the range of standards known at deployment time. • Control over how service operations are invoked, so that they can make use of the most suitable protocols supported by a service. • Provision of a standard client interface hiding the differences in deployment philosophy that each middleware technology brings. • Application of solutions to the above to services deployed using technologies such as OGSA Grid Services, Web Services and Enterprise JavaBeans.
Current Status • Startpoints for Web Services • Deployment within nested containers • Facades for exposing EJBs as Web Services • Performance tests
Current Work • Automated deployment in nested containers • Definition of containers for deployment-time configuration • Using containers to provide minimal functionality of OGSA Grid Services • Startpoints for EJBs, Grid Services
Experimentation • Our experiments have shown that nesting in our containers is not costly compared to method invocation and nested inner classes • The cost of calling EJBs via the Web Service façade comes mostly from the use of SOAP, and the consequential requirement for conversion to/from objects
Architectural Issues • Notification service • Grid component model
Architectural Issues • Notification service • Grid component model • Service directory
Service Directory Views • Multiple service directories will co-exist (IBM, Microsoft, EBI, local institutions) • Need to attach metadata to service directory entries • Metadata is personal to the scientist: trust, perceived QoS, ontological description • Need for a mechanism to allow scientists to add their metadata and to make it available to other users as a “regular service directory”.
Views: status • Currently in design phase • Use cases in the process of being finalized • Preliminary specification of interfaces • More work is needed on policy languages • Design to be finalized by end of January • First prototype of core functionality 4 months later
Overview • Bioinformatics background • myGrid facts • Service oriented architecture • Architectural issues • Notification service • Grid component model • Service directory • Conclusions
Conclusions • More architectural issues being addressed • Security (GSI, RBAC), but where is the community going? • Fault tolerance
Workflow enactment • WSFL compatible enactment engine • Support for fault tolerance, checkpointing, migration • Editor