540 likes | 651 Views
The Ibis e-Science Software Framework. Henri Bal Frank J. Seinstra, Jason Maassen, Niels Drost High Performance Distributed Computing Group Department of Computer Science VU University, Amsterdam, The Netherlands. Introduction. Distributed systems continue to change
E N D
The Ibis e-Science Software Framework Henri BalFrank J. Seinstra, Jason Maassen, Niels Drost High Performance Distributed Computing Group Department of Computer Science VU University, Amsterdam, The Netherlands
Introduction • Distributed systems continue to change • Clusters, grids, clouds, mobile devices • Distributed applications continue to change • e-Science, web, pervasive applications • Distributed programming continues to be notoriously difficult
Distributed Systems: 1980sMultiple PCs on a (local) network • Networks of Workstations (NOWs) • Collections of Workstations (COWs) • Processor pools • Condor pools • Clusters
Distributed Systems: 1990sSharing wide-area resources • Metacomputing (Smarr & Catlett, CACM) • Flocking Condor (Epema) • DAS (Distributed ASCI Supercomputer) • Grid Blueprint (Foster & Kesselman) • Desktop grids, SETI@home
Distributed Systems: 2000s Cloud computing Pay-on-demand Virtualization Hardware diversity /heterogeneous computing Green IT The Networked World Sensor networks Smart phones
Our approach • Study fundamental underlying problems • … hand-in-hand with realistic applications • … integrate solutions in one system: Ibis ! User Distributed Systems • Funding from NWO (2002), VL-e (2003-2009), EU (JavaGAT, XtreemOS, Contrail), VU, COMMIT
Outline • ‘Problem Solving’ vs. ‘System Fighting’ • Jungle Computing • Example applications: • Computational Astrophysics • Multimedia Content Analysis • The Ibis Software Framework • The 3 Common Uses of Ibis • ‘Master Key’ + ‘Glue’ + ‘HPC’ • Some current work: Green Clouds
A Random Example: Supernova Detection • DACH 2008, Japan • Distributed multi-cluster system • Heterogeneous • Distributed database (image pairs) • Large vs small databases/images • Partial replication • Image-pair comparison given (in C) • Find all supernova candidates • Task 1: As fast as possible • Task 2: Idem, under system crashes
‘Problem Solving’ vs. ‘System Fighting’ • All participating teams struggled (1 month) • Middleware instabilities… • Connectivity problems… • Load balancing… • But not the Ibis team • Winner (by far) in both categories • Note: many Japanese teams with years of experience • Hardware, middleware, network, C-code, image data… • Focus on ‘problem solving’, not ‘system fighting’ • incl. ‘opening’ of black-box C-code
Ibis Results: Awards & Prizes • Many domains; data/compute intensive, real-time... • Winner Sustainability Award in the Enlighten Your Research (EYR) competition, 7 Dec. 2011 (Frank Seinstra) 1st Prize: DACH 2008 - BS 1st Prize: DACH 2008 - FT AAAI-VC 2007 Most Visionary Research Award WebPie: A Web-Scale Parallel Inference Engine J. Urbani, S. Kotoulas, J. Maassen, N. Drost, F.J. Seinstra, F. van Harmelen, and H.E. Bal 3rd Prize: ISWC 2008 1st Prize: SCALE 2008 1st Prize: SCALE 2010
Ibis Users… …and many more
Jungle Computing (Frank Seinstra) • ‘Worst case’ computing as required by end-users • Distributed • Heterogeneous • Hierarchical (incl. multi-/many-cores)
Why Jungle Computing? • Scientists often forced to use a wide variety of resources simultaneously to solve computational problems, e.g. due to: • Desire for scalability • Distributed nature of (input) data • Software heterogeneity (e.g.: mix of C/MPI and CUDA) • Ad hoc hardware availability • Energy consumption (use most energy-efficient resource) • … • Note: most users do not need ‘worst case’ jungle • Ibis aims to apply to any subset
Example Application Domains • Computational Astrophysics (Leiden) • AMUSE: multi-model / multi-kernel simulations • “Simulating the Universe on an Intercontinental Grid” - Portegies Zwart et al (IEEE Computer, Aug 2010) • Climate Modeling (Utrecht) • CPL: multi-model / multi-kernel simulations • Atmosphere, ocean, source rock formation, … - hardware: (potentially) very diverse - high resolution => speed & scalability - …
Domain Example #1: Computational Astrophysics
Domain Example #1: Computational Astrophysics Demonstrated live at SC’11, Nov 12-18, 2011, Seattle, USA
Domain Example #1: Computational Astrophysics • The AMUSE system (Leiden University) • Early Star Cluster Evolution, including gas • Gravitational dynamics (N-body): GPU / GPU-cluster • Stellar evolution: Beowulf cluster / Cloud • Hydro-dynamics, Radiative transport: Supercomputer gravitational dynamics stellar evolution AMUSE hydro-dynamics radiative transport
Domain Example #1: Computational Astrophysics Demonstrated live at SC’11, Nov 12-18, 2011, Seattle, USA
Domain Example #2: Multimedia Content Analysis
Multimedia Content Analysis (MMCA) • Aim: • Automatic extraction of ‘semantic concepts’ from image sets and video streams • Depending on specific problem & size of data set: • May take hours, days, weeks, months, years…
Multimedia Content Analysis (MMCA) • Applications in (a.o): • Remote Sensing • Security / Surveillance • Medical Imaging • Document Analysis • Multimedia Systems • Astronomy • Application types: • Real-time vs. off-line • Fine-grained vs. coarse-grained • Data-intensive / compute-intensive / information-intensive
Domain Example #2: Color-based Object Recognition by a Grid-connected Robot Dog Seinstra et al (IEEE Multimedia, Oct-Dec 2007) Seinstra et al (AAAI’07: Most Visionary Research Award)
Successful… • …but many fundamental problems unsolved! • Scaling up to very large systems • Platform independence • Middleware independence • Connectivity (a.o. firewalls, …) • Fault-tolerance • … • Software support tool(s) urgently needed! • Jungle-aware + transparent + efficient • No progress until ‘discovery’ of Ibis
The Ibis Software Framework • Offers all functionality to efficiently & transparently implement & run Jungle Computing applications • Designed for dynamic / hostile environments • Modular and flexible • Allow replacement of Ibis components by external ones, including native code • Open source • Download: http://www.cs.vu.nl/ibis/
Programming Logical Likes math Deployment Practical Visual (GUI) Ibis Design • Applications need functionality for • Programming (as in programming languages) • Deployment (as in operating systems)
JavaGAT • Java Grid Application Toolkit • High-level API for developing (Grid) applications independentlyof the underlying (Grid) middleware • Use (Grid) services; file cp, resource discovery, job submission, … • Developed in EU GridLab project • Thilo Kielmann, Rob van Nieuwpoort • SAGA API standardized by OGF • Simple API for Grid Applications (a.o. with LSU) • SAGA on top of JavaGAT (and v.v.)
Zorilla • A prototype P2P middleware • A Zorilla system consists of a collection of nodes, connected by a P2P network • Each node independent & implements all middleware functionality • No central components • Supports fault-tolerance and malleability • Easily combines resources in multiple administrative domains
Ibis Portability Layer (IPL) • Java-centric ‘run-anywhere’ communication library • Sent along with your application • “MPI for the Grid” • Supports fault-tolerance and malleability • Resource tracking (Join-Elect-Leave model) • Open-world / Closed world • Efficient • Highly optimized object serialization • Can use optimized native libraries (e.g. MPI, Infiniband)
SmartSockets • Robust connection setup • Always connection in 30 different scenarios Problems: Firewalls Network Address Translation (NAT) Non-routed networks Multi-homing …
Ibis Programming Models • IPL-based programming models, a.o.: • Satin: • A divide-and-conquer model • MPJ: • The MPI binding for Java • RMI: • Object-Oriented remote Procedure Call • Jorus: • A ‘user transparent’ parallel model for multimedia applications
Ibis as ‘Master Key’ (or ‘Passepartout’) • Use JavaGAT to access ‘any’ system • Develop/run applications independentlyof available middlewares • JavaGAT ‘adaptors’ required for each middleware • ‘Intelligent dispatching’ even allows for transparent use of multiplemiddlewares • Example: file copy • JavaGAT vs. Globus • Simple, portable, … • SAGA API standardized package org.gridlab.gat.io.cpi.rftgt4; import java.net.MalformedURLException; import java.net.URL; import java.rmi.RemoteException; import java.security.cert.X509Certificate; import java.util.Calendar; import java.util.HashMap; import java.util.LinkedList; import java.util.List; import java.util.Map; import java.util.Vector; import javax.xml.namespace.QName; import javax.xml.rpc.ServiceException; import javax.xml.rpc.Stub; import javax.xml.soap.SOAPElement; import org.apache.axis.message.addressing.EndpointReferenceType; import org.apache.axis.types.URI.MalformedURIException; import org.globus.axis.util.Util; import org.globus.delegation.DelegationConstants; import org.globus.delegation.DelegationException; import org.globus.delegation.DelegationUtil; import org.globus.gsi.GlobusCredential; import org.globus.gsi.GlobusCredentialException; import org.globus.gsi.gssapi.GlobusGSSCredentialImpl; import org.globus.gsi.jaas.JaasGssUtil; import org.globus.rft.generated.BaseRequestType; import org.globus.rft.generated.CreateReliableFileTransferInputType; import org.globus.rft.generated.CreateReliableFileTransferOutputType; import org.globus.rft.generated.DeleteRequestType; import org.globus.rft.generated.DeleteType; import org.globus.rft.generated.OverallStatus; import org.globus.rft.generated.RFTFaultResourcePropertyType; import org.globus.rft.generated.RFTOptionsType; import org.globus.rft.generated.ReliableFileTransferFactoryPortType; import org.globus.rft.generated.ReliableFileTransferPortType; import org.globus.rft.generated.Start; import org.globus.rft.generated.TransferRequestType; import org.globus.rft.generated.TransferType; import org.globus.transfer.reliable.client.BaseRFTClient; import org.globus.transfer.reliable.service.RFTConstants; import org.globus.wsrf.NotificationConsumerManager; import org.globus.wsrf.NotifyCallback; import org.globus.wsrf.ResourceException; import org.globus.wsrf.WSNConstants; import org.globus.wsrf.container.ContainerException; import org.globus.wsrf.container.ServiceContainer; import org.globus.wsrf.core.notification.ResourcePropertyValueChangeNotificationElementType; import org.globus.wsrf.encoding.DeserializationException; import org.globus.wsrf.encoding.ObjectDeserializer; import org.globus.wsrf.impl.security.authentication.Constants; import org.globus.wsrf.impl.security.authorization.Authorization; import org.globus.wsrf.impl.security.authorization.HostAuthorization; import org.globus.wsrf.impl.security.authorization.IdentityAuthorization; import org.globus.wsrf.impl.security.authorization.SelfAuthorization; import org.globus.wsrf.impl.security.descriptor.ClientSecurityDescriptor; import org.globus.wsrf.impl.security.descriptor.ContainerSecurityDescriptor; import org.globus.wsrf.impl.security.descriptor.GSISecureMsgAuthMethod; import org.globus.wsrf.impl.security.descriptor.GSITransportAuthMethod; import org.globus.wsrf.impl.security.descriptor.ResourceSecurityDescriptor; import org.globus.wsrf.impl.security.descriptor.SecurityDescriptorException; import org.globus.wsrf.security.SecurityManager; import org.gridlab.gat.CouldNotInitializeCredentialException; import org.gridlab.gat.CredentialExpiredException; import org.gridlab.gat.GATContext; import org.gridlab.gat.GATInvocationException; import org.gridlab.gat.GATObjectCreationException; import org.gridlab.gat.Preferences; import org.gridlab.gat.URI; import org.gridlab.gat.io.cpi.FileCpi; import org.gridlab.gat.security.globus.GlobusSecurityUtils; import org.ietf.jgss.GSSCredential; import org.ietf.jgss.GSSException; import org.oasis.wsn.Subscribe; import org.oasis.wsn.TopicExpressionType; import org.oasis.wsrf.faults.BaseFaultType; import org.oasis.wsrf.lifetime.SetTerminationTime; import org.oasis.wsrf.properties.GetMultipleResourcePropertiesResponse; import org.oasis.wsrf.properties.GetMultipleResourceProperties_Element; import org.oasis.wsrf.properties.ResourcePropertyValueChangeNotificationType; class RFTGT4NotifyCallback implements NotifyCallback { RFTGT4FileAdaptor transfer; OverallStatus status; public RFTGT4NotifyCallback(RFTGT4FileAdaptor transfer) { super(); this.transfer = transfer; this.status = null; } @SuppressWarnings("unchecked") public void deliver(List topicPath, EndpointReferenceType producer, Object messageWrapper) { try { ResourcePropertyValueChangeNotificationType message = ((ResourcePropertyValueChangeNotificationElementType) messageWrapper) .getResourcePropertyValueChangeNotification(); this.status = (OverallStatus) message.getNewValue().get_any()[0] .getValueAsType(RFTConstants.OVERALL_STATUS_RESOURCE, OverallStatus.class); if (status.getFault() != null) { transfer.setFault(getFaultFromRP(status.getFault())); } // RunQueue.getInstance().add(this.resourceKey); } catch (Exception e) { } transfer.setStatus(status); } private BaseFaultTypegetFaultFromRP(RFTFaultResourcePropertyType fault) { if (fault == null) { return null; } if (fault.getDelegationEPRMissingFaultType() != null) { return fault.getDelegationEPRMissingFaultType(); } else if (fault.getRftAuthenticationFaultType() != null) { return fault.getRftAuthenticationFaultType(); } else if (fault.getRftAuthorizationFaultType() != null) { return fault.getRftAuthorizationFaultType(); } else if (fault.getRftDatabaseFaultType() != null) { return fault.getRftDatabaseFaultType(); } else if (fault.getRftRepeatedlyStartedFaultType() != null) { return fault.getRftRepeatedlyStartedFaultType(); } else if (fault.getTransferTransientFaultType() != null) { return fault.getTransferTransientFaultType(); } else if (fault.getRftTransferFaultType() != null) { return fault.getRftTransferFaultType(); } else { return null; } } } @SuppressWarnings("serial") public class RFTGT4FileAdaptor extends FileCpi { public static final Authorization DEFAULT_AUTHZ = HostAuthorization .getInstance(); Integer msgProtectionType = Constants.SIGNATURE; static final int TERM_TIME = 20; static final String PROTOCOL = "https"; private static final String BASE_SERVICE_PATH = "/wsrf/services/"; public static final int DEFAULT_DURATION_HOURS = 24; public static final Integer DEFAULT_MSG_PROTECTION = Constants.SIGNATURE; public static final String DEFAULT_FACTORY_PORT = "8443"; private static final int DEFAULT_GRIDFTP_PORT = 2811; NotificationConsumerManagernotificationConsumerManager; EndpointReferenceTypenotificationConsumerEPR; EndpointReferenceTypenotificationProducerEPR; String securityType; String factoryUrl; GSSCredential proxy; Authorization authorization; String host; OverallStatus status; BaseFaultType fault; String locationStr; ReliableFileTransferFactoryPortTypefactoryPort; public RFTGT4FileAdaptor(GATContextgatContext, Preferences preferences, URI location) throws GATObjectCreationException { super(gatContext, preferences, location); if (!location.isCompatible("gsiftp") && !location.isCompatible("gridftp")) { throw new GATObjectCreationException("cannot handle this URI"); } String globusLocation = System.getenv("GLOBUS_LOCATION"); if (globusLocation == null) { throw new GATObjectCreationException("$GLOBUS_LOCATION is not set"); } System.setProperty("GLOBUS_LOCATION", globusLocation); System.setProperty("axis.ClientConfigFile", globusLocation + "/client-config.wsdd"); this.host = location.getHost(); this.securityType = Constants.GSI_SEC_MSG; this.authorization = null; this.proxy = null; try { proxy = GlobusSecurityUtils.getGlobusCredential(gatContext, preferences, "globus", location, DEFAULT_GRIDFTP_PORT); } catch (CouldNotInitializeCredentialException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (CredentialExpiredException e) { // TODO Auto-generated catch block e.printStackTrace(); } this.notificationConsumerManager = null; this.notificationConsumerEPR = null; this.notificationProducerEPR = null; this.status = null; this.fault = null; factoryPort = null; this.factoryUrl = PROTOCOL + "://" + host + ":" + DEFAULT_FACTORY_PORT + BASE_SERVICE_PATH + RFTConstants.FACTORY_NAME; locationStr = setLocationStr(location); } String setLocationStr(URI location) { if (location.getScheme().equals("any")) { return "gsiftp://" + location.getHost() + ":" + location.getPort() + "/" + location.getPath(); } else { return location.toString(); } } protected boolean copy2(String destStr) throws GATInvocationException { EndpointReferenceTypecredentialEndpoint = getCredentialEPR(); TransferType[] transferArray = new TransferType[1]; transferArray[0] = new TransferType(); transferArray[0].setSourceUrl(locationStr); transferArray[0].setDestinationUrl(destStr); RFTOptionsTyperftOptions = new RFTOptionsType(); rftOptions.setBinary(Boolean.TRUE); // rftOptions.setIgnoreFilePermErr(false); TransferRequestType request = new TransferRequestType(); request.setRftOptions(rftOptions); request.setTransfer(transferArray); request.setTransferCredentialEndpoint(credentialEndpoint); setRequest(request); while (!transfersDone()) { try { Thread.sleep(1000); } catch (InterruptedException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } } return transfersSucc(); } public void copy(URI dest) throws GATInvocationException { String destUrl = setLocationStr(dest); if (!copy2(destUrl)) { throw new GATInvocationException( "RFTGT4FileAdaptor: file copy failed"); } } public void subscribe(ReliableFileTransferPortTyperft) throws GATInvocationException { Map<Object, Object> properties = new HashMap<Object, Object>(); properties.put(ServiceContainer.CLASS, "org.globus.wsrf.container.GSIServiceContainer"); if (this.proxy != null) { ContainerSecurityDescriptorcontainerSecDesc = new ContainerSecurityDescriptor(); SecurityManager.getManager(); try { containerSecDesc.setSubject(JaasGssUtil .createSubject(this.proxy)); } catch (GSSException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: ContainerSecurityDescriptor failed, " + e); } properties.put(ServiceContainer.CONTAINER_DESCRIPTOR, containerSecDesc); } this.notificationConsumerManager = NotificationConsumerManager .getInstance(properties); try { this.notificationConsumerManager.startListening(); } catch (ContainerException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: NotificationConsumerManager failed, " + e); } List<Object> topicPath = new LinkedList<Object>(); topicPath.add(RFTConstants.OVERALL_STATUS_RESOURCE); ResourceSecurityDescriptorsecurityDescriptor = new ResourceSecurityDescriptor(); String authz = null; if (authorization == null) { authz = Authorization.AUTHZ_NONE; } else if (authorization instanceofHostAuthorization) { authz = Authorization.AUTHZ_NONE; } else if (authorization instanceofSelfAuthorization) { authz = Authorization.AUTHZ_SELF; } else if (authorization instanceofIdentityAuthorization) { // not supported throw new GATInvocationException( "RFTGT4FileAdaptor: identity authorization not supported"); } else { // throw an sg throw new GATInvocationException( "RFTGT4FileAdaptor: set authorization failed"); } securityDescriptor.setAuthz(authz); Vector<Object> authMethod = new Vector<Object>(); if (this.securityType.equals(Constants.GSI_SEC_MSG)) { authMethod.add(GSISecureMsgAuthMethod.BOTH); } else { authMethod.add(GSITransportAuthMethod.BOTH); } try { securityDescriptor.setAuthMethods(authMethod); } catch (SecurityDescriptorException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: setAuthMethods failed, " + e); } RFTGT4NotifyCallback notifyCallback = new RFTGT4NotifyCallback(this); try { notificationConsumerEPR = notificationConsumerManager .createNotificationConsumer(topicPath, notifyCallback, securityDescriptor); } catch (ResourceException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: createNotificationConsumer failed, " + e); } Subscribe subscriptionRequest = new Subscribe(); subscriptionRequest.setConsumerReference(notificationConsumerEPR); TopicExpressionTypetopicExpression = null; try { topicExpression = new TopicExpressionType( WSNConstants.SIMPLE_TOPIC_DIALECT, RFTConstants.OVERALL_STATUS_RESOURCE); } catch (MalformedURIException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: create TopicExpressionType failed, " + e); } subscriptionRequest.setTopicExpression(topicExpression); try { rft.subscribe(subscriptionRequest); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: subscription failed, " + e); } } protected EndpointReferenceTypegetCredentialEPR() throws GATInvocationException { this.status = null; URL factoryURL = null; try { factoryURL = new URL(factoryUrl); } catch (MalformedURLException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: set factoryURL failed, " + e); } try { factoryPort = BaseRFTClient.rftFactoryLocator .getReliableFileTransferFactoryPortTypePort(factoryURL); } catch (ServiceException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: set factoryPort failed, " + e); } setSecurityTypeFromURL(factoryURL); return populateRFTEndpoints(factoryPort); } protected void setRequest(BaseRequestType request) throws GATInvocationException { CreateReliableFileTransferInputType input = new CreateReliableFileTransferInputType(); if (request instanceofTransferRequestType) { input.setTransferRequest((TransferRequestType) request); } else { input.setDeleteRequest((DeleteRequestType) request); } Calendar termTimeDel = Calendar.getInstance(); termTimeDel.add(Calendar.MINUTE, TERM_TIME); input.setInitialTerminationTime(termTimeDel); CreateReliableFileTransferOutputType response = null; try { response = factoryPort.createReliableFileTransfer(input); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: set createReliableFileTransfer failed, " + e); } EndpointReferenceTypereliableRFTEndpoint = response .getReliableTransferEPR(); ReliableFileTransferPortTyperft = null; try { rft = BaseRFTClient.rftLocator .getReliableFileTransferPortTypePort(reliableRFTEndpoint); } catch (ServiceException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: getReliableFileTransferPortTypePort failed, " + e); } setStubSecurityProperties((Stub) rft); subscribe(rft); Calendar termTime = Calendar.getInstance(); termTime.add(Calendar.MINUTE, TERM_TIME); SetTerminationTimereqTermTime = new SetTerminationTime(); reqTermTime.setRequestedTerminationTime(termTime); try { rft.setTerminationTime(reqTermTime); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: setTerminationTime failed, " + e); } try { rft.start(new Start()); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: start failed, " + e); } } private void setSecurityTypeFromURL(URL url) { if (url.getProtocol().equals("http")) { securityType = Constants.GSI_SEC_MSG; } else { Util.registerTransport(); securityType = Constants.GSI_TRANSPORT; } } private void setStubSecurityProperties(Stub stub) { ClientSecurityDescriptorsecDesc = new ClientSecurityDescriptor(); if (this.securityType.equals(Constants.GSI_SEC_MSG)) { secDesc.setGSISecureMsg(this.getMessageProtectionType()); } else { secDesc.setGSITransport(this.getMessageProtectionType()); } secDesc.setAuthz(getAuthorization()); if (this.proxy != null) { // set proxy credential secDesc.setGSSCredential(this.proxy); } stub._setProperty(Constants.CLIENT_DESCRIPTOR, secDesc); } public Integer getMessageProtectionType() { return (this.msgProtectionType == null) ? RFTGT4FileAdaptor.DEFAULT_MSG_PROTECTION : this.msgProtectionType; } public Authorization getAuthorization() { return (authorization == null) ? DEFAULT_AUTHZ : this.authorization; } private EndpointReferenceTypepopulateRFTEndpoints( ReliableFileTransferFactoryPortTypefactoryPort) throws GATInvocationException { EndpointReferenceType[] delegationFactoryEndpoints = fetchDelegationFactoryEndpoints(factoryPort); EndpointReferenceTypedelegationEndpoint = delegate(delegationFactoryEndpoints[0]); return delegationEndpoint; } private EndpointReferenceType delegate( EndpointReferenceTypedelegationFactoryEndpoint) throws GATInvocationException { GlobusCredential credential = null; if (this.proxy != null) { credential = ((GlobusGSSCredentialImpl) this.proxy) .getGlobusCredential(); } else { try { credential = GlobusCredential.getDefaultCredential(); } catch (GlobusCredentialException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } } int lifetime = DEFAULT_DURATION_HOURS * 60 * 60; ClientSecurityDescriptorsecDesc = new ClientSecurityDescriptor(); if (this.securityType.equals(Constants.GSI_SEC_MSG)) { secDesc.setGSISecureMsg(this.getMessageProtectionType()); } else { secDesc.setGSITransport(this.getMessageProtectionType()); } secDesc.setAuthz(getAuthorization()); if (this.proxy != null) { secDesc.setGSSCredential(this.proxy); } // Get the public key to delegate on. X509Certificate[] certsToDelegateOn = null; try { certsToDelegateOn = DelegationUtil.getCertificateChainRP( delegationFactoryEndpoint, secDesc); } catch (DelegationException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } X509Certificate certToSign = certsToDelegateOn[0]; // FIXME remove when there is a DelegationUtil.delegate(EPR, ...) String protocol = delegationFactoryEndpoint.getAddress().getScheme(); String host = delegationFactoryEndpoint.getAddress().getHost(); int port = delegationFactoryEndpoint.getAddress().getPort(); String factoryUrl = protocol + "://" + host + ":" + port + BASE_SERVICE_PATH + DelegationConstants.FACTORY_PATH; // send to delegation service and get epr. EndpointReferenceTypecredentialEndpoint = null; try { credentialEndpoint = DelegationUtil.delegate(factoryUrl, credential, certToSign, lifetime, false, secDesc); } catch (DelegationException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } return credentialEndpoint; } public EndpointReferenceType[] fetchDelegationFactoryEndpoints( ReliableFileTransferFactoryPortTypefactoryPort) throws GATInvocationException { GetMultipleResourceProperties_Element request = new GetMultipleResourceProperties_Element(); request .setResourceProperty(new QName[] { RFTConstants.DELEGATION_ENDPOINT_FACTORY }); GetMultipleResourcePropertiesResponse response; try { response = factoryPort.getMultipleResourceProperties(request); } catch (RemoteException e) { e.printStackTrace(); throw new GATInvocationException( "RFTGT4FileAdaptor: getMultipleResourceProperties, " + e); } SOAPElement[] any = response.get_any(); EndpointReferenceType epr1 = null; try { epr1 = (EndpointReferenceType) ObjectDeserializer.toObject(any[0], EndpointReferenceType.class); } catch (DeserializationException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: ObjectDeserializer, " + e); } EndpointReferenceType[] endpoints = new EndpointReferenceType[] { epr1 }; return endpoints; } synchronized void setStatus(OverallStatus status) { this.status = status; } public inttransfersActive() { if (status == null) { return 1; } return status.getTransfersActive(); } public inttransfersFinished() { if (status == null) { return 0; } return status.getTransfersFinished(); } public inttransfersCancelled() { if (status == null) { return 0; } return status.getTransfersCancelled(); } public inttransfersFailed() { if (status == null) { return 0; } return status.getTransfersFailed(); } public inttransfersPending() { if (status == null) { return 1; } return status.getTransfersPending(); } public inttransfersRestarted() { if (status == null) { return 0; } return status.getTransfersRestarted(); } public booleantransfersDone() { return (transfersActive() == 0 && transfersPending() == 0 && transfersRestarted() == 0); } public booleantransfersSucc() { return (transfersDone() && transfersFailed() == 0 && transfersCancelled() == 0); } /* * private BaseFaultTypegetFaultFromRP(RFTFaultResourcePropertyType fault) { * if (fault == null) { return null; } * * if (fault.getRftTransferFaultType() != null) { return * fault.getRftTransferFaultType(); } else if * (fault.getDelegationEPRMissingFaultType() != null) { return * fault.getDelegationEPRMissingFaultType(); } else if * (fault.getRftAuthenticationFaultType() != null) { return * fault.getRftAuthenticationFaultType(); } else if * (fault.getRftAuthorizationFaultType() != null) { return * fault.getRftAuthorizationFaultType(); } else if * (fault.getRftDatabaseFaultType() != null) { return * fault.getRftDatabaseFaultType(); } else if * (fault.getRftRepeatedlyStartedFaultType() != null) { return * fault.getRftRepeatedlyStartedFaultType(); } else if * (fault.getTransferTransientFaultType() != null) { return * fault.getTransferTransientFaultType(); } else { return null; } } */ /* * private BaseFaultTypedeserializeFaultRP(SOAPElement any) throws * Exception { return getFaultFromRP((RFTFaultResourcePropertyType) * ObjectDeserializer .toObject(any, RFTFaultResourcePropertyType.class)); } */ void setFault(BaseFaultType fault) { this.fault = fault; } } package tutorial; import org.gridlab.gat.GAT; import org.gridlab.gat.GATContext; import org.gridlab.gat.URI; import org.gridlab.gat.io.File; public class RemoteCopy { public static void main(String[] args) throws Exception { GATContext context = new GATContext(); URI src = new URI(args[0]); URI dest = new URI(args[1]); File file = GAT.createFile(context, src); file.copy(dest); GAT.end(); } }
Ibis as ‘Glue’ • Use IPL + SmartSockets, generally for wide-area communication • Linking up separate ‘activities’ of an application • Activities: often largely ‘independent’ tasks implemented in any popular language or model (e.g. C/MPI, CUDA, Fortran, Java…) • Each typically running on a single GPU/node/Cluster/Cloud/… • Automatically circumvent connectivity problems • Example: With SmartSockets: No SmartSockets:
Ibis as ‘HPC Solution’ • Use Ibis as replacement for e.g. C++/MPI code • Benefits: • (better) portability • malleability (open world) • fault-tolerance • (run-time) task migration • Downside: • requires recoding • Comparable speedups:
MMCA: Situation in 2004/2005 • Code pre-installed at each cluster site • Instable / faulty communication • Connectivity problems • Execution on each cluster ‘by hand’ SSH Parallel Horus Client Parallel Horus Server Sockets + SSH Tunneling Parallel Horus Client C++/MPI
Phase 1: Ibis as ‘Master Key’ (2006) JavaGAT + IbisDeploy Parallel Horus Client Parallel Horus Server Sockets + SSH Tunneling Parallel Horus Client C++/MPI • Code pre-installed at each cluster site • Instable / faulty communication • Connectivity problems • Execution on each cluster ‘by hand’
Phase 2: Ibis as ‘Glue’ (2006/2007) JavaGAT + IbisDeploy Parallel Horus Client Parallel Horus Server IPL + SmartSockets Parallel Horus Client C++/MPI • Code pre-installed at each cluster site • Instable / faulty communication • Connectivity problems • Execution on each cluster ‘by hand’
Phase 3: Ibis as ‘HPC Solution’ (2008) JavaGAT + IbisDeploy Parallel Jorus Client Parallel Jorus Server IPL + SmartSockets Parallel Jorus Client Ibis/Java • Code pre-installed at each cluster site • Instable / faulty communication • Connectivity problems • Execution on each cluster ‘by hand’
‘Master Key’ + ‘Glue’ + ‘HPC’ • Step-wise conversion to 100% Ibis / Java • Phase 1: JavaGAT as ‘Master Key’ • Phase 2: IPL + SmartSockets as ‘Glue’ • Phase 3: Ibis as ‘HPC Solution’ • After each phase a fully functional, working solution was available! • Eventual result: • ‘wall-socket computing from a memory stick’ • Remember: the ‘Promise of the Grid’? • Awards at AAAI 2007 and CCGrid 2008
100% Ibis Implementation (2008++) Seinstra, Maassen, Drost et al. (SCALE 2008 @ CCGrid 2008: First Prize Winner) Bal, Maassen, Drost, Seinstra et al. (IEEE Computer, Aug 2010)
Green Clouds • NWO Smart Energy Systems project with Univ. of Amsterdam (Cees de Laat) & SARA • How to map high-performance applications onto hybrid distributed computing system, taking both performance & energy consumption into account • System-level approach to reduce HPC energy consumption
DAS-4: infrastructure for Green IT UvA/MultimediaN (16/36) Dual quad-core Xeon E5620 Various accelerators (GPUs, multicores, ….) Scientific Linux Built by ClusterVision VU (74) SURFnet6 ASTRON (23) 10 Gb/s lambdas TU Delft (32) Leiden (16)
Main ideas • Adapt resources to application needs dynamically, accounting for computational & energy efficiency • Using Ibis malleability support • Exploit hardware diversity • Graphics Processing Units (GPUs) have much higher FLOPS/Watt for many applications • Use optical and photonic networks • Build a knowledge base & semantic infrastructure description