560 likes | 705 Views
The Ibis e-Science Software Framework. Henri Bal , Frank J. Seinstra , Jason Maassen , Niels Drost High Performance Distributed Computing Group Department of Computer Science VU University, Amsterdam, The Netherlands. Introduction. Distributed systems continue to change
E N D
The Ibis e-Science Software Framework Henri Bal, Frank J. Seinstra, Jason Maassen, NielsDrost High Performance Distributed Computing Group Department of Computer Science VU University, Amsterdam, The Netherlands
Introduction • Distributed systems continue to change • Clusters, grids, clouds, mobile devices • Distributed applications continue to change • e-Science, web, pervasive applications • Distributed programming continues to be notoriously difficult
Distributed Systems: 1980sMultiple PCs on a (local) network • Networks of Workstations (NOWs) • Collections of Workstations (COWs) • Processor pools • Condor pools • Clusters
Distributed Systems: 1990sSharing wide-area resources • Metacomputing (Smarr & Catlett, CACM) • Flocking Condor (Epema) • DAS (Distributed ASCI Supercomputer) • Grid Blueprint (Foster & Kesselman) • Desktop grids, SETI@home
Distributed Systems: 2000s Cloud computing Pay-on-demand Virtualization Hardware diversity /heterogeneous computing The Networked World Sensor networks Smart phones
Our approach • Study fundamental underlying problems • … hand-in-hand with realistic applications • … integrate solutions in one system: Ibis ! User Distributed Systems • Funding from NWO (2002), VL-e (2003-2009), EU (JavaGAT, XtreemOS, Contrail), VU, COMMIT
A Random Example: Supernova Detection • DACH 2008, Japan • Distributed multi-cluster system • Heterogeneous • Distributed database (image pairs) • Large vs small databases/images • Partial replication • Image-pair comparison given (in C) • Find all supernova candidates • Task 1: As fast as possible • Task 2: Idem, under system crashes
‘Problem Solving’ vs. ‘System Fighting’ • All participating teams struggled (1 month) • Middleware instabilities… • Connectivity problems… • Load balancing… • But not the Ibis team • Winner (by far) in both categories • Note: many Japanese teams with years of experience • Hardware, middleware, network, C-code, image data… • Focus on ‘problem solving’, not ‘system fighting’ • incl. ‘opening’ of black-box C-code
Ibis Results: Awards & Prizes • Many domains; data/compute intensive, real-time... • Winner Sustainability Award in the Enlighten Your Research (EYR) competition, 7 Dec. 2011 (Frank Seinstra) 1st Prize: DACH 2008 - BS 1st Prize: DACH 2008 - FT AAAI-VC 2007 Most Visionary Research Award WebPie: A Web-Scale Parallel Inference Engine J. Urbani, S. Kotoulas, J. Maassen, N. Drost, F.J. Seinstra, F. van Harmelen, and H.E. Bal 3rd Prize: ISWC 2008 1st Prize: SCALE 2008 1st Prize: SCALE 2010
Ibis Users… …and many more
Jungle Computing (Frank Seinstra) • ‘Worst case’ computing as requiredbyend-users • Distributed • Heterogeneous • Hierarchical (incl. multi-/many-cores)
Why Jungle Computing? • Scientists often forced to use a wide variety of resources simultaneously to solve computational problems, e.g. due to: • Desire for scalability • Distributed nature of (input) data • Software heterogeneity (e.g.: mix of C/MPI and CUDA) • Ad hoc hardware availability • Energy consumption (use most energy-efficient resource) • … • Note: most users do not need ‘worst case’ jungle • Ibis aims to apply to any subset
Example Application Domains • Computational Astrophysics (Leiden) • AMUSE: multi-model / multi-kernel simulations • “Simulating the Universe on an Intercontinental Grid” - PortegiesZwart et al (IEEE Computer, Aug 2010) • Climate Modeling (Utrecht) • CPL: multi-model / multi-kernel simulations • Atmosphere, ocean, source rock formation, … - hardware: (potentially) very diverse - high resolution => speed & scalability - …
Domain Example #1: Computational Astrophysics
Domain Example #1: Computational Astrophysics Demonstrated live at SC’11, Nov 12-18, 2011, Seattle, USA (two week ago)
Domain Example #1: Computational Astrophysics • The AMUSE system (Leiden University) • Early Star Cluster Evolution, including gas • Gravitational dynamics (N-body): GPU / GPU-cluster • Stellar evolution: Beowulf cluster / Cloud • Hydro-dynamics, Radiative transport: Supercomputer gravitational dynamics stellar evolution AMUSE hydro-dynamics radiative transport
Domain Example #1: Computational Astrophysics Demonstrated live at SC’11, Nov 12-18, 2011, Seattle, USA
Domain Example #2: Multimedia Content Analysis
Multimedia Content Analysis (MMCA) • Aim: • Automatic extraction of ‘semantic concepts’ from image sets and video streams • Depending on specific problem & size of data set: • May take hours, days, weeks, months, years…
Multimedia Content Analysis (MMCA) • Applications in (a.o): • Remote Sensing • Security / Surveillance • Medical Imaging • Document Analysis • Multimedia Systems • Astronomy • Application types: • Real-time vs. off-line • Fine-grained vs. coarse-grained • Data-intensive / compute-intensive / information-intensive
Domain Example #2: Color-based Object Recognition by a Grid-connected Robot Dog Seinstra et al (IEEE Multimedia, Oct-Dec 2007) Seinstra et al (AAAI’07: Most Visionary Research Award)
Successful… • …but many fundamental problems unsolved! • Scaling up to very large systems • Platform independence • Middleware independence • Connectivity (a.o. firewalls, …) • Fault-tolerance • … • Software support tool(s) urgently needed! • Jungle-aware + transparent + efficient • No progress until ‘discovery’ of Ibis
The Ibis Software Framework • Offers all functionality to efficiently & transparently implement & run Jungle Computing applications • Designed for dynamic / hostile environments • Modular and flexible • Allow replacement of Ibis components by external ones, including native code • Open source • Download: http://www.cs.vu.nl/ibis/
Programming Logical Likes math Deployment Practical Visual (GUI) Ibis Design • Applications need functionality for • Programming (as in programming languages) • Deployment (as in operating systems)
JavaGAT • Java Grid Application Toolkit • High-level API for developing (Grid) applications independentlyof the underlying (Grid) middleware • Use (Grid) services; file cp, resource discovery, job submission, … • Note: SAGA API standardized by OGF • Simple API for Grid Applications (a.o. with LSU) • SAGA on top of JavaGAT (and v.v.)
Zorilla • A prototype P2P middleware • A Zorilla system consists of a collection of nodes, connected by a P2P network • Each node independent & implements all middleware functionality • No central components • Supports fault-tolerance and malleability • Easily combines resources in multiple administrative domains
Ibis Portability Layer (IPL) • Java-centric ‘run-anywhere’ communication library • Sent along with your application • “MPI for the Grid” (quote 2005) • Supports fault-tolerance and malleability • Resource tracking (JEL model) • Open-world / Closed world • Efficient • Highly optimized object serialization • Can use optimized native libraries (e.g. MPI, Infiniband)
SmartSockets • Robust connection setup • Always connection in 30 different scenarios Problems: Firewalls Network Address Translation (NAT) Non-routed networks Multi-homing …
Ibis Programming Models • IPL-based programming models, a.o.: • Satin: • A divide-and-conquer model • MPJ: • The MPI binding for Java • RMI: • Object-Oriented remote Procedure Call • Jorus: • A ‘user transparent’ parallel model for multimedia applications
Ibis as ‘Master Key’ (or ‘Passepartout’) • Use JavaGAT to access ‘any’ system • Develop/run applications independentlyof available middlewares • JavaGAT ‘adaptors’ required for each middleware • ‘Intelligent dispatching’ even allows for transparent use of multiplemiddlewares • Example: file copy • JavaGAT vs. Globus • Simple, portable, … • SAGA API standardized package org.gridlab.gat.io.cpi.rftgt4; import java.net.MalformedURLException; import java.net.URL; import java.rmi.RemoteException; import java.security.cert.X509Certificate; import java.util.Calendar; import java.util.HashMap; import java.util.LinkedList; import java.util.List; import java.util.Map; import java.util.Vector; import javax.xml.namespace.QName; import javax.xml.rpc.ServiceException; import javax.xml.rpc.Stub; import javax.xml.soap.SOAPElement; import org.apache.axis.message.addressing.EndpointReferenceType; import org.apache.axis.types.URI.MalformedURIException; import org.globus.axis.util.Util; import org.globus.delegation.DelegationConstants; import org.globus.delegation.DelegationException; import org.globus.delegation.DelegationUtil; import org.globus.gsi.GlobusCredential; import org.globus.gsi.GlobusCredentialException; import org.globus.gsi.gssapi.GlobusGSSCredentialImpl; import org.globus.gsi.jaas.JaasGssUtil; import org.globus.rft.generated.BaseRequestType; import org.globus.rft.generated.CreateReliableFileTransferInputType; import org.globus.rft.generated.CreateReliableFileTransferOutputType; import org.globus.rft.generated.DeleteRequestType; import org.globus.rft.generated.DeleteType; import org.globus.rft.generated.OverallStatus; import org.globus.rft.generated.RFTFaultResourcePropertyType; import org.globus.rft.generated.RFTOptionsType; import org.globus.rft.generated.ReliableFileTransferFactoryPortType; import org.globus.rft.generated.ReliableFileTransferPortType; import org.globus.rft.generated.Start; import org.globus.rft.generated.TransferRequestType; import org.globus.rft.generated.TransferType; import org.globus.transfer.reliable.client.BaseRFTClient; import org.globus.transfer.reliable.service.RFTConstants; import org.globus.wsrf.NotificationConsumerManager; import org.globus.wsrf.NotifyCallback; import org.globus.wsrf.ResourceException; import org.globus.wsrf.WSNConstants; import org.globus.wsrf.container.ContainerException; import org.globus.wsrf.container.ServiceContainer; import org.globus.wsrf.core.notification.ResourcePropertyValueChangeNotificationElementType; import org.globus.wsrf.encoding.DeserializationException; import org.globus.wsrf.encoding.ObjectDeserializer; import org.globus.wsrf.impl.security.authentication.Constants; import org.globus.wsrf.impl.security.authorization.Authorization; import org.globus.wsrf.impl.security.authorization.HostAuthorization; import org.globus.wsrf.impl.security.authorization.IdentityAuthorization; import org.globus.wsrf.impl.security.authorization.SelfAuthorization; import org.globus.wsrf.impl.security.descriptor.ClientSecurityDescriptor; import org.globus.wsrf.impl.security.descriptor.ContainerSecurityDescriptor; import org.globus.wsrf.impl.security.descriptor.GSISecureMsgAuthMethod; import org.globus.wsrf.impl.security.descriptor.GSITransportAuthMethod; import org.globus.wsrf.impl.security.descriptor.ResourceSecurityDescriptor; import org.globus.wsrf.impl.security.descriptor.SecurityDescriptorException; import org.globus.wsrf.security.SecurityManager; import org.gridlab.gat.CouldNotInitializeCredentialException; import org.gridlab.gat.CredentialExpiredException; import org.gridlab.gat.GATContext; import org.gridlab.gat.GATInvocationException; import org.gridlab.gat.GATObjectCreationException; import org.gridlab.gat.Preferences; import org.gridlab.gat.URI; import org.gridlab.gat.io.cpi.FileCpi; import org.gridlab.gat.security.globus.GlobusSecurityUtils; import org.ietf.jgss.GSSCredential; import org.ietf.jgss.GSSException; import org.oasis.wsn.Subscribe; import org.oasis.wsn.TopicExpressionType; import org.oasis.wsrf.faults.BaseFaultType; import org.oasis.wsrf.lifetime.SetTerminationTime; import org.oasis.wsrf.properties.GetMultipleResourcePropertiesResponse; import org.oasis.wsrf.properties.GetMultipleResourceProperties_Element; import org.oasis.wsrf.properties.ResourcePropertyValueChangeNotificationType; class RFTGT4NotifyCallback implements NotifyCallback { RFTGT4FileAdaptor transfer; OverallStatus status; public RFTGT4NotifyCallback(RFTGT4FileAdaptor transfer) { super(); this.transfer = transfer; this.status = null; } @SuppressWarnings("unchecked") public void deliver(List topicPath, EndpointReferenceType producer, Object messageWrapper) { try { ResourcePropertyValueChangeNotificationType message = ((ResourcePropertyValueChangeNotificationElementType) messageWrapper) .getResourcePropertyValueChangeNotification(); this.status = (OverallStatus) message.getNewValue().get_any()[0] .getValueAsType(RFTConstants.OVERALL_STATUS_RESOURCE, OverallStatus.class); if (status.getFault() != null) { transfer.setFault(getFaultFromRP(status.getFault())); } // RunQueue.getInstance().add(this.resourceKey); } catch (Exception e) { } transfer.setStatus(status); } private BaseFaultTypegetFaultFromRP(RFTFaultResourcePropertyType fault) { if (fault == null) { return null; } if (fault.getDelegationEPRMissingFaultType() != null) { return fault.getDelegationEPRMissingFaultType(); } else if (fault.getRftAuthenticationFaultType() != null) { return fault.getRftAuthenticationFaultType(); } else if (fault.getRftAuthorizationFaultType() != null) { return fault.getRftAuthorizationFaultType(); } else if (fault.getRftDatabaseFaultType() != null) { return fault.getRftDatabaseFaultType(); } else if (fault.getRftRepeatedlyStartedFaultType() != null) { return fault.getRftRepeatedlyStartedFaultType(); } else if (fault.getTransferTransientFaultType() != null) { return fault.getTransferTransientFaultType(); } else if (fault.getRftTransferFaultType() != null) { return fault.getRftTransferFaultType(); } else { return null; } } } @SuppressWarnings("serial") public class RFTGT4FileAdaptor extends FileCpi { public static final Authorization DEFAULT_AUTHZ = HostAuthorization .getInstance(); Integer msgProtectionType = Constants.SIGNATURE; static final int TERM_TIME = 20; static final String PROTOCOL = "https"; private static final String BASE_SERVICE_PATH = "/wsrf/services/"; public static final int DEFAULT_DURATION_HOURS = 24; public static final Integer DEFAULT_MSG_PROTECTION = Constants.SIGNATURE; public static final String DEFAULT_FACTORY_PORT = "8443"; private static final int DEFAULT_GRIDFTP_PORT = 2811; NotificationConsumerManagernotificationConsumerManager; EndpointReferenceTypenotificationConsumerEPR; EndpointReferenceTypenotificationProducerEPR; String securityType; String factoryUrl; GSSCredential proxy; Authorization authorization; String host; OverallStatus status; BaseFaultType fault; String locationStr; ReliableFileTransferFactoryPortTypefactoryPort; public RFTGT4FileAdaptor(GATContextgatContext, Preferences preferences, URI location) throws GATObjectCreationException { super(gatContext, preferences, location); if (!location.isCompatible("gsiftp") && !location.isCompatible("gridftp")) { throw new GATObjectCreationException("cannot handle this URI"); } String globusLocation = System.getenv("GLOBUS_LOCATION"); if (globusLocation == null) { throw new GATObjectCreationException("$GLOBUS_LOCATION is not set"); } System.setProperty("GLOBUS_LOCATION", globusLocation); System.setProperty("axis.ClientConfigFile", globusLocation + "/client-config.wsdd"); this.host = location.getHost(); this.securityType = Constants.GSI_SEC_MSG; this.authorization = null; this.proxy = null; try { proxy = GlobusSecurityUtils.getGlobusCredential(gatContext, preferences, "globus", location, DEFAULT_GRIDFTP_PORT); } catch (CouldNotInitializeCredentialException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (CredentialExpiredException e) { // TODO Auto-generated catch block e.printStackTrace(); } this.notificationConsumerManager = null; this.notificationConsumerEPR = null; this.notificationProducerEPR = null; this.status = null; this.fault = null; factoryPort = null; this.factoryUrl = PROTOCOL + "://" + host + ":" + DEFAULT_FACTORY_PORT + BASE_SERVICE_PATH + RFTConstants.FACTORY_NAME; locationStr = setLocationStr(location); } String setLocationStr(URI location) { if (location.getScheme().equals("any")) { return "gsiftp://" + location.getHost() + ":" + location.getPort() + "/" + location.getPath(); } else { return location.toString(); } } protected boolean copy2(String destStr) throws GATInvocationException { EndpointReferenceTypecredentialEndpoint = getCredentialEPR(); TransferType[] transferArray = new TransferType[1]; transferArray[0] = new TransferType(); transferArray[0].setSourceUrl(locationStr); transferArray[0].setDestinationUrl(destStr); RFTOptionsTyperftOptions = new RFTOptionsType(); rftOptions.setBinary(Boolean.TRUE); // rftOptions.setIgnoreFilePermErr(false); TransferRequestType request = new TransferRequestType(); request.setRftOptions(rftOptions); request.setTransfer(transferArray); request.setTransferCredentialEndpoint(credentialEndpoint); setRequest(request); while (!transfersDone()) { try { Thread.sleep(1000); } catch (InterruptedException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } } return transfersSucc(); } public void copy(URI dest) throws GATInvocationException { String destUrl = setLocationStr(dest); if (!copy2(destUrl)) { throw new GATInvocationException( "RFTGT4FileAdaptor: file copy failed"); } } public void subscribe(ReliableFileTransferPortTyperft) throws GATInvocationException { Map<Object, Object> properties = new HashMap<Object, Object>(); properties.put(ServiceContainer.CLASS, "org.globus.wsrf.container.GSIServiceContainer"); if (this.proxy != null) { ContainerSecurityDescriptorcontainerSecDesc = new ContainerSecurityDescriptor(); SecurityManager.getManager(); try { containerSecDesc.setSubject(JaasGssUtil .createSubject(this.proxy)); } catch (GSSException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: ContainerSecurityDescriptor failed, " + e); } properties.put(ServiceContainer.CONTAINER_DESCRIPTOR, containerSecDesc); } this.notificationConsumerManager = NotificationConsumerManager .getInstance(properties); try { this.notificationConsumerManager.startListening(); } catch (ContainerException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: NotificationConsumerManager failed, " + e); } List<Object> topicPath = new LinkedList<Object>(); topicPath.add(RFTConstants.OVERALL_STATUS_RESOURCE); ResourceSecurityDescriptorsecurityDescriptor = new ResourceSecurityDescriptor(); String authz = null; if (authorization == null) { authz = Authorization.AUTHZ_NONE; } else if (authorization instanceofHostAuthorization) { authz = Authorization.AUTHZ_NONE; } else if (authorization instanceofSelfAuthorization) { authz = Authorization.AUTHZ_SELF; } else if (authorization instanceofIdentityAuthorization) { // not supported throw new GATInvocationException( "RFTGT4FileAdaptor: identity authorization not supported"); } else { // throw an sg throw new GATInvocationException( "RFTGT4FileAdaptor: set authorization failed"); } securityDescriptor.setAuthz(authz); Vector<Object> authMethod = new Vector<Object>(); if (this.securityType.equals(Constants.GSI_SEC_MSG)) { authMethod.add(GSISecureMsgAuthMethod.BOTH); } else { authMethod.add(GSITransportAuthMethod.BOTH); } try { securityDescriptor.setAuthMethods(authMethod); } catch (SecurityDescriptorException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: setAuthMethods failed, " + e); } RFTGT4NotifyCallback notifyCallback = new RFTGT4NotifyCallback(this); try { notificationConsumerEPR = notificationConsumerManager .createNotificationConsumer(topicPath, notifyCallback, securityDescriptor); } catch (ResourceException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: createNotificationConsumer failed, " + e); } Subscribe subscriptionRequest = new Subscribe(); subscriptionRequest.setConsumerReference(notificationConsumerEPR); TopicExpressionTypetopicExpression = null; try { topicExpression = new TopicExpressionType( WSNConstants.SIMPLE_TOPIC_DIALECT, RFTConstants.OVERALL_STATUS_RESOURCE); } catch (MalformedURIException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: create TopicExpressionType failed, " + e); } subscriptionRequest.setTopicExpression(topicExpression); try { rft.subscribe(subscriptionRequest); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: subscription failed, " + e); } } protected EndpointReferenceTypegetCredentialEPR() throws GATInvocationException { this.status = null; URL factoryURL = null; try { factoryURL = new URL(factoryUrl); } catch (MalformedURLException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: set factoryURL failed, " + e); } try { factoryPort = BaseRFTClient.rftFactoryLocator .getReliableFileTransferFactoryPortTypePort(factoryURL); } catch (ServiceException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: set factoryPort failed, " + e); } setSecurityTypeFromURL(factoryURL); return populateRFTEndpoints(factoryPort); } protected void setRequest(BaseRequestType request) throws GATInvocationException { CreateReliableFileTransferInputType input = new CreateReliableFileTransferInputType(); if (request instanceofTransferRequestType) { input.setTransferRequest((TransferRequestType) request); } else { input.setDeleteRequest((DeleteRequestType) request); } Calendar termTimeDel = Calendar.getInstance(); termTimeDel.add(Calendar.MINUTE, TERM_TIME); input.setInitialTerminationTime(termTimeDel); CreateReliableFileTransferOutputType response = null; try { response = factoryPort.createReliableFileTransfer(input); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: set createReliableFileTransfer failed, " + e); } EndpointReferenceTypereliableRFTEndpoint = response .getReliableTransferEPR(); ReliableFileTransferPortTyperft = null; try { rft = BaseRFTClient.rftLocator .getReliableFileTransferPortTypePort(reliableRFTEndpoint); } catch (ServiceException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: getReliableFileTransferPortTypePort failed, " + e); } setStubSecurityProperties((Stub) rft); subscribe(rft); Calendar termTime = Calendar.getInstance(); termTime.add(Calendar.MINUTE, TERM_TIME); SetTerminationTimereqTermTime = new SetTerminationTime(); reqTermTime.setRequestedTerminationTime(termTime); try { rft.setTerminationTime(reqTermTime); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: setTerminationTime failed, " + e); } try { rft.start(new Start()); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: start failed, " + e); } } private void setSecurityTypeFromURL(URL url) { if (url.getProtocol().equals("http")) { securityType = Constants.GSI_SEC_MSG; } else { Util.registerTransport(); securityType = Constants.GSI_TRANSPORT; } } private void setStubSecurityProperties(Stub stub) { ClientSecurityDescriptorsecDesc = new ClientSecurityDescriptor(); if (this.securityType.equals(Constants.GSI_SEC_MSG)) { secDesc.setGSISecureMsg(this.getMessageProtectionType()); } else { secDesc.setGSITransport(this.getMessageProtectionType()); } secDesc.setAuthz(getAuthorization()); if (this.proxy != null) { // set proxy credential secDesc.setGSSCredential(this.proxy); } stub._setProperty(Constants.CLIENT_DESCRIPTOR, secDesc); } public Integer getMessageProtectionType() { return (this.msgProtectionType == null) ? RFTGT4FileAdaptor.DEFAULT_MSG_PROTECTION : this.msgProtectionType; } public Authorization getAuthorization() { return (authorization == null) ? DEFAULT_AUTHZ : this.authorization; } private EndpointReferenceTypepopulateRFTEndpoints( ReliableFileTransferFactoryPortTypefactoryPort) throws GATInvocationException { EndpointReferenceType[] delegationFactoryEndpoints = fetchDelegationFactoryEndpoints(factoryPort); EndpointReferenceTypedelegationEndpoint = delegate(delegationFactoryEndpoints[0]); return delegationEndpoint; } private EndpointReferenceType delegate( EndpointReferenceTypedelegationFactoryEndpoint) throws GATInvocationException { GlobusCredential credential = null; if (this.proxy != null) { credential = ((GlobusGSSCredentialImpl) this.proxy) .getGlobusCredential(); } else { try { credential = GlobusCredential.getDefaultCredential(); } catch (GlobusCredentialException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } } int lifetime = DEFAULT_DURATION_HOURS * 60 * 60; ClientSecurityDescriptorsecDesc = new ClientSecurityDescriptor(); if (this.securityType.equals(Constants.GSI_SEC_MSG)) { secDesc.setGSISecureMsg(this.getMessageProtectionType()); } else { secDesc.setGSITransport(this.getMessageProtectionType()); } secDesc.setAuthz(getAuthorization()); if (this.proxy != null) { secDesc.setGSSCredential(this.proxy); } // Get the public key to delegate on. X509Certificate[] certsToDelegateOn = null; try { certsToDelegateOn = DelegationUtil.getCertificateChainRP( delegationFactoryEndpoint, secDesc); } catch (DelegationException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } X509Certificate certToSign = certsToDelegateOn[0]; // FIXME remove when there is a DelegationUtil.delegate(EPR, ...) String protocol = delegationFactoryEndpoint.getAddress().getScheme(); String host = delegationFactoryEndpoint.getAddress().getHost(); int port = delegationFactoryEndpoint.getAddress().getPort(); String factoryUrl = protocol + "://" + host + ":" + port + BASE_SERVICE_PATH + DelegationConstants.FACTORY_PATH; // send to delegation service and get epr. EndpointReferenceTypecredentialEndpoint = null; try { credentialEndpoint = DelegationUtil.delegate(factoryUrl, credential, certToSign, lifetime, false, secDesc); } catch (DelegationException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } return credentialEndpoint; } public EndpointReferenceType[] fetchDelegationFactoryEndpoints( ReliableFileTransferFactoryPortTypefactoryPort) throws GATInvocationException { GetMultipleResourceProperties_Element request = new GetMultipleResourceProperties_Element(); request .setResourceProperty(new QName[] { RFTConstants.DELEGATION_ENDPOINT_FACTORY }); GetMultipleResourcePropertiesResponse response; try { response = factoryPort.getMultipleResourceProperties(request); } catch (RemoteException e) { e.printStackTrace(); throw new GATInvocationException( "RFTGT4FileAdaptor: getMultipleResourceProperties, " + e); } SOAPElement[] any = response.get_any(); EndpointReferenceType epr1 = null; try { epr1 = (EndpointReferenceType) ObjectDeserializer.toObject(any[0], EndpointReferenceType.class); } catch (DeserializationException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: ObjectDeserializer, " + e); } EndpointReferenceType[] endpoints = new EndpointReferenceType[] { epr1 }; return endpoints; } synchronized void setStatus(OverallStatus status) { this.status = status; } public inttransfersActive() { if (status == null) { return 1; } return status.getTransfersActive(); } public inttransfersFinished() { if (status == null) { return 0; } return status.getTransfersFinished(); } public inttransfersCancelled() { if (status == null) { return 0; } return status.getTransfersCancelled(); } public inttransfersFailed() { if (status == null) { return 0; } return status.getTransfersFailed(); } public inttransfersPending() { if (status == null) { return 1; } return status.getTransfersPending(); } public inttransfersRestarted() { if (status == null) { return 0; } return status.getTransfersRestarted(); } public booleantransfersDone() { return (transfersActive() == 0 && transfersPending() == 0 && transfersRestarted() == 0); } public booleantransfersSucc() { return (transfersDone() && transfersFailed() == 0 && transfersCancelled() == 0); } /* * private BaseFaultTypegetFaultFromRP(RFTFaultResourcePropertyType fault) { * if (fault == null) { return null; } * * if (fault.getRftTransferFaultType() != null) { return * fault.getRftTransferFaultType(); } else if * (fault.getDelegationEPRMissingFaultType() != null) { return * fault.getDelegationEPRMissingFaultType(); } else if * (fault.getRftAuthenticationFaultType() != null) { return * fault.getRftAuthenticationFaultType(); } else if * (fault.getRftAuthorizationFaultType() != null) { return * fault.getRftAuthorizationFaultType(); } else if * (fault.getRftDatabaseFaultType() != null) { return * fault.getRftDatabaseFaultType(); } else if * (fault.getRftRepeatedlyStartedFaultType() != null) { return * fault.getRftRepeatedlyStartedFaultType(); } else if * (fault.getTransferTransientFaultType() != null) { return * fault.getTransferTransientFaultType(); } else { return null; } } */ /* * private BaseFaultTypedeserializeFaultRP(SOAPElement any) throws * Exception { return getFaultFromRP((RFTFaultResourcePropertyType) * ObjectDeserializer .toObject(any, RFTFaultResourcePropertyType.class)); } */ void setFault(BaseFaultType fault) { this.fault = fault; } } package tutorial; import org.gridlab.gat.GAT; import org.gridlab.gat.GATContext; import org.gridlab.gat.URI; import org.gridlab.gat.io.File; public class RemoteCopy { public static void main(String[] args) throws Exception { GATContext context = new GATContext(); URI src = new URI(args[0]); URI dest = new URI(args[1]); File file = GAT.createFile(context, src); file.copy(dest); GAT.end(); } }
Ibis as ‘Glue’ • Use IPL + SmartSockets, generally for wide-area communication • Linking up separate ‘activities’ of an application • Activities: often largely ‘independent’ tasks implemented in any popular language or model (e.g. C/MPI, CUDA, Fortran, Java…) • Each typically running on a single GPU/node/Cluster/Cloud/… • Automatically circumvent connectivity problems • Example: With SmartSockets: No SmartSockets:
Ibis as ‘HPC Solution’ • Use Ibis as replacement for e.g. C++/MPI code • Benefits: • (better) portability • malleability (open world) • fault-tolerance • (run-time) task migration • Downside: • requires recoding • Comparable speedups:
MMCA: Situation in 2004/2005 • Code pre-installed at each cluster site • Instable / faulty communication • Connectivity problems • Execution on each cluster ‘by hand’ SSH Parallel Horus Client Parallel Horus Server Sockets + SSH Tunneling Parallel Horus Client C++/MPI
Phase 1: Ibis as ‘Master Key’ (2006) JavaGAT + IbisDeploy Parallel Horus Client Parallel Horus Server Sockets + SSH Tunneling Parallel Horus Client C++/MPI • Code pre-installed at each cluster site • Instable / faulty communication • Connectivity problems • Execution on each cluster ‘by hand’
Phase 2: Ibis as ‘Glue’ (2006/2007) JavaGAT + IbisDeploy Parallel Horus Client Parallel Horus Server IPL + SmartSockets Parallel Horus Client C++/MPI • Code pre-installed at each cluster site • Instable / faulty communication • Connectivity problems • Execution on each cluster ‘by hand’
Phase 3: Ibis as ‘HPC Solution’ (2008) JavaGAT + IbisDeploy Parallel Jorus Client Parallel Jorus Server IPL + SmartSockets Parallel Jorus Client Ibis/Java • Code pre-installed at each cluster site • Instable / faulty communication • Connectivity problems • Execution on each cluster ‘by hand’
‘Master Key’ + ‘Glue’ + ‘HPC’ • Step-wise conversion to 100% Ibis / Java • Phase 1: JavaGAT as ‘Master Key’ • Phase 2: IPL + SmartSockets as ‘Glue’ • Phase 3: Ibis as ‘HPC Solution’ • After each phase a fully functional, working solution was available! • Eventual result: • ‘wall-socket computing from a memory stick’ • Remember: the ‘Promise of the Grid’? • Awards at AAAI 2007 and CCGrid 2008
100% Ibis Implementation (2008++) Seinstra, Maassen, Drost et al. (SCALE 2008 @ CCGrid 2008: First Prize Winner) Bal, Maassen, Drost, Seinstra et al. (IEEE Computer, Aug 2010)
Green Clouds • NWO Smart Energy Systems project with Univ. of Amsterdam (Cees de Laat) & SARA • How to map high-performance applications onto hybrid distributed computing system, taking both performance & energy consumption into account • System-level approach to reduce HPC energy consumption
DAS-4: infrastructure for Green IT UvA/MultimediaN (16/36) Dual quad-core Xeon E5620 Various accelerators (GPUs, multicores, ….) Scientific Linux Built by ClusterVision VU (74) SURFnet6 ASTRON (23) 10 Gb/s lambdas TU Delft (32) Leiden (16)
Main ideas • Adapt resources to application needs dynamically, accounting for computational & energy efficiency • Using Ibis malleability support • Exploit hardware diversity • Graphics Processing Units (GPUs) have much higher FLOPS/Watt for many applications • Use optical and photonic networks • Build a knowledge base & semantic infrastructure description
Conclusions • Ibis enables problem solving (avoids system fighting) • Successfully applied in many domains • Astronomy, multimedia analysis, climate modeling, remote sensing, semantic web, medical imaging, … • Data intensive, compute intensive, real-time… • Open source, download: • www.cs.vu.nl/ibis/