1 / 54

The Ibis e-Science Software Framework

The Ibis e-Science Software Framework. Henri Bal Frank J. Seinstra, Jason Maassen, Niels Drost High Performance Distributed Computing Group Department of Computer Science VU University, Amsterdam, The Netherlands. Introduction. Distributed systems continue to change

mohawk
Download Presentation

The Ibis e-Science Software Framework

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Ibis e-Science Software Framework Henri BalFrank J. Seinstra, Jason Maassen, Niels Drost High Performance Distributed Computing Group Department of Computer Science VU University, Amsterdam, The Netherlands

  2. Introduction • Distributed systems continue to change • Clusters, grids, clouds, mobile devices • Distributed applications continue to change • e-Science, web, pervasive applications • Distributed programming continues to be notoriously difficult

  3. Distributed Systems: 1980sMultiple PCs on a (local) network • Networks of Workstations (NOWs) • Collections of Workstations (COWs) • Processor pools • Condor pools • Clusters

  4. Distributed Systems: 1990sSharing wide-area resources • Metacomputing (Smarr & Catlett, CACM) • Flocking Condor (Epema) • DAS (Distributed ASCI Supercomputer) • Grid Blueprint (Foster & Kesselman) • Desktop grids, SETI@home

  5. Distributed Systems: 2000s Cloud computing Pay-on-demand Virtualization Hardware diversity /heterogeneous computing Green IT The Networked World Sensor networks Smart phones

  6. Our approach • Study fundamental underlying problems • … hand-in-hand with realistic applications • … integrate solutions in one system: Ibis ! User Distributed Systems • Funding from NWO (2002), VL-e (2003-2009), EU (JavaGAT, XtreemOS, Contrail), VU, COMMIT

  7. Outline • ‘Problem Solving’ vs. ‘System Fighting’ • Jungle Computing • Example applications: • Computational Astrophysics • Multimedia Content Analysis • The Ibis Software Framework • The 3 Common Uses of Ibis • ‘Master Key’ + ‘Glue’ + ‘HPC’ • Some current work: Green Clouds

  8. Ibis: ‘Problem Solving’ vs. ‘System Fighting’

  9. A Random Example: Supernova Detection • DACH 2008, Japan • Distributed multi-cluster system • Heterogeneous • Distributed database (image pairs) • Large vs small databases/images • Partial replication • Image-pair comparison given (in C) • Find all supernova candidates • Task 1: As fast as possible • Task 2: Idem, under system crashes

  10. ‘Problem Solving’ vs. ‘System Fighting’ • All participating teams struggled (1 month) • Middleware instabilities… • Connectivity problems… • Load balancing… • But not the Ibis team • Winner (by far) in both categories • Note: many Japanese teams with years of experience • Hardware, middleware, network, C-code, image data… • Focus on ‘problem solving’, not ‘system fighting’ • incl. ‘opening’ of black-box C-code

  11. Ibis Results: Awards & Prizes • Many domains; data/compute intensive, real-time... • Winner Sustainability Award in the Enlighten Your Research (EYR) competition, 7 Dec. 2011 (Frank Seinstra) 1st Prize: DACH 2008 - BS 1st Prize: DACH 2008 - FT AAAI-VC 2007 Most Visionary Research Award WebPie: A Web-Scale Parallel Inference Engine J. Urbani, S. Kotoulas, J. Maassen, N. Drost, F.J. Seinstra, F. van Harmelen, and H.E. Bal 3rd Prize: ISWC 2008 1st Prize: SCALE 2008 1st Prize: SCALE 2010

  12. Ibis Users… …and many more

  13. Jungle Computing

  14. Jungle Computing (Frank Seinstra) • ‘Worst case’ computing as required by end-users • Distributed • Heterogeneous • Hierarchical (incl. multi-/many-cores)

  15. Why Jungle Computing? • Scientists often forced to use a wide variety of resources simultaneously to solve computational problems, e.g. due to: • Desire for scalability • Distributed nature of (input) data • Software heterogeneity (e.g.: mix of C/MPI and CUDA) • Ad hoc hardware availability • Energy consumption (use most energy-efficient resource) • … • Note: most users do not need ‘worst case’ jungle • Ibis aims to apply to any subset

  16. Example Application Domains • Computational Astrophysics (Leiden) • AMUSE: multi-model / multi-kernel simulations • “Simulating the Universe on an Intercontinental Grid” - Portegies Zwart et al (IEEE Computer, Aug 2010) • Climate Modeling (Utrecht) • CPL: multi-model / multi-kernel simulations • Atmosphere, ocean, source rock formation, … - hardware: (potentially) very diverse - high resolution => speed & scalability - …

  17. Domain Example #1: Computational Astrophysics

  18. Domain Example #1: Computational Astrophysics Demonstrated live at SC’11, Nov 12-18, 2011, Seattle, USA

  19. Domain Example #1: Computational Astrophysics • The AMUSE system (Leiden University) • Early Star Cluster Evolution, including gas • Gravitational dynamics (N-body): GPU / GPU-cluster • Stellar evolution: Beowulf cluster / Cloud • Hydro-dynamics, Radiative transport: Supercomputer gravitational dynamics stellar evolution AMUSE hydro-dynamics radiative transport

  20. Domain Example #1: Computational Astrophysics Demonstrated live at SC’11, Nov 12-18, 2011, Seattle, USA

  21. Domain Example #2: Multimedia Content Analysis

  22. Multimedia Content Analysis (MMCA) • Aim: • Automatic extraction of ‘semantic concepts’ from image sets and video streams • Depending on specific problem & size of data set: • May take hours, days, weeks, months, years…

  23. Multimedia Content Analysis (MMCA) • Applications in (a.o): • Remote Sensing • Security / Surveillance • Medical Imaging • Document Analysis • Multimedia Systems • Astronomy • Application types: • Real-time vs. off-line • Fine-grained vs. coarse-grained • Data-intensive / compute-intensive / information-intensive

  24. Domain Example #2: Color-based Object Recognition by a Grid-connected Robot Dog Seinstra et al (IEEE Multimedia, Oct-Dec 2007) Seinstra et al (AAAI’07: Most Visionary Research Award)

  25. Successful… • …but many fundamental problems unsolved! • Scaling up to very large systems • Platform independence • Middleware independence • Connectivity (a.o. firewalls, …) • Fault-tolerance • … • Software support tool(s) urgently needed! • Jungle-aware + transparent + efficient • No progress until ‘discovery’ of Ibis

  26. The Ibis Software Framework

  27. The Ibis Software Framework • Offers all functionality to efficiently & transparently implement & run Jungle Computing applications • Designed for dynamic / hostile environments • Modular and flexible • Allow replacement of Ibis components by external ones, including native code • Open source • Download: http://www.cs.vu.nl/ibis/

  28. Programming Logical Likes math Deployment Practical Visual (GUI) Ibis Design • Applications need functionality for • Programming (as in programming languages) • Deployment (as in operating systems)

  29. Ibis Software Stack

  30. JavaGAT • Java Grid Application Toolkit • High-level API for developing (Grid) applications independentlyof the underlying (Grid) middleware • Use (Grid) services; file cp, resource discovery, job submission, … • Developed in EU GridLab project • Thilo Kielmann, Rob van Nieuwpoort • SAGA API standardized by OGF • Simple API for Grid Applications (a.o. with LSU) • SAGA on top of JavaGAT (and v.v.)

  31. Zorilla • A prototype P2P middleware • A Zorilla system consists of a collection of nodes, connected by a P2P network • Each node independent & implements all middleware functionality • No central components • Supports fault-tolerance and malleability • Easily combines resources in multiple administrative domains

  32. IbisDeploy

  33. Ibis Portability Layer (IPL) • Java-centric ‘run-anywhere’ communication library • Sent along with your application • “MPI for the Grid” • Supports fault-tolerance and malleability • Resource tracking (Join-Elect-Leave model) • Open-world / Closed world • Efficient • Highly optimized object serialization • Can use optimized native libraries (e.g. MPI, Infiniband)

  34. SmartSockets • Robust connection setup • Always connection in 30 different scenarios Problems: Firewalls Network Address Translation (NAT) Non-routed networks Multi-homing …

  35. Ibis Programming Models • IPL-based programming models, a.o.: • Satin: • A divide-and-conquer model • MPJ: • The MPI binding for Java • RMI: • Object-Oriented remote Procedure Call • Jorus: • A ‘user transparent’ parallel model for multimedia applications

  36. The 3 Common Uses of Ibis

  37. Ibis as ‘Master Key’ (or ‘Passepartout’) • Use JavaGAT to access ‘any’ system • Develop/run applications independentlyof available middlewares • JavaGAT ‘adaptors’ required for each middleware • ‘Intelligent dispatching’ even allows for transparent use of multiplemiddlewares • Example: file copy • JavaGAT vs. Globus • Simple, portable, … • SAGA API standardized package org.gridlab.gat.io.cpi.rftgt4; import java.net.MalformedURLException; import java.net.URL; import java.rmi.RemoteException; import java.security.cert.X509Certificate; import java.util.Calendar; import java.util.HashMap; import java.util.LinkedList; import java.util.List; import java.util.Map; import java.util.Vector; import javax.xml.namespace.QName; import javax.xml.rpc.ServiceException; import javax.xml.rpc.Stub; import javax.xml.soap.SOAPElement; import org.apache.axis.message.addressing.EndpointReferenceType; import org.apache.axis.types.URI.MalformedURIException; import org.globus.axis.util.Util; import org.globus.delegation.DelegationConstants; import org.globus.delegation.DelegationException; import org.globus.delegation.DelegationUtil; import org.globus.gsi.GlobusCredential; import org.globus.gsi.GlobusCredentialException; import org.globus.gsi.gssapi.GlobusGSSCredentialImpl; import org.globus.gsi.jaas.JaasGssUtil; import org.globus.rft.generated.BaseRequestType; import org.globus.rft.generated.CreateReliableFileTransferInputType; import org.globus.rft.generated.CreateReliableFileTransferOutputType; import org.globus.rft.generated.DeleteRequestType; import org.globus.rft.generated.DeleteType; import org.globus.rft.generated.OverallStatus; import org.globus.rft.generated.RFTFaultResourcePropertyType; import org.globus.rft.generated.RFTOptionsType; import org.globus.rft.generated.ReliableFileTransferFactoryPortType; import org.globus.rft.generated.ReliableFileTransferPortType; import org.globus.rft.generated.Start; import org.globus.rft.generated.TransferRequestType; import org.globus.rft.generated.TransferType; import org.globus.transfer.reliable.client.BaseRFTClient; import org.globus.transfer.reliable.service.RFTConstants; import org.globus.wsrf.NotificationConsumerManager; import org.globus.wsrf.NotifyCallback; import org.globus.wsrf.ResourceException; import org.globus.wsrf.WSNConstants; import org.globus.wsrf.container.ContainerException; import org.globus.wsrf.container.ServiceContainer; import org.globus.wsrf.core.notification.ResourcePropertyValueChangeNotificationElementType; import org.globus.wsrf.encoding.DeserializationException; import org.globus.wsrf.encoding.ObjectDeserializer; import org.globus.wsrf.impl.security.authentication.Constants; import org.globus.wsrf.impl.security.authorization.Authorization; import org.globus.wsrf.impl.security.authorization.HostAuthorization; import org.globus.wsrf.impl.security.authorization.IdentityAuthorization; import org.globus.wsrf.impl.security.authorization.SelfAuthorization; import org.globus.wsrf.impl.security.descriptor.ClientSecurityDescriptor; import org.globus.wsrf.impl.security.descriptor.ContainerSecurityDescriptor; import org.globus.wsrf.impl.security.descriptor.GSISecureMsgAuthMethod; import org.globus.wsrf.impl.security.descriptor.GSITransportAuthMethod; import org.globus.wsrf.impl.security.descriptor.ResourceSecurityDescriptor; import org.globus.wsrf.impl.security.descriptor.SecurityDescriptorException; import org.globus.wsrf.security.SecurityManager; import org.gridlab.gat.CouldNotInitializeCredentialException; import org.gridlab.gat.CredentialExpiredException; import org.gridlab.gat.GATContext; import org.gridlab.gat.GATInvocationException; import org.gridlab.gat.GATObjectCreationException; import org.gridlab.gat.Preferences; import org.gridlab.gat.URI; import org.gridlab.gat.io.cpi.FileCpi; import org.gridlab.gat.security.globus.GlobusSecurityUtils; import org.ietf.jgss.GSSCredential; import org.ietf.jgss.GSSException; import org.oasis.wsn.Subscribe; import org.oasis.wsn.TopicExpressionType; import org.oasis.wsrf.faults.BaseFaultType; import org.oasis.wsrf.lifetime.SetTerminationTime; import org.oasis.wsrf.properties.GetMultipleResourcePropertiesResponse; import org.oasis.wsrf.properties.GetMultipleResourceProperties_Element; import org.oasis.wsrf.properties.ResourcePropertyValueChangeNotificationType; class RFTGT4NotifyCallback implements NotifyCallback { RFTGT4FileAdaptor transfer; OverallStatus status; public RFTGT4NotifyCallback(RFTGT4FileAdaptor transfer) { super(); this.transfer = transfer; this.status = null; } @SuppressWarnings("unchecked") public void deliver(List topicPath, EndpointReferenceType producer, Object messageWrapper) { try { ResourcePropertyValueChangeNotificationType message = ((ResourcePropertyValueChangeNotificationElementType) messageWrapper) .getResourcePropertyValueChangeNotification(); this.status = (OverallStatus) message.getNewValue().get_any()[0] .getValueAsType(RFTConstants.OVERALL_STATUS_RESOURCE, OverallStatus.class); if (status.getFault() != null) { transfer.setFault(getFaultFromRP(status.getFault())); } // RunQueue.getInstance().add(this.resourceKey); } catch (Exception e) { } transfer.setStatus(status); } private BaseFaultTypegetFaultFromRP(RFTFaultResourcePropertyType fault) { if (fault == null) { return null; } if (fault.getDelegationEPRMissingFaultType() != null) { return fault.getDelegationEPRMissingFaultType(); } else if (fault.getRftAuthenticationFaultType() != null) { return fault.getRftAuthenticationFaultType(); } else if (fault.getRftAuthorizationFaultType() != null) { return fault.getRftAuthorizationFaultType(); } else if (fault.getRftDatabaseFaultType() != null) { return fault.getRftDatabaseFaultType(); } else if (fault.getRftRepeatedlyStartedFaultType() != null) { return fault.getRftRepeatedlyStartedFaultType(); } else if (fault.getTransferTransientFaultType() != null) { return fault.getTransferTransientFaultType(); } else if (fault.getRftTransferFaultType() != null) { return fault.getRftTransferFaultType(); } else { return null; } } } @SuppressWarnings("serial") public class RFTGT4FileAdaptor extends FileCpi { public static final Authorization DEFAULT_AUTHZ = HostAuthorization .getInstance(); Integer msgProtectionType = Constants.SIGNATURE; static final int TERM_TIME = 20; static final String PROTOCOL = "https"; private static final String BASE_SERVICE_PATH = "/wsrf/services/"; public static final int DEFAULT_DURATION_HOURS = 24; public static final Integer DEFAULT_MSG_PROTECTION = Constants.SIGNATURE; public static final String DEFAULT_FACTORY_PORT = "8443"; private static final int DEFAULT_GRIDFTP_PORT = 2811; NotificationConsumerManagernotificationConsumerManager; EndpointReferenceTypenotificationConsumerEPR; EndpointReferenceTypenotificationProducerEPR; String securityType; String factoryUrl; GSSCredential proxy; Authorization authorization; String host; OverallStatus status; BaseFaultType fault; String locationStr; ReliableFileTransferFactoryPortTypefactoryPort; public RFTGT4FileAdaptor(GATContextgatContext, Preferences preferences, URI location) throws GATObjectCreationException { super(gatContext, preferences, location); if (!location.isCompatible("gsiftp") && !location.isCompatible("gridftp")) { throw new GATObjectCreationException("cannot handle this URI"); } String globusLocation = System.getenv("GLOBUS_LOCATION"); if (globusLocation == null) { throw new GATObjectCreationException("$GLOBUS_LOCATION is not set"); } System.setProperty("GLOBUS_LOCATION", globusLocation); System.setProperty("axis.ClientConfigFile", globusLocation + "/client-config.wsdd"); this.host = location.getHost(); this.securityType = Constants.GSI_SEC_MSG; this.authorization = null; this.proxy = null; try { proxy = GlobusSecurityUtils.getGlobusCredential(gatContext, preferences, "globus", location, DEFAULT_GRIDFTP_PORT); } catch (CouldNotInitializeCredentialException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (CredentialExpiredException e) { // TODO Auto-generated catch block e.printStackTrace(); } this.notificationConsumerManager = null; this.notificationConsumerEPR = null; this.notificationProducerEPR = null; this.status = null; this.fault = null; factoryPort = null; this.factoryUrl = PROTOCOL + "://" + host + ":" + DEFAULT_FACTORY_PORT + BASE_SERVICE_PATH + RFTConstants.FACTORY_NAME; locationStr = setLocationStr(location); } String setLocationStr(URI location) { if (location.getScheme().equals("any")) { return "gsiftp://" + location.getHost() + ":" + location.getPort() + "/" + location.getPath(); } else { return location.toString(); } } protected boolean copy2(String destStr) throws GATInvocationException { EndpointReferenceTypecredentialEndpoint = getCredentialEPR(); TransferType[] transferArray = new TransferType[1]; transferArray[0] = new TransferType(); transferArray[0].setSourceUrl(locationStr); transferArray[0].setDestinationUrl(destStr); RFTOptionsTyperftOptions = new RFTOptionsType(); rftOptions.setBinary(Boolean.TRUE); // rftOptions.setIgnoreFilePermErr(false); TransferRequestType request = new TransferRequestType(); request.setRftOptions(rftOptions); request.setTransfer(transferArray); request.setTransferCredentialEndpoint(credentialEndpoint); setRequest(request); while (!transfersDone()) { try { Thread.sleep(1000); } catch (InterruptedException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } } return transfersSucc(); } public void copy(URI dest) throws GATInvocationException { String destUrl = setLocationStr(dest); if (!copy2(destUrl)) { throw new GATInvocationException( "RFTGT4FileAdaptor: file copy failed"); } } public void subscribe(ReliableFileTransferPortTyperft) throws GATInvocationException { Map<Object, Object> properties = new HashMap<Object, Object>(); properties.put(ServiceContainer.CLASS, "org.globus.wsrf.container.GSIServiceContainer"); if (this.proxy != null) { ContainerSecurityDescriptorcontainerSecDesc = new ContainerSecurityDescriptor(); SecurityManager.getManager(); try { containerSecDesc.setSubject(JaasGssUtil .createSubject(this.proxy)); } catch (GSSException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: ContainerSecurityDescriptor failed, " + e); } properties.put(ServiceContainer.CONTAINER_DESCRIPTOR, containerSecDesc); } this.notificationConsumerManager = NotificationConsumerManager .getInstance(properties); try { this.notificationConsumerManager.startListening(); } catch (ContainerException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: NotificationConsumerManager failed, " + e); } List<Object> topicPath = new LinkedList<Object>(); topicPath.add(RFTConstants.OVERALL_STATUS_RESOURCE); ResourceSecurityDescriptorsecurityDescriptor = new ResourceSecurityDescriptor(); String authz = null; if (authorization == null) { authz = Authorization.AUTHZ_NONE; } else if (authorization instanceofHostAuthorization) { authz = Authorization.AUTHZ_NONE; } else if (authorization instanceofSelfAuthorization) { authz = Authorization.AUTHZ_SELF; } else if (authorization instanceofIdentityAuthorization) { // not supported throw new GATInvocationException( "RFTGT4FileAdaptor: identity authorization not supported"); } else { // throw an sg throw new GATInvocationException( "RFTGT4FileAdaptor: set authorization failed"); } securityDescriptor.setAuthz(authz); Vector<Object> authMethod = new Vector<Object>(); if (this.securityType.equals(Constants.GSI_SEC_MSG)) { authMethod.add(GSISecureMsgAuthMethod.BOTH); } else { authMethod.add(GSITransportAuthMethod.BOTH); } try { securityDescriptor.setAuthMethods(authMethod); } catch (SecurityDescriptorException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: setAuthMethods failed, " + e); } RFTGT4NotifyCallback notifyCallback = new RFTGT4NotifyCallback(this); try { notificationConsumerEPR = notificationConsumerManager .createNotificationConsumer(topicPath, notifyCallback, securityDescriptor); } catch (ResourceException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: createNotificationConsumer failed, " + e); } Subscribe subscriptionRequest = new Subscribe(); subscriptionRequest.setConsumerReference(notificationConsumerEPR); TopicExpressionTypetopicExpression = null; try { topicExpression = new TopicExpressionType( WSNConstants.SIMPLE_TOPIC_DIALECT, RFTConstants.OVERALL_STATUS_RESOURCE); } catch (MalformedURIException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: create TopicExpressionType failed, " + e); } subscriptionRequest.setTopicExpression(topicExpression); try { rft.subscribe(subscriptionRequest); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: subscription failed, " + e); } } protected EndpointReferenceTypegetCredentialEPR() throws GATInvocationException { this.status = null; URL factoryURL = null; try { factoryURL = new URL(factoryUrl); } catch (MalformedURLException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: set factoryURL failed, " + e); } try { factoryPort = BaseRFTClient.rftFactoryLocator .getReliableFileTransferFactoryPortTypePort(factoryURL); } catch (ServiceException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: set factoryPort failed, " + e); } setSecurityTypeFromURL(factoryURL); return populateRFTEndpoints(factoryPort); } protected void setRequest(BaseRequestType request) throws GATInvocationException { CreateReliableFileTransferInputType input = new CreateReliableFileTransferInputType(); if (request instanceofTransferRequestType) { input.setTransferRequest((TransferRequestType) request); } else { input.setDeleteRequest((DeleteRequestType) request); } Calendar termTimeDel = Calendar.getInstance(); termTimeDel.add(Calendar.MINUTE, TERM_TIME); input.setInitialTerminationTime(termTimeDel); CreateReliableFileTransferOutputType response = null; try { response = factoryPort.createReliableFileTransfer(input); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: set createReliableFileTransfer failed, " + e); } EndpointReferenceTypereliableRFTEndpoint = response .getReliableTransferEPR(); ReliableFileTransferPortTyperft = null; try { rft = BaseRFTClient.rftLocator .getReliableFileTransferPortTypePort(reliableRFTEndpoint); } catch (ServiceException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: getReliableFileTransferPortTypePort failed, " + e); } setStubSecurityProperties((Stub) rft); subscribe(rft); Calendar termTime = Calendar.getInstance(); termTime.add(Calendar.MINUTE, TERM_TIME); SetTerminationTimereqTermTime = new SetTerminationTime(); reqTermTime.setRequestedTerminationTime(termTime); try { rft.setTerminationTime(reqTermTime); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: setTerminationTime failed, " + e); } try { rft.start(new Start()); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: start failed, " + e); } } private void setSecurityTypeFromURL(URL url) { if (url.getProtocol().equals("http")) { securityType = Constants.GSI_SEC_MSG; } else { Util.registerTransport(); securityType = Constants.GSI_TRANSPORT; } } private void setStubSecurityProperties(Stub stub) { ClientSecurityDescriptorsecDesc = new ClientSecurityDescriptor(); if (this.securityType.equals(Constants.GSI_SEC_MSG)) { secDesc.setGSISecureMsg(this.getMessageProtectionType()); } else { secDesc.setGSITransport(this.getMessageProtectionType()); } secDesc.setAuthz(getAuthorization()); if (this.proxy != null) { // set proxy credential secDesc.setGSSCredential(this.proxy); } stub._setProperty(Constants.CLIENT_DESCRIPTOR, secDesc); } public Integer getMessageProtectionType() { return (this.msgProtectionType == null) ? RFTGT4FileAdaptor.DEFAULT_MSG_PROTECTION : this.msgProtectionType; } public Authorization getAuthorization() { return (authorization == null) ? DEFAULT_AUTHZ : this.authorization; } private EndpointReferenceTypepopulateRFTEndpoints( ReliableFileTransferFactoryPortTypefactoryPort) throws GATInvocationException { EndpointReferenceType[] delegationFactoryEndpoints = fetchDelegationFactoryEndpoints(factoryPort); EndpointReferenceTypedelegationEndpoint = delegate(delegationFactoryEndpoints[0]); return delegationEndpoint; } private EndpointReferenceType delegate( EndpointReferenceTypedelegationFactoryEndpoint) throws GATInvocationException { GlobusCredential credential = null; if (this.proxy != null) { credential = ((GlobusGSSCredentialImpl) this.proxy) .getGlobusCredential(); } else { try { credential = GlobusCredential.getDefaultCredential(); } catch (GlobusCredentialException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } } int lifetime = DEFAULT_DURATION_HOURS * 60 * 60; ClientSecurityDescriptorsecDesc = new ClientSecurityDescriptor(); if (this.securityType.equals(Constants.GSI_SEC_MSG)) { secDesc.setGSISecureMsg(this.getMessageProtectionType()); } else { secDesc.setGSITransport(this.getMessageProtectionType()); } secDesc.setAuthz(getAuthorization()); if (this.proxy != null) { secDesc.setGSSCredential(this.proxy); } // Get the public key to delegate on. X509Certificate[] certsToDelegateOn = null; try { certsToDelegateOn = DelegationUtil.getCertificateChainRP( delegationFactoryEndpoint, secDesc); } catch (DelegationException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } X509Certificate certToSign = certsToDelegateOn[0]; // FIXME remove when there is a DelegationUtil.delegate(EPR, ...) String protocol = delegationFactoryEndpoint.getAddress().getScheme(); String host = delegationFactoryEndpoint.getAddress().getHost(); int port = delegationFactoryEndpoint.getAddress().getPort(); String factoryUrl = protocol + "://" + host + ":" + port + BASE_SERVICE_PATH + DelegationConstants.FACTORY_PATH; // send to delegation service and get epr. EndpointReferenceTypecredentialEndpoint = null; try { credentialEndpoint = DelegationUtil.delegate(factoryUrl, credential, certToSign, lifetime, false, secDesc); } catch (DelegationException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } return credentialEndpoint; } public EndpointReferenceType[] fetchDelegationFactoryEndpoints( ReliableFileTransferFactoryPortTypefactoryPort) throws GATInvocationException { GetMultipleResourceProperties_Element request = new GetMultipleResourceProperties_Element(); request .setResourceProperty(new QName[] { RFTConstants.DELEGATION_ENDPOINT_FACTORY }); GetMultipleResourcePropertiesResponse response; try { response = factoryPort.getMultipleResourceProperties(request); } catch (RemoteException e) { e.printStackTrace(); throw new GATInvocationException( "RFTGT4FileAdaptor: getMultipleResourceProperties, " + e); } SOAPElement[] any = response.get_any(); EndpointReferenceType epr1 = null; try { epr1 = (EndpointReferenceType) ObjectDeserializer.toObject(any[0], EndpointReferenceType.class); } catch (DeserializationException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: ObjectDeserializer, " + e); } EndpointReferenceType[] endpoints = new EndpointReferenceType[] { epr1 }; return endpoints; } synchronized void setStatus(OverallStatus status) { this.status = status; } public inttransfersActive() { if (status == null) { return 1; } return status.getTransfersActive(); } public inttransfersFinished() { if (status == null) { return 0; } return status.getTransfersFinished(); } public inttransfersCancelled() { if (status == null) { return 0; } return status.getTransfersCancelled(); } public inttransfersFailed() { if (status == null) { return 0; } return status.getTransfersFailed(); } public inttransfersPending() { if (status == null) { return 1; } return status.getTransfersPending(); } public inttransfersRestarted() { if (status == null) { return 0; } return status.getTransfersRestarted(); } public booleantransfersDone() { return (transfersActive() == 0 && transfersPending() == 0 && transfersRestarted() == 0); } public booleantransfersSucc() { return (transfersDone() && transfersFailed() == 0 && transfersCancelled() == 0); } /* * private BaseFaultTypegetFaultFromRP(RFTFaultResourcePropertyType fault) { * if (fault == null) { return null; } * * if (fault.getRftTransferFaultType() != null) { return * fault.getRftTransferFaultType(); } else if * (fault.getDelegationEPRMissingFaultType() != null) { return * fault.getDelegationEPRMissingFaultType(); } else if * (fault.getRftAuthenticationFaultType() != null) { return * fault.getRftAuthenticationFaultType(); } else if * (fault.getRftAuthorizationFaultType() != null) { return * fault.getRftAuthorizationFaultType(); } else if * (fault.getRftDatabaseFaultType() != null) { return * fault.getRftDatabaseFaultType(); } else if * (fault.getRftRepeatedlyStartedFaultType() != null) { return * fault.getRftRepeatedlyStartedFaultType(); } else if * (fault.getTransferTransientFaultType() != null) { return * fault.getTransferTransientFaultType(); } else { return null; } } */ /* * private BaseFaultTypedeserializeFaultRP(SOAPElement any) throws * Exception { return getFaultFromRP((RFTFaultResourcePropertyType) * ObjectDeserializer .toObject(any, RFTFaultResourcePropertyType.class)); } */ void setFault(BaseFaultType fault) { this.fault = fault; } } package tutorial; import org.gridlab.gat.GAT; import org.gridlab.gat.GATContext; import org.gridlab.gat.URI; import org.gridlab.gat.io.File; public class RemoteCopy { public static void main(String[] args) throws Exception { GATContext context = new GATContext(); URI src = new URI(args[0]); URI dest = new URI(args[1]); File file = GAT.createFile(context, src); file.copy(dest); GAT.end(); } }

  38. Ibis as ‘Glue’ • Use IPL + SmartSockets, generally for wide-area communication • Linking up separate ‘activities’ of an application • Activities: often largely ‘independent’ tasks implemented in any popular language or model (e.g. C/MPI, CUDA, Fortran, Java…) • Each typically running on a single GPU/node/Cluster/Cloud/… • Automatically circumvent connectivity problems • Example: With SmartSockets: No SmartSockets:

  39. Ibis as ‘HPC Solution’ • Use Ibis as replacement for e.g. C++/MPI code • Benefits: • (better) portability • malleability (open world) • fault-tolerance • (run-time) task migration • Downside: • requires recoding • Comparable speedups:

  40. MMCA: Situation in 2004/2005 • Code pre-installed at each cluster site • Instable / faulty communication • Connectivity problems • Execution on each cluster ‘by hand’ SSH Parallel Horus Client Parallel Horus Server Sockets + SSH Tunneling Parallel Horus Client C++/MPI

  41. Phase 1: Ibis as ‘Master Key’ (2006) JavaGAT + IbisDeploy Parallel Horus Client Parallel Horus Server Sockets + SSH Tunneling Parallel Horus Client C++/MPI • Code pre-installed at each cluster site • Instable / faulty communication • Connectivity problems • Execution on each cluster ‘by hand’

  42. Phase 2: Ibis as ‘Glue’ (2006/2007) JavaGAT + IbisDeploy Parallel Horus Client Parallel Horus Server IPL + SmartSockets Parallel Horus Client C++/MPI • Code pre-installed at each cluster site • Instable / faulty communication • Connectivity problems • Execution on each cluster ‘by hand’

  43. Phase 3: Ibis as ‘HPC Solution’ (2008) JavaGAT + IbisDeploy Parallel Jorus Client Parallel Jorus Server IPL + SmartSockets Parallel Jorus Client Ibis/Java • Code pre-installed at each cluster site • Instable / faulty communication • Connectivity problems • Execution on each cluster ‘by hand’

  44. ‘Master Key’ + ‘Glue’ + ‘HPC’ • Step-wise conversion to 100% Ibis / Java • Phase 1: JavaGAT as ‘Master Key’ • Phase 2: IPL + SmartSockets as ‘Glue’ • Phase 3: Ibis as ‘HPC Solution’ • After each phase a fully functional, working solution was available! • Eventual result: • ‘wall-socket computing from a memory stick’ • Remember: the ‘Promise of the Grid’? • Awards at AAAI 2007 and CCGrid 2008

  45. 100% Ibis Implementation (2008++) Seinstra, Maassen, Drost et al. (SCALE 2008 @ CCGrid 2008: First Prize Winner) Bal, Maassen, Drost, Seinstra et al. (IEEE Computer, Aug 2010)

  46. Some current work

  47. Green Clouds • NWO Smart Energy Systems project with Univ. of Amsterdam (Cees de Laat) & SARA • How to map high-performance applications onto hybrid distributed computing system, taking both performance & energy consumption into account • System-level approach to reduce HPC energy consumption

  48. DAS-4: infrastructure for Green IT UvA/MultimediaN (16/36) Dual quad-core Xeon E5620 Various accelerators (GPUs, multicores, ….) Scientific Linux Built by ClusterVision VU (74) SURFnet6 ASTRON (23) 10 Gb/s lambdas TU Delft (32) Leiden (16)

  49. Main ideas • Adapt resources to application needs dynamically, accounting for computational & energy efficiency • Using Ibis malleability support • Exploit hardware diversity • Graphics Processing Units (GPUs) have much higher FLOPS/Watt for many applications • Use optical and photonic networks • Build a knowledge base & semantic infrastructure description

More Related