260 likes | 410 Views
Distributed Computing Environments Team. Marian Bubak bubak@agh.edu.pl Department of Computer Science and Cyfronet AGH University of Science and Technology Krakow , Poland dice.cyfronet.pl. DICE Team. Investigation of methods for building complex scientific collaborative applications
E N D
Distributed Computing Environments Team Marian Bubak bubak@agh.edu.pl Department of Computer Science and Cyfronet AGH University of Science and Technology Krakow, Poland dice.cyfronet.pl
DICE Team • Investigation of methods for building complex scientific collaborative applications • Elaboration of environments and tools for e-Science • Integration of large-scale distributed computing infrastructures • Knowledge-based approach to services, components, and their semantic composition AGH University of Science and Technology (1919) 16 faculties, 36000 students; 4000 employees http://www.agh.edu.pl/en Academic Computer Centre CYFRONET AGH (1973) 120 employees http://www.cyfronet.pl/en/ Other 15 faculties Faculty of Computer Science, Electronics and Telecommunication (2012) 2000 students, 200 employees http://www.iet.agh.edu.pl/ • Distributed Computing Environments (DICE) Team http://dice.cyfronet.pl Department of Computer Science AGH (1980) 800 students, 70 employees http://www.ki.agh.edu.pl/uk/index.htm
Current research objectives Investigating applicability of cloud computing model for complex scientific applications Optimization of resource allocation for applications on clouds Resource management for services on heterogeneous resources Urgent computing scenarios on distributed infrastructures Billing and accounting models Procedural and technical aspects of ensuring efficient yet secure data storage, transfer and processing Methods for component dependency management, composition and deployment Information representation model for cloud federating platform, its components and operating procedures
Topics for collaboration • Supporting system-level (e)Science • tools for effective scientific research and collaboration • advanced scientific analyses using HPC/HTC resources • Cloud security • security of data transfer • reliable storage and removal of the data • Cross-cloud service deployment based on container model • Optimization of service deployment on clouds • Constraint satisfaction and optimization of multiple criteria (cost, performance) • Static deployment planning and dynamic auto-scaling • Billing and accounting model • Adapted for the federatedcloud infrastructure • Handle multiple billing models
Spatial and temporal dynamics in grids • Grids increase research capabilities for science • Large-scale federation of computing and storage resources • 300 sites, 60 countries, 200 Virtual Organizations • 10^5 CPUs, 20 PB data storage, 10^5 jobs daily • However operational and runtime dynamics have a negative impact on reliability and efficiency ~95% 1 job asynchronous and frequent failures and hardware/software upgrades 100 jobs <10% long and unpredictable job waiting times 3 hours seconds J.T.Moscicki:Understanding and mastering dynamics in Computing Grids,UvA PhD thesis, promoter: M. Bubak, co-promoter: P. Sloot; 12.04.2011
User-level overlay with late binding scheduling • Improved job execution characteristics • HTC-HPC Interoperability • Heuristic resource selection • Application awaretaskscheduling 1.5 hours 40 hours Completion time with late binding. Completion time with early binding. J.T.Moscicki, M.Lamanna, M.Bubak, P.M.A.Sloot: Processing moldable tasks on the Grid: late job binding with lightweight user-level overlay, FGCS 27(6) pp 725-736, 2011
Cloud performance evaluation • Performance of VM deployment times • Virtualization overhead Evaluation of open source cloud stacks (Eucalyptus, OpenNebula, OpenStack) • Survey of European public cloud providers • Performance evaluation of top cloud providers (EC2, RackSpace, SoftLayer) • A grant from Amazon has been obtained M. Bubak, M. Kasztelnik, M. Malawski, J. Meizner, P. Nowakowski and S. Varma:Evaluation of Cloud Providers for VPH Applications, posterat CCGrid2013 - 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Delft, the Netherlands, May 13-16, 2013
Costoptimization of applications on clouds • Infrastructure model • Multiple compute and storage clouds • Heterogeneous instance types • Application model • Bag of tasks • Leyered workflows • Modeling with AMPL (A Modeling Language for Mathematical Programming) • Cost optimization underdeadline constraints • Mixed integer programming • Bonmin, Cplex solvers M. Malawski, K.Figiela, J.Nabrzyski:Cost minimization for computational applications on hybrid cloud infrastructures, Future Generation Computer Systems, Volume 29, Issue 7, September 2013, Pages 1786-1794, ISSN 0167-739X, http://dx.doi.org/10.1016/j.future.2013.01.004
Resource allocationmanagement The Atmosphere Cloud Platform is a one-stop management service for hybridcloud resources, ensuringoptimaldeployment of application services on the underlying hardware. Admin External application VPH-Share Master Int. OpenStack/Nova Computational Cloud Site VPH-ShareCore Services Host Amazon EC2 Other CS Atmosphere Management Service (AMS) CloudFacade (secureRESTful API ) Developer Scientist Cloud Manager AtmosphereInternal Registry (AIR) Cloud stackplugins(Fog) Development Mode Generic Invoker Workflow management WorkerNode Worker Node Worker Node Worker Node Worker Node Worker Node Worker Node Worker Node Head Node Cloud Facade client CustomizedapplicationsmaydirectlyinterfaceAtmosphere via itsRESTfulAPI calledthe Cloud Facade Image store (Glance) P. Nowakowski, T. Bartynski, T. Gubala, D. Harezlak, M. Kasztelnik, M. Malawski, J. Meizner, M. Bubak: Cloud Platform for Medical Applications, eScience 2012 (2012)
Data reliability and integrity DRI is a toolwhichcankeepstrack of binary data storedina cloudinfrastructure, monitor data availability and faciliateoptimaldeployment of application services in a hybridcloud (bringingcomputations to data ortheotherwayaround). LOBCDER DRI Service Metadata extensions for DRI A standaloneapplication service, capable of autonomousoperation. It periodicallyverifiesaccess to anydatasetssubmitted for validation and iscapable of issuingalerts to datasetowners and system administrators in case of irregularities. Validation policy Register files Get metadata Migrate LOBs Get usage stats (etc.) Configurable validation runtime (registry-driven) Runtime layer Extensible resource client layer End-user features (browsing, querying, direct access to data, checksumming) Binary data registry Store and marshal data VPH Master Int. OpenStack Swift Cumulus Amazon S3 Data management portlet (with DRI management extensions) Distributed Cloudstorage
Data security in clouds • Data should be secure stored and realiable deleted when no longer needed • Clouds not secureenough, data optimisationspreventingensuringthat data weredeleted • A solution: • end-to-end encryption (decryption key stays in protected/private zone) • data dispersal (portion of data, dispersed between nodes so it’s non-trivial/impossible to recover whole message) Jan Meizner, Marian Bubak, Maciej Malawski, and Piotr Nowakowski: Secure storage and processing of confidential data onpublic clouds.In: Proceedings of the International Conference On Parallel Processing and Applied Mathematics(PPAM) 2013 • To ensuresecurity of data in transit • Modern applicationsusesecuretranportprotocols (e.g.TLS) • For legacyunencryptedprotocolsifabsolutlyneeded, or as additionalsecuritymeasure: • Site-to-Site VPN, e.g. between cloud sites is outside of the instance, might use • Remote access – for individual users accessing e.g. from their laptops
Semanticworkflowcomposition • GworkflowDL language (with A. Hoheisel) • Dynamic, ad-hoc refinement of workflows based on semantic description in ontologies • Novelty • Abstract, functional blocks translated automatically into computation unit candidates (services) • Expansion of a single block into a subworkflow with proper concurrency and parallelism constructs (based on Petri Nets) • Runtime refinement: unknown or failed branches are re-constructed with different computation unit candidates T. Gubala, D. Harezlak, M. Bubak, M. Malawski:Semantic Composition of Scientific Workflows Based on the Petri Nets Formalism.In:"The 2nd IEEE International Conference on e-Science and Grid Computing", IEEE Computer Society Press, http://doi.ieeecomputersociety.org/10.1109/E-SCIENCE.2006.127, 2006
Cooperative virtual laboratory for e-Science • Design of a laboratory for virologists, epidemiologists and clinicians investigating the HIV virus and the possibilities of treating HIV-positive patients • Based on notion of in-silico experiments built and refined by cooperating teams of programmers, scientists and clinicians • Novelty • Employed full concept-prototype-refinement-production circle for virology tools • Set of dedicated yet interoperable tools bind together programmers and scientists for a single task • Support for system-level science with concept of result reuse between different experiments T. Gubala, M. Bubak, P.M.A. Sloot:Semantic Integration of Collaborative Research Environments, chapter XXVI in “Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine and Healthcare”, Information Science Reference IGI Global 2009, ISBN: 978-1-60566-374-6, pages 514-530
Semantic integration for science domains • Concept of describing scientific domains for in-silico experimentation and collaboration within laboratories • Based on separation of the domain model, containing concepts of the subject of experimentation from the integration model, regarding the method of (virtual) experimentation (tools, processes, computations) • Facets defined in integration model are automatically mixed-in concepts from domain model: any piece of data may show any desired behavior • Proposed, designed and deployed themethod for 3 domains of science: • Computational chemistry inside InSilicoLab chemistry portal • Sensor processing for early warning and crisis simulation in UrbanFlood EWS • Processing of results of massive bioinformatic computations for protein folding method comparison • Composition and execution of multiscale simulations • Setup and management of VPH applications T. Gubala, K. Prymula, P. Nowakowski, M. Bubak: Semantic Integration for Model-based Life Science Applications. In: SIMULTECH 2013 Proceedings of the 3rd International Conference on Simulation and Modeling Methodologies, Technologies and Applications, Reykjavik, Iceland 29 - 31 July, 2013, pp. 74-81
GridSpace - platform for e-Science applications • Experiment: an e-science application composed of code fragments (snippets), expressed in either general-purpose scripting programming languages, domain-specific languages or purpose-specific notations. Each snippet is evaluated by a corresponding interpreter. • GridSpace2 Experiment Workbench: a web application - an entry point to GridSpace2. It facilitates exploratory development, execution and management of e-science experiments. • Embedded Experiment: a published experiment embedded in a web site. • GridSpace2 Core: a Java library providing an API for development, storage, management and execution of experiments. Records all available interpreters and their installations on the underlying computational resources. • Computational Resources: servers, clusters, grids, clouds and e-infrastructures where the experiments are computed. E. Ciepiela, D. Harezlak, J. Kocot, T. Bartynski, M. Kasztelnik, P. Nowakowski, T. Gubała, M. Malawski, M. Bubak: ExploratoryProgramming in the Virtual Laboratory. In: Proceedings of the International Multiconference on Computer Science and Information Technology, pp. 621-628, October 2010, thebestpaper award.
Goal: Extending the traditionalscientificpublishing model with computationalaccess and interactivitymechanisms; enablingreaders (includingreviewers) to replicate and verifyexperimentationresults and browselarge-scaleresultspaces. Collage - executable e-Science publications Challenges: Scientific: A commondescriptionschema for primary data (experimental data, algorithms, software, workflows, scripts) as part of publications; deploymentmechanisms for on-demandreenactment of experiments in e-Science. Technological: Anintegratedarchitecture for storing, annotating, publishing, referencing and reusingprimary data sources. Organizational: Provisioning of executablepaper services to a largecommunity of usersrepresentingvariousbranches of computational science; fosteringfurtheruptakethroughinvolvement of major players in the field of scientificpublishing. P. Nowakowski, E. Ciepiela, D. Harężlak, J. Kocot, M. Kasztelnik, T. Bartyński, J. Meizner, G. Dyk, M. Malawski: The Collage Authoring Environment. In: Proceedings of the International Conference on Computational Science, ICCS 2011 (2011), Winner of the Elseview/ICCS Executable Paper Grand Challenge E. Ciepiela, D. Harężlak, M. Kasztelnik, J. Meizner, G. Dyk, P. Nowakowski, M. Bubak: The Collage Authoring Environment: From Proof-of-Concept Prototype to Pilot Servicein ProcediaComputer Science, vol. 18, 2013
GridSpace2 / Collage - Executable e-Science Publications Jun 2012 Dec 2011 Jun 2011 17 • Goal: Extend the traditionalway of authoring and publishingscientificmethods with computationalaccess and interactivitymechanismsthus bringing reproducibility to scientific computationalworkflows and publications • Scientific challenge: Conceive a model and methodology to embracereproducibility in scientificworflows and publications • Technological challenge: supportthese by modern Internet technologies and availablecomputinginfrastructures • Solution proposed: • GridSpace2 – web-orienteddistributedcomputing platform • Collage – authoring environment for executablepublications
GridSpace2 / Collage - Executable e-Science Publications • Results: • GridSpace2/Collage won Executable Paper Grand Challenge in 2011 • Collage was integrated with Elsevier ScienceDirect portal so papers can be linked and presented with corresponding computational experiments • Special Issue of Computers & Graphics journal featuring Collage-basedexecutable papers was released in May 2013 • GridSpace2/Collage hasbeen applied to multiplecomputationalworkflows in the scope of PL-Grid, PL-Grid Plus and Mapperprojects E. Ciepiela, D. Harężlak, M. Kasztelnik, J. Meizner, G. Dyk, P. Nowakowski, M. Bubak: The Collage Authoring Environment: From Proof-of-Concept Prototype to Pilot Service.In: ProcediaComputer Science, vol. 18, 2013 E. Ciepiela, P. Nowakowski, J. Kocot, D. Harężlak, T. Gubała, J. Meizner, M. Kasztelnik, T. Bartyński, M. Malawski, M. Bubak: Managing entire lifecycles of e-science applications in the GridSpace2 virtual laboratory–from motivation through idea to operable web-accessible environment built on top of PL-grid e-infrastructure. In: Building a National Distributed e-Infrastructure–PL-Grid, 2012 P. Nowakowski, E. Ciepiela, D. Harężlak, J. Kocot, M. Kasztelnik, T. Bartyński, J. Meizner, G. Dyk, M. Malawski: The Collage Authoring Environment. In: ProcediaComputer Science, vol. 4, 2011
Common Information Space (CIS) • Facilitatecreation, deployment and robustoperation of EarlyWarning Systems in virtualizedcloud environment • EarlyWarning System (EWS): any system workingaccording to foursteps: monitoring, analysis, judgment, action (e.g. environmental monitoring) • Common Information Space • connectsdistributed component into EWS and deployitoncloud • optimizesresourceusagetakingintoacount EWS importancelevel • provides EWS and selfmonitoring • equipped with autohealing B. Balis, M. Kasztelnik, M. Bubak, T. Bartynski, T. Gubala, P. Nowakowski, J. Broekhuijsen: The UrbanFlood Common Information Space for Early Warning Systems. In: Elsevier Procedia Computer Science, vol 4, pp 96-105, ICCS 2011.
HyperFlow: model & execution engine HyperFlow model JSON serialization { "name":"...", name of the app "processes":[...], processes of the app "functions":[...],functions used by processes "signals":[...], exchanged signals info "ins":[...],inputs of the app "outs":[...]outputs of the app } • Supports a rich set of workflow patterns • Suitable for various application classes • Abstracts from other distributed app aspects (service model, data exchange model, communication protocols, etc.) Simple yet expressive model for complex scientific apps App = set of processes performing well-defined functions and exchanging signals
Platform for distributed applications HyperFlow model & engine for distributed apps App optimization & scheduling Autoscaling and dynamic app reconfiguration Multi-cloud resource provisioning
Colaborativemetadatamanagement Objectives • Provide means for ad-hoc metadata model creation and deployment of corresponding storage facilities • Create a research space for metadata model exchange and discovery with associated data repositories with access restrictions in place • Support different types of storage sites and data transfer protocols • Support the exploratory paradigm by making the models evolve together with data Architecture • Web Interface is used by users to create, extend and discover metadata models • Model repositories are deployed in the PaaS Cloud layer for scalable and reliable access from computing nodes through REST interfaces • Data items from Storage Sites are linked from the model repositories
Multiscaleprogrammingand executiontools • A method and an environment for composing multiscaleapplications from single-scale models • Validation of the themethodagainst real applicationsstructuredusingtools • Extension of applicationcompositiontechniquesto multiscalesimulations • Support for multisite execution of multiscalesimulations • Proof-of-concepttransformation of high-levelformaldescriptionsintoactualexecutionusing e-infrastructures MaMe MAD GridSpace K. Rycerz, E. Ciepiela, G. Dyk, D. Groen, T. Gubala, D. Harezlak, M. Pawlik, J. Suter, S. Zasada, P. Coveney, M. Bubak: Support for Multiscale Simulations with Molecular Dynamics, ProcediaComputer Science, Volume 18, 2013, pp. 1116-1125, ISSN 1877-0509 K. Rycerz, M. Bubak, E. Ciepiela, D. Harezlak, T. Gubala, J. Meizner, M. Pawlik, B.Wilk: Composing, Execution and Sharing of Multiscale Applications, submitted to FutureGenerationComputer Systems, after 1st review (2013) K. Rycerz, M.Bubak, E.Ciepiela, M. Pawlik, O. Hoenen, D. Harezlak, B. Wilk, T. Gubala, J. Meizner, and D. Coster: Enabling Multiscale Fusion Simulations onDistributed Computing Resources, submitted to PLGrid PLUS book 2014 • MAPPER Memory (MaMe) a semantics-aware persistence store to record metadata about models and scales • Multiscale Application Designer (MAD) visual composition tool transforming high level description into executable experiment • GridSpace Experiment Workbench (GridSpace) execution and result management of experiments
Building scientific software based on Feature Model Research on Feature Modeling: • modelling eScience applications family component hierarchy • modelling requirements • methods of mapping Feature Models to Software Product Line architectures Research on adapting Software Product Line principles in scientific software projects: • automatic composition of distributed eScienceapplications based on Feature Model configuration • architectural design of Software Product Line engine framework B. Wilk, M. Bubak, M. Kasztelnik: Software for eScience: from feature modeling to automatic setup of environments, Advances in Software Development, Scientific Papers of the Polish Informations Processing, Society Scientific Council, 2013 pp. 83-96
Topics for collaboration • Optimization of service deployment on clouds • Constraint satisfaction and optimization of multiple criteria (cost, performance) • Static deployment planning and dynamic auto-scaling • Billing and accounting model • Adapted for the federatedcloud infrastructure • Handle multiple billing models • Supporting system-level (e)Science • tools for effective scientific research and collaboration • advanced scientific analyses using HPC/HTC resources • Cloud security • security of data transfer • reliable storage and removal of the data • Cross-cloud service deployment based on container model • dice.cyfronet.pl