1 / 39

eScience on Distributed Infrastructure in Poland

This presentation discusses the eScience initiatives and distributed infrastructure in Poland, focusing on the PL-Grid Consortium, training and support for users, platforms and tools, and international cooperation.

wintonm
Download Presentation

eScience on Distributed Infrastructure in Poland

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. eScience on Distributed Infrastructure in Poland Marian Bubak AGH University of Science and Technology ACC Cyfronet Krakow, Poland dice.cyfronet.pl PLAN-E, the Platform of National eScience/Data Research Centers in Europe, 29-30 September 2014, Amsterdam

  2. Outline ACC Cyfronet AGH PL-GridConsortium and Programme Focus on users: training and support Platforms and tools: towards PL-ecosystem International cooperation, conferences Summary

  3. Credits • ACC Cyfronet AGH • MichałTurała • Krzysztof Zieliński • Karol Krawentek • AgnieszkaSzymańska • Maciej Twardy • Angelika Zaleska-Walterbach • Andrzej Oziębło • ZofiaMosurska • Marcin Radecki • RenataSłota • Tomasz Gubała • Darin Nikolow • Aleksandra Pałuk • PatrykLasoń • Marek Magryś • ŁukaszFlis • ICM • Marek Niezgódka • Piotr Bała • Maciej Filocha • PCSS • MaciejStroiński • Norbert Meyer • Krzysztof Kurowski • BartekPalak • Tomasz Piontek • DawidSzejnfeld • PawełWolniewicz • WCSS • Paweł Tykierko • Paweł Dziekoński • Bartłomiej Balcerek • TASK • Rafał Tylman • Mścislaw Nakonieczny • Jarosław Rybicki … and manyothers domainexperts….

  4. ACC Cyfronet AGH Participation and coordination of national and international scientificprojects. Computational power, storage and libraries for scientificresearch. Coordinator of PL-GridInfrastructure Development. 40 years of expertise High Performance Computing Centre of Competence High Performance Networking Main node of Cracow MAN. South Poland mainnode of PIONIER network. Access to GEANT network.

  5. Motivation and background • Worldprogressin Big Science: • Theory, Experiment, Simulation • Experiments in silico: • advanced, distributed computing • big international collaboration • e-Science and e-Infrastructure interaction Numericallyintensivecomputing Data intensivecomputing • Needs: • increase of resources • support for making science • Computational Science problems to be addressed: • algoritms, environments and deployment • 4th paradigm, Big Data, Data Farming

  6. PL-GridConsortium • Consortiumcreation – January 2007 • a response to requirements from Polish scientists • due to ongoing Grid activities in Europe (EGEE, EGI_DS) • Aim:significantextension of amount of computing resources provided to the scientific community (start of the PL-GridProgramme) • Development based on: • projectsfunded by the EuropeanRegional Development Fund as part of the InnovativeEconomy Program • closeinternationalcollaboration (EGI, ….) • previousprojects (5FP, 6FP, 7FP, EDA…) • National Network Infrastructure available: Pionier National Project • computingresources: Top500 list • Polish scientific communities: ~75% highly rated Polish publications in 5 Communities PL-Grid Consortium members: 5 High Performance Computing Polish Centres, representing Communities, coordinated by ACC Cyfronet AGH

  7. PL-Gridand PLGrid Plus in short • PL-Grid Project (2009–2012) • Budget: total 21 M€, from EU 17M€ • Outcome: Common base infrastructure • National Grid Infrastructure (NGI_PL) • Resources: 230 Tflops, 3.6 PB • PLGrid Plus Project (2011–2014) • Budget: total ca.18M€, from EU: ca.15 M€ • Expected outcome: • focus on users • specific computing environments • QoS by SLM • Extension of resourcesand services by: • 500 Tflops, 4.4 PB • Keeping diversityfor users • Clusters (thin and thick nodes, GPU) • SMP, vSMP, Clouds

  8. Polish Infrastructure for Supporting Computational Science in the European Research Space – PL-Grid PL-Gridproject Budget: total 21 m€, from EC 17m€ Duration: 1.1.2009 – 31.3.2012 Managed by the PL-GridConsortiummade up of 5 Polish supercomputing and networking centres Project coordinator: Academic Computer Centre Cyfronet AGH, Krakow, Poland Project web site: projekt.plgrid.pl PL-Grid aimed at significantly extending the amount of computing resources provided to the Polish scientific community (by approximately 215 TFlops of computing power and 2500 TB of storage capacity) and constructing a Grid system that would facilitate effective and innovative use of the available resources.   Main Project Objectives: Common (compatible) base infrastructure Capacity to construct specialized, domain Grid systems for specific applications Efficient use of available financial resources Focus on HPC and Scalability Computing for domain specific Grids

  9. PL-Gridproject–results • First working NGI in Europe in the framework of EGI.eu (since March 31, 2010) • Number of users (March 2012): 900+ • Number of jobs per month: 750,000 - 1,500,000 • Resourcesavailable: • Computing power: ca. 230 TFlops • Storage:ca. 3600 TBytes • High level of availiability and realibility of the resources • Facilitating effective use of these resources by providing: • innovative grid services and end-user toolslikeEfficient Resource Allocation, Experimental Workbenchand GridMiddleware • Scientific Software Packages • User support: helpdesk system, broadtrainingoffer • Various, well-performeddissemination activities, carried out at national and internationallevels, whichcontributed to increasing of awareness and knowledge about the Project and the grid technology in Poland. • Publication of the book presenting the scientific and technical achievements of the Polish NGI in the Springer Publisher, in March 2012: • „Building a National Distributed e-Infrastructure – PL-Grid” • In Lecture Notes in Computer Science, Vol. 7136, subseries: Information Systems and Applications • Content: 26 articles describingthe experience and the scientific results obtained by the PL-Grid project partners as well as the outcome of research and development activities carried out within the Project.

  10. Domain-oriented services and resources of Polish Infrastructure for Supporting Computational Science in the European Research Space – PLGrid Plus PLGrid Plus project • Budget: total ca. 18M€including funding from the EC: ca.15 M€ • Duration: 1.10.2011 – 31.12.2014 • Five PL-GridConsortiumPartners • ProjectCoordinator: ACC CYFRONET AGH The main aim of the PLGrid Plus projectis to increasepotential of the Polish Science by providing the necessary IT services for researchteams in Poland, in line with Europeansolutions. • Preparation of specific computing environments  so called domain grids  i.e. solutions, services and extended infrastructure (including software), tailored to the needs of different groups of scientists. • Domain-specific solutions created for13 groups of users, representing the strategic areas and important topics for the Polish and international science:

  11. PLGrid Plus project–activities • Integration Services • National and International levels • Dedicated Portals and Environments • Unification of distributed Databases • Virtual Laboratiories • Remote Visualization • Service value = utility + warranty • SLA management • Computing IntensiveSolutions • Specific Computing Environments • Adoption of suitable algorithms and solutions • Workflows • Cloud computing • Porting Scientific Packages • Data Intensive Computing • Access to distributed Scientific Databases • Homogeneousaccess to distributed data • Data discovery, process, visualization, validation…. • 4th Paradigm of scientific research • Instruments in Grid • Remote Transparent Access to instruments • Sensor networks • Organizational • Organizational backbone • Professional support for specific disciplines and topics

  12. PLGrid Plus project–results • New domain-specific services for 13 identified scientific domains • Extension of the resources available in the PL-Grid Infrastructure by ca. 500 TFlops of computing power and ca. 4.4 PBytes of storage capacity • Design and start-up of support for new domain grids • Deployment of Quality of Service system for users by introducing SLA agreement • Deployment of new infrastructure services • Deployment of Cloud infrastructure for users • Broad consultancy, training and dissemination offer • Publication of the book presenting the scientific and technical achievements of PLGrid Plus in the Springer Publisher, in September 2014: • „eScience on Distributed Computing Infrastructure” • In Lecture Notes in Computer Science, Vol. 8500, subseries: Information Systems and Applications • Content: 36 articles describingthe experience and the scientific results obtained by thePLGridPlus project partners as well as the outcome of research and development activities carried out within the Project. • Hugeeffort of 147 authors, 76 reviewers and editors team in Cyfronet

  13. New generation domain-specific services in the PL-Grid infrastructure for Polish Science PLGrid NG project • Budget: total ca. 14 889 773,23 PLN, including funding from the EC: 12 651 715,38 PLN • Duration: 01.01.2014 – 31.10.2015 • Five PL-GridConsortiumPartners • ProjectCoordinator: ACC CYFRONET AGH The aim of the PLGrid NG project is to provide a set of dedicated, domain-specific computing services for 14 new groups of researchers and implementation of these services in the PL-Gridnational computing infrastructure. Biology OpenOxides Medicine Complex Networks UNRES eBaltic-Grid PersonalizedMedicine Nuclear Power and CFD Computational Chemistry Mathematics Geoinformatics Metal Processing Technologies Meteorology Hydrology

  14. PLGrid NG project–activities Tasks: • Additional groups of experts involved − identified 14 communities/scientific topics • Development and maintenance of the IT infrastructure In line with the best IT ServiceManagement (ITSM) practices, such as ITIL or ISO-20000 • Security on new applications, audits In the development stage, before deployment and during exploitation • Optimization of resource usage − IT experts Operation Center • Optimization of application porting • User support First-line support, Helpdesk, domain experts, training New Advanced Service Platforms Application Application Application Application DomainGrid DomainGrid DomainGrid DomainGrid Gridinfrastructure (Grid services) PL-Grid Clusters High Performance Computers Data repositories National Computer Network PIONIER

  15. Competence Centre in the Field of Distributed Computing Grid Infrastructures PLGrid Coreproject • Budget: total 104 949 901,16 PLN, including funding from the EC : 89 207 415,99 PLN • Duration: 01.01.2014 – 31.11.2015 • ProjectCoordinator: Academic Computer Centre CYFRONET AGH The main objective of the project is to support the development of ACC Cyfronet AGH as a specialized competencecentrein the field of distributed computing infrastructures, with particular emphasis on grid technologies, cloud computing and infrastructures supporting computations on big data.

  16. PLGrid Coreproject– services • Basic infrastructure services • Uniform access to distributed data • PaaSCloud for scientists • Applications maintenance environment of MapReduce type • End-user services • Technologies and environmentsimplementing the Open Science paradigm • Computing environment for interactiveprocessing of scientific data • Platform for development and execution of large-scale applications organized in a workflow • Automatic selection of scientific literature • Environment supporting data farming mass computations

  17. Focus on users Help Desk QoS/SLM Domain Experts User friendly Services Computercentres Hardware/Software Grants Real Users

  18. User support Interdisciplinary team of IT experts with extensive knowledge on • different programming methods used in research: parallel, distributed and GPGPU cards programming • various scientific software • the specifics of work with HPC/Cloud systems • various aspects of work with large data sets Support methods • PL-Grid Infrastructure user support systems (Helpdesk, User’s Forum) • documentation services, PL-Grid User’s Manual • f2f meetings and consultations in ACC Cyfronet AGH and users' home institutions • International cooperation • cooperation with various institutions and initiatives dedicated to scientists’ training: Software Sustainability Institute (UK), Software Carpentry, Data Carpentry, Mozilla Science Lab, ELIXIR UK • Cyfronet is making every effort to become a Software Carpentryregional center in Poland or Central Europe Users of the Cyfronetcomputingresourcesareprovided with support and professionalhelp in solvinganyproblemsrelated to access and effectiveuse of theseresources.

  19. Training • Training on basic and advanced services • traditional − in ACC Cyfronet AGH or in the interested users’ home scientific institutions • remote −using ateleconference platform (Adobe Connect) and e-learning platforms (Blackboard Learn – currently; Moodle – planned) • Courses are prepared based on the experts' experience gained a.o. during previous projects • A survey assessing the training is performed after each course

  20. PL-GridInfrastructureusers PL-Grid Users Employees All accounts

  21. Gridusersof global services

  22. PL-Gridusersof domain-specific services

  23. GridSpace: a platform for e-Science applications • Experiment: an e-science application composed of code fragments (snippets), expressed in either general-purpose scripting programming languages, domain-specific languages or purpose-specific notations. Each snippet is evaluated by a corresponding interpreter. • GridSpace2 Experiment Workbench: a web application - an entry point to GridSpace2. It facilitates exploratory development, execution and management of e-science experiments. • Embedded Experiment: a published experiment embedded in a web site. • GridSpace2 Core: a Java library providing an API for development, storage, management and execution of experiments. Records all available interpreters and their installations on the underlying computational resources. • Computational Resources: servers, clusters, grids, clouds and e-infrastructures where the experiments are computed.

  24. Collage: executable e-Science publications Goal: Extending the traditionalscientificpublishing model with computationalaccess and interactivitymechanisms; enablingreaders (includingreviewers) to replicate and verifyexperimentationresults and browselarge-scaleresultspaces. Challenges: Scientific: A commondescriptionschema for primary data (experimental data, algorithms, software, workflows, scripts) as part of publications; deploymentmechanisms for on-demandreenactment of experiments in e-Science. Technological: Anintegratedarchitecture for storing, annotating, publishing, referencing and reusingprimary data sources. Organizational: Provisioning of executablepaper services to a largecommunity of usersrepresentingvariousbranches of computational science; fosteringfurtheruptakethroughinvolvement of major players in the field of scientificpublishing.

  25. DataNet: colaborativemetadatamanagement Objectives • Provide means for ad-hoc metadata model creation and deployment of corresponding storage facilities • Create a research space for metadata model exchange and discovery with associated data repositories with access restrictions in place • Support different types of storage sites and data transfer protocols • Support the exploratory paradigm by making the models evolve together with data Architecture • Web Interface is used by users to create, extend and discover metadata models • Model repositories are deployed in the PaaS Cloud layer for scalable and reliable access from computing nodes through REST interfaces • Data items from Storage Sites are linked from the model repositories

  26. CloudPlatform: resourceallocationmanagement The Atmosphere Cloud Platform is a one-stop management service for hybridcloud resources, ensuringoptimaldeployment of application services on the underlying hardware. Admin External application VPH-Share Master Int. VPH-ShareCore Services Host OpenStack/Nova Computational Cloud Site Amazon EC2 Other CS Atmosphere Management Service (AMS) Developer Scientist CloudFacade (secureRESTful API ) AtmosphereInternal Registry (AIR) Cloud stackplugins (Fog) Cloud Manager Development Mode Generic Invoker Workflow management WorkerNode Worker Node Worker Node Worker Node Worker Node Worker Node Worker Node Worker Node Head Node Cloud Facade client Image store (Glance) CustomizedapplicationsmaydirectlyinterfaceAtmosphere via itsRESTful API called the CloudFacade.

  27. InSilicoLab science gateway framework • Goals • Complex computations done in non-complex way • Separating users from the concept of jobs and the infrastructure • Modelling the computation scenarios in an intuitive way • Different granularity of the computations • Interactive nature of applications • Dependencies between applications • Summary • The framework proved to be an easy way to integrate new domain-specific scenarios • Even if done by external teams • Natively supports multiple types of computational resources • Including private resources – e.g. private clouds • Supports various types of computations Architecture of the InSilicoLab framework: Domain Layer, Mediation Layerwith its Core Services, and Resource Layer. In the Resource Layer, Workers (`W') ofdifferent kinds (markedwith different colors) are shown.

  28. Scalarm Scalarm overview What problemsare addressed with Scalarm ? Self-scalable platform adapting to experiment size and simulation type Exploratory approach for conducting experiments Supporting online analysis of experiment partial results Integrates with clusters, Grids, Clouds Data farming experiments with an exploratory approach Parameter space generation with support of design of experiment methods Accessing heterogeneous computational infrastructure Self-scalability of the management part

  29. Veilfs A system operating in the user space (i.e.FUSE), which virtualizes organizationally distributed, heterogeneous storage systems to obtain uniform and efficient access to data. End users access the data stored within VeilFS through one of the provided userinterfaces: • FUSE client, which implements a file system in user space to cover the datalocation and exposes a standard POSIX file system interface, • Web-based GUI, which allows data management via any Internet browser, • REST API. Functionalitiesprovided by VeilFS

  30. Chemistry InSilicoLab for chemistry The service aims to support the launch of complex computational quantum chemistry experiments in the PL-GridInfrastructure. Experiments of this service facilitate planning sequential computation schemes that require the preparation of series of data files, based on a common schema.

  31. Metallurgy Simulations of extrusion process in 3D MainObjective: Optimization of the metallurgical process of profiles extrusion. Optimization includes: shape of foramera, channel position on a die, calibrationstripes, extrusion velocity, ingot temperatures, tools. The proposed grid-based software simulatesextrusion of thinprofiles and rods of special alloys of magnesium, containing calcium supplements. These alloys are characterized by extremely low technologicalplasticity during metal forming. The FEM mathematical modeldeveloped.

  32. Life Science Integromics – a system for researchers from biomedicine and biotechnology • The system was developed to allow: • data collection from experiments, laboratory diagnostics, diagnostic imaging, instrumental analysis and from medical interview, • integration, management, processing and analysis of the collected data using specialized software and some of data mining techniques, • hypotheses generation, • data sharing and presentation of the results. Example: The diagram of an artificial neural network used to classify patients based on the expression of selected genes. The used method will allow to raise new hypotheses about the influence of individualgeneson changes in the organisms.

  33. SynchroGrid Elegant − the service for those involved in the design and operation of Synchrotron • Objectives: • Preparation of tools needed to Synchrotron deployment and running, aimed at operations and research of the beam line. • Addressing the estimated users’ needs in this scientific area focusing on data access and management – especially the metadata for the experimental data gathered during the beam time. • The developed service consists in: • provision of theelegant (ELEctron Generation ANd Tracking)application inthe parallel version on a cluster, • configuring the Matlab software to read output files produced by this application in a Self Describing Data Sets (SDDS) format and to generate the final results in the form of drawings.

  34. International cooperation– EU fundedprojects ACC Cyfronet AGH is involved in numerous projects co-financed by the EU funds and the Polish government. Research conducted in Cyfronet focus on: • grid and cloud environments, • programming paradigms, • research portals, • efficient use of computing and storage resources, • reconfigurable FPGA and GPGPU computing systems.

  35. National projects

  36. Organization of conferences Cyfronet for many years has been organizing national and international conferences, workshops and seminars, which bring together computer scientists and researchers involved in the creation, development and application of information technologies, as well as the users of these technologies. The Centre has also initiated a series of conferences: • CGW Workshop, heldyearlysince 2001 • ACC Cyfronet AGH Users' Conference, heldyearly since 2008 • as well as International Conference on Computational Science (ICCS),organizedtwice: in 2004 and 2008 ’01 http://www.cyfronet.krakow.pl/cgw14/

  37. Organization of conferences CGW Workshop Proceedings

  38. Summary: what we offer • We develope and deployresearch e-infrastructure in threedimensions: • Network & Future Internet • HPC/GRID/CLOUDs • Data & Knowledge layer • Deploymentshave the nationalscope; however with closeEuropeanlinks • Developmentsoriented on end-users & projects • Achievingsynergybetweenresearchprojects and e-infrastructuresby closecooperation and offeringrelevant services • Durabilityatleast 5 yearsafterfinishing the projects - confirmed in contracts • Futureplans: continuation of current policy with a support from EU StructuralFunds • Center of Excellence in Life Science • CGW as a place to exchange experienceand for collaboration between eScience centers in Europe

  39. More information • www.cyfronet.krakow.pl/en • www.plgrid.pl/en • www.cyfronet.krakow.pl/cgw14 • www.cyfronet.krakow.pl/kdm14 • dice.cyfronet.pl

More Related