420 likes | 591 Views
EPOS e-Infrastructure. Keith G Jeffery Natural Environment Research Council keith.jeffery@stfc.ac.uk. Structure of Presentation. Who? EPOS Rationale and approach e-Infrastructure Basics Related Projects (Torild van Eck) Proposed Approach ICT Board Conclusion. Who?.
E N D
EPOS e-Infrastructure Keith G Jeffery Natural Environment Research Council keith.jeffery@stfc.ac.uk
Structure of Presentation • Who? • EPOS Rationale and approach • e-Infrastructure Basics • Related Projects (Torild van Eck) • Proposed Approach • ICT Board • Conclusion
Who? • STFC Director IT & International Strategy • Note STFC runs large research facilities • Author of original paper on e-Science 1999 • Led to a £500m programme in UK • Led to large EC-funded programme • Led to ERCIM-led CoreGRIDNoE • Chair EC Expert Group on GRIDs (2002-2007) • Co-Convenor EC-Expert Group on CLOUDs (2009-2010) • Executive Secretary of ERF (national facilities / international access) www.europeanresearchfacilities.eu • President ERCIM www.ercim.org • President euroCRIS www.eurocris.org • Chair Alliance for Permanent Access to the Records of Science www.alliancepermanentaccess.eu • Board Member EOS (Enabling Open Access) www.openscholarship.org
BUT....(important for EPOS) • STFC Director IT & International Strategy • Note STFC runs large research facilities • Author of original paper on e-Science 1999 • Led to a £500m programme in UK • Led to large EC-funded programme • Led to ERCIM-led CoreGRID NoE • Chair EC Expert Group on GRIDs (2002-2007) • Co-Convenor EC-Expert Group on CLOUDs (2009-2010) • Executive Secretary of ERF (national facilities / international access) • President ERCIM www.ercim.org • President euroCRIS www.eurocris.org • Chair Alliance for Permanent Access to the Records of Science www.alliancepermanentaccess.eu • Board Member EOS (Enabling Open Access) www.openscholarship.org • BSc (1968)and PhD (1971) are in Geology • (the PhD with a large IT content)
Rutherford Appleton Laboratory STFC Rutherford Appleton Laboratory
Structure of Presentation • Who? • EPOS Rationale and approach • e-Infrastructure Basics • Related Projects (Torild van Eck) • Proposed Approach • ICT Board • Conclusion
EPOS Concept Massimo Cocco
Structure of Presentation • Who? • EPOS Rationale and approach • e-Infrastructure Basics • Related Projects (Torild van Eck) • Proposed Approach • ICT Board • Conclusion
e-Infrastructure Basics • GRIDs • Clouds • Web 2.0 • SOA (Service-Oriented Architecture) • Research process • Fourth paradigm (Data Intensive Scientific Discovery) • Virtualisation • Autonomicity • Security, Privacy, Trust • Performance • Development • Maintenance
Internet 1.5 billion fixed connections Estimated 4 billion mobile connections Digital Storage Estimated 280 billion Gigabytes (280 exabytes – 280*10**18) Expect all to grow ~ 1 order of magnitude in 4 years and accelerating) Users : Asia 550 million 14% penetration Europe 350 million 50% penetration USA 250 million 70% penetration Scalability Trust & security & privacy Manageability Accessability Useability Representativity CONTEXT Last 20 years CPU 10**16 Storage 10**18 Networks 10**4
The GRIDs Vision • The end-user interacts with the GRIDs environment to clarify the request • using a ‘device’ or ‘appliance’ • The GRIDs environment proposes a ‘deal’ to satisfy the request • which may or may not involve money • The user accepts or rejects the ‘deal’
The GRIDs Vision • The GRIDs environment is such that • A user can interact with it intelligently • It provides transparent access to • data, information, knowledge • computation • instrumentation / detectors • http://epubs.cclrc.ac.uk/work-details?w=28736
Knowledge Layer Information Layer Data toKnowledge Control Computation / Data Layer The GRIDs Architecture: Layering The GRIDs Architecture
U:USER R:RESOURCE S:SOURCE A POSSIBLE ARCHITECTURE The GRIDs Architecture: Components The GRIDs Environment Um:User Metadata Ua:User Agent Sm:Source Metadata Sa:Source Agent Ra:Resource Agent Rm:Resource Metadata brokers
Cloud Computing: The Technology • A very large number of processors • Clustered in racks as blades • In one major computer centre • May be replicated for business continuity • With massive online storage • RAID for resilience • And excellent communications links • For access
Cloud Computing: The Intention • Low cost of entry for customers • Device and location independence • Capacity at reasonable cost (performance, space) • Cloud Operator manages resource sharing balancing different peak loads • Scalable as demand rises from user • Security due to data centralisation and software centralisation • Sustainable and environmentally friendly – concentrated power • it is a service and the user does not know or care from where, by whom, and how it is provided • as long as the SLA (service level agreement) is satisfied
Cloud Computing: What is it? • Is cluster computing • with the advantages that brings performance, scheduling • With GRIDs features • Scheduling / resource allocation • self-* • virtualisation • ASP (Application Service Provider) service • Can be used: • for infrequently required supercomputing • for business continuity / disaster recovery • for total ICT outsourced solution
Cloud Problems • Proprietary offerings (Amazon, Google, Microsoft...) so lock-in • Interoperation attempt failed • Inefficient to move data to the cloud • Despite SLA/QoS guarantees some concerns • performance • security/trust/privacy • So maybe GRID of proprietary CLOUDs?
Web 2.0 • Features: • creativity, communications, secure information sharing, collaboration and functionality • Examples: • Social networking, video-sharing, wikis, blogs, folksonomies • Crowdsourcing to gather information / knowledge wisdom? If you don’t know what Web2.0 is your kids do!
XML datastreams mobile code (Java) plug-ins on browsers (commonly) an easy-to-use software development environment Hierarchic structure linearised – inefficient, inexpressive Is this the best language – procedural and low-level Security implications and increasing memory requirements Usually rather informal Web 2.0 Based on web services
Web 2.0: Criticism • Uses Web 1.0 technology –nothing really new • i.e. http, URI, html/xml • Ideas not new • see Lotus Notes / Domino, videoconferencing etc (CSCW) before Web 2.0 • Tim Berners-Lee dismisses it as hype
Input Parameter definitions Service description (descriptive metadata) Output Parameter definitions Functional Program Code (to deliver the service) Restrictions on use of service (restrictive metadata) Service Oriented Architecture Services: Challenges 1 • Location • Requirements matching • Composing • Utilising metadata
Service Oriented Architecture Services: Challenges 2 Multiple Instances Parallel execution Composition • End-to-end FR satisfaction • End-to-end NFRs satisfaction • Avoiding emergent properties • Conditions of use of services
server server server server Bringing it Together: e-,i-,k-infrastructure k- Deduction & induction – human or machine i- Information Systems e- server Physical detectors
Middleware – and as SOKUs (Service-Oriented Knowledge Utilities) k- K- upper middleware (resolves semantic heterogeneity) K- lower middleware (presents declared semantics) i- Upper middleware (hides syntactic heterogeneity) Lower middleware (hides physical heterogeneity) e-
Research Process: 4th Paradigm Observational Science Experimental Science Modelling Science Hypothesis Characterisation Simulation/modelling Observations Contextual metadata Pre-processing Digital preservation Availability Analysis Visualisation Hypothesis Experimentation Observations Contextual metadata Pre-processing Digital preservation Availability Analysis Visualisation • Observations • Contextual metadata • Pre-processing • Digital preservation • Availability • Analysis • Visualisation (Concept from Jim Gray 1944-2007) DATA-INTENSIVE SCIENCE
Structure of Presentation • Who? • EPOS Rationale and approach • e-Infrastructure Basics • Related Projects (Torild van Eck) • Proposed Approach • ICT Board • Conclusion
Related Projects EPOS e-infrastructure has to fit in with • ESFRI Roadmap projects in Environmental Cluster (Wouter Los) • ESFRI roadmap projects in other clusters • Physical sciences (STM) • Economic/social science • Arts and humanities • PRACE (supercomputing) • National e-infrastructures for e-Research • Especially geoscience • Other international projects (Noth America, Pacific Rim, South America...) From Torild van Eck
EPOS IT relevant EC-project projects + proposal (summary) GEM Hazard EC projects starting 2010 SHARE Hazard ETHZ (D. Giardini) NERA Seismology & Earthq Eng. ETHZ + ORFEUS/KNMI (D. Giardini; T. van Eck) EPOS PP INGV (Massimo Cocco) QUEST (Training network) Computational Seismology LMU (H. Igel) EPOS (ESFRI roadmap) VERCE IPGP (J-P Vilotte) ORFEUS/KNMI EMSC INGV LMU Univ Liverpool BAW CINECA Fraunhofer UoE (IT) INFRA-2011-1.2.1 EUDAT CSC Finland (Kimmo Koski) EPOS (GFZ, INGV) LifeWatch … CINECA (IT) UoE (IT) … INFRA-2011-1.2.2 ENVRI LifeWatch (Wouter Los) EPOS (ORFEUS/KNMI) LifeWatch EPOS EMSO EISCAT ICOS … STFC (IT) UoE (IT) … INFRA-2011-2.3.3 Deadline: Nov 23 Deadline: Nov 23 Deadline: Nov 25 Project proposals 2010 INFRASTR. 2011-1 Call 8/9
Data providers & Users Humans & Instruments Roles Sensors Curators Researchers Observers Aggregators Public Functionalities Virtual Environments & Collaborative organisations Security & Protection Three layers (slide by Peter Wittenburg and Wouter Los) Data generators & Users Data discovery & Navigation (meta) data tagging tools Data submission tools Operational Semantic Interoperability Workflow Generator Data correlation Knowledge management Virtualisation Community Support Services Data Services Persistant storage capacity 24/7 operation Preservation & Sustainability Authenticity Certification & Integrity GUIDs Generic interoperability Technical Legal Semantic
Structure of Presentation • Who? • EPOS Rationale and approach • e-Infrastructure Basics • Related Projects (Torild van Eck) • Proposed Approach • ICT Board • Conclusion
e-Infrastructure Requirement • Data collection, calibration, validation • Data cataloguing and indexing • Data preservation and curation • Information processing – retrieval, analysis, visualisation • Hypothesis processing – simulation, modelling, analysis, visualisation • Hypothesis generation – data mining • Knowledge processing – integration of ICT with human processing – theory processing, user interface, scholarly communication (open access) • External interoperation – physical and medical sciences, economic and social sciences, arts and humanities • Dissemination – outreach (website plus) • Education and training • Management and Coordination
Key e-Infrastructure Principles • Mobile code: ability to move code to data because data large and costly to transport • Virtualisation: user neither knows nor cares where computing done or where data located as long as QoS/SLA met • Autonomicity: (self-*) because human management of ICT too expensive / slow
Key e-Infrastructure Challenges • Interoperation • Access to heterogeneous distributed data sources • Schema integration – syntactic and semantic • Security/privacy/trust • Identification – authentication – authorisation – accounting • Performance • Towards exascale processing (simulation/modelling) • Towards exabyte data streams (1.0*10**18)
Steps to achieve EPOS e-Infrastructure1 • Define / Agree requirements of end-user (document dynamically) • Including expected future requirements • Survey available data/information sources (document dynamically) • Detector systems • Repositories / databases / file systems • Data, documents, metadata, contextual data • Conditions of use – QoS, SLA (link to governance) • Define schema mappings, convertors for interoperation (document dynamically) • Canonical interoperation standard? • Note CERIF (Common European Research Information Format)
Steps to achieve EPOS e-Infrastructure2 • Survey available computing and computation resources (document dynamically) • Detector systems • Data servers • HPC • Conditions of use – QoS, SLA (link to governance) • Define access and utilisation of ICT (document dynamically) • User identification, authentication, authorisation, accounting (security, privacy) • Available services • Conditions of use – QoS, SLA (link to governance) • Design first-cut ICT architecture (document dynamically) • GEANT network • GRIDs (EGI) middleware • Web services software • Web portal(s) user interface
Structure of Presentation • Who? • EPOS Rationale and approach • e-Infrastructure Basics • Related Projects (Torild van Eck) • Proposed Approach • ICT Board • Conclusion
ICT Board The ICT Board will provide expert support and evaluate the e-infrastructure implementation plan as well as it will oversee the development of an architectural model for the EPOS infrastructure. In particular, the ICT Board will evaluate the data flow organization, the workflow management and the user interface definition and services. The board will consist of experts nominated through ERCIM (www.ercim.org) to ensure quality and integration with other e-infrastructure initiatives.
Structure of Presentation • Who? • EPOS Rationale and approach • e-Infrastructure Basics • Related Projects (Torild van Eck) • Proposed Approach • ICT Board • Conclusion
Conclusion(take-home messages) • EPOS is a HUGE CHALLENGE • EPOS requires LEADING EDGE ICT to support LEADING EDGE GEOSCIENCE • EPOS e-Infrastructure is the ‘GLUE’ • EPOS is going to be FUN! ********* Prof Keith G Jeffery CEng, CITP, FGS, FBCS, HFICS keith.jeffery@stfc.ac.uk