190 likes | 357 Views
DFC Vision. Build collaboration environment Sharing of data, information , and knowledge Form national data cyberinfrastructure Federation of existing data management systems Support reproducible data-driven research Encapsulate knowledge within shared workflows
E N D
DFC Vision • Build collaboration environment • Sharing of data, information, and knowledge • Form national data cyberinfrastructure • Federation of existing data management systems • Support reproducible data-driven research • Encapsulate knowledge within shared workflows • Enable student participation in research • Policy-controlled analysis of “live” data Compute Resources – HPC centers, institutional clusters DFC Collaboration Environment – Data Grid NEW Community Resources – Repository, Catalog
Data Driven Science and Engineering • Plant biology – the iPlant Collaborative • Enable collaborative research across existing data repositories • Cognitive science – the Temporal Dynamics of Learning Center • Manage research data, apply IRB policies • Social Science – the Odum Institute • Integrate policy-based data management with the existing Dataverse repository Collaboration Environments • Oceanography – Ocean Observatory Initiative • Archiving climatic data records from real-time sensor data streams • Engineering – CIBER-U • Engineering Digital Library: Curating civil engineering data, materials data, archaeology data, student training materials • Hydrology - EarthCube • Automating hydrology research workflows (data retrieval, transformation, analysis)
Challenges • Federated national data cyberinfrastructure • Existing projects have web services, data repositories, digital libraries, archives, processing pipelines, science portals • What are the interoperability mechanisms needed to enable federation of existing resources?
DFC Builds on the iRODS data grid (integrated Rule Oriented Data System) • Astrophysics Auger supernova search • Atmospheric science NASA Langley Atmospheric Sciences Center • Biology Phylogenetics at CC IN2P3 • Climate NOAA National Climatic Data Center • Cognitive Science Temporal Dynamics of Learning Center • Computer Science GENI experimental network • Cosmic Ray AMS experiment on the International Space Station • Dark Matter Physics Edelweiss II • Earth Science NASA Center for Climate Simulations • Ecology CEED Caveat Emptor Ecological Data • Engineering CIBER-U • High Energy Physics BaBar / Stanford Linear Accelerator • Hydrology Institute for the Environment, UNC-CH; Hydroshare • Genomics Broad Institute, Wellcome Trust Sanger Institute, NGS • Medicine Sick Kids Hospital • Neuroscience International Neuroinformatics Coordinating Facility • Neutrino Physics T2K and dChooz neutrino experiments • Oceanography Ocean Observatories Initiative • Optical Astronomy National Optical Astronomy Observatory • Particle Physics Indra multi-detector collaboration at IN2P3 • Plant genetics the iPlant Collaborative • Quantum Chromodynamics IN2P3 • Radio Astronomy Cyber Square Kilometer Array, TREND, BAOradio • Seismology Southern California Earthquake Center • Social Science Odum, TerraPop
Policy Concept Graph Policy Enforcement Purpose Persistent State Collection Property Procedure Policy Purpose DATA_ID DATA_REPL_NUM DATA_CHECKSUM Collection Defines Replication Policy Isa Isa Isa Has Has Isa Checksum Policy Defines Digital Object Attribute Has Isa Quota Policy Has Isa Integrity Data Type Policy Isa Updates Isa Isa Authenticity Persistent State Information Isa Property Policy Procedure Defines Updates Controls Isa Access control Isa SubType Has HasFeature GetUserACL Periodic Assessment Criteria Policy HasFeature Workflow Isa Policy Enforcement Point SetDataType Completeness HasFeature Chains Isa SetQuota Correctness Isa Function HasFeature Invokes Isa DataObjRepl Consensus Isa Isa SysChksumDataObj Operation Consistency Client Action
Policy-based Data Management – Implementation in iRODS Purpose (5 main types) DATA_ID DATA_REPL_NUM DATA_CHECKSUM Collection Defines SubType Replication Policy Has Isa Isa Isa Has Isa Archive Data grid Collection Digital Library Processing Pipeline Checksum Policy Digital Object Attribute Has Isa Quota Policy Has Isa Defines Integrity Data Type Policy Isa Updates Isa Isa Persistent State Information (338) Authenticity Isa Property (7 default) Policy (11 default) Procedure(11 default) Defines Updates Controls Access control Isa Isa SubType Has HasFeature msiGetUserACL Periodic Assessment Criteria Policy HasFeature Workflow Isa Policy Enforcement Points (70) msiSetDataType Completeness HasFeature Chains Isa msiSetQuota Correctness Isa Micro-service (317) HasFeature Invokes Isa msiDataObjRepl Consensus Isa Isa msiSysChksumDataObj Operation Consistency Clients (50)
Federation Approach • Use middleware to implement unifying name spaces for: • Users Single sign-on • Collections Directories, workflow, time series • Objects Files, soft links, workflows • Storage systems Cloud, tape, file systems, objects • Metadata Provenance, description, state • PoliciesManagement, assessment • Micro-services Procedures, interactions DFC - CNI
DFC Federation Hub ooi icat.oceanobservatories.org: 1247 renci Iren2.renci.org: 1247 engineering irods.ischool.drexel.edu: 1247 hydrology iren2.renci.org: 2823 odumMain iodum1.irss.unc.edu: 1247 TDLC tdlc-01.sdsc.edu: 6688 dfctest dfctest.renci.org: 1248 Port: 1237, Zone: dfcmain iCAT iren2.renci.org res-bk15 srbbrick15.ucsd.edu res-dfcmain iren2.renci.org demoResc iren2.renci.org hydroResc hydro.renci.org
National Infrastructure Existing infrastructure XSEDE Kepler OOI TDLC iPlant CUAHSI NCDC Dataverse GeoBrain DataONE NCSA Polyglot Research Environment - Portals, Applications, Workflows DFC Collaboration Environment – Data Grid Community Resource Repository Community Resource Catalog Community Resource Services DFC - CNI
The Future: Reproducible Research Sensors Simulation Literature Archives Experiments Petabytes Doubling every Two years The Challenge: Support reproducible data-driven research Deliver the capability to manage, mine, and publish knowledge through collaboration environments. DFC - CNI
National Infrastructure Approach • Build national data cyberinfrastructure prototype • Support multiple science and engineering domains by loosely couplingtheir existing infrastructure with a collaboration environment • Develop generic interoperability framework • Define the generic infrastructure needed for the national infrastructure to manage knowledge as well as data and information • Define interoperability mechanisms • Support access across the disparate types of infrastructurein common use • Define domain specific extensions • Support three levels: technical interoperability, project level policy, and end user usage requirements
Interoperability Mechanisms Policies control execution of each interoperability mechanism Analysis Workflows Knowledge Creation Knowledge Procedures : Micro-services Knowledge Management Soft Links Collection Registration Information Message Queue Information Exchange Database Query Information Manipulation Micro-services Data Access Data Storage Driver Data Manipulation DFC - CNI
DataNet Interoperability Research Environment- Portals, Applications, Workflows DFC Data Grid DFC Collaboration Environment SEAD Portal (VIVO) Message Queue Web Service DataONE Coordinating Node DataONE Member Node SEAD Data TerraPop Server DFC Data Grid SEAD Engagement Center DFC - CNI
DFC Interoperability Layers Authentication InCommon, GSI, Kerberos, Shibboleth, LDAP PAM / GSSAPI Data Access DataONE, Data Conservancy, CUAHSI, NCDC Micro-Services Data Manipulation NetCDF, HDF5, THREDDS, ERDDAP Format Drivers Workflows Kepler, NCSA Cyberintegrator, Taverna, NCSA Polyglot Micro-Services Networks HTTPS, TCP/IP, Parallel TCP/IP, RBUDP Network Drivers Clients Web browsers, Web Services, Workflows, FUSE, Synchronization, MediaWiki OpenSocial Storage Systems File Systems, Tape Archives, Object Stores, Cloud Storage Storage Drivers Messaging AMQP, iRODSXmsg Micro-Services Vocabulary HIVE, (Cheshire) Micro-Services Management (RDA Policies), (ISO 16363 Criteria) Policies DFC - CNI
Interoperability Mechanisms • Drivers • Encapsulate knowledge to support your operations at the remote repository: partial I/O, parsing of formats, manipulation of data structures • Authentication, format, storage • Micro-services • Encapsulate knowledge needed to interact with an external system or with a data set using the remote protocol • Data access, external workflows, semantics, messaging • Policies • Encapsulate knowledge needed for management functions • Federation control, administrative tasks, validation checks
Assertion • Three basic types of interoperability mechanisms are sufficient for assembling national data cyberinfrastructure • Example: Linked software defined networks to data grids • From an iRODS data grid, controlled the selection of three disjoint network paths for optimizing data transport by adding appropriate policy enforcement points and micro-services • Expect functionality currently in data grid middleware to migrate into network middleware
Future Architecture Clients Clients Virtual collection Data Grid Middleware Data Grid Middleware Virtual network Resources Network Middleware DFC Federation Resources GEMI - GENI
Contacts http://datafed.org http://irods.org Reagan W. Moore rwmoore@renci.org National Science Foundation Cooperative Agreement: OCI-0940841 DFC - CNI