1 / 29

Cyberinfrastructure for the 21 st Century (CIF21 ): Data MRI and STCI EarthCube CASC

Cyberinfrastructure for the 21 st Century (CIF21 ): Data MRI and STCI EarthCube CASC Sept 9, 2011 Rob Pennington Office of Cyberinfrastructure (OCI) National Science Foundation rpenning@nsf.gov. 1. Framing the Challenge: Science and Society Transformed by Data. Modern science

topper
Download Presentation

Cyberinfrastructure for the 21 st Century (CIF21 ): Data MRI and STCI EarthCube CASC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cyberinfrastructurefor the 21st Century (CIF21): Data MRI and STCI EarthCube CASC Sept 9, 2011 Rob Pennington Office of Cyberinfrastructure (OCI) National Science Foundation rpenning@nsf.gov 1

  2. Framing the Challenge:Science and Society Transformed by Data • Modern science • Data- and compute-intensive • Integrative, multiscale • Multi-disciplinary Collaborations for Complexity • Individuals, groups, teams, communities • Sea of Data • Age of Observation • Distributed, central repositories, sensor- driven, diverse, etc

  3. Advisory Committee for Cyberinfrastructure Task Force Reports • More than 25 workshops and Birds of a Feather sessions and more than 1300 people involved • Final recommendations presented to the NSF Advisory Committee on Cyberinfrastructure (ACCI)Dec 2010 • Final reports on-line at: http://www.nsf.gov/od/oci/taskforces/ Campus Bridging Data and Viz HPC HIGH P ERFORMANCE COMPUTING Grand Challenges Cyberlearning Software

  4. Data Task Force Recommendations Infrastructure: Recognize data infrastructure and services (including visualization) as essential long term research assets fundamental to today’s science Economic sustainability: Develop realistic cost models to underpin institutional/national business plans for research repositories/data services Culture Change: Emphasize expectations for data sharing; support the establishment of new citation models in which data and software tool providers and developers are recognized and credited with their contributions Data Management Guidelines: Identify and share best-practices for the critical areas of data management Ethics and IP: Train researchers in privacy-preserving data access

  5. Evolution of Cyberinfrastructure for the 21st Century (CIF21) and Data National Science Board (NSB) On-going input Science & Engineering Research + Cyberinfrastructure ACCI Data Task Force NSF CIF21 Data Programs DataNet Program Community Input

  6. Cyberinfrastructure Ecosystem (CIF21) Organizations Universities, schools Government labs, agencies Research and Medical Centers Libraries, Museums Virtual Organizations Communities Scientific Instruments Large Facilities, MREFCs,,telescopes Colliders, shake Tables Sensor Arrays - Ocean, environment, weather, buildings, climate. etc Expertise Research and Scholarship Education Learning and Workforce Development Interoperability and operations Cyberscience Discovery Collaboration Education Data Databases, Data repositories Collections and Libraries Data Access; storage, navigation management, mining tools, curation, privacy Computational Resources Supercomputers Clouds, Grids, Clusters Visualization Compute services Data Centers Networking Campus, national, international networks Research and experimental networks End-to-end throughput Cybersecurity Software Applications, middleware Software development and support Cybersecurity: access, authorization, authentication Maintainability, sustainability, and extensibility

  7. CIF21: Four Major Thrust Areas Organizations Universities, schools Government labs, agencies Research and Medical Centers Libraries, Museums Virtual Organizations Communities Community Research Networks Scientific Instruments Large Facilities, MREFCs,,telescopes Colliders, shake Tables Sensor Arrays - Ocean, environment, weather, buildings, climate. etc Expertise Research and Scholarship Education Learning and Workforce Development Interoperability and operations Cyberscience Data-Enabled Science Discovery Collaboration Education Education: integral and embedded Data Databases, Data repositories Collections and Libraries Data Access; storage, navigation management, mining tools, curation, privacy Computational Resources Supercomputers Clouds, Grids, Clusters Visualization Compute services Data Centers New Computational Resources Access and Connections to CI Resources Networking Campus, national, international networks Research and experimental networks End-to-end throughput Cybersecurity Software Applications, middleware Software development and support Cybersecurity: access, authorization, authentication

  8. Scientific Data Challenges Square Kilometer Array Climate, Environment Exa Bytes Peta Bytes Tera Bytes Giga Bytes Volume Genomics Bytes per day Useful Lifetime Climate, Environment TeraGrid, Blue Waters LHC LHC LSST DataNet Distribution Genomics Many smaller datasets… 2012 2020 Data Access

  9. CIF21 Data Goals • Support data intensive and multi-disciplinary science • Provide reliable digital access, integration, management and preservation capabilities for science and engineering data over a decades-long timeline • Develop innovative data analysis and mining tools to support data manipulation, modeling, and discovery • Engage at the frontiers of technological innovation and transformative science to drive the leading edge forward

  10. DataNet Role in CIF21 • DataNet is a strategic part of Foundation-wide investments in data in CIF21 • Focus on center–scale awards • DataNet efforts effectively balance: • Production infrastructure to provide operational services • Research to create next generation infrastructure • DataNet awards are partnerships • Responsive to user communities to define their meaningful and useful scope • Form a coordinated network to provide national, interdisciplinary data models and infrastructure

  11. DataNet: A Multi-tiered and Multi-Disciplinary Landscape Modeling and Simulation Communities Population, Climate, Environment Communities Data-enabled Science Genomics Communities Data Curation Data Storage DataNet supported

  12. Data Storage • National storage infrastructure for scientific data • Accommodatescale and heterogeneity of scientific data through robust, open, and broadly accepted standards • Sustainable cost model that can be implemented with governmental, academic, non profit, and commercial stakeholders such that it is sustainable. • Make strategic investments that: • Leverage existing resources in TeraGrid, commercial clouds, federal data centers • Meet growing capacity needs at optimum cost • Provide coordinating and integrative functions for integrity, access control, availability, persistence • Catalyze a national data infrastructure in a similar role that NSFNet played in Internet

  13. Data Curation • Sustainable, community-based networks for management of critical scientific data resources in a life-cycle context. • Overcome challenges of culture change, policy development and implementation, sustainable operations, quality and usability control. • Strategic awards that address heterogeneity in formats, complexity, semantics of data collections that are valued by science communities of significant breadth. • Operate as a network of data services that promote interoperability, multidisciplinarity, and scalability.

  14. Data Enabled Science • Provide critical tools and services for data mining, integration, analysis, modeling and visualization. • Overcome barriers to scaling, synthesis, and interoperability to promote effective use of large scale, shared data resources. • Strategic investments that concentrate tools, resources and expertise in support of compelling grand challenge science questions.

  15. Cross Cutting Challenges • Balancing research into next generations of infrastructure with operation & maintenance of current capacity. • Stimulate innovation and manage transitions • Sustainable, long term programs • Technical design, development of business models, and integration with the research cycle. • Integration • Vertical – Linking low-level bit storage infrastructure to data collections, and finally to applications • Horizontal– Achieving connectivity and interoperability between activities that vary in scale, disciplinarity, and funding source.

  16. DataNet Program Management • Life cycle perspective covering the use of the data • Research, development, implementation, operations, sustainability, close-out • Apply project management methods • WBS, risk management, change control, schedule, milestones, deliverables • Standardized process: • Evaluate science merit, conceptual design • Develop draft PEP, design and reporting metrics. • Critical review – prototype, finalize baseline (approval/mid-course correction/off-ramp) • Implementation & operations – subject to change control, oversight based on milestones & metrics • Final operational review – informs decision for renewal, termination.

  17. DataNet Federation ConsortiumData Driven Science • Implement national data grid • Federate existing discipline-specific data management systems to enable national research collaborations • Enable collaborative research on shared data collections • Manage collection life cycle as the user community broadens • Integrate “live” research data into education initiatives • Enable student research participation through control policies Project Shared Collection Processing Pipeline Digital Library Science and Engineering Initiatives: Ocean Observatories Initiative the iPlantCollaborative CUAHSI CIBER-U OdumSocial Science Institute Temporal Dynamics of Learning Center Reference Collection Cyber-infrastructure Partners: Univ. of North Carolina, Chapel Hill Univ. of California, San Diego Arizona State University Drexel University Duke University University of Arizona University of South Carolina Federation Collection Life Cycle Policy-based data management National Science Foundation Cooperative Agreement: OCI-0940841

  18. MRI 2011 • CUNY SI: Instrumentation for Enabling Data Analysis, Sharing, Storage, and Preservation • UC Boulder: Acquisition of a Scalable Petascale Storage Infrastructure for Data-Collections and Data-Intensive Discovery • RPI: Acquisition of a Balanced Environment for Simulation • NCA&T: Acquisition of a Complete High-Performance Modeling and Visualization System for Research in Mathematical Biology and Mathematical Geosciences • OSU: Acquisition of a High Performance Compute Cluster for Multidisciplinary Research

  19. What is EarthCube?

  20. A Call to Action Over the next decade, the geosciences community commits to developing a framework to understand and predict responses of the Earth as a system—from the space-atmosphere boundary to the Earth’s core, including the influences of humans and ecosystems Transitions and Tipping Points in Complex Environmental Systems, NSF AC for Environmental Research and Education, 2009 Earth Science and Applications from Space: National Imperatives for the Next Decade and Beyond, 2007 High-Performance Computing Requirements for the Computational Solid Earth Sciences, 2005

  21. Goal of EarthCube To transform the conduct of research in geosciences by supporting community-based cyberinfrastructure to integrate data and information for knowledge management across the Geosciences.

  22. What Needs To Be Done? • Integrate data, tools and communities through cyberinfrastructure • Establish a governance mechanism that is inclusive and adopted by the community • Utilize current and emerging technologies to create transparent infrastructure for the geosciences community

  23. Convergence to a Unifying Architecture Modes of Support Well-Connected through EarthCube Loosely or Not Connected

  24. EARTHCUBE ASSUMPTIONS • The geosciences community is ready to take on the EarthCube challenge • Community will start self-organizing prior to EarthCube activities, like the Nov 1-4 Charrette • Current and emerging technology will help achieve the convergence envisioned for EarthCube • A broad range of expertise and resources must be engaged to shape EarthCube

  25. Proposed Framework Approaches Developed through EAGERs DCL Released Two WebEx events Sandpit/IdeasLab to determine 18 mo. prototype award(s) Charrette Jun 2011 Nov 1-4 2011 May 2012 Jul-Sept 2011 Nov/11-Apr/12

  26. EARTHCUBE TIMELINE • On-line Community Information: • August to November, 2011 • EarthCube Charrette: • Early November, 2011 • EarthCubeIdeas/Lab: • Tentatively Early May, 2012 • Prototype Development: • May to December 2013 • Fully integrated geosciences infrastructure: • 2014-2022

  27. Pre-Charrette Organization(August – September) • Second WebEx on Aug. 22 • NSF seeks input from wide range of sources • Individuals, inst./org., representatives of scientific groups or communities • Facilities and managers of CI endeavors • Industry, Federal Labs., Federal Agencies, and International Partners • NSF will establish on-line resources and forums to • Gather community inputs/requirements • Facilitate partnerships and collaborations • Encourage submission of approaches to the EarthCube design

  28. Charrette Process • Stakeholders focus EarthCube Ideas and Activities • Plenary Sessions to • discuss user requirements • refine approaches and designs for EarthCube • develop partnerships and new collaborations • Remote participation and real-time comments system will be available • Summary Session • Comments from NSF, facilitators, and participants on process • NSF provides guidance on post-Charrette activities

  29. Questions?

More Related