190 likes | 302 Views
The NSF Cyberinfrastructure for the 21 st Century Program CIF21. Rob Pennington Program Directo r Office of Cyberinfrastructure National Science Foundation. The Shift Towards a “Sea of Data” Implications.
E N D
The NSF Cyberinfrastructure for the 21st Century Program CIF21 Rob Pennington Program Director Office of Cyberinfrastructure National Science Foundation
The Shift Towards a “Sea of Data”Implications How do we attribute credit for this new publication form? How are data peer reviewed? What is a publication in the modern data-rich world? Fundamental questions become focused around data: How to remove boundaries? How to incentivize sharing? • All science is becoming data-dominated • Experiment, computation, theory • Fourth paradigm • Classes of data • Collections, observations, experiments, simulations • Software • Publications • Totally new methodologies • Algorithms, mathematics, culture • Data become the medium for • Multidisciplinarity, communication, publication…science
Scientific Data Challenges Square Kilometer Array Climate, Environment Exa Bytes Peta Bytes Tera Bytes Giga Bytes Volume Genomics Bytes per day Useful Lifetime Climate, Environment TeraGrid, Blue Waters LHC LHC LSST DataNet Distribution Genomics Many smaller datasets… 2012 2020 Data Access
CIF21 and Transforming Research Science, innovation, discovery, economic competitiveness EarthCube, Understanding the Phenome, Clean Energy,Climate prediction, Social networking, Complex networks, Health records, cybersecurity, Matter-by-design, disaster recovery, etc Grand Challenges Multi-disciplinary & multi-scale integration CIF21 Software Expertise, research Compute, Modeling Analytic Tools Communities Networks Sea of Data
NSFCIF21 Major Areas Organizations Universities, schools Government labs, agencies Research and Medical Centers Libraries, Museums Virtual Organizations Communities Scientific Instruments Large Facilities, MREFCs,,telescopes Colliders, shake Tables Sensor Arrays - Ocean, environment, weather, buildings, climate. etc Expertise Research and Scholarship Education Learning and Workforce Development Interoperability and operations Cyberscience Discovery Collaboration Education Data Databases, Data repositories Collections and Libraries Data Access; storage, navigation management, mining tools, curation, privacy Computational Resources Supercomputers Clouds, Grids, Clusters Visualization Compute services Data Centers Networking Campus, national, international networks Research and experimental networks End-to-end throughput Cybersecurity Data Infrastructure Program Advanced Computational Infrastructure Software Applications, middleware Software development and support Cybersecurity: access, authorization, authentication
Broad Principles to Lead CIF21 • Builds national infrastructure for S&E • Leverages common methods, approaches, and applications – focus on interoperability • Catalyzes other CI investments across NSF • Provides focus and is a vehicle for coordinating efforts and programs • Based upon a shared governance model involving all parts of NSF • Managed as a coherent program by OCI • Spiral development methodology
Evolution of CIF21 and NSF Data Programs On-going input NSB Science & Engineering Research + Cyberinfrastructure ACCI Task Force NSF CIF21 Data Programs DataNet Awards Community Input
Data Related Context • National Science and Technology Council (NSTC) • http://www.whitehouse.gov/blog/2012/01/30/your-comments-access-federally-funded-scientific-research-results • Networking and Information Technology Research and Development (NITRD) • http://www.nitrd.gov/subcommittee/bigdata.aspx • National Science Board Data Policies Task Force • http://www.nsf.gov/nsb/committees/tskforce_dp.jsp • Advisory Committee for Cyberinfrastructure (ACCI) • www.nsf.gov/od/oci/taskforces/
NSTC RFIs for Public Comment - Context • Two Requests for Information (RFIs) – Nov 2011 • Public Access to Digital Data Resulting from Federally Funded Scientific Research • Preservation, Discovery and Access • Standards for Interoperability, Re-Use and Re-Purposing • RFI for Scholarly Publications • http://www.whitehouse.gov/blog/2011/11/07/request-information-public-access-digital-data-and-scientific-publications • Comment period closed on 12 Jan 2012 • Digital Data: 118 responses • Scholarly Publications: 377 responses • Individual and institutional responses
NSB Data Policy Task Force - Context • Dec 2011: NSB 11-79 Recommendations • http://www.nsf.gov/nsb/publications/2011/nsb1124.pdf • #1: Provide leadership … in the development and implementation of digital research data policies ... • #2: … require grantees to make both the data and the methods and techniques used in the creation and analysis of the data accessible … Data should be shared using persistent electronic identifiers … • #3: Continue to expand the support of computational and data-enabled science and engineering … • #4: Convene a panel .. to explore and develop a range of viable long-term business models… • #5: Further the expansion of sustainable data management, including preservation and curation of pre-existing and newly generated long-lived data …
NSF Advisory Committee for Cyberinfrastructure (ACCI) Task Force - Context Campus Bridging • Grand Challenges, HPC, Data/Viz, Software, Campus Bridging, Cyberlearning • More than 25 workshops and Birds of a Feather sessions and more than 1300 people involved • Final reports:http://www.nsf.gov/od/oci/taskforces/ Data and Viz HPC HIGH P ERFORMANCE COMPUTING Grand Challenges Cyberlearning Software
ACCI Data Task Force Recommendations • Recognizedata infrastructure and services as essential research assets fundamental to today’s science and as long-term investments in national prosperity • Create new citation models in which data and software tool providers are credited with their data contributions • Develop and publish realistic cost models to underpin institutional/national business plans for research repositories/data services • Identify and share best-practices for the critical areas of data management
CIF21 and Data Enabled Science • Provide critical tools and services for data mining, integration, analysis, modeling and visualization. • Overcome barriers to scaling, synthesis, and interoperability to promote effective use of large scale, shared data resources. • Strategic investments that concentrate tools, resources and expertise in support of compelling grand challenge science questions.
Data Infrastructure: A Multi-tiered and Multi-Disciplinary Landscape Modeling and Simulation Communities Population, Climate, Environment Communities Data-enabled Science Observational Communities Data Content Data Storage DataNet supported
CIF21: Data-Enabled Science • Data-intensive Science Program (knowledge) • Intensive disciplinary efforts, multi-disciplinary discovery and innovation • Data Analysis and Tools Program (information) • Data mining, manipulation, modeling, visualization, decision-making systems • Data Services Program (data) • Provide reliable digital preservation, access, integration, and analysis capabilities for science and/or engineering data over a decades-long timeline Dumped On by Data: Scientists Say a Deluge Is Drowning Research
Data Curation • Sustainable, community-based networks for management of critical scientific data resources in a life-cycle context. • Overcome challenges of culture change, policy development and implementation, sustainable operations, quality and usability control. • Strategic awards that address heterogeneity in formats, complexity, semantics of data collections that are valued by science communities of significant breadth. • Operate as a network of data services that promote interoperability, multidisciplinarity, and scalability.
Data Storage • National storage infrastructure for scientific data • Accommodate scale and heterogeneity through robust, open, and broadly accepted standards • Business model implemented with governmental, academic, non profit, and commercial stakeholders • Make strategic investments that: • Leverage existing resources in XSEDE, commercial clouds, federal data centers • Meet growing capacity needs at optimum cost • Provide coordinating and integrative functions for integrity, access control, availability, persistence • Catalyze a national data infrastructure
Cross Cutting Challenges • Balancing Research into Next Generation infrastructure with operation & maintenance of current capacity • Sustainability through technical design, development of business models, and integration with the research cycle • Integration • Vertical – Linking low-level bit storage infrastructure to data collections, and to applications • Horizontal– Achieving connectivity and interoperability between activities that vary in scale, disciplinarity, and funding source
Summary • CIF21 is focused on effective ways to approach and respond to the challenges • Critical concepts and goals • Realistic and innovative • Spiral process with strong, on-going feedback • Structure for longevity • Scalable open inclusive governance • Long term business models • International collaborations and programs