240 likes | 396 Views
National and International Efforts in Research Data Access and Sharing. Dr. Francine Berman Chair, Research Data Alliance / US Edward P. Hamilton Distinguished Professor in Computer Science, RPI. Research Data Driving Solutions to Complex Scientific and Societal Challenges.
E N D
National and International Efforts in Research Data Access and Sharing Dr. Francine Berman Chair, Research Data Alliance / US Edward P. Hamilton Distinguished Professor in Computer Science, RPI
Research Data Driving Solutions to Complex Scientific and Societal Challenges How can we increase wheat yields? Image: Lucas Taylor How can we best address energy needs and sustain the environment? How accurate is the Standard Model of Physics? Image: Ceinturion, Wikipedia Who is most at risk to contract asthma?
Data Infrastructure Needed to Explore Solutions Research Dissemination and Reproducibility Data Use and Re-use Data Access (now) and Preservation (later) Data Discovery and Data Sharing Data discoverability tools Data access via portals, science gateways, etc. Database and data collection systems Data services to support use and re-use Data analysis algorithms Data-driven models and simulations Data visualization tools Semantic frameworks Data management systems Data storage … Fran Berman
Social, Organizational, and Human Infrastructure Equally Important Social and Organizational Infrastructure Data-focused Curriculum and Training Human Infrastructure / Workforce Common Standards Community Practice Policy Sustainable Economics Data Scientists McKinsey Global Institute 2011 Report, Traffic Image: Mike Gonzalez
Today’s Presentation: Emerging Efforts in the Development of Effective Research Data Infrastructure Global Data Infrastructure How do we accelerate open access data sharing and exchange? National Data Infrastructure How do we support stewardship and preservation of publicly accessible research data?
Data-Sharing Driving Discovery Across Sectors and Communities
World-wide Efforts Focusing on Infrastructure to Support Research Data Sharing, Access, Use Science, Humanities, Arts Communities Libraries, Archives, Repositories, Museums E-Infrastructure professionals, data analysts, data center staff, … Data Scientists
Research Data Alliance Created to Accelerate Development of Research Data Sharing Infrastructure Worldwide • RDA is an emerging, global community-driven organization created to accelerate the development of research data-sharing infrastructure world-wide. • RDA community efforts focus on building social, organizational and technical infrastructure to • reduce barriers to data sharing and exchange • accelerate the development of coordinated global data infrastructure
RDA Approach: CREATE ADOPT USE RDA Members come together as Working Groups – 12-18 month efforts to build, adopt, and use specific pieces of infrastructure Interest Groups – longer-lived discussion forums that spawn Working Groups as specific pieces of needed infrastructure are identified. Working Group efforts focus on the development and use of data sharing infrastructure Code, policy, infrastructure, standards, or best practices that are adopted and usedby communities to enable data sharing “Harvestable” efforts for which 12-18 months of work can eliminate a roadblock Efforts that have substantive applicabilityto groups within the data community, but may not apply to everyone Efforts for which working scientists and researchers can start today
The RDA Community Today: Over 1600 members from 70+ countries (as of 15/3/14) Africa 2% SouthAmerica 1% Map courtesy traveltip.org Asia 4% Austral-pacific 4%
RDA Plenary 1 / Launch Gothenburg, Sweden Community Growth RDA Plenary 3 Dublin, Ireland First Organizational Assembly 6 co-located events 14 BOF, 12 Working Groups, 22 Interest Groups 497 participants First “neutral space” community meeting (Data Citation Summit) First Org. Partner Meet-up First BOFs 380 participants from 22 countries RDA Plenary 2 Washington, DC First Working Groups and Interest Groups 240 participants Global Data Planning Meeting: October 2012 Amsterdam RDA Second Plenary September 2013 RDA Third Plenary March 2014 RDA Launch / First Plenary March 2013 RDA Fourth Plenary September 2014 First RDA organizational telecon: August 2012 First Working Group exchange meeting
RDA Interest (IG) and Working Groups (WG) by Focus (as of 15/3/14) Community Needs - focused • Community Capability Model IG • Engagement IG • Clouds in Developing Countries IG Reference and Sharing - focused • Data Citation IG • Data Categories and Codes WG • Legal Interoperability IG Data Stewardship - focused • Research Data Provenance IG • Certification of Digital Repositories IG • Preservation e-infrastructure • Long-tail of Research Data IG • Publishing Data IG • Domain Repositories IG • Global Registry of Trusted Data Repositories and Services IG Base Infrastructure - focused • Data Foundations and Terminology WG • Metadata Standards WG • Practical Policy WG • PID Information Types WG • Data Type Registries WG • Metadata IG • Big Data Analytics IG • Data Brokering IG Domain Science - focused Toxicogenomics Interoperability IG Structural Biology IG Biodiversity Data Integration IG Agricultural Data Interoperability IG Digital History and Ethnography IG Defining Urban Data Exchange for Science IG Marine Data Harmonization IG Materials Data Management IG
First RDA Infrastructure Deliverables coming this Fall Data Type Registries WG • Deliverables: System of data type registries, formal model for describing types, working model of a registry. • Initial Adopters and Users: CNRI, International DOI Foundation, Deep Carbon Observatory Practical Code Policies • Deliverables: Survey of policies in production use, testbed of machine actionable policies, deployment of 5 policy sets, policy starter kits • Initial Adopters and Users: RENCI, DataNet Federation Consortium, CESNET, Odum Institute, EUDAT Persistent Identifier Information Types • Deliverables: Minimal set of PID types, API • Initial Adopters and Users: Data Conservancy, DKRZ Language Codes • Deliverables: Operationalization of ISO language categories for repositories. • Initial Adopters and Users: Language Archive, Paradisec Data Foundations and Terminology • Deliverables: Common vocabulary for data terms, formal definitions and open registry for data terms • Initial Adopters and Users: EUDAT, DKRZ, Deep Carbon Observatory, CLARIN, EPOS Metadata Standards • Deliverables: Use cases and prototype directory of current metadata standards starting from DCC directory • Initial Adopters and Users: JISC, DataOne
RDA/US: Collaborate Globally, Contribute Locally • NSF-supported RDA/US initiatives: • Outreach (RDA RDA/US) • RDA Deliverables Amplification • Student / Early Career Engagement • RDA/US Steering Committee • Fran Berman, RPI • Larry Lannom, CNRI • Mark Parsons, RPI • Beth Plale, IU RDA/US Goals: • Contribute to RDA “international” efforts and leadership • Bring US efforts to broader RDA community • Build the RDA community within the US • Leverage and implement RDA deliverables in the US to amplify impact • Collaborate closely with other RDA “regions” on key programs and initiatives
RDA/US Opportunities for Students and Early Career Professionals • RDA/US Interns • $5K for summer of work/mentorship with RDA Interest or Working Group • Interns attend Fall Plenary ($2500 participant support) and present a poster on their project • Interns attend a kick-off meeting at the beginning of the summer. • RDA/US Fellows • Fellows engage with an RDA WG/IG and attend 3 Plenaries ($2.5K per Plenary participant costs) • First Plenary: Identify a group to work with • Second and Third Plenaries: Present interim and final progress on common efforts
Sustainable Stewardship to Support Data-Driven Innovation Global Data Infrastructure How do we accelerate open access data sharing and exchange? National Data Infrastructure How do we support stewardship and preservation of publicly accessible research data?
Increasing R&D Agency Requirements for Data Access and ManagementResearch Data Infrastructure particularly important
Publicly Accessible Data has to Live SomewherePublic Access, Use, and Re-Use of Data Now and in the Future Presupposes Sustainable Stewardship Today • Stewardship and Preservation are critical: “Homeless” data ceases to exist • Economically sustainable data infrastructure necessary to support • Federally mandated data management plans • Public access to research data • Use and re-use • Reproducibility • The “bigger”, more long-term, more complex, or more valuable the data is, the greater the importance of sustainable data stewardship and infrastructure
It’s Not Just “Big Data” and It’s Not Just the Cost of Storage.Data Management, Stewardship, and Use Incur Continuing Infrastructure Costs • Costs include • Maintenance and upkeep • Software tools and packages • Utilities (power, cooling) • Space • Networking • Security and failover systems • People (expertise, help, infrastructure management, development) • Training, documentation • Monitoring, auditing • Reporting costs • Costs of compliance with regulation, etc. Resources and Resource Refresh • Most valuable data replicated • As research collections increase, storage capacity must stay ahead of demand SDSC Data Storage Growth ‘97-’09 Information courtesy of Richard Moore, SDSC
Economics of Public Access: Who Pays the Data Bill? Article: Science Magazine, August 9, 2013. Free public access link at http:/www.cs.rpi.edu/~bermaf/
Op-Ed Recommendations: Partner Across Sectors to Distribute the Preservation and Stewardship Responsibilities Charleston Ballet blog: http://allianceblog.org/tag/charleston-ballet/; iTunes gift card
Value Proposition:Why Data Infrastructure Is Important The Research landscape is changing Data is accelerating new innovation and discovery Greater need for access, ease-of-use, interoperability of data Traditional modes of research recognition evolving: new approaches to collaboration / competition, publication, citation, analysis all involve digital data The Educational landscape is changing University curricula becoming more data-driven Increasing integration of on-line / on-site options supported by data infrastructure More digital monitoring, tracking, accountability needed; more policy and regulation involving digital data The Workforce is changing More data literacy required from everyone More data science embedded in everything Data scientists increasingly critical for competitiveness and leadership Image: CAIDA Internet visualization; Article: HBR October 2012
Your part: Things you can do on Monday morning Small steps: • If you don’t have one, create a data management plan for your current project for a reasonable fixed term of time • Make your data available to the community (as appropriate) by curating it and ingesting it into a publicly accessible repository • Cite and publish your data when you write about your results • Join the RDA and get involved in (or start) an Interest Group or Working Group that will help you develop needed data infrastructure.