1 / 35

FAIR data and Open Science in Africa - the need for a holistic approach

This article explores the importance of adopting FAIR data principles and promoting open science in Africa. It discusses the benefits of data sharing, good data practices, and the significance of data skills in advancing research and decision-making. The text emphasizes the need for a collaborative approach to enhance data interoperability, foster innovation, and accelerate scientific discovery. It highlights key initiatives by CODATA, such as the development of data policies, data science education, and regional open science platforms.

brittneyj
Download Presentation

FAIR data and Open Science in Africa - the need for a holistic approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FAIR data and Open Science in Africa - the need for a holistic approach Simon Hodson ExecutiveDirector CODATA www.codata.org Science Ouverte au Sud: enjeux et perspectives Université Cheikh AntaDiop de Dakar, Sénégal 24 October 2019

  2. CODATA, the Committee on Data of the International Science Council • CODATA supports ISC’s mission of advancing science as a global public good: http://bit.ly/ISC-Science-Action-Plan • CODATA promotes global collaboration to make data work to improve science to support decision-making • CODATA develops policy recommendations to enhance data sharing and re-use, promotes good data practices, explores frontier issues in data science, and advances education and training in data skills. • CODATA membership: national committees, international scientific unions and associations, and organisations. • CODATA is a community of interest and expertise. We act strategically through ISC supported initiatives, through projects and Task Groups.

  3. Data Policies Data Science Data Skills Data Good Practices • CODATA Data Policy Committee http://bit.ly/data-policy-committee; • One major policy report per year. • 20-Year Review of GBIF currently underway. • New Centre of Excellence in Data for Society being set up at University of Arizona. • CODATA-RDA School of Research Data Science. • CODATA China, PASTD and other training activities. • #terms4FAIRskills and FAIRsFAIR Competence Centres. • Data Science Journal: https://datascience.codata.org/ • International Data Week and CODATA Conference series. • Task Groups and Working Groups. • Regional Open Science Platforms • Data Interoperability for Multi-Disciplinary Research. • Survey and recommendation of good practices.

  4. Open Science and FAIR Data • Good scientific practice depends on communicating the evidence. • Open research data are essential for reproducibility, self-correction. • Academic publishing has not kept up with age of digital data. • Danger of an replication / evidence / credibility gap. • Boulton: to fail to communicate the data that supports scientific assertions is malpractice • Open data practices have transformed certain areas of research. • Genomics and related biomedical sciences; crystallography; astronomy; areas of earth systems science; various disciplines using remote sensing data… • FAIR data helps use of data at scale, by machines, harnessing technological potential. • Research data often have considerable potential for reuse, reinterpretation, use in different studies. • Open data foster innovation and accelerate scientific discovery through reuse of data within and outside the academic system. • Research data produced by publicly funded research are a public asset.

  5. Open Science and FAIR Data • CODATA President, 2014-2018, Geoffrey Boulton, lead author of ‘Science as an Open Enterprise’. • Significant driver is good scientific practice: ‘to fail to communicate the data that supports scientific assertions is tantamount to malpractice’. • Data should be intelligently open – and intelligibly open. • CODATA General Assembly Nov 2018, elected new CODATA President, Barend Mons and a new Executive Committee. • One of the originators of the FAIR principles and of the GO FAIR initiative. • FAIR is a neat encapsulation of what is needed for data to be usable at scale.

  6. A World that Counts: mobilising the data revolution for sustainable development • A World that Counts - A World that Matters. • A world that counts and measures itself in order to understand itself. • Measurement of things is a sine qua non to understanding. • Qualitative understanding is essential. • The things we measure are interelated and often affected by the act of measuring. • Overemphasis on one or other metric can have very negative effects. • ‘[T]here is an urgent need to mobilise the data revolution for all people and the whole planet in order to monitor progress, hold governments accountable and foster sustainable development.’ • Need to upskill data providers and harness many other data sources. • Role of scientific community is essential. • Indicators and metrics are complicated, need to by viewed critically. They are interrelated and need to be examines as a system.

  7. FAIR

  8. (Mons, B., et al., The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data, http://dx.doi.org/10.1038/sdata.2016.18) Image CC-BY-SA by SangyaPundir

  9. FAIR GuidingPrinciples • To be Findable: • F1. (meta)data are assigned a globally unique and persistent identifier • F2. data are described with rich metadata (defined by R1 below) • F3. metadata clearly and explicitly include the identifier of the data it describes • F4. (meta)data are registered or indexed in a searchable resource • To be Accessible: • A1. (meta)data are retrievable by their identifier using a standardized communications protocol • A1.1 the protocol is open, free, and universally implementable • A1.2 the protocol allows for an authentication and authorization procedure, where necessary • A2. metadata are accessible, even when the data are no longer available • To be Interoperable: • I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. • I2. (meta)data use vocabularies that follow FAIR principles • I3. (meta)data include qualified references to other (meta)data • To be Reusable: • R1. meta(data) are richly described with a plurality of accurate and relevant attributes • R1.1. (meta)data are released with a clear and accessible data usage license • R1.2. (meta)data are associated with detailed provenance • R1.3. (meta)data meet domain-relevant community standards • (Mons, B., et al., The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data, http://dx.doi.org/10.1038/sdata.2016.18)

  10. Findable: have sufficiently rich metadata and a unique and persistent identifier, to enable discovery. • Accessible: retrievable by humans and machines through a standard protocol; authentication and authorization where necessary. • Allows programmatic access for analysis. • Interoperable: metadata use a ‘formal, accessible, shared, and broadly applicable language for knowledge representation’. • The descriptions of variables etc follow a shared specification and are commensurable. • Reusable: metadata provide rich and accurate information; clear usage license; detailed provenance. • Both humans and their analytical tools know what can be done with the data (license) and can assess its provenance. European Commission Expert Group, Chaired by Simon Hodson, Turning FAIR into Reality (2018) https://doi.org/10.2777/1524

  11. FAIR and Open • FAIR ≠ Open: FAIR and Open are distinct but complementary. • Drivers for FAIR: • not enough to put make data Open, dump it raw onto the Web • important to have a dialogue with research areas in which much data cannot be Open • FAIR is useful because it applies as much to data that MUST be restricted as to data that can be Open • FAIR does NOT detract from Open • Research data should be as Open as possible, Open by default.

  12. FAIR Digital Objects and the FAIR Ecosystem

  13. Funder and Donor Policies • Bill and Melinda Gates Foundation, Open Access and Open Data Policy https://www.gatesfoundation.org/how-we-work/general-information/open-access-policy • ‘Data Underlying Published Research Results Will Be Accessible and Open Immediately. The foundation will require that data underlying the published research results be immediately accessible and open. This too is subject to the transition period and a 12-month embargo may be applied.’ • MSF Data Sharing Policy: http://www.msf.org/en/msf-data-sharing-policy • ‘MSF recognizes the ethical imperative it has to share its data openly, transparently and in a timely manner for the greater public health good.’ • Appropriate restrictions for consent, privacy, etc. • European Commission Data Policy: ‘as open as possible, as closed as necessary’, FAIR Data • Wellcome Trust: strong support for Open Data sharing, with appropriate restrictions.

  14. Dryad Joint Data Archiving Policy • Dryad (ecology and evolutionary biology journals) Joint Data Archiving Policy, 2011 • [Journal] requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive, such as [list of approved archives here]. Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. Authors may elect to have the data publicly available at time of publication, or, if the technology of the archive allows, may opt to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information such as human subject data or the location of endangered species.

  15. Journal Data Policies • PLOS Data Availability Policy, revised Feb 2014: http://www.plosone.org/static/policies.action#sharing • PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exceptions. • Springer Nature initiative to standardise policies: http://www.springernature.com/gp/group/data-policy/policy-types • RDA Interest Group developing standardised journal data policies. • See FAIRsharing https://fairsharing.org/ for policy listings.

  16. https://aasopenresearch.org/ AfricanAcademy of Sciences Publishing collaboration with F1000 – open peerreview and requirement to make data available.

  17. Big Data: Of course, the capacity to create huge volumes of extremely valuable data (Earth observation, sensor networks, environmental monitoring, human activity…) has huge scientific and societal potential. • Little Data: Data that are essential to understanding a given phenomenon can be very small (in volume) and require considerable stewardship and analysis. • NO DATA: The major challenge remains that for many issues we simply do not have the data we need!

  18. Slide Credit: Fernando Gouveia Reis and Laura Merson, IDDO Infectious Diseases Data Observatory, Oxford. IDDO collects and integrates clinical, laboratory and epidemiological data relating to a number of infectious diseases. Analysis of combined datasets increases the power to determine optimal treatments, identify the most effective intervention in outbreaks.

  19. Slide Credit: Laura Merson, IDDO West African Ebola Outbreak, 2014-2016.

  20. Slide Credit: Laura Merson, IDDO West African Ebola Outbreak, 2014-2016. Pisani et al. 2018 Estimate 65% of study data not available, not shared.

  21. Data aggregation is essential for research and action. • Barriers to data aggregation impede research and action. • 65% of data was not shared, made available (finding in E. Pisaniet al. Data sharing in public health emergencies. Wellcome Trust, 2018.) • Most data cannot be accessed directly at the record level (e.g. summarised in studies and not shared). • Most clinical records from the outbreak are pdf scans. • Lack of metadata (data / information about the data which allows the data to be discovered, aggregated, integrated). • Lack of a data dictionary (a set of definitions that allows the variables in the data to be understood). • Technically challenging to integrate and analyse trials data and clinical data; and other relevant data (e.g. genomic data, vector data, transport and environmental data etc). SlideCredit: Fernando Gouveia Reis and Laura Merson, IDDO

  22. Data that characterise many of the factors influencing the progression of an outbreak are available, but remain isolated in siloes within the various domain- specific communities, often with their own domain-specific formats, vocabularies and ontologies. • Availability of datasets from industry, the research community, national public health surveillance, climate and environmental monitoring systems, health systems administration, social media feeds, and animal health services will then be sought in order to understand how their integration can fill critical knowledge gaps across disciplines. Reports and lessons learned from previous infectious disease outbreaks have identified clinical, genomic, demographic, pathogen and vector surveillance, communications, land-use, health administration, and environmental data as powerful inputs to support planning and operationalising outbreak response. We can anticipate data in numerous formats such as tabular data in spreadsheets, CSV, TSV, and/or plain text, geospatial point-wise data, geographic data, and a variety of XML and JSON dialects. For the domains of interest, available ontologies will be sourced and compared to determine methods for integration and interchange. SlideCredit: Fernando Gouveia Reis and Laura Merson, IDDO

  23. Open Data in a Big Data World (2015) • Science International Accord on Open Data in a Big Data World: https://council.science/publications/open-data-in-a-big-data-world • Supported by four major international science organisations. • Profound transformations mean that data should be: • Open by default: as open as possible, as closed as necessary • Intelligently open, FAIR data • Makes a case for a global response: leaving no-one behind. • Lays out a framework of principles, responsibilities and enabling practices for how the vision of Open Data in a Big Data World can be achieved.

  24. Ignited at Science Forum SA 2015 SFSA: Hosted the first meeting of Science International (ICSU, ISSC, TWAS, IAP) which led to the publication of an international accord on open data in a big data world 2017 SFSA: Reported on progress in building an AOSP community of practice and challenges 2016 SFSA: Launched the Pilot Phase of the AOSP as an outcome of the 2015 Science International meeting 2018 SFSA: Launch of the vision and strategy of the Operationalization of the AOSP

  25. The AOSP Vision African scientists are at the cutting edge of contemporary, data-intensive science as a fundamental resource for a modern society. They are innovative global exponents and advocates of Open Science, and leaders in addressing African and Global Challenges.

  26. African Open Science Platform • See http://www.codata.org/strategic-initiatives/african-open-science and http://africanopenscience.org.za/ • Three year pilot project funded by DST and NRF in South Africa, delivered by ASSAf, directed by CODATA. • Advocacy and community building. • Landscape survey of Open Science and data initiatives in Africa. • Framework documents on: data policies, incentives, training, RDM in institutions and data infrastructure. • Vision and Strategy Document: https://doi.org/10.5281/zenodo.2222418 • Federated network and cloud facilities. • Open science, FAIR and RDM tools. • Data Science and AI Institute. • Inter-disciplinary global challenge projects • Community network for education and skills in FAIR data and Open Science. • Network for societal engagement and participation. • Delivery Phase Planning Workshop – First Members’ Meeting, Alexandria, Egypt, 2-3 September • Stakeholders: international donors, grant funding councils, academies, NRENs, pan-African research projects, data initiatives, university consortia… • Moving to phase two…

  27. Ecosystem for FAIR Data • Ecosystem of components which are created and governed internationally. • Reporting Research Outputs: information systems for research output reporting (CRIS), metadata standards e.g. CERIF, managed by euroCRIS. • Persistent and Unique Identifiers: DOIs for articles (CrossRef); DOIs for data sets (DataCite); author IDs (ORCID). • Data and Metadata Standards: CIF in crystallography, FITS in astronomy, DDI in social science surveys, Darwin Core in biodiversity, etc, etc. • DCC Registry of Metadata Standards http://www.dcc.ac.uk/resources/metadata-standards ; now maintained by RDA IG http://rd-alliance.github.io/metadata-directory/ • Data Repositories: listed in Re3Data, registry of data repositories: https://www.re3data.org/ • Trusted Data Repositories: Core Trust Seal https://www.coretrustseal.org/, a merger of Data Seal of Approval and the World Data System criteria.

  28. Delivery Phase Planning Workshop • September 2019 • Meeting of stakeholders, funders. • Particularly important is the Granting Councils Initiative – route to sustainability. • AOSP strategy implementation towards the platform launch in 2020-1 • Hosted by Dr Ismail Serageldin, Emeritus Librarian, and member of the Board of Trustees of the Library of Alexandria; AOSP Co-Chair

  29. Regional Open Science Platforms… • Another strand of the draft ISC Science Action Plan. • Exploring similar initiatives in Asia Pacific Region and in Latin America. • Mission for the ISC Regional Offices. • Malaysian Open Science Platform: Malaysian Academy working with the five research • ISC ROAP will lead a APEC (Asia Pacific Economic Cooperation) project to survey and scope the Open Science landscape. • Exploring options for ASEAN or wider regional cooperation around Open Science.

  30. CODATA Conferences • VizAfrica 2019 Botswana: https://vizafrica.codata.org/2019-Botswana/ and http://bit.ly/VizAfrica-Botswana • University of Botswana, Gaborone, 18-19 November • Registration still open: https://vizafrica.codata.org/conference/2019-Botswana/register/ • Call for Applications to Host International Data Week 2021 or 2023, deadline: http://www.codata.org/news/338/62/Call-for-Applications-to-Host-International-Data-Week-2021-or-2023 • Next CODATA Conference, Oct-Dec 2020: TBA

  31. Data Skills and Training • Regular Beijing Data Science Training Workshops (last one Sept 2019) • FAIRsFAIR and GO Train activities: adding data stewardship and train-the-trainer components. • Initiative to develop a community recognised terminology for FAIR data competencies: #terms4FAIRskills • See: http://www.codata.org/fair-data-training and https://terms4fairskills.github.io/Announcement.html • CODATA-RDA Schools of Research Data Science: http://bit.ly/CODATA-RDA-data_schools • Film: https://vimeo.com/299263596 • Deadline for Pretoria, 31 October: http://bit.ly/data-school-pretoria • 2020: Pretoria, Abuja, Trieste… • 2019: Addis, Trieste, Trieste Advanced Workshops, Costa Rica. • 2018: Brisbane, Trieste, Trieste Advanced Workshops, Kigali, São Paulo • 2017: Trieste, Trieste Advanced Workshops, São Paulo • 2016: Trieste

  32. CODATA Early Career Data Scientists and Stewards • Aim to re-create an active group of Early Career Data Scientists and Stewards. • Alumni of Data Schools and Training Workshops. • Students > Helpers > Instructors > Directors… • Alumni Sara El Jadid and Marcela Alfaro now co-chairs of the Data Schools. • Felix EmekaAnyiam and others run the careers list. • ShailyGhandi has been a helper at Trieste • Shaily and Felix organised a school on urban data science https://sws.cept.ac.in/course-detail/urban-data-science-S19FT001 • Small grants to assist participation, running events, managing the careers list etc. • Sign up to CODATA lists for more information.

  33. Follow CODATA! • CODATA Website: http://www.codata.org/ • CODATA Blog: http://codata.org/blog/ • CODATA International News and Discussion List: http://bit.ly/CODATA-International-List • CODATA Data Science and Data Stewardship Careers List: http://bit.ly/CODATA_Careers_List • CODATA on Twitter: @CODATANews and @simonhodson99

  34. Thank you for your attentionSimon Hodson, CODATAwww.codata.orgsimon@codata.org@simonhodson99 ; @CODATAnews

More Related