370 likes | 502 Views
Sharing of social science and humanities data. Professor Denise Lievesley Head of School of Social Science and Public Policy, King’s College London and Chair, European Statistical Advisory Committee. Principles Policy Practice Partnerships. Principles.
E N D
Sharing of social science and humanities data Professor Denise Lievesley Head of School of Social Science and Public Policy, King’s College London and Chair, European Statistical Advisory Committee
Principles • Policy • Practice • Partnerships
Principles • Scientific principle – research findings together with the data should be available for others to refute, confirm, clarify, or extend the results – part of public accountability • Responsibility to funders and to society to use resources efficiently (data are often under-exploited) • Important to reduce response burden • Increasing international responsibilities
Scientific paradigm • Many codes of professional conduct espouse these principles eg • The International Statistical Institute’s declaration on professional ethics states that “A principle of all scientific work is that it should be open to scrutiny, assessment and possible validation by fellow scientists.”
Two important publications • Fienberg S., Martin and Straf (1985) ‘Sharing research data’ National Academy Press • Arzberger P., Schroeder, Beaulieu, Bowker, Casey, Laaksonen, Moorman, Uhlir, Wouters (2004) ‘Promoting Access to Public Research Data for Scientific, Economic, and Social Development’Data Science Journal
“Publicly funded research data are a public good, produced in the public interest. As such they should remain in the public realm. Availability should be restricted only by legitimate considerations of national security restrictions; protection of confidentiality and privacy; intellectual property rights; and time-limited exclusive use by principal investigators.”
“In recent years, the debate on e-science has tended to focus on the “open access” to the digital output of scientific research, namely, the results of research published by researchers as the articles in the scientific journals. This focus on publications often overshadows the issues of access to the input of research - the research data, the raw material at the heart of the scientific process and the object of significant annual public investments. In terms of access, availability of research data generally poses more serious problems than access to publications.” Arzberger et al (2004)
Reduction of response burden • Compliance costs important especially in small countries and in surveys of elites, businesses, institutions • Fresh data collection takes time and resources • Secondary data analysis can take place in resource–constrained (including a time-constrained) environment
conclusion • Deliberate replication is to be encouraged • Duplication in ignorance of previous research is to be abhored • There is growing awareness that failure to exploit the full potential of data has costs for society and many institutions and agencies now espouse the aim of ensuring that data are used as extensively as possible.
Importance of establishing policies on data access, sharing and preservation by • funding agencies • universities or university consortia • professional societies • data producers Policies need an implementation plan which must pay attention to the sticks and carrots and to the means of achieving the plan
Example policy – UK Economic and Social Research Council • limits new data collection • encourages secondary analysis • requires deposit of new data and derived data in UK data archive • determines the date for deposit • sets standards for documentation • provides resources for data access and preservation • builds data commons • funds data use workshops.
Barriers to data access • legal obstacles especially with respect to confidentiality, commitments to respondents • technical and financial obstacles including in-house capacity to handle the complex aspects of micro-data dissemination such as data anonymization • political obstacles • psychological obstacles: the tendency to control access perhaps because of concerns over its mis-interpretation or because ‘data is power’
Incentives in academic system • In 1985 the report of the US committee of national statistics pointed out that ‘A scientist is recognised and rewarded through the scientific community and its institutions. Researchers will have greater incentives to share data if the community and its institutions foster the idea that the practice advances science and is part of what is recognised as necessary and proper scientific behaviour”. • Competition, performance targets, etc
Policies must pay attention to the responsibilities of data users • acknowledge and give credit • respect conditions of access • use data responsibly • provide feedback on use Value and role of data intermediaries
Benefits to universities of sharing data • Development of knowledge • Encourage greater exploitation of data and therefore greater impact • Contribute to sound policy decisions • Foster multiple perspectives on data • Facilitate comparative research • Create knowledgeable data community • Provide feedback on data and improve data quality • Improve citations and competitiveness • Improve quality and relevance of teaching
Putting the plan into practice • Promotion of the plan • Clear guidance for data producers • Resources • for providing access • for preservation
Access – one size doesn’t fit all • Needs of users/usages differ • especially in relation to their sophistication and the need for individual level data • Data sets vary especially in relation to sensitivity of content and possibility of disclosure • Particular challenges are posed by • Integrated, longitudinal data • Qualitative data • Administrative data • Cross-national data
Shared resources? • Centralisation v. disseminated model • Specialised services v. generic • Delivering data remotely v. ‘safe havens’
Partnership - with data intermediaries • for both technical work and advocacy partnership across the data archiving, data librarian, statistical and research communities is to be encouraged • Preservation • Metadata and documentation • Providing access • Keeping records • Running user training
Preservation is essential • Having collected data at some cost to the taxpayer, it behoves us to manage them well. • Alongside dissemination, this entails data preservation. • Due to poor data management, human error as well as technical change and inadequate use of technology, many data sets are no longer readable. • Thus all that remains of this important legacy are the, often quite superficial, reports or papers that were produced at the time. • To this extent an important part of our heritage is lost and we are severely limited in our analysis of change.
Long term preservation of electronic material is not a straightforward task especially with data sets which have embedded software • It can be hard to persuade financial authorities to spend money on the preservation of data for historians and researchers of the future, when there are so many pressing problems today.
Partnerships- with government data agencies • to broaden data use and reuse; • to foster diversity and deepen the quality of data analysis thereby extracting more information from the data; • to add value to data by bringing subject-matter knowledge to data analysis; • to improve data quality (Data analysts can and often do detect errors in data and when they provide feedback to statistical agencies, this can lead to improvements in future data collection.)
Such agencies aim to graduate from being data producers to generators of information and knowledge • attention to data collection at expense of generation of information and knowledge • collection costly and difficult • importance of quality of data • mountains of data – insufficiently processed and analysed • most people not adept at understanding data • important for government agencies to get involved in interpretation and use of information
It is the responsibility of official agencies to ensure that the widest possible use is made of data; consistent of course with the legal constraints and ethical undertakings. • Partnership with Universities is a key way of enabling them to deliver on this responsibility.
Case study– building the secondary uses services National Health Service in Englandindividual patient care records • Conducting audits of clinical practice; • Surveillance of infectious diseases • Management of the health system • Monitor equity of access and provision; • Evidence-based health policy • Providing better information to the general public • Improving the quality and safety of care
Aim of SUS – to promote the widest possible informed use of the data whilst maintaining trust in the system • Hierarchy of data access consistent with ensuring lowest risk of patient identification • Need to know • Role of honest brokers and safe havens • Development of ‘virtual’ safe havens
Information governance of Secondary Uses Service • aggregate data widely available • default anonymised • - or pseudonymised • if identifiers needed consent should be obtained • full justification in terms of benefits to be made for exceptions • exceptions assessed by transparent, equitable, replicable and open process involving patients representatives • requirement for safety and security of information (ie accountability)
Partnerships –internationally • Data archives • Cochrane collaboration • Campbell collaboration • National Library of Health • Communities of practice Principle of reciprocity
Results of a meta-analysis • Collation of the results of many studies contradict this advice • Extract from publicity prepared for the UK ‘Reduce the Risk’ Campaign (early 1990s) “The risk of cot death is reduced if babies are not put on the tummy to sleep. Place your baby on the back to sleep. ….Healthy babies placed on their backs are not more likely to choke.”
Iain Chalmers • “No doubt like millions of his other readers, I passed on and acted on this apparently rational and authoritative advice.” • “We now know that the advice promulgated so successfully in Spock's book led to thousands, if not tens of thousands, of avoidable cot deaths.” (Letter to BMJ)
Communities of practice • International social survey programme • CROP - the Comparative Research Programme on Poverty whose major aim is to produce sound and reliable knowledge, which can serve as a basis for poverty reduction • RENCORE - encourage and enhance comparative empirical research of individual, national and institutional level data from the states of western, central and eastern Europe • Cleveland conference on education research • African Programme on Rethinking Development Economics
Concluding remarks Social scientists and humanities researchers are involved in the creation of a diverse range of datasets, many of which are unique, rich in information content and incapable of replication. Sharing allows scientists to extend the value of these datasets through new, high quality, ethical research and exploitation. It also reduces unnecessary duplication of data collection. Building preservation systematically into routine data management is part of good research practice: it strengthens quality, enables replication and audit, and provides a sound basis for data sharing.
Research data grow in value the more they are used, unlike most commodities which are diminished with use.