290 likes | 353 Views
Research Databases for NRES. London 29 th Feb 2012. JHC roles. Research chair at UoN –epidemiology, risk prediction and drug safety Member of the ECC NIGB Developed and run the not-for-profit QResearch database with EMIS Inner city GP. Outline. Background Key ethical issues
E N D
Research Databases for NRES London 29th Feb 2012
JHC roles • Research chair at UoN –epidemiology, risk prediction and drug safety • Member of the ECC NIGB • Developed and run the not-for-profit QResearch database with EMIS • Inner city GP
Outline Background Key ethical issues Scientific Confidentiality Example of QResearch Data linkage and pseudonymisation Discussion /questions
Background • Large volumes of electronic data now collected in the NHS • Huge potential for useful research • Technology exists to extract data and assemble it into databases • Databases popular with academics and DH • Large numbers for studies • Relative efficiency • Increasing potential for data linkages
Definition research database in NRES SOP • “a structured collection of individual level personal information, which is stored for potential research purposes beyond tehh life of a specific research project with defined end points” • Includes databases set up for research • Re-use of databases established for • - audit • - disease registers
Research databases • Included in NRES SOPs • Specific section within IRAS form • Approvals generally for 5 years renewable • Can include generic approval • Can include providing data to third parties as part of a research service • Detailed protocol required on purpose, operation, methods, policies, governance
New research databases • What is the purpose? • Do we need a new one or can an existing database be used? • Who is will ‘own’ it and be responsible for it? • What data will it contain and how will it be accessed? • What is the governance framework ? • Will it contain identifiable data +/- consent? • ? S251 support required
Key objectives for safe data sharing Maximise public benefit Patient and their data Minimise risk Privacy Maintain public trust
Three main options for data access s251 Maximise public benefit consent Pseudo nymisation Patient and their data Minimise risk Privacy Maintain public trust
De-identification • Various methods to reduce identifiability of data • Pseudonymisation • Use of samples and limited data items rather than whole database • Conversion of dob to year of birth or age. • Contracts/data sharing agreements with clear liabilities and penalities
Example for QResearch • Established in 2002 to support ethical medical research • Largest of three UK databases & expanding • Management board – UoN and EMIS. • Advisory board – professional and lay representation. Advises on policy, strategy etc. • Scientific Board – review science and risk assessment.
QResearch key facts • Large pseudonymised database • >700 GP practices, 14 million patients • Patient and event level data • Demographics – year birth, sex, ethnicity • Diagnoses, Lab results, clinical values • Medication , referrals • No free text. No strong identifiers • All research peer reviewed & published.
QResearch uploads • informed consent from practice • Practice displays notice in waiting room • Practice activates upload software • Data pseudonymised BEFORE data leaves practice • Patients can be opted out of upload • Secure upload to server at EMIS with full NHS security clearance • Backups delivered to University
QResearch - security • Full database stored on off line server • Full encryption of hard drive • Key padded server room with limited access • 24 hour CCTV with monitoring • Confidentiality clauses in staff contracts • Full log of all data accesses • Log of all uses of data • No losses data or breaches in 10 years
QResearch policy • Whilst all data are pseudonymised, we have same safeguard as it identifiable • To minimise any risks of re-identification patients (and practices) • To maintain public and professional trust • Explicit policy to ensure all results of research studies are widely and freely available for public benefit.
Researcher access • University based academics • One must be GMC registered • Standard application form • Clarify research question and methods • Independent Scientific review • Provided with sample size and data items needed to answer question • Data only used for agreed purpose • Data destroyed after project completed
Why is it important to ensure robust scientific methods • Published research must give valid results which don’t mislead or misinform doctors, patients, policy makers • Equally need to avoid unpublished research – eg a good study with important results • Avoid duplication effort • Avoid publication bias • Avoid suppression of unpopular results (eg side effects medicines)
Ensuring scientific quality • Is there a clear research question? • Can the data answer the question? • Are the methods scientifically valid? • Are the results likely to be generalisable? • Does team have skills to do the project • Is the researcher free to publish? • Some databases with generic REC agreement will organise independent scientific review to answer the above.
Risk to confidentialty • Each study needs risk assessment even if pseudonymised • Could the study lead to identification of the patients because of • - other data that the researcher might have • - small numbers/rare events • Minimise risk by de-identification data • Data sharing agreement & sanctions for misconduct .
QResearch data linkage study • Linked to deprivation in 2002 • Linked to ONS cause death in 2007 • Currently being linked to HES and cancer registry • Testing out new method of data linkage using pseudonymised data linkage • Exceptionally high levels of valid, complete NHS numbers for ONS data, HES, GP data
Open pseudonymiser project • Need approach which doesn’t extract identifiable data but still allows linkage • Legal ethical and NIGB approvals • Secure, Scalable • Reliable, Affordable • Generates ID which are Unique to project • Can be used by any set of organisations wishing to share data • Pseudoymisation applied as close as possible to identifiable data ie within clinical systems
Pseudonymisation: method • Scrambles NHS number BEFORE extraction from clinical system • Takes NHS number + project specific encrypted ‘salt code’ • One way hashing algorithm (SHA2-256) – no collisions and US standard from 2010 • Applied twice - before leaving clinical system & on receipt by next organisation • Apply identical software to second dataset • Allows two pseudonymised datasets to be linked • Cant be reversed engineered
Web tool to create encrypted salt: proof of concept • Web site private key used to encrypt user defined project specific salt • Encrypted salt distributed to relevant data supplier with identifiable data • Public key in supplier’s software to decrypt salt at run time and concatenate to NHS number (or equivalent) • Hash then applied • Resulting ID then unique to patient within project
Openpseudonymiser.org • Website • Desktop application • Software for integration • Test data • Documentation • Utility to generate encrypted salt codes • Source code GNU GPL
Progress so far • Pseudonymised entired • HES database since 1997 • Cause of death data since 1993 • Cancer registrations since 1990 • Linked all three datasets based only on pseudo NHS number - >99% complete • Due to linked GP data Spring 2012 • Implementing into major GP computer systems
Key points • Pseudonymisation at source • Instead of extracting identifiers and storing lookup tables/keys centrally, then technology to generate key is stored within the clinical systems • Use of project specific encrypted salted hash ensures secure sets of ID unique to project • Full control of data controller • Can work in addition to existing approaches • Open source technology so transparent & free
Definition of clinical care team • Important as determines whether s251 required • Tendency by research community to adopt v broad definition to justify access • Definition is tricky as a guide • Individual has a duty of care to patient • Has duty of confidence • Would be recognised in that role by a reasonable patient
Implications of Open Data • VERY difficult to see how patient level data can be suitably de-identified so that it can published on line to meet Cameron’s promises • Current work on de-identification standard by IC/DH to help custodians decided when data can be published.