250 likes | 377 Views
Synchronising Diversely Implemented Databases to Support Administration of Clinical Research. Stuart Anderson Mark Hartswood Conrad Hughes CRISP (Clinical Research Information Systems Project) School of Informatics University of Edinburgh. Many administrative databases, much the same data.
E N D
Synchronising Diversely Implemented Databases to Support Administration of Clinical Research Stuart Anderson Mark Hartswood Conrad Hughes CRISP (Clinical Research Information Systems Project) School of Informatics University of Edinburgh
Many administrative databases,much the same data Project R#30 Title “A very important study” …
Share the data automatically! Project R#30 Title “A very important study” …
Research organisations • Research Organisations (ROs) in NHS Lothian administering clinical research projects: • NHS Research & Development Office • Welcome Trust Clinical Research Facility (WTCRF) • Scottish Cancer Research Network (SCRN) • Experimental Cancer Medicine Centre (ECMC) • NHS R&D involved in all projects, at least in terms of handling approvals
Project meta-information • Project title • Project start and end dates • Project ethics status and research approval • Project sponsors, funders and finance data • Project personnel • Sponsor and personnel contact details • Patient lists and activity records • … Supposedly the same data, but in different databases
A CRISPy Opportunity • We could: • Reduce data entry costs • Improve data quality • Improve awareness of activity • ...if we find ways to share common data between databases • Suits government “bureaucracy busting”
Options • Looked at commercial solutions: • Some didn’t understand the complexity and risks (e.g. rsync in two directions) • Competent-sounding ones were prohibitively expensive (e.g. £170k per site) • Our solution: DIY approach using free software
Harmony • Document synchronisation framework • By Benjamin Pierce et al.: http://www.seas.upenn.edu/~harmony • Reconciles changes made to multiple disconnected structured documents containing the same data (or subsets thereof - the “view update” problem), e.g. • Internet browser bookmarks files • Calendar applications • Strong theoretical approach with emphasis on provable safety: changes only propagated under well-defined circumstances
Archive (~Old X) RO1’s Document X RO2’s Document X Harmony Updated Document X (RO1) Updated Document X (RO2) Log of changes and conflicts New Archive Overview of Harmony
Harmony operation: Equality After running Harmony:
project Conflicts 600 pre-roll-out conflicts to resolve; these examples are fairly trivial
Provenance issues? • Trust • Alignment • Form and meaning • Authority • Control
Trust • Organisations are allowing other participants to write to their databases • Do you trust them? • Alignment of goals • Need to establish confidence in each other’s procedures and practices • Established through regular meetings • Others might know more than you do
Alignment: record identity • Need to identify which records in different databases refer to the same project, funding body or person • Use R&D Number, assigned by NHS R&D, for projects • Creation complicated because projects may initially be entered (without R&D#) by ROs • Deletion complicated because some projects may leave scope but no projects should really be deleted • Funding bodies and persons are handled more loosely • Identity and duplication less critical here
Syncing two database tables Database 1 Database 2 7 7 9 3 3 9 Unique Shared Keys identify records across databases SK SK Establishing identity Synchronising tables across two databases depends on having a unique shared key. This value has to be guaranteed to be unique within each table, and to identify corresponding records uniquely across databases.
Do they have the same meaning? • Start/end date • Approval? Funding? Recruitment? Analysis? • Often driven by reporting requirements • Some fields too contentious, not useful to share, so not included in sync • Option to synchronise separate meanings as separate fields • Get parties to agree on common meanings • Valuable communications exercise among participants
Shared meaning = shared form? • Field types/sizes • Field values • N/A na None No Pending • Funder classification varies from DB to DB • Personnel roles • One column per role or one row per role? • Some adjustment and convergence possible to participants’ databases • Transform data to “standard” on export/import
Authority • Harmony is symmetric: no peer to a sync gets priority • Some information should only be sourced by R&D (responsible for approvals) • Some information is best sourced by ROs (personnel, funding) • But: • Databases involved don’t record sources of information • Strict rules impair usability and make for an unpopular (and unused) system • Solution: • Emphasise audit over control • But provide limited inter-site control at data import stage
Control • Each database contains organisation-specific (and private) information • Some content is just irrelevant to others • Some patient data! • Solution: import/export script run locally by each organisation only exports a chosen subset of tables, rows and columns
Benefits • Data only entered once for all • Everyone takes responsibility for data they’re most expert in • Disagreement (“conflict”) is permitted, and may be resolved through human-human communication • Limited (inter-site) audit operating so expect/hope for responsible behaviour
Conclusion • Real data synchronisation application has been far from the theoretical ideal • Issues of alignment, scope, identity, policy, trust, data quality, form and meaning • Solutions to problems encountered aren’t just technical: organisational engagement and trust have been essential in keeping the task tractable • Rolling out now, so reality yet to be seen • Depends on fair balance of effort and reward among participants
Thank you! conrad.hughes@ed.ac.uk School of Informatics University of Edinburgh