550 likes | 722 Views
Development of a cross institutional database and data management system for research, clinical management and quality indicators in GI disease Jaroslaw Pillardy Bioinformatics Facility Cornell University Andrew Talal Center for the Study of Hepatitis C Weill Cornell Medical College.
E N D
Development of a cross institutional database and data management system for research, clinical management and quality indicators in GI disease Jaroslaw Pillardy Bioinformatics Facility Cornell University Andrew Talal Center for the Study of Hepatitis C Weill Cornell Medical College
Cornell University • Bioinformatics Facility Provides bioinformatics state-of-the-art computational resources and analysis tools, and expertise in their applications, to the university community and to outside investigators. • Research collaboration • Access to bioinformatics computational resources • Software, database and website development • Project design and data analysis consultation • Bioinformatics workshops and training
Development of a cross institutional database and data management system for research, clinical management and quality indicators in GI disease. The database and data management system was developed initially for Andrew Talal Lab in Weil Cornell Medical College. The system (Datamart) integrated tissue bank data and clinical data. The development started in 2008.
Presentation Overview • Overview of the Datamart • Current limitations and future development • Applications
What is Datamart? • Information Management System (IMS) with • Flexible user-defined data structure • Integration of existing data from multiple sources • Management of Lab data (e.g. tissue bank, sequencing) • Web browser access • Point-and click interface for all interface functions • Data export to file or database • Text data mining (in progress)
Datamart components: server Two parts: Web interface ASP.NET C# application. Requires Windows Server. Database MS SQL database. Requires MS SQL Server.HepCdatabase with 13,000 patients and 14,000,000 labs required 400 GB space. Both components may run on the same computer.
Datamart components: client • Three major web browsers are supported: • Internet Explorer (7 or newer) • Firefox (3 or newer) • Safari (5 or newer) • Mozilla API based (1 or newer) • Most mobile browsers work, we may develop a mobile application to improve mobile content browsing if there is demand • The interface refuses to work with incompatible browsers
How is Datamart different? Meta-IMS: Users define how data is stored and what are relations between forms Various data sources: Integrates with hospital clinical databases Internet access: Accessible from any device with web browser Data mining: Convert text based records into numeric records Using previous data: Easy import of previous lab data from Access or Excel repositories
Data Structure defined by the user In classical system any change in data structure requires programmatic changes in database and interface. In Datamart the data layout is defined by authorized users via web interface. Data is stored in forms and fields. Each field may be unrestricted data of a given type (txt, int, dec, etc.) or restricted to a list of values (pulldown).
Accessing Data Browsing data with filters Data can be traversed using browsing form, with filters including all form fields. It is also possible display patient information if needed. Advanced search Custom queries, build online using web interface.
Browsing data with filters Data can be traversed using browsing form, with filters including all form fields. It is also possible display patient information if needed.
Browsing data with filters Data can be traversed using browsing form, with filters including all form fields. It is also possible display patient information if needed.
Advanced search Queries can be build using graphical interface, by choosing forms, fields, filters. Queries can be stored in the server for other users. Resulting data can be exported to a file or a database table.
Entering Data Manually Data can be entered or modified manually if needed. Each record can be edited, linked forms can be edited together. Data import Preferred way of entering data is data import.
Importing clinical data from hospital database (e.g. EPIC) Data can be imported from various database sources via database queries. Any data source that can be queried in SQL can be imported. Imported data are matched to the existing data and the format converted as appropriate. External data is then added or internal data is updated.
Importing clinical data from hospital database (e.g. EPIC) Data matching is done using multiple fields (e.g. for patients “last name”, “DOB”, “PAT_ID”, “MRN”, “IDX” etc). Records with inconsistent match or otherwise inconsistent data are reported to the operator for manual inspection. Import can be stopped and restarted at any time. If external source does not support incremental export, dump of the previous version of the export can be used to limit amount of datae to process.
Importing clinical data from hospital database (e.g. EPIC) Rules for data marching and conversion are defined by an authorized user via graphical interface. Consistency check is performed during import, any data failing consistency check can be ignored or reported as inconsistent for inspection and correction.
Importing data from other repositories Data collected in other media like text file, Access database or Excel spreadsheet can also be imported. This type of import is not automated and requires programming help. Datamart has been initialized with data from Access database.
Exporting data to text or database • Data can be exported from data browsing or a query. • Data can be exported to • tab separated text file, which can be further converted into any appropriate format (e.g. SAS) or opened in Excel. • database table for further processing • Very useful for clinical trials or transferring data between labs. Data can be further processed or modified using scripts.
Datamart security The IMS is protected with passwords and encryption / decryption keys. Every user is assigned to a role that defines the access and modifications rights. Sensitive data (names, DOB, MRN) are encrypted and can be only accessed by users with encryption / decryption keys. Not even database administrators have access to these data.
Datamart security The users’ activity in the LIMS is logged, including logins, logouts, data export and import. No data can be physically removed from the database – new versions are just visible in front of the old. The IMS has auditing and data recovery procedures.
Datamart security: no key login Multiple slides
Datamart security: auditing previous record versions Multiple slides
Datamart security • Client related security • Encrypted communication enforced • Only selected browser versions are accepted • Page will log out automatically if not used • Client browser caching is switched off • User cannot use “back” function from the browser
Datamart security • Users • Access to data can be controlled on multiple levels • allow or deny access to encrypted data • allow or deny access to particular forms • allow or deny access to editing functions • allow or deny access to export/import functions • Users can be assigned group rights based on projects.
Datamart data mining Many records from hospital databases contain useful data in free text format This data can be extracted and imported as numerical or categorized data points to other forms Both formats are preserved: first step is to import free text data, store it, then convert. This feature is under development.
Optimization • Query design module needs to be further expanded to include more complicated data search, including text search inside pre-defined entries and more flexible form joining. • The database now includes over 30,000 patients, 8,500,000 lab orders and 31,000,000 lab results. The interaction with the database should be optimized taking into account a balance between different types of data, which will result in significant speed increase.
Data import • Datamart is designed to import data form one external sourceat a time. • The import module itself is very flexible, and can accept a variety of formats and data source types (any data source supporting SQL queries is compatible), but only one source at a time may be configured and used.
Data import • The import module should be generalized to be able to switch between pre-defined sources. • Pre-defined import templates for particular EMR systems (in addition to a generic SQL import source) should be avilable. • Some external data sources do not provide incremental data export (e.g. EPIC), therefore entire database needs to be scanned and imported. In such case it is a challenge to complete the data import fast. The data import module needs to be optimized for increased data import speed.
Data mining • Data mining module is still under development. • Limited form of data mining is available for import procedures.
Research • Research collaborations. • Multiple research groups using the same data system, accessible remotely. • Retrospective reviews performed on data from the Datamart • Example needed. • Translational research studies using samples from the repository • Example needed.