790 likes | 833 Views
Tutorial - Design and Implementation of Clinical Databases with openEHR. Pablo Pazos Gutiérrez, Koray Atalag, Luis Marco-Ruiz, Erik Sundvall, Sérgio Miranda Freire. Foundations. Modern Clinical Databases need to ... handle many types of information, lost of different data structures,
E N D
Tutorial - Design and Implementation of Clinical Databases with openEHR Pablo Pazos Gutiérrez, Koray Atalag, Luis Marco-Ruiz, Erik Sundvall, Sérgio Miranda Freire
Foundations • Modern Clinical Databases need to ... • handle many types of information, • lost of different data structures, • be flexible and generic, • consistent, standardized, future-proof (evolution) • CDBs are difficult to design! • design is 10% about storing data, 90% about querying, retieve and using data • To achieve a good design we need to have: • deep knowledge of clinical record structures • apply good practices, standards and support generic requirements • knowledge about different technologies / solutions
Agenda • Clinical Information Requirements • Clinical Information Organization • Database Technologies & Features • openEHR • goals, information model, knowledge model, data store & query, versioning & audit • openEHR Data Storage Techniques • Relational + ORM • Hybrid • Data Querying
Clinical Information Requirements Storing & Accessing Data
Minimal Information Set (ISO 18308) • Related to storage: from the user point of view • Patient history • Physical examination • Psychological, social, environmental, family and self care information • Allergies and other therapeutic precautions • Preventative and wellness measures such as vaccinations and lifestyle interventions • Diagnostic tests and therapeutic interventions such as medications and procedures • Clinical observations, interpretations, decisions and clinical reasoning • Requests/Orders for further investigations, treatments or discharge • Problems, diagnoses, issues, conditions, preferences and expectations • Healthcare plans, health and functional status, and health summaries • Disclosures and consents • Suppliers, model and manufacturer of devices (e.g. implants or prostheses) • Internally we want more generic information elements • especially on our database designs
Several ways of accessing clinical data • Related to clinical data querying for clinical usage (patient level): • Chronological (e.g. to sort medical consultations) • Problem-Oriented (access data by condition or disease) • Health records are associated to a health problem • Each problem evolves until it is solved/inactivated if it is not chronic • By medical specialty (e.g. cardiology) • By department, sector, unit or service (e.g. emergency, ICU, ...) • Episode • One or many contacts / visits on different dates • May include hospitalizations • Associated with a health problem (e.g. asthma attack) • Access to individual documents or data points • e.g. all the blood pressure measures for a patient • There is also Public Health and Epidemiology (population level) • we had a tutorial about that yesterday: “Enabling Clinical Data Reuse with openEHR Data Warehouse Environments”
Infrastructure Requirements • Related to user experience and quality • Be aware of the CAP theorem! • Scalability (grow maintaining service level) • High availability (% operational time) • Transactionality (all or nothing) • Performance (run forest, run!) • Concurrency (we all want that resource) • Audit (what, when, who, where, why, ...) • Encryption (data at rest) • Version management (history of changes) • ... • We want all! • We might need to use different technologies
Agenda • Clinical Information Requirements • Clinical Information Organization • Database Technologies & Features • openEHR • goals, information model, knowledge model, data store & query, versioning & audit • openEHR Data Storage Techniques • Relational + ORM • Hybrid • Data Querying
Clinical Information Organization Clinical records & information are highly hierarchical paper based or electronic
Clinical Information Organization Clinical records & information are highly hierarchical
Agenda • Clinical Information Requirements • Clinical Information Organization • Database Technologies & Features • openEHR • goals, information model, knowledge model, data store & query, versioning & audit • openEHR Data Storage Techniques • Relational + ORM • Hybrid • Data Querying
DBs for different kinds of usage • operative / transactional databases (OLTP) • read/write oriented • support business processes, small historical data • querying databases • read oriented • read-only data, might be in memory • document database • audit, versioning, electronic signature (authenticity, incorruptibility) • analysis database • read oriented • might need ETL • data linking + data mining + statistical analysis + prediction techniques (trends) • datawarehouse database • ETL from many data sources • batch calculations of indicators over loads of historical data
Which to choose? relational, documental, key/value, graph, object, ..., and which brand?
Consider many use cases can be met efficiently by relational databases, but each project is different, and there is no one-fits-all solution We are not worried about performance just yet, we’ll focus on how to design Clinical Databases with openEHR first!
First approach • The choice depends on the context • use cases, estimated # of operations / # of records, organizational knowledge, ... • For the operative / transactional DB lets go with a relational database: • MySQL, Postgres, Oracle, SQLServer, … • NOT a recommendation: just focusing on one option to understand some common clinical database design concepts applicable on other technologies.
First approach • Loads of reads? Complex queries and JOINs? Low performance? • try relational for writes + documental for reads • Needs ETL: relational => doc (JSON,XML) • you can denormalize the relational DB, and/or • use documental capabilities of some RDBs (e.g. Postgres supports XML & JSON) • made good use of indexes • analyze query plans (Posrgres / MySQL EXPLAIN query) • Most systems wont have problems with any of these options • there is always a way to optimize things!
Also, we have transformations between models • When we need to • migrate to another technology (e.g. from RDBs to Doc) • integrate different technologies (hybrid solution)
We can: XML (canonical xform) JSON openEHR XML JSON equivalent
Non-RDB-based approaches? XML: BaseX, Sedna, eXist, ... JSON: Couchbase, CouchDB, MongoDB, ... Often suitable if your client side GUI primarily wants XML or JSON documents/chunks (avoids conversion needs) …or if you go all-in-javascript on server+client? Auto-translating AQL to hierarchy-friendly query languages (e.g. Xquery, N1QL, Sparql, SQL++?) is often straightforward. Consider using a parser generator. XML databases fast for transactional (clinical?), but often slow for population-wide (epidemiology?) queries. Solutions such as Couchbase can be very fast for both, after specific indexing is done (example on next slide). Very little is published regarding graph/network databases (Neo4J etc) and object databases for openEHR usage. Please test and publish! 20
Scaling? Size & Performance tests, 4.2M patients Source: Yet unpublished results, working title: Comparing the Performance of NoSQL Approaches for Storing and Retrieving Archetype-Based Electronic Health Record Data. Authors:Sergio M Freire, Douglas Teodoro, Fang Wei-Kleiner, Erik Sundvall, Daniel Karlsson, Patrick Lambrix More about the test data and some of the setup is already published in http://www.ep.liu.se/ecp/070/009/ecp1270009.pdf Please note: • All DBs work fine/fast for ”clinical” patient-specific queries, the graph shows population-queries • the RDB, here used as source and reference, is an epidemiology-optimised non-openEHR-based reference that we try to match in end-user speed (not size). The XML/JSON based DB-examples have the flexibility of openEHR to add new archetypes etc. without manually reworking the DB schema etc, the RDMBs reference example does not have that flexibility. 21
Agenda • Clinical Information Requirements • Clinical Information Organization • Database Technologies & Features • openEHR • goals, information model, knowledge model, data store & query, versioning & audit • openEHR Data Storage Techniques • Relational + ORM • Hybrid • Data Querying
Open Standard to create really flexible, future-proof (maintainable in the long term at large scale with low cost), interoperable EHRs. • Defines an Infostructure! • Created, maintained, tested, validated and implemented by an international community of professionals. • The community provides Modeling Tools and Open Source Reference Implementations in many technologies (Java, Eiffel, .Net, Ruby, Python). • Key elements: • technological independence • multi-level models, clean and complete • information, clinical concepts, terminology bindings, querying, services, ... • formal methodology for knowledge management • open & free access to specifications • a-la W3C / IETF (enabled the implementation of the Internet and the Web) • Please join us! • openEHR Foundation: • http://openehr.org/community/mailinglists • openEHR en español: • http://openehr.org.es
Information Model Our Clincal DB Design will be based on this!
Information Model Clinical records & information are highly hierarchical
Record Entries Different types of entries a clinical document can have Clinical records are highly hierarchical!
Specifying Clinical Records: Key Points for Clinical Database Design for openEHR data
Archetypes & ADL • Represent clinical concepts by constraints over a generic Information Model • defined in Archetype Definition Language • globally valid, multi-language • Important elements for DB design and implementation! • multi-axial identifier • openEHR-EHR-OBSERVATION.blood_pressure.v1 • node identifier • atNNNN • node path (e.g. path to systolic BP) • /data[at0001]/events[at0006]/data[at0003]/items[at0004]/value • archetype id + path • unique semantic identifier • will use them in our databases! • Need archetypes, no problem: http://ckm.openehr.org/
Operational Templates (OPT) • "Big archetypes" • Combine archetypes to represent clinical documents • Allows to add more constraints • Defined in XML • Use for specific contexts • one language • locally valid (organization, federation, national) • Used by EHR/EMR software directly • for validating data • for generating UIs • for indexing data • for querying • …
Operational Templates (OPT) <template_id> <value>Consulta Médica</value> </template_id> <definition> <rm_type_name>COMPOSITION</rm_type_name> ... <node_id>at0000</node_id> <attributes xsi:type="C_SINGLE_ATTRIBUTE"> <rm_attribute_name>category</rm_attribute_name> ... <children xsi:type="C_COMPLEX_OBJECT"> <rm_type_name>DV_CODED_TEXT</rm_type_name> ... <attributes xsi:type="C_SINGLE_ATTRIBUTE"> <rm_attribute_name>defining_code</rm_attribute_name> ... <children xsi:type="C_CODE_PHRASE"> <rm_type_name>CODE_PHRASE</rm_type_name> ... <terminology_id> <value>openehr</value> </terminology_id> <code_list>433</code_list> -- category = event </children> </attributes> </children> </attributes> ... openEHR IM class openEHR IM attribute
Information & Metadata References to Archetypes and Templates (semantic content definitions) • Link between Archetypes and the Information Model • Will use those fields in our persistence model • Are important for queries!
Agenda • Clinical Information Requirements • Clinical Information Organization • Database Technologies & Features • openEHR • goals, information model, knowledge model, data store & query, versioning & audit • openEHR Data Storage Techniques • Relational + ORM • Hybrid • Data Querying
openEHR Data Storage Design • openEHR doesn't define how to store data • The IM is not a Persistence Model • The Persistence Model will depend on requirements and technologies • Our work is to adapt the IM to our persistence needs • We can simplify, adapt or use part of it (openEHR is very flexible) • openEHR doesn't care about how we store data but does care about: • structural and semantic consistency (defined by archetypes & OPTs) • processable / accessable / queryable data • Tips: • archetype id, path, template id, node id are important for querying • references can be simplified (OBJECT_REF) (FKs in Relational) • structured data can be simplified (ej. DV_CODED_TEXT) • …
Object-Relational Mapping (ORM) • OO system (openEHR IM) & Relational DB => ORM • OO: class, attribute, attr. type, relationship, inheritance • Relational: table, column, column type, reference • Key elements: • identity representation • data type mapping • association mapping (different cardinalities 1..1, 1..N, N..N) • inheritance mapping
Identity in Object-Oriented Model • Objects have an identity to: • differentiate between objects of the same class • reference those objects • In the relational model we have Primary Keys • Solution: • add an "id" column in each table • of type "int" or "long" and use it as PK • FKs reference only PKs "id" • represents relationships in the OO model
Data Type Mapping Each type we use in the OO model, should be mapped to a type in the DBMS we chose.
Mapping Inheritance TIP: on table per class, is better to use the same value for "id" for the columns of the same instances distributed in different tables.
Database Schema Examples Some databases we have designed for openEHR data, but with different purposes
EHRServer + generic data storage + focused on querying + doesn’t map the whole IM + training purposes (for now)
+ operational DB + for an EMR system + pretty normalized
Hybrid approach • Considerations • Use only if it makes sense! • for example if it improves querying performance / scalability • Modern Relational DBMS compete with some NoSQL features: • support documents • scale through clusters • some allow in-memory tables or views
Agenda • Clinical Information Requirements • Clinical Information Organization • Database Technologies & Features • openEHR • goals, information model, knowledge model, data store & query, versioning & audit • openEHR Data Storage Techniques • Relational + ORM • Hybrid • Data Querying
Data Querying AQL and path-based queries
Archetype Query Language • AQL is like SQL for EHRs • Archetype ID is "like" a table (type of info we want) • openEHR-EHR-OBSERVATION.blood_pressure.v1 • Data points identified by paths, "like" "columns (defined by each archetype) • Systolic BP: /data[at0001]/events[at0006]/data[at0003]/items[at0004]/value Get high BP data SELECT obs/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude, obs/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/magnitude FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS OBSERVATION obs [openEHR-EHR-OBSERVATION.blood_pressure.v1] WHERE obs/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude >= 140 OR obs/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/magnitude >= 90 https://openehr.atlassian.net/wiki/display/spec/Archetype+Query+Language+Description