170 likes | 389 Views
OpenMRS Data Synchronization Implementing OpenMRS in loosely connected environments. 27-Nov-2008, Maros Cunderlik , openmrs.org. OpenMRS Software Architecture. Software Architecture
E N D
OpenMRS Data SynchronizationImplementing OpenMRS in loosely connected environments 27-Nov-2008, Maros Cunderlik, openmrs.org
OpenMRS Software Architecture • Software Architecture • “Architecture is defined by the recommended practice as the fundamental organization of a system, embodied in its components, their relationships to each other and the environment, and the principles governing its design and evolution” • WHAT does it do? • HOW does it do it? • WHY does it do it the way it does? • Architecture Documentation: ‘Views’ • Logical: documents functional composition of the system elements and their relationships • Physical: distribution of the logical units onto physical resources • Servers, technologies, protocols, ports, etc. References: http://www.sei.cmu.edu/architecture/ http://www.sei.cmu.edu/architecture/published_definitions.html
OpenMRS: Physical View Reporting Data Entry Clinical Decision Support Internet (HTTP) • OpenMRS Server: • OS: • Windows or Unix • Application server: • tomcat, Java • Database: • MySql or other RDBMS Tomcat + Spring + Hibernate (Http + Java Application Server + Java Persistence) JDBC MySql DB
OpenMRS: Challenges in rural areas • Goals: • Allow convenient and up-to-date access to system in rural districts and health centers • Data collected in rural areas must be available to central systems in timely manner • Challenges? • Power • Connectivity (packet loss, corruption) and bandwidth • Travel: data cannot be easily shipped to/from central locations • HW and SW maintenance and upgrades in remote areas • HW failures, patches, SW upgrades, etc.
OpenMRS: Loosely connected • Solutions? • #1: Collect Data on paper and ship it back to central location • Pros vs. Cons ? • #2: OpenMRS ‘Lite’ • Make light-weight copy of openmrs that support minimal functionality and distribute it to remote areas • Pros vs. Cons ? • #3: Separate Desktop Application • Make completely separate application that works in disconnected mode • Pros vs. Cons ? • #4: Connected installs of OpenMRS with data synchronization • Pros vs. Cons ?
OpenMRS: #4 Data Synchronization • Reasons for #4: • Health centers need functionality beyond simple data entry (i.e. reporting, updated drug information); i.e. making separate application would be costly • Health centers also need to *receive* data about patients from other centers in their district/province: i.e. ‘one-way’ data flow is not sufficient • On-site connectivity: there *will* be onsite Internet connectivity; it may be sporadic and at times unreliable • Given limited amount of dev resources, reuse as much of core openmrs java code as possible
OpenMRS: Data Synchronization Design • Q: What capabilities must exist in a system for two installations to exchange data? • A: Four things • 1. Serialization • Facility to reliably export and then import business objects • 2. Globally Unique Identification of data/records • Primary keys are unique only to a single *local* database • 3. Change tracking mechanism • How do we know what changed on a given system since ‘last time’? • 4. Transport mechanism capable of working on unreliable networks
Data Synchronization: Implementation • #1: Serialization: Serializing object graphs can be tricky #1 public class Person { protected Address primaryAddress; protected Address secondaryAddress; … public Address getPrimaryAddress() {..} public void setPrimaryAddress(Address a) {..} public Address getSecondaryAddress() {..} public void setSecondaryAddress(Address a) {..} } #2 public static void main(String [ ] args) { Address a = new Address(“Kigali”); Person p = new Person(); p.setPrimaryAddress(a); p.setSecondaryAddress(a); a.setValue(“Kirehe”); assert(p.getPrimaryAddress().equals(p.getSecondaryAddress()); } ? true #4 .. Address a1 = p.getPrimaryAddress(); Address a2 = p.getSecondaryAddress(); a1.setValue(“Rwinkwavu”): assert(a1.equals(a2)); <Person> <primaryAddress value = ‘Kigali’ /> <secondaryAddress value = ‘Kigali’ /> </Person> #3 ? true
Data Synchronization: Implementation • Serialization Options: • Java native: java.io.Serializable • doesn’t work well for durable state; cannot move from one JVM to another • 3rd party tools: Simple, XStream • Custom • i.e. implement iava.io.Externalizable • Data Synchronization: leverage Hibernate Persistence Mechanism • Pros: • Reuse what is already in use in openmrs • Also provides simple solution to #3 problem • Dependent on persistence layer: any changes made outside of it will not be serialized or understood • Longer-term: replace with rebust serialization framework in core openmrs
Data Synchronization: Implementation • #2: Record Uniqueness • How do we know patient_id of a given patient in two different databases? • Example: 2 server: Rwinkwavu and Kirehe • 0. both Rwink and Kirehe have exact same # of patients in their tables • 1. Rwink: Add patient Joe, system assigns next id, assume patient_id = 34; • 2. Kirehe: Add patient Patrick, system assigns next available primary key, say 34; • 3. If Rwink sends its patient data, patient #34 will be Joe but Kirehe ‘thinks’ it is Patrick: how to fix??? • Two common solutions: • Create mapping tables: server_id, table_id, pk • Cons: using one central mapping table creates single point of failure, keeping up distributed version of the mapping table is trickly • Use something that *is* globally unique: Universally Unique IDentifier (UUID) • Java.util.UUID, http://en.wikipedia.org/wiki/UUID
Data Synchronization: Implementation • #3: Change tracking mechanism • Servers need to somehow ‘tell’ each other the last time they ‘saw’ each other • Classic problem in distributed computing • Two (at least) common solutions: • #1: Versioning • #2: Change logs/journals • Openmrs sync: Change journal + hibernate intercetor (for now) • We actually want versioning but doing so requires extensive changes to data model; compromise: change journal table analogous to DB transaction log
Data Synchronization: Implementation • #4: Robust Transport • Needs: • cannot be connected protocol (i.e. RPC) • Efficient on wire • Back up mode for transport needs to be available in case of not connectivity • Able to withstand network transport corruption • OpenMRS Sync solution: • HTTP + checksum + compression • ‘USB’ flat file based data interchange as backup
Data Synchronization: Implementation • Addressing the Challenges • Power and maintenance • Use Ubuntu to minimize need for patches • Application will automatically start up when server boots up and sync on schedule; no intervention needed • Investment in key infrastructure: solar power, sat. connections • TODOs: application self-update, database self-migrate • Connectivity, Travel • Sync transmissions checksum-ed and compressed • Data can be retransmitted without concern about corruption/duplication • If prolonged outage: sync-via-USB available