220 likes | 352 Views
Uniting i2b2.org and caGrid. National scale data sharing networks for Biomedical Informatics research Rob Wynden – UCSF A collaborative effort of UCSF, OSU, UCD, Rochester U, UPenn, U Washington, Mayo Clinic, Wash U, and Partner’s Health. Challenges.
E N D
Uniting i2b2.org and caGrid National scale data sharing networks for Biomedical Informatics research Rob Wynden – UCSF A collaborative effort of UCSF, OSU, UCD, Rochester U, UPenn, U Washington, Mayo Clinic, Wash U, and Partner’s Health
Challenges • Several challenges impede the task of launching an IDR (integrated data repository) and sharing that information for research purposes • Data Governance and Standardization • Meeting the needs of researchers • Semantic Interoperability
Data Governance • It is very difficult to get approval to import data into an IDR installation • If we were also to require that data be encoded at the source in a particular standard format then approval would be even more difficult • Data translation during ETL (extract transform and load) is also hard because not all data needs to be so encoded and data must often be translated into multiple standard formats
Meeting the needs of Researchers • Researchers need data to be encoded in the format which is appropriate for their research specialty. No single data encoding is appropriate for all purposes • Researchers will also require access to the source information in un-modified form for verification purposes
Semantic Interoperability • In order for researchers within the same domain of study to share information and work together that information must be encoded in a consistent format • Each research institution has information encoded in a unique fashion which is dependent on a particular mix of the source software environments used in clinical, clinical research and bench science.
Benefits of i2b2 • An open source translational informatics warehouse platform (an IDR) • An active open source based user community • Industry support (Sybase, HP, Sun …) • A relatively easy platform into which to import source data regardless of it’s encoding • Availability of a general purpose instance mapper for the translation of source data into standard encodings
Problems with i2b2 related to data sharing • I2b2 lacks a mature data sharing capability which includes both general purpose semantic interoperability and security • I2b2 cannot interoperate with other IDR’s which may not be on the same platform
Benefits of caGRID • Developed as part of the caBIG translational informatics effort caGRID is a mature data sharing network • caGRID offers secure user authentication • caGRID offers data sharing over a semantically interoperable network • caGRID is platform agnostic and can be used to interconnect IDR environments regardless of the underlying technology (the design of caGRID is NOT specific to caBIG related systems)
Problems with caGRID • It is currently difficult to use caGRID on IDR projects. The caBIG project does not currently offer a general purpose IDR software environment • It is currently difficult to translate data into a format suitable for publication over caGRID • All caGRID based systems require that shared data be encoded within standard format(s) which usually does not match the format of our data sources.
The best of both worlds • By combining the advantages of i2b2.org and caGRID we will provide a comprehensive solution to national scale data sharing • I2b2.org provides a relatively easy way of importing source data and translating that information into a standard format(s) • caGRID supplies a secure and semantically interoperable national scale network.
CTSA Collaborative Development • The effort to combine i2b2.org with caGRID is a collaborative effort involving several CTSA sites • I2b2.org was first launched into open source by Partner’s Health and includes many CTSA award sites including, Harvard Med, UCSF, UCD, U Washington, Cincinnati Children’s, UT Houston, Rochester, UPenn etc, etc…
Ontology Mapper Cell • The Ontology Mapper Cell within i2b2 is a general purpose instance mapper which can translate messy local data into one or more standard formats. In other words, the Ontology Mapper maps local data into Ontologies • Maps will be created and annotated in a Protégé Prompt plug-in and can be shared over HL7 CTS II both as open source or as commercially sold assets • Maps contain routing, provenance information and a scriptlet payload of SQL, Perl, SparQL, Horn or R • The Ontology Mapper Cell within i2b2 is a collaborative effort involving UCSF, UCD, Rochester, UPenn, and U Washington • This has been a highly active collaborative effort which is now in an Alpha release cycle
caGRID Cell • The caGRID Cell is a development project which is a collaboration of OSU (Ohio State) and UCSF • This component allows any i2b2 data mart, which has been translated into standard format by the Ontology Mapper, to share data over caGRID • This system will allow i2b2 to share data (a federated query) across any caGRID based data source (not just between other i2b2 instances)
Three pilot projects under way • There are currently three projects which have all based their architectures on this work • HSDB (Human Studies Database) – The project for which this i2b2-caGRID architecture was first developed shares clinical research metadata • QSN (The Quality Safety Network) – A national network of payer, and IDR derived quality data • STIR (Cardiovascular Imaging Network) – A national scale network of information to be used for cardiovascular research
So how does it work? • STEP 1 • First data is ETL’ed (extract transform load) into the i2b2 schema • The i2b2 schema is based on Concept Table design which is a derivative of fact table design. • In concept table design each ‘name’ in the fact table is a hierarchical string of concepts • This architecture can be used to import (ETL) source data in any encoding without the requirement for data standardization as a data governance task
So how does it work? • STEP 2 • As data is imported it is then translated into one or more standard formats with the Ontology Mapper Cell. • The Ontology Mapper uses HL7 CTSII shareable data translation rules to translate local data into standard format(s). (it’s a general purpose instance mapper). • One-to-one maps, aggregates and archetype generation are all supported. • The Ontology Mapper then publishes data into a data mart. Ontology Mapper data marts are database Views which can be ‘materialized’ into physical data marts if required.
So how does it work? • STEP 3 • The Ontology Mapper translates data into an IEC11179 compliant data model • The Ontology Mapper Cell then publishes that data as a data mart (a View within the underlying database) with permission within i2b2 aligned with the study protocol • Each data model is checked into the caDSR (data standards repository) to serve as a common standard reference • The caGRID Cell then provides a grid data service which automatically provides the necessary EAV to object relational transform in order for i2b2 based data to be interoperable over the caGRID (created based on the Introduce tool) • Data can then be queried via standard caGRID tools or via custom caGRID query environments if required (permissions are handled via Grid Grouper) • Queries can be both intra and inter institutional
Combining i2b2 and caGRID • By combining these techniques we can achieve the goal of a national scale semantically interoperable data sharing network within the CTSA • This is a national collaborative effort involving many CTSA and caBIG based sites around the country • By all working together as a team we are better equipped to achieve our goals of launching IDR’s and sharing research information.
Thank you • Questions please • A collaborative effort of UCSF, OSU, UCD, Rochester U, UPenn, U Washington, Mayo Clinic, Wash U, and Partner’s Health. If you are interested in becoming a contributing member to this effort please contact rob.wynden@ucsf.edu