1 / 25

Uniting i2b2 and caGrid

Uniting i2b2.org and caGrid. National scale data sharing networks for Biomedical Informatics research Rob Wynden – UCSF A collaborative effort of UCSF, OSU, UCD, Rochester U, UPenn, U Washington, Wash U, and Partner’s Health. Challenges.

yachi
Download Presentation

Uniting i2b2 and caGrid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Uniting i2b2.org and caGrid National scale data sharing networks for Biomedical Informatics research Rob Wynden – UCSF A collaborative effort of UCSF, OSU, UCD, Rochester U, UPenn, U Washington, Wash U, and Partner’s Health

  2. Challenges • Several challenges impede the task of launching an IDR (integrated data repository) and sharing that information for research purposes • Data Governance and Standardization • Meeting the needs of researchers • Semantic Interoperability

  3. Data Governance • It is very difficult to get approval to import data into an IDR installation • If we were also to require that data be encoded at the source in a particular standard format then approval would be even more difficult • Data translation during ETL (extract transform and load) is also hard because not all data needs to be so encoded and data must often be translated into multiple standard formats

  4. Meeting the needs of Researchers • Researchers need data to be encoded in the format which is appropriate for their research specialty. No single data encoding is appropriate for all purposes • Researchers will also require access to the source information in un-modified form for verification purposes

  5. Semantic Interoperability • In order for researchers within the same domain of study to share information and work together that information must be encoded in a consistent format • Each research institution has information encoded in a unique fashion which is dependent on a particular mix of the source software environments used in clinical, clinical research and bench science.

  6. Ontology Mapper • The Ontology Mapper Maps local data (which is usually not formally encoded) into formally encoded based on ISO/IEC 111-79 data models which have been checked into the caDSR (Data Standards Repository). (It is an Instance Mapper.) • XML based instance map definitions can be shared between institutions both under Creative Commons License or under a Commercial License after purchase.

  7. Benefits of i2b2 • An open source translational informatics warehouse platform (an IDR) • An active open source based user community • Industry support (Sybase, HP, Sun …) • A relatively easy platform into which to import source data regardless of it’s encoding • Availability of a general purpose instance mapper for the translation of source data into standard encodings

  8. Problems with i2b2 related to data sharing • I2b2 lacks a mature data sharing capability which includes both general purpose semantic interoperability and security • I2b2 cannot interoperate with other IDR’s which may not be on the same platform

  9. Benefits of caGRID • Developed as part of the caBIG translational informatics effort caGRID is a mature data sharing network • caGRID offers secure user authentication • caGRID offers data sharing over a semantically interoperable network • caGRID is platform agnostic and can be used to interconnect IDR environments regardless of the underlying technology (the design of caGRID is NOT specific to caBIG related systems) • caGRID will eventually interoperate with Science Commons for accessing legal data access agreements

  10. Problems with caGRID • It is currently difficult to use caGRID on IDR projects. The caBIG project does not currently offer a general purpose IDR software environment • It is currently difficult to translate data into a format suitable for publication over caGRID • All caGRID based systems require that shared data be encoded within standard format(s) which usually does not match the format of our data sources.

  11. The best of both worlds • By combining the advantages of i2b2.org and caGRID we will provide a comprehensive solution to national scale data sharing • I2b2.org provides a relatively easy way of importing source data and translating that information into a standard format(s) • caGRID supplies a secure and semantically interoperable national scale network.

  12. CTSA Collaborative Development • The effort to combine i2b2.org with caGRID is a collaborative effort involving several CTSA sites • I2b2.org was first launched into open source by Partner’s Health and includes many CTSA award sites including, Harvard Med, UCSF, UCD, U Washington, Cincinnati Children’s, UT Houston, Rochester, UPenn etc, etc…

  13. Ontology Mapper Cell • The Ontology Mapper Cell within i2b2 is a general purpose instance mapper which can translate messy local data into one or more standard formats. In other words, the Ontology Mapper maps local data into Ontologies • Maps will be created and annotated in a Protégé Prompt plug-in and can be shared over HL7 CTS II both as open source or as commercially sold assets • Maps contain routing, provenance information and a scriptlet payload of SQL, Perl, SparQL, Horn or R • The Ontology Mapper Cell within i2b2 is a collaborative effort involving UCSF, UCD, Rochester, UPenn, and U Washington • This has been a highly active collaborative effort which is now in an Alpha release cycle

  14. caGRID Cell • The caGRID Cell is a development project which is a collaboration of OSU (Ohio State) and UCSF • This component allows any i2b2 data mart, which has been translated into standard format by the Ontology Mapper, to share data over caGRID • This system will allow i2b2 to share data (a federated query) across any caGRID based data source (not just between other i2b2 instances)

  15. Query Interfaces • caGRID based query: Work is under way to create a caGRID based query interface for the HSDB project (Wash U) • I2b2 based query: This environment will be implemented as a plug-in for the i2b2 SHRINE environment

  16. Five pilot projects under way • There are currently FIVE data sharing projects which have all based their architectures on this work • HSDB (Human Studies Database – Ida Sim) – The project for which this i2b2-caGRID architecture was first developed shares clinical research metadata – UCSF, Mayo Clinic, Wash U, UTSW, UCD • QSN (The Quality Safety Network – Andy Auerbach) – A national network of payer, and IDR derived quality data - UCSF, Tufts, Northwestern, Kaiser, Michigan and 17 Payers • STIRS (Cardiovascular Imaging Research Grid - Max Wintermark) : UCSF, GeorgeTown, UCLA, Sutter Health Corp • CHORI (Collab for Oral Health-Related Informatics - Joel White) : UCSF, Harvard, UT Houston • DBRD (Distributed Biobank for Rare Diseases - Jennifer Puck) : UCSF, UT Southwestern, Emory, Duke Total number of unique sites: 37 Number of sites already involved with the CTSA: 20 (almost all of these sites are heavily involved with at least one of these grid projects)

  17. So how does it work? • STEP 1 • First data is ETL’ed (extract transform load) into the i2b2 schema • The i2b2 schema is based on Concept Table design which is a derivative of fact table design. • In concept table design each ‘name’ in the fact table is a hierarchical string of concepts • This architecture can be used to import (ETL) source data in any encoding without the requirement for data standardization as a data governance task

  18. Concept Table Design

  19. So how does it work? • STEP 2 • As data is imported it is then translated into one or more standard formats with the Ontology Mapper Cell. • The Ontology Mapper uses HL7 CTSII shareable data translation rules to translate local data into standard format(s). (it’s a general purpose instance mapper). • One-to-one maps, aggregates and archetype generation are all supported. • The Ontology Mapper then publishes data into a data mart. Ontology Mapper data marts are database Views which can be ‘materialized’ into physical data marts if required.

  20. So how does it work? • STEP 3 • The Ontology Mapper translates data into an IEC11179 compliant data model • The Ontology Mapper Cell then publishes that data as a data mart (a View within the underlying database) with permission within i2b2 aligned with the study protocol • Each data model is checked into the caDSR (data standards repository) to serve as a common standard reference • The caGRID Cell then provides a grid data service which automatically provides the necessary EAV to object relational transform in order for i2b2 based data to be interoperable over the caGRID (created based on the Introduce tool) • Data can then be queried via standard caGRID tools or via custom caGRID query environments if required (permissions are handled via Grid Grouper) • Queries can be both intra and inter institutional

  21. Combining i2b2 and caGRID • By combining these techniques we can achieve the goal of a national scale semantically interoperable data sharing network within the CTSA • This is a national collaborative effort involving many CTSA and caBIG based sites around the country • By all working together as a team we are better equipped to achieve our goals of launching IDR’s and sharing research information.

  22. Thank you • Questions please • A collaborative effort of UCSF, OSU, UCD, Rochester U, UPenn, U Washington, Wash U, Partner’s Health and many others. If you are interested in becoming a contributing member to this effort please contact rob.wynden@ucsf.edu

More Related