1 / 48

i2b2 Clinical Research Chart and Hive Architecture

i2b2 Clinical Research Chart and Hive Architecture. Henry Chueh Shawn Murphy Isaac Kohane, PI. Summary. Background Intro to the Clinical Research Chart (CRC) Hive / Cell Software Architecture More details on establishing and using the CRC. Background. Clinical documentation is…clinical

betrys
Download Presentation

i2b2 Clinical Research Chart and Hive Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. i2b2Clinical Research Chartand Hive Architecture Henry Chueh Shawn Murphy Isaac Kohane, PI i2b2 National Center for Biomedical Computing

  2. Summary • Background • Intro to the Clinical Research Chart (CRC) • Hive / Cell Software Architecture • More details on establishing and using the CRC i2b2 National Center for Biomedical Computing

  3. Background • Clinical documentation is…clinical • Lack of systematic approach for organizing clinical data for research • Ownership issues are unique • Consent issues are a challenge i2b2 National Center for Biomedical Computing

  4. Driving Biological Projects • Asthma • Hypertension • Huntington’s Disease • Diabetes i2b2 National Center for Biomedical Computing

  5. Clinical Research Chart (CRC) • Organize and transform clinical data to maximize its utility for research • Develop an Application and Database framework to serve this goal • Establish an architecture that allows data from different studies done on this platform to be integrated i2b2 National Center for Biomedical Computing

  6. Services: Ontology Consent/Tracking Application Pool Management Soap/Http interfaces Data flowing A program Custom Interfaces clinical trials Design of Clinical Research Chart HL7 MSH|^/&|736401….. PID|102|3231285.…. CRC DB Text files XML <Patient1> <image>.…. database i2b2 National Center for Biomedical Computing

  7. Services: Ontology Consent/Tracking Application Pool Management Soap/Http interfaces Data flowing A program Custom Interfaces clinical trials Design of Clinical Research Chart Data pipeline/workflow application Pheno/Genotype Database HL7 MSH|^/&|736401….. PID|102|3231285.…. CRC DB Text files XML <Patient1> <image>.…. Visualization and Analysis of database contents database i2b2 National Center for Biomedical Computing

  8. i2b2 Skeletal Data Flow EDC Service EDC applications Shared data Enterprise data source (RPDR) Clinical Research Chart i2b2 ETL workflow Annotation Service Study specific data Annotation UI Analytic workflow Enterprise Systems Registration, ADT, Labs, Reports, Clinical Notes, etc Local Systems Systems not gathered into Enterprise data warehouses i2b2 National Center for Biomedical Computing

  9. Overall Themes • Framework to allow development of application services in a maximally decoupled fashion. • Linux and Windows OS support • Java and C++ programming languages • Use Cases for construction of CRC come from Driving Biology Projects and experience with clients of Partners Research Patient Data Registry i2b2 National Center for Biomedical Computing

  10. Focus on Workflow • Necessary for both pre-CRC and post-CRC processes • Needed for scientific flexibility • Implies a consistent environment for data pipelining and flow control i2b2 National Center for Biomedical Computing

  11. i2b2 Hive • Formed as a collection of interoperable Cells, or services • Loosely coupled • Makes no assumptions about proximity • Connected by Web services • Activated by a workflow engine that forms basis of choreography among Cells for complex interactions i2b2 National Center for Biomedical Computing

  12. Complex choreography i2b2 National Center for Biomedical Computing

  13. i2b2 Cell • Behaves as a functional service • Separates interactions conceptually into transactions and semantics • Focuses on facilitating transactions with simple semantics (e.g., datatype) • Leaves deep semantics to be defined by the services provided by a Cell • Does not restrict language implementation i2b2 National Center for Biomedical Computing

  14. Target layer for i2b2 Semantic Objects I2b2 platform Web Services TCP/IP i2b2 National Center for Biomedical Computing

  15. Cell examples • Concept extraction from clinical narratives • Simple transformations; e.g., basic text format conversion • Complex encoding; e.g., encoding MIAME in MAGE • Microarray data normalization • … i2b2 National Center for Biomedical Computing

  16. Exposing Cells • Protocols layered on top of SOAP • At the WSDL level for integrators; ie, bioinformaticians & software engineers • At a functional level for investigators • i2b2 toolkits to allow integrators to expose controlled functionality to investigators (Automator) i2b2 National Center for Biomedical Computing

  17. Automator Approach Extend Kepler workflow engine informaticians i2b2 Automator investigators i2b2 National Center for Biomedical Computing

  18. Bird’s eye view Investigator Portal Workflow engine CRC Repository i2b2 National Center for Biomedical Computing

  19. Current Implementation • Extending Kepler workflow engine for i2b2 • Data model for CRC repository • Defining protocols necessary for interaction (in addition to SOAP) • Created Cell for concept extraction from narratives • Early designs for Automator toolkit i2b2 National Center for Biomedical Computing

  20. i2b2 Architecture Key Points • Leverage existing workflow standards and software • Use Web services as basic form of interaction • Assume unlimited choreography, but… • Provide tools to distill complexity into basic automation for clinical investigators i2b2 National Center for Biomedical Computing

  21. SW Licensing and Distribution • Commit to Open Source software • Use GNU Lesser General Public License • Establish local i2b2 repository exposed through i2b2 website • Contribute to a more global NCBC SourceForge style repository if it emerges ?NIH Forge • Keep i2b2 protocols fully open i2b2 National Center for Biomedical Computing

  22. Interoperability across NCBC • Strongly consider Web services as basic protocol for generic shared interactions • Consider sharing datasets • Promote diversity of approach and use of shared software (don’t impose uniformity) • Facilitate/promote NCBC Open Source project teams i2b2 National Center for Biomedical Computing

  23. Pre-CRC Data Pipeline/Workflow Populating the Clinical Research Chart (CRC) i2b2 National Center for Biomedical Computing

  24. Pre-CRC Data Pipeline/Workflow • Use workflow framework to choreograph applications services in specific sequences • Used to extract, transform, conform, and load data and metadata into the CRC i2b2 National Center for Biomedical Computing

  25. Pre-CRC Data Pipeline/Workflow Services: Ontology Consent/Tracking Application Pool Management Soap/Http interfaces Output Input Data flowing Local or through SOAP service Custom Interfaces A program increasingly useful i2b2 National Center for Biomedical Computing

  26. Ontology Service • Manages mappings of terms to common vocabularies • Provides lists of acceptable (enumerated) values for various attribute and value slots. • Allows for management of hierarchies, groupings, and relationships between terms Ontology Ontology Consent/Tracking Application Pool Management i2b2 National Center for Biomedical Computing

  27. Person Consent/Tracking Service • Provides mappings between patient/subject identifiers • Tracks patient/subject consent information • Allows identification of the patient/subject based upon fuzzy demographic matches Ontology Consent/Tracking Consent/Tracking Application Pool Management i2b2 National Center for Biomedical Computing

  28. Application Pool (CVS) Service • Stores programs/scripts used in pipeline • Provides applications to be downloaded when needed • Manages versioning of software • Provides documentation Ontology Consent/Tracking Application Pool Application Pool Management i2b2 National Center for Biomedical Computing

  29. Management Service • Stores workflow execution plan • Starts and controls workflow execution • Schedules workflow execution • Monitors workflow execution and data locations • Controls permissions associated with workflow execution Ontology Consent/Tracking Application Pool Management Management i2b2 National Center for Biomedical Computing

  30. Services: Ontology Consent/Tracking Application Pool Management Soap/Http interfaces Data flowing Input Output A program Custom Interfaces Data Pipeline/Workflow ApplicationUse Case for Asthma Data RPDR CRC DB AsthmaMart Data retrieval Language processing Load Data into Mart Data de-identification Vocabulary matching i2b2 National Center for Biomedical Computing

  31. Define standard XML representation for workflow - MoML Define standards for SOAP services and resource discovery Adopt and extend open source workflow package (Kepler) Prototypes by July timeframe BIRN -> NAMIC and LONI collaboration Data Pipeline/WorkflowImplementation • Can follow construction details at http://diagon/i2b2 i2b2 National Center for Biomedical Computing

  32. Phenotype/Genotype Database i2b2 National Center for Biomedical Computing

  33. Phenotype/Genotype DatabasePrinciples • Analytical database schema that does not need to change with new data types and concepts • Defined fundamental unit of data (atomic fact) = observation • Defined metadata strategy • Various levels of de-identification (reviewed and approved by IRB) i2b2 National Center for Biomedical Computing

  34. Phenotype/Genotype DatabaseArchitecture (see preprint) i2b2 National Center for Biomedical Computing

  35. Phenotype/Genotype DatabaseUse Case • Smoking observations represented in database i2b2 National Center for Biomedical Computing

  36. Phenotype/Genotype DatabaseImplementation • Asthma CRC DB “primed” with data from 90,000 patients from Research Patient Data Registry • Serves as fundamental data structure for i2b2 supported data Querying and Visualization Application Suite • CRC DB’s able to fuse seamlessly together • Various levels of de-identification to be supported for data sharing and publication i2b2 National Center for Biomedical Computing

  37. Visualization and Analysis of CRC database Post-CRC workflow i2b2 National Center for Biomedical Computing

  38. Visualization and AnalysisPrinciples • Supported application suite to query and view CRC database contents • Outside applications for analysis and viewing able to plug in to application suite • Pipeline/Workflow framework may be used for analysis and re-entry of derived data into CRC database i2b2 National Center for Biomedical Computing

  39. Visualization and AnalysisArchitecture • Supported Applications, Querying and Visualization • Standard querying • Data exploration i2b2 National Center for Biomedical Computing

  40. Visualization and AnalysisArchitecture • Supported Applications, ontology management • Ontology Management • Integrate (outside?) population analysis applications i2b2 National Center for Biomedical Computing

  41. Visualization and AnalysisArchitecture • Supported applications have plug-in architecture for outside analytic tools: • Standard web-link support with GET and POST oriented data transfer • Support transfer of specifically transformed data to outside applications • Complex analysis supported with workflow application i2b2 National Center for Biomedical Computing

  42. Visualization and AnalysisArchitecture - Query Launch i2b2 National Center for Biomedical Computing

  43. Visualization and AnalysisArchitecture - Exploration Launch i2b2 National Center for Biomedical Computing

  44. Visualization and AnalysisArchitecture – Ontology mgmt i2b2 National Center for Biomedical Computing

  45. Visualization and AnalysisUse Case i2b2 National Center for Biomedical Computing

  46. SNOMED CODE SN8745 PA5683 SN8745 SN8745 Visualization and AnalysisImplementation of analysis tools • Workflow framework to accommodate external analytic applications patient id 0000004 ProgID AA3.3 CRC DB subject id 4 ProgID CA2.3 ProgID CN2.3 ProgID XN0.9 subject id 4 ProgID SN5.4 ProgID CX2.3 account # 347 ProgID PN5.1 ProgID TH3.0 i2b2 National Center for Biomedical Computing

  47. statistics statistics application application server server population ownership registry manager database database encryption Final Assembly Gene expression in APOE e4 Allele person raw value concept date Z5937X 3/4 Surgery Outcomes calculated every week microarray (encrypted) Alzheimer's ER visit Z5937X 3/4 Seizures Trauma Z5937X 3/4 ER visits Gene-Chips Z5937X 3/4 Clinic visits Trauma Seizure Z5937X 4/6 Surgery Gene-Chips Z5956X 5/2 Multiple sclerosis microarray (encrypted) Seizure Z5956X 5/2 Alzheimer’s Z5956X 5/2 Diabetes Z5956X 5/2 CT Scan Z5956X 3/9 Hemorrhage Z5956X 3/9 Trauma Z5956X 3/9 Thalamus Z5956X 3/9 i2b2 National Center for Biomedical Computing

More Related