480 likes | 628 Views
i2b2 Clinical Research Chart and Hive Architecture. Henry Chueh Shawn Murphy Isaac Kohane, PI. Summary. Background Intro to the Clinical Research Chart (CRC) Hive / Cell Software Architecture More details on establishing and using the CRC. Background. Clinical documentation is…clinical
E N D
i2b2Clinical Research Chartand Hive Architecture Henry Chueh Shawn Murphy Isaac Kohane, PI i2b2 National Center for Biomedical Computing
Summary • Background • Intro to the Clinical Research Chart (CRC) • Hive / Cell Software Architecture • More details on establishing and using the CRC i2b2 National Center for Biomedical Computing
Background • Clinical documentation is…clinical • Lack of systematic approach for organizing clinical data for research • Ownership issues are unique • Consent issues are a challenge i2b2 National Center for Biomedical Computing
Driving Biological Projects • Asthma • Hypertension • Huntington’s Disease • Diabetes i2b2 National Center for Biomedical Computing
Clinical Research Chart (CRC) • Organize and transform clinical data to maximize its utility for research • Develop an Application and Database framework to serve this goal • Establish an architecture that allows data from different studies done on this platform to be integrated i2b2 National Center for Biomedical Computing
Services: Ontology Consent/Tracking Application Pool Management Soap/Http interfaces Data flowing A program Custom Interfaces clinical trials Design of Clinical Research Chart HL7 MSH|^/&|736401….. PID|102|3231285.…. CRC DB Text files XML <Patient1> <image>.…. database i2b2 National Center for Biomedical Computing
Services: Ontology Consent/Tracking Application Pool Management Soap/Http interfaces Data flowing A program Custom Interfaces clinical trials Design of Clinical Research Chart Data pipeline/workflow application Pheno/Genotype Database HL7 MSH|^/&|736401….. PID|102|3231285.…. CRC DB Text files XML <Patient1> <image>.…. Visualization and Analysis of database contents database i2b2 National Center for Biomedical Computing
i2b2 Skeletal Data Flow EDC Service EDC applications Shared data Enterprise data source (RPDR) Clinical Research Chart i2b2 ETL workflow Annotation Service Study specific data Annotation UI Analytic workflow Enterprise Systems Registration, ADT, Labs, Reports, Clinical Notes, etc Local Systems Systems not gathered into Enterprise data warehouses i2b2 National Center for Biomedical Computing
Overall Themes • Framework to allow development of application services in a maximally decoupled fashion. • Linux and Windows OS support • Java and C++ programming languages • Use Cases for construction of CRC come from Driving Biology Projects and experience with clients of Partners Research Patient Data Registry i2b2 National Center for Biomedical Computing
Focus on Workflow • Necessary for both pre-CRC and post-CRC processes • Needed for scientific flexibility • Implies a consistent environment for data pipelining and flow control i2b2 National Center for Biomedical Computing
i2b2 Hive • Formed as a collection of interoperable Cells, or services • Loosely coupled • Makes no assumptions about proximity • Connected by Web services • Activated by a workflow engine that forms basis of choreography among Cells for complex interactions i2b2 National Center for Biomedical Computing
Complex choreography i2b2 National Center for Biomedical Computing
i2b2 Cell • Behaves as a functional service • Separates interactions conceptually into transactions and semantics • Focuses on facilitating transactions with simple semantics (e.g., datatype) • Leaves deep semantics to be defined by the services provided by a Cell • Does not restrict language implementation i2b2 National Center for Biomedical Computing
Target layer for i2b2 Semantic Objects I2b2 platform Web Services TCP/IP i2b2 National Center for Biomedical Computing
Cell examples • Concept extraction from clinical narratives • Simple transformations; e.g., basic text format conversion • Complex encoding; e.g., encoding MIAME in MAGE • Microarray data normalization • … i2b2 National Center for Biomedical Computing
Exposing Cells • Protocols layered on top of SOAP • At the WSDL level for integrators; ie, bioinformaticians & software engineers • At a functional level for investigators • i2b2 toolkits to allow integrators to expose controlled functionality to investigators (Automator) i2b2 National Center for Biomedical Computing
Automator Approach Extend Kepler workflow engine informaticians i2b2 Automator investigators i2b2 National Center for Biomedical Computing
Bird’s eye view Investigator Portal Workflow engine CRC Repository i2b2 National Center for Biomedical Computing
Current Implementation • Extending Kepler workflow engine for i2b2 • Data model for CRC repository • Defining protocols necessary for interaction (in addition to SOAP) • Created Cell for concept extraction from narratives • Early designs for Automator toolkit i2b2 National Center for Biomedical Computing
i2b2 Architecture Key Points • Leverage existing workflow standards and software • Use Web services as basic form of interaction • Assume unlimited choreography, but… • Provide tools to distill complexity into basic automation for clinical investigators i2b2 National Center for Biomedical Computing
SW Licensing and Distribution • Commit to Open Source software • Use GNU Lesser General Public License • Establish local i2b2 repository exposed through i2b2 website • Contribute to a more global NCBC SourceForge style repository if it emerges ?NIH Forge • Keep i2b2 protocols fully open i2b2 National Center for Biomedical Computing
Interoperability across NCBC • Strongly consider Web services as basic protocol for generic shared interactions • Consider sharing datasets • Promote diversity of approach and use of shared software (don’t impose uniformity) • Facilitate/promote NCBC Open Source project teams i2b2 National Center for Biomedical Computing
Pre-CRC Data Pipeline/Workflow Populating the Clinical Research Chart (CRC) i2b2 National Center for Biomedical Computing
Pre-CRC Data Pipeline/Workflow • Use workflow framework to choreograph applications services in specific sequences • Used to extract, transform, conform, and load data and metadata into the CRC i2b2 National Center for Biomedical Computing
Pre-CRC Data Pipeline/Workflow Services: Ontology Consent/Tracking Application Pool Management Soap/Http interfaces Output Input Data flowing Local or through SOAP service Custom Interfaces A program increasingly useful i2b2 National Center for Biomedical Computing
Ontology Service • Manages mappings of terms to common vocabularies • Provides lists of acceptable (enumerated) values for various attribute and value slots. • Allows for management of hierarchies, groupings, and relationships between terms Ontology Ontology Consent/Tracking Application Pool Management i2b2 National Center for Biomedical Computing
Person Consent/Tracking Service • Provides mappings between patient/subject identifiers • Tracks patient/subject consent information • Allows identification of the patient/subject based upon fuzzy demographic matches Ontology Consent/Tracking Consent/Tracking Application Pool Management i2b2 National Center for Biomedical Computing
Application Pool (CVS) Service • Stores programs/scripts used in pipeline • Provides applications to be downloaded when needed • Manages versioning of software • Provides documentation Ontology Consent/Tracking Application Pool Application Pool Management i2b2 National Center for Biomedical Computing
Management Service • Stores workflow execution plan • Starts and controls workflow execution • Schedules workflow execution • Monitors workflow execution and data locations • Controls permissions associated with workflow execution Ontology Consent/Tracking Application Pool Management Management i2b2 National Center for Biomedical Computing
Services: Ontology Consent/Tracking Application Pool Management Soap/Http interfaces Data flowing Input Output A program Custom Interfaces Data Pipeline/Workflow ApplicationUse Case for Asthma Data RPDR CRC DB AsthmaMart Data retrieval Language processing Load Data into Mart Data de-identification Vocabulary matching i2b2 National Center for Biomedical Computing
Define standard XML representation for workflow - MoML Define standards for SOAP services and resource discovery Adopt and extend open source workflow package (Kepler) Prototypes by July timeframe BIRN -> NAMIC and LONI collaboration Data Pipeline/WorkflowImplementation • Can follow construction details at http://diagon/i2b2 i2b2 National Center for Biomedical Computing
Phenotype/Genotype Database i2b2 National Center for Biomedical Computing
Phenotype/Genotype DatabasePrinciples • Analytical database schema that does not need to change with new data types and concepts • Defined fundamental unit of data (atomic fact) = observation • Defined metadata strategy • Various levels of de-identification (reviewed and approved by IRB) i2b2 National Center for Biomedical Computing
Phenotype/Genotype DatabaseArchitecture (see preprint) i2b2 National Center for Biomedical Computing
Phenotype/Genotype DatabaseUse Case • Smoking observations represented in database i2b2 National Center for Biomedical Computing
Phenotype/Genotype DatabaseImplementation • Asthma CRC DB “primed” with data from 90,000 patients from Research Patient Data Registry • Serves as fundamental data structure for i2b2 supported data Querying and Visualization Application Suite • CRC DB’s able to fuse seamlessly together • Various levels of de-identification to be supported for data sharing and publication i2b2 National Center for Biomedical Computing
Visualization and Analysis of CRC database Post-CRC workflow i2b2 National Center for Biomedical Computing
Visualization and AnalysisPrinciples • Supported application suite to query and view CRC database contents • Outside applications for analysis and viewing able to plug in to application suite • Pipeline/Workflow framework may be used for analysis and re-entry of derived data into CRC database i2b2 National Center for Biomedical Computing
Visualization and AnalysisArchitecture • Supported Applications, Querying and Visualization • Standard querying • Data exploration i2b2 National Center for Biomedical Computing
Visualization and AnalysisArchitecture • Supported Applications, ontology management • Ontology Management • Integrate (outside?) population analysis applications i2b2 National Center for Biomedical Computing
Visualization and AnalysisArchitecture • Supported applications have plug-in architecture for outside analytic tools: • Standard web-link support with GET and POST oriented data transfer • Support transfer of specifically transformed data to outside applications • Complex analysis supported with workflow application i2b2 National Center for Biomedical Computing
Visualization and AnalysisArchitecture - Query Launch i2b2 National Center for Biomedical Computing
Visualization and AnalysisArchitecture - Exploration Launch i2b2 National Center for Biomedical Computing
Visualization and AnalysisArchitecture – Ontology mgmt i2b2 National Center for Biomedical Computing
Visualization and AnalysisUse Case i2b2 National Center for Biomedical Computing
SNOMED CODE SN8745 PA5683 SN8745 SN8745 Visualization and AnalysisImplementation of analysis tools • Workflow framework to accommodate external analytic applications patient id 0000004 ProgID AA3.3 CRC DB subject id 4 ProgID CA2.3 ProgID CN2.3 ProgID XN0.9 subject id 4 ProgID SN5.4 ProgID CX2.3 account # 347 ProgID PN5.1 ProgID TH3.0 i2b2 National Center for Biomedical Computing
statistics statistics application application server server population ownership registry manager database database encryption Final Assembly Gene expression in APOE e4 Allele person raw value concept date Z5937X 3/4 Surgery Outcomes calculated every week microarray (encrypted) Alzheimer's ER visit Z5937X 3/4 Seizures Trauma Z5937X 3/4 ER visits Gene-Chips Z5937X 3/4 Clinic visits Trauma Seizure Z5937X 4/6 Surgery Gene-Chips Z5956X 5/2 Multiple sclerosis microarray (encrypted) Seizure Z5956X 5/2 Alzheimer’s Z5956X 5/2 Diabetes Z5956X 5/2 CT Scan Z5956X 3/9 Hemorrhage Z5956X 3/9 Trauma Z5956X 3/9 Thalamus Z5956X 3/9 i2b2 National Center for Biomedical Computing