420 likes | 532 Views
Harmonizing Terminology Steven Hirschfeld, MD PhD Captain, U.S. Public Health Service Associate Director for Clinical Research Acting Director, National Children’s Study Eunice Kennedy Shriver National Institute of Child Health and Human Development. COMET International Meeting
E N D
Harmonizing Terminology Steven Hirschfeld, MD PhD Captain, U.S. Public Health Service Associate Director for Clinical Research Acting Director, National Children’s Study Eunice Kennedy Shriver National Institute of Child Health and Human Development COMET International Meeting Bristol UK July 11, 2011 Client Logo
Disclosures • No financial or governance interests to disclose • All opinions expressed are those of the author and may not represent the views and policies of the U.S. Federal government or any of its agencies
Why terminology? • Outcomes in research are based on concepts • For example, man with pneumonia • Concepts require terms that are specific to describe them and differentiate them from other concepts • For example, man with respiratory inflammation, influenza, tuberculosis or silicosis • Terminology is the tool for precision to allow consistency and multiple analyses
What are current options? • Systematized Nomenclature of Medicine (SNOMED)- in use for medical records and research • International Classification of Disease (ICD)- in use for epidemiology and reimbursement. Several versions in use concurrently • Medical Dictionary for Regulatory Activities (MedDRA)- in use for therapeutic and diagnostic product development and registration • Multiple subspeciality and niche terminologies
What is the dilemma? • The major terminologies do not readily map to one another • None are robust for child health and development, particularly at the youngest ages • All are episodic in that they describe a single circumstance and do not relate concepts across a developmental time line
AAP: American Academy of Pediatrics CDC: Centers for Disease Control and Prevention CDISC: Clinical Data Interchange Standards Consortium EPA: Environmental Protection Agency ICH-E11: International Conference on Harmonisation SNOMED: Systematized Nomenclature of Medicine
What is new and different? • The NICHD terminology system differs from other terminology systems by incorporating into all concepts a dimension of time and position along a developmental scale to relate concepts to one another
Rationale for terminology initiative • The NICHD has an ongoing effort to establish, through stakeholder consensus, a core library of consistent and harmonized pediatric terms. Reaching stakeholder consensus on terminology will benefit pediatric clinical researchers in the following ways: • Provide the infrastructure necessary to compare and aggregate data and information. • Prevent misinterpretation. • Improve precision of data sharing. • Permit more robust meta analysis. • Establish consistency with the health care delivery system across the NICHD’s clinical research portfolio, across the portfolios of other NIH Institutes/Centers, as well as with the broader research community.
Harmonization Process • The terminology harmonization process involves identifying relevant concepts, identifying terms and definitions to describe the concepts, and graphically depicting the structure of and relationships between the concepts.
What is the framework? • A model developed in the Unified Modeling Language (UML) can be used to map the concepts of interest and can be leveraged by modeling tools to efficiently and extensibly harmonize terminology
Examination Tool Parts: Identification & Demographics Physical Examination Behavioral & Neurological Examination Biochemical/Physiologic/Genetic Examination Imaging & Other Findings Example of UML Framework http://nichd.nih.gov/clinres/terminology
Terminology Development Process 1. Identify concepts and reference terminology: Determine the terms that will require harmonization. Identify which of the terms are unique concepts. Reference terminology resources to find matching concepts. 2. Develop model: Develop a model for concepts as a terminology that depicts concepts and their attributes and the relationships between concepts. Prepare terminology to be incorporated into a reference terminology, such as the NCI Thesaurus. 3. Annotate model: Use terminology curation tools, such as the NCI’s Semantic Integration Workbench (SIW) to annotate the model with the reference terminology. 4. Review concepts with the community: Inform experts in pediatric community of terminology development effort and facilitate collaboration. Solicit input and feedback on proposed concepts from the pediatric community and harmonize with model. 5. Load metadata and generate tools: Load metadata from the annotated model to a metadata repository, such as the NCI’s cancer Data Standards Repository (caDSR). Leverage open source clinical research tools to extract metadata to generate content specific clinical research tools.
1. Trace list of sources 6. Generate Research Tool 2. Draft tool 5. Curate Common Data Elements 3. Structure concepts 4. Develop model Visual Depiction of Terminology Development Process American Health Information Community Sources • Final Newborn Examination Tool • Draft Tool • (1) Demographics; • --------------------------------------------------------------------------------------------------------------- • (2) Physical Examination; • ---------------------------------------------------------------------------------------------------------------------------------------------------- • (1) Demographics; • ------------------------------------------------------------------------------------------------------------------------ • (2) Physical Examination; • ---------------------------------------------------------------------------------------------------------------------------------------------------------------- Final Examination Tool Draft Examination Tool NCI Thesaurus Common Data Elements browser UML Model
Core Terminology Library • NICHD terminology files are available for download from an NCI EVS ftp site (http://evs.nci.nih.gov/ftp1/NICHD/) in three formats. A textual representation of the hierarchy of NICHD terms is provided as well; the terms in this hierarchy, which are restricted to NICHD terms only, do not necessarily have a direct parent-child relationship within the NCI Thesaurus (NCIt). The Changes file is published monthly and contains all changes that have been made to NICHD content in the current production version of the NCIt when compared to the most recently posted previous file. • Instructions at: http://evs.nci.nih.gov/ftp1/NICHD/About.html
The NICHD Pediatric Terminology Metastructure provides a common information model associated with various child life stages associatedWith hasLifeStage resultsIn occursIn associatedWith evidenceOf affects
Current Activities • Focus on neonatal terminology because: • Largest gaps in major terminology schema • Existence of robust research networks • Multi Step Process • Identify general domains • Align concepts • Map concepts to a common resource • Agree on mapping • Publish map
Advantages to Current Process • Retention of legacy tools • Ability to pool data and perform meta- analyses • Systematic identification of knowledge gaps and opportunities • Path forward for further harmonization and consensus terminology incorporating model and framework
The National Children’s Study as a case study
Overview of the National Children’s Study (NCS) • The NCS is mandated by the U.S. Congress and implemented by the Eunice Kennedy Shriver National Institute of Child Health and Human Development of the National Institutes of Health with advice and input from other NIH components, the Centers for Disease Control and Prevention and the Environmental Protection Agency • It is a multi-year research study that will examine the effects of environmental influences on the health and development of more than 100,000 children across the United States, following them from before birth until age 21 years • The goal of the Study is to improve the health and well-being of children and contribute to understanding the influence of various factors on health and disease
The NCS is an integrated system of activities • The Study is an integrated system of activities that include • a pilot Study which began in January 2009 with the goal of determining the feasibility, acceptability and cost of Study activities, • a Main Study scheduled to begin in calendar year 2012 to determine exposure-response relationships and • various substudies and formative research projects to examine specific methodological questions • The pilot Study, also known as the Vanguard Study, will run for 21 years, enroll about 4000 families, and precede Main Study activities by about 3 years so that every aspect of the Main Study is field tested prior to scale up and implementation
NCS Vanguard Study Goals • Vanguard Study designed to evaluate: • Feasibility (technical performance) • Acceptability (impact on participants, study personnel, and infrastructure) • Cost (personnel, time, effort, money) • of • Study recruitment • Logistics and operations • Study visits and study visit assessments
The National Children’s Study takes an informatics approach that is flexible to support innovation and accommodate evolving technology • The approach to informatics for the National Children’s Study is informed by several trends in informatics, including: • modular architecture • use of standardized terminology with curation • semantic awareness • scalability • defined transmission standards • open architecture and open source platforms with development communities • vertical and horizontal integration of process • interoperability
The National Children’s Study informatics approach is standards-based • During the Vanguard phase of the NCS, multiple informatics platforms and tools are in the field to determine the performance characteristics of each. • This approach entails the use of NCS specifications to which each potential informatics solution must comply plus a systematic evaluation scheme to compare performance • Use of such standards complements an interoperable approach that allows support for common interfaces and data exchange specifications • Such standards include: • Data Documentation Initiative (DDI) • Clinical Data Acquisition Standards Harmonization (CDASH) • CDISC Operational Data Model (ODM) • ISO 11179 / 21090 • CRoss-Industry Standard Process for Data Mining (CRISP-DM)
Standards + Modular = Flexibility • The NCS emphasis on interoperable modular architecture means that any component of a data system can accurately and efficiently communicate with other data systems, while adhering to international data standards such as ones developed by the Clinical Data Interchange Standards Consortium (www.cdisc.org), such that its components can be reused or adapted for other studies
NCS Data Life Cycle • From concept to archive, the NCS has a consistent approach to the data life cycle • Description can be found in the NCS Data Life Cycle Concepts of Operation http://www.nationalchildrensstudy.gov/about/overview/Pages/NCS_concept_of_operations_04_28_11.pdf
The NCS incorporates Operational Data Elements • Operational Data Elements are defined as data elements that capture the research process. In some contexts the term paradata is used. • The Operational Data Elements will allow systematic and objective evaluation of how the study is conducted and provide a basis for continuous improvement of efficiency • The NCS developed a catalog or code list of about 500 Operational Data Elements for various study operations • The NCS would like to contribute to the establishment of standards for Operational Data Elements
Metadata derived from harmonized terminology for the Study provides a layer of semantic interoperability across the data life cycle • The NCS data life cycle follows data approach through data acquisition to data analysis, maximizing transparency and the understanding of NCS data • Study data elements are guided by the NICHD Pediatric Terminology framework developed across many sources in the research, healthcare delivery, and standards development spectrum • Consistent metadata will assure: • Semantic interoperability and compliance with international data standards • Syntactic interoperability between NCS information management systems as they exchange data in line with the data plan
Various semantic schemas are harmonized so that data may be accurately exchanged and analyzed among pre-existing systems • A bridging schema, or metastructure, provides a mapping among concepts and codes from individual terminology schema used by networks or research endeavors • The metastructure is publicly available, and the source terminology schema is the property of and maintained by the original owners • As the National Children's Study proceeds, all developmental stages through age 21 years will be covered in a Pediatric Terminology Metastructure and many fields of research will be included
A metadata model has emerged that meets the semantic and syntactic requirements of the Study • The Data Documentation Initiative (DDI) is a metadata specification and international standard for describing data from the social, behavioral and economic sciences • The DDI model is aligned for CDISC SDTM vocabularies and CDISC BRIDG protocol definition • The CDISC family of standards (including BRIDG, SDTM and CDASH) include objects useful in describing health research not found in DDI DDI Combined Life Cycle Model
The DDI-based metadata repository supports cyclical processes for both pre-analytical datasets and analytical datasets • Pre-analytical data datasets can be produced and repurposed for new uses such as support for additional performance metrics or linkage with extant datasets • Data analysis may uncover recruitment, retention and/or compliance problems which lead to protocol change
The DDI-based metadata repository for the NCS is an end-to-end solution that allows scoping and incremental development • The CRoss-Industry Standard Process for Data Mining (CRISP-DM) can be used to standardize project management and to maximize data transparency and eventual analysis of NCS data Business Understanding Data Understanding DataPreparation Analysis Preparation Analysis Execution Results Evaluation • Identify key business objectives • Identify key constraints & assumptions • Translate business objectives into metrics or questions • Identify potential data sources • Assess suitability of each data source for analysis • Extract data • Describe data • Explore data • Assess data quality • Match and merge data • Clean data • Reformat data for analysis • Translate results into business metrics or answers to questions • Present results including detailed documentation of entire process • Select analytic algorithms • Code algorithms • Validate algorithms • Validate assumptions • Execute algorithms • Capture and interpret results • Iteratively improve any discrepancies / shortcomings
DDIcovers the entire NCS data lifecycle from protocol definition and sampling strategy through data collection, analysis, and distribution
In the NCS data lifecycle, forms and questionnaires are first specified • Domain groups define measures to capture study operations and the child development life cycle • Forms and questionnaires are specified around the measures • These specification occur early in the NCS data life cycle and are captured by the NCS end-to-end metadata repository An NCS Incident Report is captured in the DDI model
Data elements corresponding to the questions are typed • Form and questionnaire code lists are typically composed without regard to common data elements and standard code lists • Instead they are responsive to context and the exigencies of form and questionnaire design • This has led us to the approach that corresponds to DDI classifications in which there are categories and codes or, in other words, master code lists and specific code lists
Master and specific code lists are maintained for internal (Study-specific) and external harmonization • Form and questionnaire specific code lists almost always include missing values • In NCS the Incident Report is not restricted to adverse events, but encompasses other classifications • All of these classifications are captured in the Incident Type code list which goes with the question “What category best describes the incident (mark one)?”
Specific code lists are the product of master code lists • A code list for a question is compositional • The composition of question specific code lists typically includes a subset from a category/master collection of missing values • Mixing and matching questions across many categories leads to better comparison of answers across forms and across time in a longitudinal study
Metadata tagging provides a path for mapping to external references • Data elements are associated with concepts, with external references through unique identifiers • External references can be made to an ISO 21090 Concept Descriptor and/or an OpenEHR archetype • In these external references each value of a code list might be linked to a concept • This is our path to ISO 11179 compliance and, in the case of incident type, an NCS code list that combines code lists from many vocabularies
The NCS data life cycle reaches the production of analysis datasets that conform to the CDISC ODM interchange standard • Variables are packaged into logical records • Physical dataset definitions are constructed that document the various datasets researchers will request and receive
Ongoing and Future Collaboration • As the National Children’s Study evolves over time, it will continue to seek continuous input and partnership from willing collaborators and adhere to and inform international data standards to the highest extent possible • The National Children’s Study aims to connect people, data and diverse systems to exchange and use information and to work together as a platform for innovative research and analysis, to ultimately improve the health and well-being of children
Summary and Plans • The NCS utilizes standards from multiple sources to ensure an open source sustainable and interoperable informatics environment • The NCS is field testing in the Vanguard or Pilot phase several tools and platforms concurrently in a systematic fashion to determine performance characteristics • The integration of multiple standards and models allows exploration of meta data analyses, operational data elements and project management across all study activities • The NCS will publicly disseminate findings as rapidly as possible and actively seeks collaborators
For more information on the National Children’s Study • Please visit the main website: http://www.nationalchildrensstudy.gov/ • Organizations, groups or individuals that are interested in contributing to the effort or learning more are encouraged to contact: Steven Hirschfeld, MD PhD hirschfs@mail.nih.gov