300 likes | 478 Views
caBIG IVIWS SIG: Imaging Vocabularies & Common Data Elements Breakout Overview . Curtis P. Langlotz, MD, PhD University of Pennsylvania Daniel L. Rubin, MD, MS Stanford University Mike Keller, PhD Booz Allen Hamilton. Clinical Trial Building Blocks. Clinical Trials Tools/ Templates.
E N D
caBIG IVIWS SIG:Imaging Vocabularies & Common Data ElementsBreakout Overview Curtis P. Langlotz, MD, PhD University of Pennsylvania Daniel L. Rubin, MD, MS Stanford University Mike Keller, PhD Booz Allen Hamilton
Clinical Trial Building Blocks Clinical Trials Tools/ Templates National Information Infrastructure Sharable Information Repositories Common Data Elements NCI Informatics Long Range Planning, circa 1999 “The CII”
Importance of Common Data Collection Methods, circa 1999 • Serve as building blocks for the CII • Allow pooling of data and comparison of results among clinical trials • Facilitate enrollment of patients in clinical trials • Avoid redundant data collection (enter-once, use-many principle) • Automate and expedite administration of clinical trials
Medical Vocabularies: Completeness for Radiology Langlotz & Caldwell, J Digit Imaging 15(1S):201, 2002
What is RadLex? • 26 participating organizations • 9 committees • 92 radiologist participants • 5,308 anatomic concepts 10-30 percent of these concepts are not found in SNOMED-CT
Automatic indexing and retrieval of teaching files Point and click “structured” reporting systems Comparison or unification of disparate research databases Reference datasets for cancer imaging research Standardized image mark-up and annotation tools Common vocabulary and data elements for cancer imaging RadLex Motivations caBIG IVI Motivations Motivations for Common Imaging Terminology
Fundamentals ofImaging Terminology & Ontology Daniel L. Rubin, MD, MS Stanford University
Terminologies • A constrained list of terms • Usually shown as a list or taxonomy • Usually few attributes (e.g., ID code, synonyms) • Usually 1 or no relationships • No relations list of terms • 1 relation taxonomy • Use • Coding • Indexing • Simple search ------Diseases----- 003. @ OTHER SALMONELLA INFECTIONS 003.0 SALMONELLA GASTROENTERITIS 003.2 @ LOCALIZED SALMONELLA INFECTIONS 003.20 LOCALIZED SALMONELLA INFECTION, UNSPECIFIED 003.21 SALMONELLA MENINGITIS 003.29 OTHER LOCALIZED SALMONELLA INFECTIONS ------Procedures----- 01. @ INCISION AND EXCISION OF SKULL, BRAIN,... 01.0 @ CRANIAL PUNCTURE 01.01 CISTERNAL PUNCTURE 01.09 OTHER CRANIAL PUNCTURE
What is an ontology? • Similar to terminologies, specifying concepts (entities) and attributes • Also specifies multiple relationships among concepts • Permits rich knowledge representation • Supports complex inference • Use • Coding, indexing, and retrieval (like terminologies) • Reasoning and intelligent applications • Information integration • Semantic Web
Anatomy ontology: explicit representation of knowledge in various relationships
Vocabulary/CDE Strategy Metadata storage formats Metadata for Images NLP Terminologies & CDEs Queries & Analysis Image Annotation Vocabularies & Metadata Formats & Tools Applications
Vocabulary/CDE Strategy • Metadata & Terminology • Define image metadata useful to collect for cancer researchDevelop an image mark-up standard and associated open source and free annotation creation and display tools • Determine vocabularies & ontologies to populate the metadata • Formats & Tools • Define formats for associating data and metadata with images • Identify/develop tools for annotating images • Develop/reuse NLP methods to extract metadata from text • Testbed/applications using Vocabulary/CDE (tools and methods to use metadata to support cancer research) • Retrieve cases based on terminology-based queries and image annotations (e.g., trends in tumor size, image features) • Use ontology annotations on images to combine image data with clinical and molecular data
Vocabularies & Common Data ElementsProposed Work Items#1 and #2 Curtis P. Langlotz, MD, PhD University of Pennsylvania
Proposed Work Items • Create caDSR compatible CDEs from standard imaging vocabulary terms • Cancer imaging research “playbook”: Devices, procedures, and protocols • Using terminology/ontology to markup or annotate images • Evaluate natural language processing (NLP) tools for prose image metadata (e.g. radiology reports)
ACRIN • American College of Radiology Imaging Network • NCI-funded imaging clinical trial cooperative group • Dozens of trials funded, including some very high profile trials (DMIST, NLST) • Tens of thousands of subjects • Case report forms containing hundreds of potential CDEs
Data Collection CDE Example • Please describe the margins of the mass: • Smooth • Lobulated • Irregular • Spiculated • Obscured
Vocabulary Concepts Data Collection CDE Example • Please describe the margins of the mass: • Smooth • Lobulated • Irregular • Spiculated • Obscured
The “Playbook” for Imaging in Cancer Research • Cancer Research Imaging Procedures and Protocols • An ontology of the imaging devices, procedures, and protocols that are used for experimental cancer imaging • (e.g., 7T 18-cm horizontal bore; 4.7T 33-cm bore magnet operating at 200 MHz for 1-H imaging experiments) • Common, vendor-independent language to describe experimental imaging instruments.
Proposed Work Items • Create caDSR compatible CDEs from standard imaging vocabulary terms • Cancer imaging research “playbook”: Devices, procedures, and protocols • Using terminology/ontology to markup or annotate images • Evaluate natural language processing (NLP) tools for prose image metadata (e.g. radiology reports)
Vocabularies & Common Data Elements Proposed Work Items#3 and #4 and Summary Daniel L. Rubin, MD, MS Stanford University
Formats & Tools • Metadata Storage Formats • Need to define a format to associate instantiations of metadata (annotations) with images • Image Annotation (“mark-up”) • Need tools to annotate images and that adopt metadata standards adopted by caBIG • NLP • Goal: access free text to allow correlative research with images • Medium: radiology/pathology reports; published literature • Uses: indexing/retrieval, information extraction
Metadata & Terminology • Metadata • Determine requirements for metadata • Interview cancer researchers (NCI-funded Cooperative Clinical Trial Therapy Groups, ACRIN, industry) re image access/analysis needs • Review prior image-based cancer trials • Inventory other image metadata & standards efforts • DICOM, HL7, Commercial systems • Consider analogy to “MIAMI” (microarray experiments)—the minimal information necessary to describe a medical image • Identify PHI data fields to help other applications to anonymize data
Image Annotation • Inventory existing tools for annotating images • Create custom tools for associating metadata with images • Image annotation tool • Structured data acquisition tool that is part of clinical trial data collection process, or integrates with existing clinical trial tools
Natural Language Processing • Determine requirements for NLP • E.g., extract entities and relations from radiology reports; map to ontologies, etc • Inventory existing NLP tools • caTIES, MEDLEE, Ricky Taira tools, Meta-Map and open source • Select or develop NLP tools to fulfill requirements
Overall Mission Motivating the Breakout Session Extract meaning from imaging data to improve outcomes for patients with cancer or pre-cancer Support correlative imaging science • Clinical trials are conducted by Cancer Centers, Consortia, and Cooperative Groups • Need to structure imaging content of such trials • Transmit the pertinent imaging data and metadata together with clinical trials data to an archive maintained by the NCI • Need query and data mining capability to determine trends and patterns in imaging data across clinical trials
Vocabulary/CDE Strategy Metadata storage formats Metadata for Images NLP Terminologies & CDEs Queries & Analysis Image Annotation Vocabularies & Metadata Formats & Tools Applications
Vocabulary/CDE Strategy • Metadata & Terminology • Define image metadata useful to collect for cancer researchDevelop an image mark-up standard and associated open source and free annotation creation and display tools • Determine vocabularies & ontologies to populate the metadata • Formats & Tools • Define formats for associating data and metadata with images • Identify/develop tools for annotating images • Develop/reuse NLP methods to extract metadata from text • Testbed/applications using Vocabulary/CDE (tools and methods to use metadata to support cancer research) • Retrieve cases based on terminology-based queries and image annotations (e.g., trends in tumor size, image features) • Use ontology annotations on images to combine image data with clinical and molecular data
Proposed Work Items • Create caDSR compatible CDEs from standard imaging vocabulary terms • Cancer imaging research “playbook”: Devices, procedures, and protocols • Using terminology/ontology to markup or annotate images • Evaluate natural language processing (NLP) tools for prose image metadata (e.g. radiology reports)