130 likes | 213 Views
OIA Key Attributes + DRAFT. May 18, 2011 Comment: focus on medical molecular imaging; pathology in a later effort. Contributors. Michael Ackerman, NLM Rick Avila, Kitware Andy Buckler, Buckler Biomedical Terry Yoo, NLM David Clunie, Core Lab Partners James Luo, NIBIB
E N D
OIA Key Attributes +DRAFT May 18, 2011 Comment: focus on medical molecular imaging; pathology in a later effort
Contributors • Michael Ackerman, NLM • Rick Avila, Kitware • Andy Buckler, Buckler Biomedical • Terry Yoo, NLM • David Clunie, Core Lab Partners • James Luo, NIBIB • Tony Reeves, Cornell University • Daniel Rubin, Stanford University • Brandon Whitcher, Mango Solutions • Alden Dima
Key Attributes • Contribution Support • Quality of the data curation process • Speed to post datasets • Support for imaging data types and metadata • User Support • Robust querying and ease of performing a download • Advanced computing services • General • Long-term integrity and support
Contribution Support:Quality of the data curation process • de-identification support • Validate/verify that de-identification was successful • Example: BIRN DUP application that de-identifies • Support for de-identification standards DICOM suppl. 142 • Metadata preparation tools – clear definition needed if used in final document; split (experiment description; clinical data non-image data) • Tools for efficient capture and organization of metadata • Utilization of common nomenclature • Example: OSA ISP metadata tool, Ontologies: BRIDG, imaging biomarker ontology, UMLS? … • NLM: Numerous ontologies being developed – this must be considered carefully. • For recommendation document: Appropriate domain specific (clin) metadata; those would come from relevant sources specific to the domain; always provide a specific example illustrating the recommendation • Distinguish between bulk data and individual data upload (e.g. eCRF) • Revision control (depending on use case e.g. data sharing) • Apply revision control concepts to data elements • Examples: Commercial institutions do this routinely. EHRs. NBIA may have this capability (Eliot Siegel). • Capturing provenance • Capturing important information on the acquisition process is needed • Examples: Perhaps “data papers” will help • There are also goal specific requirements
Contribution Support:Speed to post datasets • Avoid Limits on data upload size and speed • Protocols to load the data • FTP, DICOM, SOAP based interfaces, webDAV • Re-de-identify using automated methods (needs example and explanation); usage based de-identification • Retain certain fields for potential future purposes • Automated methods to check that the data complies with expectations • Example: PET SUV, need to know patient weight and height • Goal is to try to obtain high quality data, but we would not throw away data if not conforming • Some expectations we may know in advance, others not • Organization • David: Try to be agnostic on data organization (? Rick); data model flexibility and context (?); HealthVault example
Data Upload Attributes Continued • DICOM conformance checks • Automated methods are preferred (UID replacement, integrity check etc.); limit manual interaction steps (NBIA) • ADNI is performing automated qa (specific use cases) • Metadata Expectations • Utilize a standard information model • Example: Use AVT to • Store results of computation/analysis • Manual annotations • Computational algorithms • Summary statistics • Utilize emerging data models (e.g. AIM) • Definition : Ontologies vs information models • Ontology – standard terminology (e.g. RADLex, SNOMed, …) • Information Model – the syntax for making statements (DICOM structured reporting, NBIA has proprietary XML format)
User Support • Query Capabilities • More generic than web page queries • More sophisticated query methods will drive database design • Outside applications can access and perform queries and get a response using a service model • Flexibility to support a range of use cases • Support both plain text search and a structured query • One day support content based retrieval • If we were to support data papers, there would be additional content and terms we could use • Revision control and review on datasets • Query on computation/analysis results • Have multiple indices for the same dataset
User Support Continued • Download Support • Shopping carts are good for certain situations • Use of standard protocols (e.g. rsync, FTP) • Web services • Support for portable hard drive • Annotated manifest (images, text descriptions summarizing the metadata) • Computational Support (nice to have)
User Support • Contributor Agreement • User License • Least restrictive • Standard license
General • Long-term integrity • Backup /mirror solutions • Handle.net type solution • Local copies of format specification for non-standard data • Not just images, but meta-data as well • Crypto-graphic hashes as a unique identifier, check sums • Hash is helpful for data analysis and retrieval • Support • Check de-identification (service?) • Need to encourage and we prefer automated systems
Encouragement & Credit • Making data available as a requirement of: • Funding • Publication • Data papers • Very early yet • Infrastructure should support this if adopted by the community
Recognition • Major Problem: medical imaging field started off not sharing • Open Data Papers • http://eu-demo-dih.slidepath.com/dih/webViewer.php?snapshotId=1294305660 • Open Data Awards • Requirement for Tenure? • Allow/encourage users/readers to contribute to a body of knowledge, similar to papers • Other Ideas?
Next Step for OIA • Present recommendations to the RSNA radiology informatics committee • Prepare manuscript on OIA Committee recommendations • NIH Movement to make funded data available (e.g. datamedcentral) • Publishers • OSA • BiomedCentral • Elsevier: Media