160 likes | 248 Views
The Tissue Microarray Data Exchange Specification Presented for: Cambridge Healthtech Institute Microarrays in Medicine Boston, MA April 26, 2004 Jules J. Berman, Ph.D., M.D. Program Director for Pathology Informatics Cancer Diagnosis Program National Cancer Institute
E N D
The Tissue Microarray Data Exchange Specification Presented for: Cambridge Healthtech Institute Microarrays in Medicine Boston, MA April 26, 2004 Jules J. Berman, Ph.D., M.D. Program Director for Pathology Informatics Cancer Diagnosis Program National Cancer Institute National Institutes of Health Rockville, MD This presentation is a U.S. government-sponsored work in the public domain
In brief: The TMA Specification is an open access document that can be used without any restriction. Its development was sponsored by the NCI and by the Association for Pathology Informatics All the documents and software that you might need to obtain, understand and implement the specification are available in two recently published open access manuscripts.
Basics of the specification: Jules J Berman, Mary Edgerton and Bruce Friedman.The tissue microarray data exchange specification: a community-based, open source tool for sharing tissue microarray data. BMC Med Inform Decis Mak. 2003 May 23;3:5 Real-world implementation example: Jules J Berman, Milton Datta, Andre Kajdacsy-Balla, Jonathan Melamed, Jan Orenstein, Kevin Dobbin, Ashok Patel, Rajiv Dhir, Michael J Becich. The tissue microarray data exchange specification: implementation by the Cooperative Prostate Cancer Tissue Resource. BMC Bioinformatics 2004 Feb 27, 5:19
Why is it important to have a data exchange specification for TMAs? The greatest value of TMAs is the ability to link TMA data with data from other TMAs and from other databases that inform on the data contained in the TMA database. That value is essentially untapped because there has been no way to publish, exchange, merge and link TMA datasets in a manner that everyone can use and understand. The data exchange specification provides a common intermediate structure for TMA data that can be used to exchange data between different TMA databases.
Analagous situation: Wordperfect (different versions) Word (different versions) Abiword Postscript Pdf One vendor’s software often cannot open files prepared in another vendor’s software. But any good word processor should be able to export a file as an RTF file (simple ascii with markup for formatting), and should be able to import the RTF file and convert it to their preferred proprietary format.
We wanted to make a flexible specification for TMAs that would permit researchers with proprietary systems to port their TMA data into a file that could be easily disassembled and re-assembled into other formats. • The basic properties of the file: • Self-describing • Made from commonly understood data structures • Extremely simple (most of our stakeholders are not sophisticated bioinformaticians, computer scientists, or metadata experts) • Infinitely scalable (can be endlessly combined with other data sources)
The first draft of the specification was developed through open workshops held at meetings sponsored by the Association for Pathology Informatics and the National Cancer Institute
May 30, 2001. Ann Arbor, Michigan. Chair of speaker session: Mark A Rubin. Speakers: David Rimm, Steve Bova, Matt Van de Rijn, Jules Berman Oct. 6, 2001. Pittsburgh, PA and co-sponsored by The National Cancer Institute. Chair, Mary Edgerton. Speakers: Olli Kallioniemi, Chris Chute, Richard Lieberman, Paul Spellman. Chair of Data Exchange Workshop: Mary Edgerton. May 22, 2002. Ann Arbor, Michigan and co-sponsored by the National Cancer Institute. Chair of Speaker session: Mark A. Rubin. Speakers: James Bacus, Angelo de Marzo, Peggy Porter, David Rimm and Guido Sauter. Chair of Data Exchange Workshop: Dr. Mary Edgerton. October 4, 2002. Held in conjunction with Advancing Pathology Informatics, Imaging and the Internet, Pittsburgh, PA. Chair of speaker session: Mary Edgerton. Speakers: Steve Hewitt, Ulysses Balis. Chair of Data Exchange Workshop: Mary Edgerton.
Specification is XML XML allows heterogeneous systems to communicate and exchange their data It achieves this through metadata (data about data). Can produce an ideal document that completely describes itself, including all data and all metadata.
Four required sections: 1) Header, containing the specification Dublin Core identifiers, 2) Block, describing the paraffin-embedded array of tissues, 3)Slide, describing the glass slides produced from the Block, and 4) Core, containing all data related to the individual tissue samples contained in the array.
Eighty Common Data Elements (CDEs), conforming to the ISO-11179 specification for data elements constitute XML tags used in the TMA data exchange specification. Only a hand-ful of these are required in TMA files. A set of six simple semantic rules describe the complete data exchange specification. Anyone using the data exchange specification can validate their TMA files using a software implementation written in Perl and distributed as a supplemental file with this publication.
<histo> <tma> <header> </header> <block> <slide> </slide> <core> </core> </block> </tma> </histo>
<?xml version="1.0" ?> <histo xmlns="http://65.222.228.150/jjb/tma_cde.htm" xmlns:cpctr="http://www.pathology.pitt.edu/pdf/cpctr/cpctr-cde-v22.pdf" xmlns:dc="http://dublincore.org"> <tma> <header> <dc:title>Cooperative Prostate Cancer Tissue Resource (CPCTR) Prostate Cancer Microarray 1-2</dc:title> <dc:creator>CPCTR</dc:creator> <dc:subject>Prostate tissue microarray</dc:subject> <dc:description>CPCTR TMA XML datafile for Microarray 1-2</dc:description> <dc:publisher>CPCTR</dc:publisher> <dc:contributor>CPCTR</dc:contributor> <dc:date>2003-10-05</dc:date> <dc:type>Prostate Cancer Tissue Microarray</dc:type>
<record> <cpctr:IMS_Case_Identifier>1053371588</cpctr:IMS_Case_Identifier> <cpctr:Location_Code>G61</cpctr:Location_Code> <cpctr:Race>Caucasian</cpctr:Race> <cpctr:Year_of_Birth>1926</cpctr:Year_of_Birth> <cpctr:Year_of_Diagnosis>1991</cpctr:Year_of_Diagnosis> <cpctr:Year_of_Prostatectomy>1991</cpctr:Year_of_Prostatectomy> <cpctr:Is_Residual_Carcinoma_Present>Yes</cpctr:Is_Residual_Carcinoma_Present> <cpctr:Most_Prominent_Histologic_Type>adenocarcinoma NOS aka acinar</cpctr:Most_Prominent_Histologic_Type> <cpctr:Gleason_Primary_Grade>4</cpctr:Gleason_Primary_Grade> <cpctr:Gleason_Secondary_Grade>3</cpctr:Gleason_Secondary_Grade> <cpctr:Gleason_Sum_Score>7</cpctr:Gleason_Sum_Score> <cpctr:Number_of_Nodes_Examined>5</cpctr:Number_of_Nodes_Examined> <cpctr:Number_of_Nodes_Positive>0</cpctr:Number_of_Nodes_Positive> <cpctr:Distant_Mets__1_at_Time_of_Diagn>Bladder</cpctr:Distant_Mets__1_at_Time_of_Diagn> <cpctr:pT_Stage>pT3b</cpctr:pT_Stage> <cpctr:pN_Stage>pN0</cpctr:pN_Stage> <cpctr:pM_Stage>pMX</cpctr:pM_Stage> <cpctr:Vital_Status>Alive</cpctr:Vital_Status> <cpctr:Year_of_PSA_Recurrence></cpctr:Year_of_PSA_Recurrence> <cpctr:PSA_Recurrence_Status>Unknown</cpctr:PSA_Recurrence_Status> <cpctr:Recurrence_Free_Year></cpctr:Recurrence_Free_Year> <array_locations>row 9, column 18|row 10, column 4</array_locations> </record>
Implementing the specification • We provide: • The specification (XML data structure and 80 common data elements) • A perl-script validator • A paper that describes a real-world implementation (porting TMA data from an excel spreadsheet) • You provide: • Whatever database you like for storing your TMA data • A script (java, perl, python, whatever) that can port your data into the TMA specification. • A script that can port TMA files in the data exchange specification into whatever database you prefer.