240 likes | 335 Views
Informatics Tools for Molecular-Based Specimen Banks. Current Resources CaTIS Database VGSR Concept. Mark A. Watson, M.D., Ph.D. Informatics Limitations of Specimen Banks. Inability to completely and accurately annotate specimens with clinical and pathological data.
E N D
Informatics Tools for Molecular-Based Specimen Banks • Current Resources • CaTIS Database • VGSR Concept Mark A. Watson, M.D., Ph.D.
Informatics Limitations of Specimen Banks • Inability to completely and accurately annotate specimens with clinical and pathological data. • Inefficient tracking of specimens and specimen quality to multiple different research projects. • Inability to perform real-time queries of readily available samples from secure data servers. • Lack of biological (research) data annotation for specimens. • Inability to completely and accurately annotate specimens with clinical and pathological data. • Inefficient tracking of specimens and specimen quality to multiple different research projects. • Inability to perform real-time queries of readily available samples from secure data servers. • Lack of biological (research) data annotation for specimens.
SCC Tissue Procurement Core Overview • Established 1997 • 24 studies • Archival bank • Institutional studies • American College of Surgeons Oncology Group (ACOSOG) • 21,000 specimens / 8,000 patients (sporadic cancer) • Frozen Tissue / Paraffin Blocks / Serum • 6,300 DNA and RNA samples • 112 Sample distributions • Intra- and extramural investigators
SCC Tissue Procurement Bioinformatics • Mark A. Watson, M.D., Ph.D. • Dir. Siteman Cancer Center Tissue Procurement Core Facility • Dir. ACOSOG Central Specimen Bank • Dir. Siteman Cancer Center Multiplexed Gene Analysis Facility • Rakesh Nagarajan, M.D., Ph.D. • Asst. Dir. Siteman Cancer Center Bioinformatics Core Facility • Jeff Milbrandt, M.D., Ph.D. • Dir. Siteman Cancer Center Bioinformatics Core Facility • Richard Wilson, Ph.D. • Dir. Washington Univ. Genome Sequencing Center • (5) Laboratory FTEs • Persistent Systems (Software Development)
SCC Tissue Procurement Core Capabilities • Specimen Collection • Multiple institutional sites • Outside institutions • Specimen Storage • LN2, -80C, 4C • Specimen Processing and QA • DNA and RNA samples • Sample Arraying • Tissue / DNA / RNA • Data Management (Tracking)
Polymorphism Analysis Germline DNA Methylation Studies DNA Tumor T2N0 NSCLC RTK Mutational Profiling Non-Malignant Tissue Serum RNA Gene Expression Arrays Bone Marrow Clinical Data qRT/PCR Serum Tissue Microarray Proteomics Pathology Data IHC Validation FISH A Paradigm for Specimen Data Utilization Specimen Data Multi-Dimensional Data Space
CaTIS Specimen Database • MS Access Based • Non-networked • Scalable • Rapid customization • Functionality • Patient / Specimen / Sample Accession • Storage / QA data (tissues and samples) • Pathology data (manual entry) • Distribution data • Mapping to sample arrays / robotics • Tracking through to outside institutions (e.g. WU-GSC) • Mapping to experimental results
Pathology Data Clinical Databases Investigators CaTIS Specimen Database Specimen Data Patient Code Number Specimen Code Number Specimen Info / QC Submission Data Path Data Sample Data Specimen Code Number Sample Code Number Sample QC Investigator Distribution Experimental Results Patient Data Patient Code Number Demographics Study / IRB
Aim 1: Commercialize and Distribute Key CaTIS Components • Data security scheme • Migrate to web based data entry / query • Migrate to platform independence • Open interface to other pathology data systems • Open interface to genomics data systems (e.g. CHIPDB) • Use of caBIG-defined common data elements • Pathology • Molecular • Additional tools (e.g. pedigrees) suggested by adopters
Virtual Genomic Sample Repository (VGSR) • Most specimen resources do not allow direct querying to sample detail. • Most specimen resources do not allow querying by biological (experimental) data. • Caucasian male / > 60 YO / T2N0 NSCLC • T2N0 NSCLC / p53- / 50 ug DNA / U133 Array # 4567 • Single specimens may be used for multiple studies at the genome / transcriptome / proteome level • Maximum specimen utility • Integrative systems biology
Virtual Genomic Sample Repository (VGSR) Goal: To develop an informatics system and scientific culture to facilitate the sharing of molecular biospecimens and biological data associated with their use.
Genome Analysis Primary Data Mining Transcriptome Analysis Primary Use in Correlative Studies DATA Proteome Analysis VGSR Data Registration Specimen Registration Specimen Distribution for New Study Specimen Query Based on New Data Mining VGSR – Conceptual Data Flow Institutional Tissue Banks
VGSR - Components • Policy / Governance • Can samples be shared (IRB / MTA) ? • Will samples be shared (ownership / IP) ? • How will samples be shared (review / prioritization) ? • Sample QA / Data standards • Central Database Server • Web-based Applications • Sample Registration • Sample Query / Request • Sample Manager • Data Registration
VGSR – Key Features • No other system currently meets this need • Cooperative Group Tissue Banks • Pathology Database Tools • Unique opportunity to test the ‘Virtual Specimen Bank’ concept • Numerous points of interoperability with caBIG systems • Based on caBIG Architecture Workspace recommendations • Use of caBIG vocabularies and common data elements • Integrated with NCICB tools (caIMAGE) • Works with other software developed in Integrative Cancer Research Tools Workspace (e.g. Function Express, Mutation Viewer)
CaTIS / VGSR – Required Resources • Developmental Requirements • Consensus building / SOPs / CDEs • Personnel • 1 Architect (Shared with Architecture Workspace) • 1 Project manager (50%- Shared with Integrated Cancer Research Tools Workspace) • 1 DBA (50%- Shared with Integrated Cancer Research Tools Workspace) • 2 Programmers • Hardware • Enterprise-class server / programmer workstations • Software • Rational Rose/XDE, ClearCase (CVS), ClearQuest • JBuilder, C++ Builder, Ant • RDBMS (e.g. Oracle, DB2, MSSQL, etc.)
CaTIS – Project Timeline • Month 1-6 • Conversion of CaTIS to “Commercialized” CaTIS • Month 7-12 • Deploy CaTIS API to extramural sites (adopters)
VGSR – Project Timeline (1-12 months) • Month 1-8 • Consensus / Policy building • Architecture design / CDEs and vocabularies • Month 9-12 • Database building • Web tools design
CaTIS / VGSR Outcome Measures • Utilization of CaTIS by adopter sites • Successful registration of samples to VGSR • Utilization of VGSR samples by new collaborative research teams for translational cancer research • Funding opportunities • Publications track record