100 likes | 218 Views
Cancer-research biostatistics motivation for ACGT architecture. Thierry Sengstag Swiss Institute of Bioinformatics European-Japanese Workshop on Medial ICT Hokkaido University 14-15 Sept, 2009. Why ACGT is needed in clinical research?. Definitions
E N D
Cancer-research biostatistics motivationfor ACGT architecture Thierry SengstagSwiss Institute of Bioinformatics European-Japanese Workshop on Medial ICTHokkaido University 14-15 Sept, 2009
Why ACGT is needed in clinical research? • Definitions • Clinical trials:aim at answering a single predefined question (e.g. is the new treatment better than the old one?) • Clinical research:uses all available data for exploratory purpose, i.e. to answer many questions (limitation: human imagination) • Clinical trials until the 1990’s: • A few tens of parameters measured • Clinical research not very different from clinical trials (from statistical viewpoint)
Why ACGT is needed in clinical research? • End of 1990’s – early 2000’s:Human Genome Project (20’000 genes)Microarray technology • Expression of genes represent thousands to millions of individual measurements 1.28cm Image of Hybridized Probe Array
Why ACGT is needed in clinical research? • The curse of dimensionality • It is difficult to find conclusive evidence with just a few tens or hundreds of patients! Gene expression of patients in a cancer-related clinical trial
Why ACGT is needed in clinical research? • Partial solution: Meta-analytical approaches(combining data from multiple experiments) • e.g. breast cancer Breast cancer microarrays studies published worldwide (2002-2006)
Why ACGT is needed in clinical research? Data curation
Why ACGT is needed in clinical research? Data mining
So, why is ACGT needed in clinical research? • Automatic federation of multiple datasets • “Free” annotation of patient samples throughthe Master Ontology • Bioinformatics driven projects can be terminated e.g. International Genomics Consortium ExpO database (born 2004, last update Sep 2007) • Clinicians driven databases unlikely • Single-pass data entry (fewer errors) • Providing high-performance computing resources • Easy-to-develop workflows • Reproduce analyses • Clinical trial monitoring
ACGTKD Tools Service ontology DB R R Prep BEA … OS OS Workflow repository (RepoServices) Workflow Editor/Enactor Data Management Layer Credentials DB ACGT ontology DB Mediator GridFTP/DMS GAS Pending terms DB DataGrid FileSystem Ontology Mapping Tool V.O. Management External ontologies Data Access Layer Data Access Service Public data repositories Data Access Services BASE DICOM Web Service CTMS Anonymized mirrorsof hospital DBs GEO AE Only anonymized data on this side Hospital wall Patient private data on this side Data export tool Pseudonym DB CAT: Custodix Anonymization Tool Hospital DBs DICOM BASE CTMS Hospital data entry tool ACGT Trial Builder (ObTIMA) Color code: ACGT environment Outside ACGT development scope ACGT Data Architecture V2 rev 8 Portal
Demonstrator grid infrastructure Microarray databases* (Lund) Data Access Services(Eindhoven) DMS (data management) VO management (Gent) Grid Resource Mgmt(Poznan) GridR execution nodes(St-Augustin, Poznan) ObTiMA(St-Ingbert, Sapporo) GridR service node(St-Augustin) Mediator (Madrid) Portal (Bucharest) Clinical databases* (Heraklion) Meta-data repository (Malaga) Workflow enactor * hospital and anonymized