180 likes | 290 Views
Phenotype database interoperability and integration. Damian Smedley, EBI. Why do we need data integration and interoperability?. Centralised vs distributed solutions. Distributed solution . Centralised warehouse v2 . Centralised warehouse v1 . Strains. portal. Genomics. MGI. JaxMice.
E N D
Phenotype database interoperability and integration Damian Smedley, EBI Mouse models for human disease
Why do we need data integration and interoperability? Mouse models for human disease
Centralised vs distributed solutions Distributed solution Centralised warehouse v2 Centralised warehouse v1 Strains portal Genomics MGI JaxMice IMSR EMMA Ensembl Central database nightly data syncs web services IKMC projects Phenotype/Expression KOMP EUCOMM NorCOMM TIGM Eurexpress /GXD etc Europhenome Mouse models for human disease
Centralised solutions Advantages • Better query performance for large datasets • Easier to analyse raw data in one location Disadvantages • Regular data deposition is non-trivial • Designing a single schema to store different types of data is not simple. • Persuading people to “give up” their data/databases/websites • Will still need to make interoperable with other data sources Mouse models for human disease
Distributed solutions Advantages • Domain expertise at production site exploited • Different types of data easily integrated as long as they share something in common such as a gene identifier • No need for nightly data flow to keep data up to date • No need for redundant data in each database • Easier to persuade people to collaborate in a distributed scenario Disadvantages • Technical knowledge required to deploy the web services • Potential query performance problems for large datasets (may need to provide summary level data) • Potential problems performing analysis over all datasets • Problems with services going down Mouse models for human disease
1000 Genomes - centralisation Mouse models for human disease
International Cancer Genome Consortium France Liver (alcohol-related) Breast (HER2+ve) UK Breast (several subtypes) Japan Liver (virus related) Canada Pancreas China Stomach Spain CLL India Oral Cavity Australia Pancreas Mouse models for human disease
ICGC - distributed Mouse models for human disease
Joint Ensembl and EurExpress query Mouse models for human disease
TIGM GXD EUCOMM Eurexpress KOMP NorCOMM EMMA KOMP rep CMMR IMSR IKMC portal: knockoutmouse.org Europhenome Ensembl CREATE Mouse models for human disease
IKMC interoperability strategy MGI ID MGI ID MGI ID MGI ID BioMart query interface(s) MGI ID MGI ID MGI ID CREATE Ensembl GXD EBI, UK JAX, USA Sanger, UK IKMC EURExpress Sanger, UK Edinburgh, UK ES cells + lines MGI EMMA (UK), KOMP (USA), CMMR (Canada) Phenotype(EuroPhenome etc) JAX, USA Harwell, UK Mouse models for human disease
www.knockoutmouse.org/martsearch Mouse models for human disease
Europhenome: raw and summary data Mouse models for human disease
Possible strategy for phenotype data High thoughput phenotyping centres CREATE Ensembl GXD EBI, UK JAX, USA Sanger, UK IKMC MGI ID MGI ID Central database EURExpress MGI ID Sanger, UK MGI ID Edinburgh, UK MGI ID BioMart query interface(s) ES cells + lines Presentation of raw results Analysis to assign phenotypes to genes MGI ID MGI EMMA (UK), KOMP (USA), CMMR (Canada) JAX, USA MGI ID High throughput phenotyping Mouse models for human disease
Linking from IKMC portal Phenotype searches Phenotyping Mouse models for human disease
Linking from IKMC portal Mouse models for human disease
Mouse models for human disease Mouse models for human disease
Acknowledgements The whole CASIMIR consortium and in particular: Paul Schofield, Michael Gruenberger, Chao-Kung Chen, George Gkoutos, Ann-Marie Mallon, John Hancock: MouseFinder tool. MartSearch: Vivek Iyer, Darren Oakley, Bill Skarnes BioMart: Arek Kaspryzk, Syed Haider, Edoardo Marcora Mouse models for human disease