1 / 60

DESPRAD subproject

DESPRAD subproject. Alvis Brazma EMBL-EBI Hinxton, October 20, 2003. DESPRAD – Development and Establishment of Standards and Prototype Repository for Array Data. Participants. EBI UMC Utrecht University of Bergen RZPD Cambridge University EMBL Heidelberg

palti
Download Presentation

DESPRAD subproject

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DESPRAD subproject Alvis Brazma EMBL-EBI Hinxton, October 20, 2003

  2. DESPRAD – Development and Establishment of Standards and Prototype Repository for Array Data

  3. Participants • EBI • UMC Utrecht • University of Bergen • RZPD • Cambridge University • EMBL Heidelberg • University of Marseille (CIML) • University of Madrid (CMB)

  4. Three major sets of WPs: • Developing standards and an international infrastructure for microarray data sharing (WP1 – WP4) • Establishing a public repository for microarray data – ArrayExpress (WP4 – WP9) • Research in gene expression data analysis and gene networks (WP9 – WP12)

  5. ArrayExpress goals • Serving as an archival repository for microarray data supporting publications • Providing easy access to microarray data in a structured and standardised format for research community • Facilitating the sharing of microarray designs and protocols

  6. ArrayExpress approach • To collect the necessary information enabling the user to understand how to interpret the data • To try to represent the information in a structured way potentially allowing for automated analysis and mining • To work towards a community agreement to represent the microarry data in a standard way – founding of the MGED society

  7. 1. Standards • Founding the Microarray Gene Expression Data (MGED) society • Development of the standards • MIMAE • MAGE • MGED ontology

  8. Array scans Quantitations Samples Spots Genes A B D C Sharing microarray data – which data?

  9. Sample annotations problem 1 Gene expression levels – problem 2 Gene annotations Annotations Samples Gene expression matrix Genes

  10. MGED Society • Microarray Gene Expression Data Society is an international organisation for facilitating the sharing of functional genomics and proteomics array data MGED 1, Hinxton, November 1999 MGED 2, Heidelberg, May 2000 MGED 3, Stanford University, April 2001 MGED 4, Boston, February 2002 MGED 5, Tokyo, September 2002 MGED 6, Aix-en-Provence, September 2003 MGED 7, Toronto, September 2004 Board of directors – EBI, Stanford, UCB, TIGR, Affymetrix, Rosetta,…

  11. labelled nucleic acid labelled nucleic acid labelled nucleic acid labelled nucleic acid Microarray array array array Gene expression data matrix Protocol Protocol Protocol Protocol Protocol Protocol normalization integration Experiment genes Sample Sample Sample Sample Sample Array design RNA extract RNA extract RNA extract RNA extract RNA extract hybridisation labelled nucleic acid hybridisation array hybridisation hybridisation hybridisation

  12. The first database model - developed in collaboration with DKFZ in 1999

  13. MGED standards - MIAME

  14. Nature editorial

  15. MGED standards – MAGE-ML MAGE-ML

  16. Affymetrix Agilent Biodiscovery (Imagene5.5) BASE (Open source project coordinated at Lund) Iobion (Gene Traffic) Manchester University (MAXDB) Molmine (J-Express) NCI NIEHS Rosetta Biosoftware (Rosetta Resolver) RZPD Sanger Institute LIMS (MIDAS) Silicon Genetics (GeneNet) Stanford University (SMD) TIGR (MADAM) UC at Berkeley University of Pennsylvania (RAD) UMC Utrecht The organisations and software supporting MAGE-ML include

  17. Affymetrix Agilent Biodiscovery (Imagene5.5) BASE (Open source project coordinated at Lund) Iobion (Gene Traffic) Manchester University (MAXDB) Molmine (J-Express) NCI NIEHS Rosetta Biosoftware (Rosetta Resolver) RZPD Sanger Institute LIMS (MIDAS) Silicon Genetics (GeneNet) Stanford University (SMD) TIGR (MADAM) UC at Berkeley University of Pennsylvania (RAD) UMC Utrecht The organisations and software supporting MAGE-ML include

  18. ~3000 1172 ~250 Data in ArrayExpress Hybs 3000 2000 1000 ~100 6 2004 2003 2002 April September September February November

  19. ArrayExpress content (experiments) +1 drosophyla experiment By experiment

  20. Submissions by labs (in hybs)

  21. Submissions by country (in experiments)

  22. SUBSELECT Expression Profiler(component interface) 1 CLUSTER 2

  23. ArrayExpress web-page hits • 2002 – 49 245 • 2003 – 274 983 (by 12 September)

  24. ArrayExpress components Submissions Queries, Analysis Large-scale microarray facilities ArrayExpress Export to local analysis tools MAGE-ML MAGE-ML MIAMExpress - online submission tool Expression Profiler - online analysis tool Internet Smaller labs www

  25. MIAMExpress • Online since December 1, 2002 • 2002 – 15 951 hits • 2003 – 112 871 hits by 12 September • So far ~20 submissions completed through MIAMExpress, i.e., about 25% of all experiments in ArrayExpress • MIAMExpress is open source software - installed in at least 15 labs (EMBL, RZPD, Leipzig, Leuven, Vancouver, VIB) • Tox-MIAMExpress – a specialised version for Toxicology

  26. ArrayExpress infrastructure Submissions Access ArrayExpress www MIAMExpress (MySQL) Desktop Data Analysis software MIAMExpress Local installations (Cambridge,…) MAGE-ML Repository (Oracle) www MAGE-ML retrieval Local databases (RZPD,Stanford) Queries Query interface (Tomcat) Local databases LIMS (EMBL,TIGR) MAGE-ML pipelines Expression Profiler www Array Manufacturers (Affymetrix,Agilent)

  27. Submissions by pipeline (in hybs)

  28. More complex queries (genes, expression levels, etc) Simple queries (species, author, lab, array types, etc) Repository (MAGE-OM model) Warehouse (simple gene-centric model) Ensmart submissions curation curation Links back to the evidence Hyperlinks to other databases Database integration ArrayExpress development

  29. Sample annotations Gene expression levels Gene annotations Gene expression data matrix Samples Genes

  30. Summarised information about which gene is expressed where More complex queries (genes, expression levels, etc) Simple queries (species, author, lab, array types, etc) Repository (MAGE-OM model) Warehouse (simple gene-centric model) Gene Expression Atlas Ensmart submissions curation curation curation Links back to the evidence Hyperlinks to other databases Database integration Database integration ArrayExpress development

  31. New in ArrayExpress • Password protected logins • Can be used to support anonymous refereeing of microarray papers • Discussions with Nature

  32. Data growth in ArrayExpress Hybs 4000 ? 3000 2000 1000 2004 2003 2002

  33. Distributed data collection Small lab Small lab Small lab Small lab Small lab Small lab Small lab Small lab National microarray centre National microarray centre National microarray centre EMBL ArrayExpress Stanford Sanger TIGR

  34. Data analysis tools • Expression profiler – complete redevelopment of the earlier tool – new interface, new functionality, XML based modularity – beta version will be ready on months 24 • J-express – (developed in Bergen), talk by Inge Jonassen

  35. Research • Microarray based gene network analysis – 2 publications out, 1 in print, 1 submitted • S. Pombe gene expression data analysis (in collaboration with the Sanger Institute) – publication in preparation • New algorithms for clustering and cluster comparison – 2 publications in preparation

  36. Transcription factor binding network • Chromatin IP experiments on a chip (ChiP on chip) • Using microarrays for finding genomic (intragenic) sequences (of length of few hundred bp) where a particular transcription factor is likely to bind • ChIP by Lee et al. (Science 2002) – binding site location data in yeast genome for 107 transcription factors (from about 250 yeast transcription factors in total) • Identified around 4500 binding locations

  37. ChIP on chip network by Lee et al

  38. DA DC DB C A gene A gene B gene C B D gene D Gene disruption network

  39. Data for over 200 gene disruptions in Yeast Hughes et al, Cell, 102 (2000)

  40. Mutation network for S. Cerevisiae

  41. Three networks in yeast • ChIP network (Lee et al) • Mutation network (Hughes et al) • In silico network – matching 38 experimentally known transcription factor binding sites (Pilpel et al) against yeast genome sequence

  42. Intersection of the networks Red – 39 arcs present in all networks Green – arcs present in at least 2 networks and adjacent to one of SWI4, SWI6 or MBP1

  43. All genes t Transcription factors h Disrupted genes How Chip-chip and disruption networks relate? All genes Regulation set of t Effectual set of h

More Related