280 likes | 475 Views
ArrayExpress A public database for microarray based gene expression data http://www.ebi.ac.uk/microarray/. European Bioinformatics Institute EMBL-EBI Alvis Brazma, Helen Parkinson, Ugis Sarkans, Mohammadreza Shojatalab, Jaak Vilo + team. MGED IV, Boston, February 2002. ArrayExpress.
E N D
ArrayExpressA public database for microarray based gene expression datahttp://www.ebi.ac.uk/microarray/ European Bioinformatics Institute EMBL-EBI Alvis Brazma, Helen Parkinson, Ugis Sarkans, Mohammadreza Shojatalab, Jaak Vilo + team MGED IV, Boston, February 2002
ArrayExpress Tuesday, February 12th, 2002 Opened to public • Standards: MIAME-compliant • Data model: MAGE-OM • Data input: MAGE-ML, web • Data output: HTML,MAGE-ML, TAB-delimited, link to Expression Profiler • Data curation: Team of curators • Data sets: Yeast, human
General overview ArrayExpress MAGE-ML MAGE-ML MIAMExpress Expression Profiler Internet www
Internet ArrayExpress component architecture www Application server Java servlets MAGE-OM ArrayExpress Main database SQL derived from MAGE-OM Data warehouse gene-centred queries Submission/ curation Images file server MAGE-ML
ArrayExpress - features • MIAME-compliant, MAGE-ML, MAGE-OM • Can deal with: • raw quantitation data • processed data • data transformations • Independent of: • experimental platforms • image analysis methods • data normalization methods
ArrayExpress: details • Database schema derived from MAGE-OM • Standard SQL, we use Oracle • Data loader for MAGE-ML - generated • Web interface (first release 12.2.2002) • Queries by experiment, array, sample • Browsing • Object model-based query mechanism, automatic mapping to SQL
MIAMExpress • Data annotation and submission tool • MIAME based web interface • Experiment, Array, Protocol submissions • Uses CV/ontology wherever possible • Creates MAGE-ML files for loading into ArrayExpress • Based on MySQL, Perl, CGI, Apache
Create account Login Pending/New Experiment En En En En E1 E1 E1 E1 E2 E2 E2 E2 Samplen Sample1 Sample2 Sample3 Sample protocol Extracts 1…n Extracts 1…n Extracts 1…n Extracts 1…n Extraction protocol Hyb protocol Hybridisations Array1 Array2 Array3 Arrayn Scanning protocol Data1 Data2 Data3 Datan Image analysis protocol Transformation protocol Combined Experiment Data Submit Final free text comment MIAMExpresssubmission procedure
MIAMExpress design and future • Species and domain specific pages and ontologies, ontology development • Life-span of data submissions is long • Curation control, submissions tracking • Interaction with ArrayExpress • Full MAGE-OM, data updating • Usability, flexibility, scalability, platform independence • User needs, free in-house installation
ArrayExpress curation effort • User support and help documentation • Submission support for MIAMExpress • Support on ontologies and CVs • Minimize free text, removal of synonyms • MIAME encouragement • Help on MAGE-ML • Goal: to provide high-quality, well-annotated data to allow automated data analysis
Accession numbers • E-MEXP-234Experiment 234 via MIAMExpress • E-SANG-25 Experiment 25 from Sanger Institute • A-AFFY-1034Array description 1034 from Affymetrix • P-LABL-5Protocol 5 for labeling
Human data (ironchip) from EMBL Yeast data from EMBL S. pombe data SangerInstitute TIGR array descriptions Affymetrix chip designs Direct pipeline from Sanger (Rob Andrews) HGMP mouse EMBL mosquito (Add your name here!) Data in ArrayExpress Now Work underway
General overview ArrayExpress MAGE-ML MAGE-ML MIAMExpress Expression Profiler Internet www
GeneOntology Pathways Databases SPEXS Other tools Expression Profiler: EPCLUST FOLDER DATA SELECT ANALYZE A “CLUSTER” URLMAP
101 Sequences relative to ORF start YGR128C + 100 >YAL036C chromo=1 coord=(76154-75048(C)) start=-600 end=+2 seq=(76152-76754) TGTTCTTTCTTCTTCTGCTTCTCCTTTTCCTTTTTTTCCTTCTCCTTTTCCTTCTTGGACTTTAGTATAGGCTTACCATCCTTCTTCTCTTCAATAACCTTCTTTTCTTGCTTCTTCTTCGATTGCTTCAAAGTAGACATGAAGTCGCCTTCAATGGCCTCAGCACCTTCAGCACTTGCACTTGCTTCTCTGGAAGTGTCATCTGCACCTGCGCTGCTTTCTGGATTTGGAGTTGGCGTGGCACTGATTTCTTCGTTCTGGGCGGCGTCTTCTTCGAATTCCTCATCCCAGTAGTTCTGTTGGTTCTTTTTACTCTTTTTCGCCATCTTTCACTTATCTGATGTTCCTGATTGCCCTTCTTATCCCCTCAAAGTTCACCTTTGCCACTTATTCTAGTGCAAGATCTCTTGCTTTCAATGGGCTTAAAGCTTGAAAAATTTTTTCACATCACAAGCGACGAGGGCCCGTTTTTTTCATCGATGAGCTATAAGAGTTTTCCACTTTTAAGATGGGATATTACGGTGTGATGAGGGCGCAATGATAGGAAGTGTTTGAAGCTAGATGCAGTAGGTGCAAGCGTAGAGTTGTTGATTGAGCAAA_ATG_ >YAL025C chromo=1 coord=(101147-100230(C)) start=-600 end=+2 seq=(101145-101747) CTTAGAAGATAAAGTAGTGAATTACAATAAATTCGATACGAACGTTCAAATAGTCAAGAATTTCATTCAAAGGGTTCAATGGTCCAAGTTTTACACTTTCAAAGTTAACCACGAATTGCTGAGTAAGTGTGTTTATATTAGCACATTAACACAAGAAGAGATTAATGAACTATCCACATGAGGTATTGTGCCACTTTCCTCCAGTTCCCAAATTCCTCTTGTAAAAAACTTTGCATATAAAATATACAGATGGAGCATATATAGATGGAGCATACATACATGTTTTTTTTTTTTTAAAAACATGGACTCGAACAGAATAAAAGAATTTATAATGATAGATAATGCATACTTCAATAAGAGAGAATACTTGTTTTTAAATGAGAATTGCTTTCATTAGCTCATTATGTTCAGATTATCAAAATGCAGTAGGGTAATAAACCTTTTTTTTTTTTTTTTTTTTTTTTGAAAAATTTTCCGATGAGCTTTTGAAAAAAAATGAAAAAGTGATTGGTATAGAGGCAGATATTGCATTGCTTAGTTCTTTCTTTTGACAGTGTTCTCTTCAGTACATAACTACAACGGTTAGAATACAACGAGGAT_ATG_ ... >YBR084W chromo=2 coord=(411012-413936) start=-600 end=+2 seq=(410412-411014) CCATGTATCCAAGACCTGCTGAAGATGCTTACAATGCCAATTATATTCAAGGTCTGCCCCAGTACCAAACATCTTATTTTTCGCAGCTGTTATTATCATCACCCCAGCATTACGAACATTCTCCACATCAAAGGAACTTTACGCCATCCAACCAATCGCATGGGAACTTTTATTAAATGTCTACATACATACATACATCTCGTACATAAATACGCATACGTATCTTCGTAGTAAGAACCGTCACAGATATGATTGAGCACGGTACAATTATGTATTAGTCAAACATTACCAGTTCTCGAACAAAACCAAAGCTACTCCTGCAACACTCTTCTATCGCACATGTATGGTTCTTATTGTTTCCCGAGTTCTTTTTTACTGACGCGCCAGAACGAGTAAGAAAGTTCTCTAGCGCCATGCTGAAATTTTTTTCACTTCAACGGACAGCGATTTTTTTTCTTTTTCCTCCGAAATAATGTTGCAGCGGTTCTCGATGCCTCAAGAATTGCAGAAGTAAACCAGCCAATACACATCAAAAAACAACTTTCATTACTGTGATTCTCTCAGTCTGTTCATTTGTCAGATATTTAAGGCTAAAAGGAA_ATG_ GATGAG.T 1:52/70 2:453/508 R:7.52345 BP:1.02391e-33G.GATGAG.T 1:39/49 2:193/222 R:13.244 BP:2.49026e-33AAAATTTT 1:63/77 2:833/911 R:4.95687 BP:5.02807e-32TGAAAA.TTT 1:45/53 2:333/350 R:8.85687 BP:1.69905e-31TG.AAA.TTT 1:53/61 2:538/570 R:6.45662 BP:3.24836e-31TG.AAA.TTTT 1:40/43 2:254/260 R:10.3214 BP:3.84624e-30TGAAA..TTT 1:54/65 2:608/645 R:5.82106 BP:1.0887e-29 ... GATGAG.T TGAAA..TTT
1 mismatch GATGAG.T TGAAA..TTT GATGAG.T W/30 TGAAA..TTT Upstream sequence (600bp)
Components of Expression Profiler http://ep.ebi.ac.uk/ External data, tools pathways, function, etc. EP:PPI Prot-Prot ia. EP:GO GeneOntology Expression data EPCLUST Expression data GENOMES sequence, function, annotation URLMAP provide links SEQLOGO SPEXS discover patterns PATMATCH visualisepatterns
Ackowledgments: the team (3) 1999 November MGED 1 in Hinxton, EBI Alvis Brazma Alan Robinson Jaak Vilo
Ackowledgments: the team (5) 2000 August Alvis Brazma, Alan Robinson Database Ugis Sarkans Expression Profiler Research, students Jaak Vilo Thomas Schlitt
Ackowledgments: the team (9) 2001 June Alvis Brazma Database Curation MIAMExpress Ugis Sarkans Helen Parkinson Mohammadreza Shojatalab Expression Profiler Research, students Jaak Vilo Thomas Schlitt Patrick Kemmeren Katja Kivinen Johan Rung
Ackowledgments: the team (19) 2002 February Alvis Brazma Database Curation MIAMExpress Ugis Sarkans Helen Parkinson Mohammadreza Shojatalab Ahmet Oezcimen Susanna Sansone Gonzalo Garcia Philippe Rocca-Serra Niran Abeyguna- wardena Ele Holloway Expression Profiler Research, students Jaak Vilo Thomas Schlitt Lev Soinov Patrick Kemmeren Katja Kivinen Anastasia Samsonova Misha Kapushesky Johan Rung Koichi Tazaki