1 / 11

ExpressDB: mRNA Expression Database for Flexible and Organism-General Analysis

ExpressDB is a web-based database designed to faithfully represent yeast RNA expression data, supporting multiple measurement methodologies with a flexible schema and cross-data set queries. It aims to provide a generalized 2D table with comprehensive data for research analysis.

dirkc
Download Presentation

ExpressDB: mRNA Expression Database for Flexible and Organism-General Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ExpressDB mRNA expression database John Aach, PhDLecturer, Department of GeneticsHarvard Medical SchoolChurch Lab, Lipper Center for Computational Genetics

  2. ExpressDB design goals • faithfully represent available yeast RNA expression data • organism-general version in testing • flexible schema • support multiple RNA level measurement methodologies • web-based interface supporting cross data set queries • potential for pipelining to applications (esp. clustering)

  3. ORF Group Gene Name ORF Descrip ORF Expression Data Set Measure Expression Data Point Binary Expression Data Point Character Expression Data Point Decimal Expression Data Point Floating Point SGD Info Expression Data Point Integer ExpressDB design

  4. ORF Glu Gal YAL003W 1.58 1.28 YAL005C 1.37 1.38 YAL007C 0.26 0.22 YAL008W 0.18 0.14 ......... EDS table Measure table EDP table ORF table Dataset1 Dataset2 .... Glu Gal G/R:0 G/R:0.5 ..... 1.58 1.28 0.82 3.87 0.73 1.45 ... YAL003WYAL004W YAL005CYAL007CYAL008W .... ORF G/R:0 G/R:0.5 YAL003W 0.82 3.87 YAL004W 0.73 1.45 YAL005C 0.63 1.17 YAL007C 0.90 1.64 ......... ExpressDB is a generalized 2D table Individual researcher datasets ExpressDB

  5. ExpressDB status • Data from 11 studies using 3 different kinds of RNA assay • Veculescu 97, DeRisi 97, Cho 98, Roth 98, Chu 98, Holstege 98, Marton 98, Eisen 98, Spellman 98, Myers 99, Cohen 99 • Measure records: 2503 • Data Point records: 17515209 • 53.8% int, 25.6% dec, 16.6% char, 3.7% bin, 0.3% float • approx. data size: 800.7 MB • 570.8 MB (data) + 229.9MB (indices)

  6. Organism Feature Group Feature Attribution Feature Expression Data Set Measure Expression Data Point Binary Expression Data Point Character Expression Data Point Decimal Expression Data Point Floating Point Database xref Expression Data Point Integer Generalized ExpressDB design Support formultiple organisms Support forRNAs other than ORFs Generalizeddatabasexref via BIGED

  7. Query application • http://arep.med.harvard.edu/ExpressDB • supports queries across multiple data sets • produces tab-delimited output that can be saved or copy/pasted for further analysis • Performance • Untuned, prototype system on development server • you’ll be happier if you restrict yourself to simple queries

  8. ExpressDB data preferences • Data that can be directly used for data mining • expression levels for an RNA in a condition, plus error estimates • “summary” data rather than raw data • data that is directly comparable across all experiments and RNA assays • common, normalized form for expression level No common form exists today, so ExpressDB is ‘compromise’ that can handle any data form.

  9. Multiple data values ORFt=0102030YAL001C 5 -3 5 3YAL001C 108 72 109 108 • 2277 ORFs in ExpressDB have multiple data values • complicate database and queries Recommendations for improving comparability:Data form Publish summary versions of data that • Provide expression level data at functional RNA level • Use stable identifiers instead of names • database-of-record accession numbers (SGDID, UniGene…) • Exclude data values called as errors • Consolidate multiple non-erroneous values • Clearly document all measures and computations

  10. Recommendations for improving comparability:Data content • Standardize microarray control conditions and move towards estimated relative abundances (ERAs) as the principal reported summary data • ERA = fractional abundance of RNA in total mRNA population in one condition (not a ‘fold change’ or ratio of 2 conditions) • investigate nucleic acid species quantifiable controls (genomic DNA, probe pools) • Verify that different RNA assays yield the same results on the same RNA samples, and develop calibrations if they don’t.

  11. Acknowledgements Pat Brown Joe DeRisi Mike Eisen Vishy Iyer Paul Spellman anonymous reviewers at Genome Research Lipper Foundation HHMI HMR DOE George Church Wayne Rindone Barak Cohen Pete Estep Jason Hughes Rob Mitra Martin Steffen Saeed Tavazoie plusmany other members of Church Lab Affymetrix

More Related