110 likes | 128 Views
ExpressDB is a web-based database designed to faithfully represent yeast RNA expression data, supporting multiple measurement methodologies with a flexible schema and cross-data set queries. It aims to provide a generalized 2D table with comprehensive data for research analysis.
E N D
ExpressDB mRNA expression database John Aach, PhDLecturer, Department of GeneticsHarvard Medical SchoolChurch Lab, Lipper Center for Computational Genetics
ExpressDB design goals • faithfully represent available yeast RNA expression data • organism-general version in testing • flexible schema • support multiple RNA level measurement methodologies • web-based interface supporting cross data set queries • potential for pipelining to applications (esp. clustering)
ORF Group Gene Name ORF Descrip ORF Expression Data Set Measure Expression Data Point Binary Expression Data Point Character Expression Data Point Decimal Expression Data Point Floating Point SGD Info Expression Data Point Integer ExpressDB design
ORF Glu Gal YAL003W 1.58 1.28 YAL005C 1.37 1.38 YAL007C 0.26 0.22 YAL008W 0.18 0.14 ......... EDS table Measure table EDP table ORF table Dataset1 Dataset2 .... Glu Gal G/R:0 G/R:0.5 ..... 1.58 1.28 0.82 3.87 0.73 1.45 ... YAL003WYAL004W YAL005CYAL007CYAL008W .... ORF G/R:0 G/R:0.5 YAL003W 0.82 3.87 YAL004W 0.73 1.45 YAL005C 0.63 1.17 YAL007C 0.90 1.64 ......... ExpressDB is a generalized 2D table Individual researcher datasets ExpressDB
ExpressDB status • Data from 11 studies using 3 different kinds of RNA assay • Veculescu 97, DeRisi 97, Cho 98, Roth 98, Chu 98, Holstege 98, Marton 98, Eisen 98, Spellman 98, Myers 99, Cohen 99 • Measure records: 2503 • Data Point records: 17515209 • 53.8% int, 25.6% dec, 16.6% char, 3.7% bin, 0.3% float • approx. data size: 800.7 MB • 570.8 MB (data) + 229.9MB (indices)
Organism Feature Group Feature Attribution Feature Expression Data Set Measure Expression Data Point Binary Expression Data Point Character Expression Data Point Decimal Expression Data Point Floating Point Database xref Expression Data Point Integer Generalized ExpressDB design Support formultiple organisms Support forRNAs other than ORFs Generalizeddatabasexref via BIGED
Query application • http://arep.med.harvard.edu/ExpressDB • supports queries across multiple data sets • produces tab-delimited output that can be saved or copy/pasted for further analysis • Performance • Untuned, prototype system on development server • you’ll be happier if you restrict yourself to simple queries
ExpressDB data preferences • Data that can be directly used for data mining • expression levels for an RNA in a condition, plus error estimates • “summary” data rather than raw data • data that is directly comparable across all experiments and RNA assays • common, normalized form for expression level No common form exists today, so ExpressDB is ‘compromise’ that can handle any data form.
Multiple data values ORFt=0102030YAL001C 5 -3 5 3YAL001C 108 72 109 108 • 2277 ORFs in ExpressDB have multiple data values • complicate database and queries Recommendations for improving comparability:Data form Publish summary versions of data that • Provide expression level data at functional RNA level • Use stable identifiers instead of names • database-of-record accession numbers (SGDID, UniGene…) • Exclude data values called as errors • Consolidate multiple non-erroneous values • Clearly document all measures and computations
Recommendations for improving comparability:Data content • Standardize microarray control conditions and move towards estimated relative abundances (ERAs) as the principal reported summary data • ERA = fractional abundance of RNA in total mRNA population in one condition (not a ‘fold change’ or ratio of 2 conditions) • investigate nucleic acid species quantifiable controls (genomic DNA, probe pools) • Verify that different RNA assays yield the same results on the same RNA samples, and develop calibrations if they don’t.
Acknowledgements Pat Brown Joe DeRisi Mike Eisen Vishy Iyer Paul Spellman anonymous reviewers at Genome Research Lipper Foundation HHMI HMR DOE George Church Wayne Rindone Barak Cohen Pete Estep Jason Hughes Rob Mitra Martin Steffen Saeed Tavazoie plusmany other members of Church Lab Affymetrix