10 likes | 116 Views
BarleyBase: BarleyBase.org. BARLEYBASE – A MIAME-COMPLIANT EXPRESSION PROFILING DATABASE FOR PLANTS Lishuang Shen, Jian Gong, Jianqiang Xin, Xiaoyun Tang, Rico A. Caldo, Stacy Turner, Dan Nettleton, Roger P. Wise, Julie A. Dickerson*
E N D
BarleyBase: BarleyBase.org BARLEYBASE – A MIAME-COMPLIANT EXPRESSION PROFILING DATABASE FOR PLANTS Lishuang Shen, Jian Gong, Jianqiang Xin, Xiaoyun Tang, Rico A. Caldo, Stacy Turner, Dan Nettleton, Roger P. Wise, Julie A. Dickerson* Virtual Reality Applications Center, Iowa State University, Ames, Iowa 50011 BarleyExpress- Web-Based Submission • BarleyExpress is a MIAME-compliant microarray submission and annotation tool adapted from MIAMExpress. • Submitters first input experiment design information. • Annotate experiment in factorial design with factors and factor level. • Batch upload raw GeneChip data files. • Associate raw data files with each studied treatment. • Protocol submission – optional. • Input sample preparation details for each hybridization. Use templates to reuse previous sample submissions. • Finalize experiment submission. • Submitters grant access to designated individuals and groups. • Plant ontology and controlled vocabulary are enforced at each step. Abstract BarleyBase (www.BarleyBase.org) is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix Barley1 and Arabidopsis ATH1 GeneChips, plus experiment and sample annotation. And it is expanding to other plant microarray platforms. BarleyBase features a web-based, MIAME-compliant, experiment submission tool, BarleyExpress. BarleyExpress allows users to efficiently submit and manage their experiment descriptions, array design and expression analysis information. BarleyBase contains a broad set of query and display options at all data levels, from experiment, hybridization to probe set and probe levels. Users can do cross-experiment query on probe sets by expression profile and by biological information. Probe set queries are seamlessly integrated with visualization and analysis tools such as scatter plots, the R statistical toolbox, and data filters. BarleyBase collaborates with PlantGDB, Gramene and GrainGenes to perform gene prediction and cross-species comparison with Barley1 GeneChip exemplar sequences. NASCArrays shares ATH1 data. BarleyBase houses 20 experiment submissions from Barley and Arabidopsis with total 741 hybridizations (August 31, 2004). Figure 4. Expression & Annotation for Exemplar Barley1_11969 Visualization & Analysis • Web-based microarray data analysis pipelines integrate a broad set of probe set query and display options with analysis tools. • Interactive visualization at all data levels for experiments, hybridizations, probe sets, and probes. • Gene list creation with cross-experiment and cross-platform probe set queries for generating hypotheses about genes of interest. • Identification of differentially expressed and co-expressed genes with multiple statistical test and expression profile filters. • Pattern recognition on gene lists, methods include hierarchical clustering, k-means partitioning, PCA, SOM, and multi-dimensional scaling (MDS). • Gene list classification by Gene Ontology. • Data analysis & visualizations use R and Bioconductor. • Probe alignments with exemplar sequence. • Gene prediction through interconnections with PlantGDB database. • Cross-species comparative genomics through the Gramene and GrainGenes databases. BarleyBase Data Processing Pipeline Batch Download MAGE-ML Raw Data CSV BarleyExpress Query & Analysis MAS5.0 RMA Figure 2. Major Steps in Experiment Submission Data Access • Batch download complete data sets for experiment annotation, raw and normalized expression data in MAGE-ML, comma-separated values (CSV), or CEL-file formats. • Navigate experiment, hybridization, sample data, exemplars. • Gene list creation & management for gene-centric analysis. • Access probe sets based on expression profiles with single- or cross-experiment query. • Search genes by biological criteria: annotation, sequence, gene ontology category, pathway, gene family membership. • Flexible, submitter-controlled data access, group access to private submissions Internet User Figure 1.BarleyBase Overview Data Acquisition & Processing • Experiment and expression raw data submission by submitter. • BarleyBase normalizes submitted raw data with the statistical algorithm from Affymetrix MAS 5 and Robust Multi-Array Analysis (RMA) . • Compute summary statistics and graphs for raw and normalized expression data • Store all types of data in an open-source MySQL database. • BarleyBase assigns unique accession numbers to experiments, hybridizations & samples. • BarleyBase generates MAGE-ML and CSV files for batch download and data exchange. • Submission and associated data are available for online access and analysis. Figure 5. Visualization for Hybridizations & Gene Cluster • Future Plans • Evolve into PlExDB, a comprehensive Plant Expression Data Base • Support other major plant species: maize, rice, soybean, wheat. • Support spotted cDNA and long-oligo microarray platforms. • Analysis & visualization tool development. • Cross-experiment, cross-platform & cross-species data analysis. • Exemplar annotation with Gene Ontology and pathway information. BarleyBase Data Model • BarleyBase uses a hierarchical data model to store microarray gene expression data. • The top level data structure is experiment, each contains one or more treatments, a treatment has one or more samples as replicates, a sample has one or more hybridizations. • Protocols are associated with experiment at the hybridization level. • Five table types : Array, Expression, Experiment, Protocol, Submitter. • Follows MIAME principles recommended by MGED and implemented in MIAMExpress, tuned for plants, and removes the Extract level. • Added statistical experimental factorial design factors fields. • Enforcing plant ontology and controlled vocabulary in experiment description. • Biological annotation for probe sets and exemplars with Gene Ontology.. • Support expression data from Affymetrix GeneChips, will add spotted microarray support. • Acknowledgments • The BarleyBase project is funded by the USDA National Research Initiative (NRI) grant no. 02-35300-12619 and USDA-CSREES North American Barley Genome Project. • PlantGDB, Gramene, GrainGenes, KEGG, TAIR share tools and genomic data. • NASCArrays and TAIR share Arabidopsis ATH1 GeneChip data. • BarleyBase is hosted at the Iowa State University Virtual Reality Applications Center. • Exemplar sequences and BLASTX NR annotations were provided by HarvEST:Barley. Figure 3. Gene List Creation, Management & Analysis