180 likes | 322 Views
Project Plan-Objective 2 Taner Sen & Mary Schaeffer. Create tools to enhance access to expanded datasets that reveal gene function and datasets for genetic and breeding analyse s . . Project Plan-Objective 2a.
E N D
Project Plan-Objective 2Taner Sen & Mary Schaeffer Create tools to enhance access to expanded datasets that reveal gene functionand datasets for genetic and breeding analyses.
Project Plan-Objective 2a • Goal 2a. Enable researchers to access high-quality functional descriptions for maize gene products by documenting their potential involvement in particular biochemical and metabolic pathways.
The Need: Interactions -> Networks -> Phenotypes Interactions between the products of these genes? ? Gene Phenotype Genetic interactions Chromosome Gene X Gene Y Gene Z Sharma et al 2011 Biochemical interactions
Building a metabolic network Computational Enzymatic Function Assignment Pathway Assignment (Pathway Tools) Manual Curation Enzymatic Function Assignment (Current resources): • MaizeCyc(Gramene in collaboration with MaizeGDB) EnsemblxREF pipeline (seq similarity) • CornCyc(PMN in collaboration with MaizeGDB) Averaged weighted integration algorithm based on BLAST / CatFam / Priam results • BLAST: e-value cutoff <= 1e-30, subset of SwissProt 15.3 • CatFam: version 2.0, 1% FDR profile library • Priam: Release November 2010
Comparing Enzyme Function Assignment Performances of MaizeCyc and CornCyc • A set of 177 UniProt proteins with experimental evidence and alignment to B73 RefGen_v2 FGS protein models • Definitions to measure performance: • TP (True Positive): EC match for a given gene model • FP (False Positive): EC mismatch for a given gene model • FN (False Negative): missing EC for a given gene model • TN (True Negative): missing gene model-EC pairs In collaboration with PMN
Performance Measures MaizeCyc • Precision = 0.711 • Recall = 0.245 • F = 0.365 CornCyc • Precision = 0.873 • Recall = 0.913 • F = 0.893 Precision (P) = TP / (TP +FP) Recall (R) = TP / (TP+FN) F-measure = 2 * (P * R) / (P + R) In collaboration with PMN
MaizeCyc vs. CornCyc • MaizeCyc and CornCyc are complementary resources • MaizeCyc: A higher coverage resource in assigned pathways • CornCyc: A higher stringency resource, which includes spliced variants
Metabolic Pathways CurationProject Plan - Columbia • Sub-objective 4.2: Curate maize metabolism and pathways data for release as a BioCyc database and as GO annotation files • Goal 4.2: Annotate maize genome sequence with critical, experimentally confirmed, gene function.
MetaCyc Experimental Evidence MaizeCyc CornCyc PMN MaizeGDB GO annotation GO UniProt Metabolism curation workflow
Project Plan / Goal 2a / 5-year • Gramene will move MaizeCyc to iPlant and will not update it (their focus will shift to Reactome). MaizeGDB will maintain MaizeCyc instance for B73 RefGen_v2 as is • Plant Metabolic Network is funded through NSF to support CornCyc at least until August 2015. MaizeGDB will consult with PMN how to best support CornCyc and whether to continue with CornCyc after August 2015 • Coordinate with PMN to enable curators to perform pathway and functional annotations on CornCyc • Maintain and Update CornCyc database instances at MaizeGDB • Make CornCyc database available in BioPax and Bioconductor-compatible formats
Project Plan-Objective 2b • Goal 2b. Develop and deploy network-based data access and analysis toolsthat support predictive biological investigations routinely pursued by basic biologists and breeders.
Challenges for network analysis and visualization • Analysis and visualization bottlenecks created by “Big Data” are inherently complex • Require integration of heterogeneous and incomplete data with varying number of false positives • Physical, genetic, expression networks • Breeding: pedigree, similarity relationships between lines (e.g, there is a real need for maize researchers to learn how pedigree-based similarities correspond to SNP-based similarities)
Project Plan /Goal 2b / 5-year • Determine which software tools and network and pedigree dataare available for implementation and deployment to be used in a survey (Year 1) • Interaction networks, breeding similarity networks, pedigree visualization software • Some examples for visualization tools: Cytoscape Web, CurlyWhirly, FlapJack, GeneaQuilts • Survey maize researchers including basic scientists and breeders to determine their needs for network software data analysis and visualization. (Year 2 & 4) • Deploy visualization tools for displaying interaction and pedigree datasets. (Year 3 & 5)
A proof-of-concept Pedigree Visualization • Curating data into machine-readable form is a huge challenge • Pedigree data are from Gerdes et al. from a small number of states (9) • Nodes: inbred lines, Edges: relationship between inbred lines
A Cytoscape Web Representation Stiff Stalk Lines with expired Plant Variety Protection (from Mikel et al. 2006)
Enriching interface functionality • Menu items can be created to choose difference visualization options (e.g., color according to states, calculate and display shortest distance between inbred lines in the network ) • Cytoscape Web Visualization Tool can be extended to display SNP-based relationships • Other visualization software can be used to compare various networks & incorporate heterogeneous data onto these network representations
Project Plan-Objective 2 • Create tools to enhance access to expanded datasets that reveal gene function and datasets for genetic and breeding analyses. • Goal 2a. Enable researchers to access high-quality functional descriptions for maize gene products by documenting their potential involvement in particular biochemical and metabolic pathways. • Goal 2b. Develop and deploy network-based data access and analysis tools that support predictive biological investigations routinely pursued by basic biologists and breeders.