590 likes | 723 Views
A New Approach to Searching Omics Databases. The Strategies Web Development Kit (WDK). Steve Fischer The EuPathDB Project. The EuPathDB Genome Resources. 21 Annotated Eukaryotic Pathogen Species (parasites) 6 sites by genus A EuPathDB portal that aggregates all data
E N D
A New Approach to Searching Omics Databases The Strategies Web Development Kit (WDK) Steve Fischer The EuPathDB Project www.gusdb.org/wdk
The EuPathDB Genome Resources • 21 Annotated Eukaryotic Pathogen Species (parasites) • 6 sites by genus • A EuPathDB portal that aggregates all data • Sample research applications: • Find malaria drug targets • Find sleeping sickness vaccine targets • Find Toxoplasma proteins that transit to the host www.gusdb.org/wdk
Examples of Omics Data We Integrate(beyond standard genome annotation) RNA Sequencing Protein Expression Transcript Expression (multiple platforms) ChIP-Chip SNPs Isolates www.gusdb.org/wdk
Genomics data is highly-dimensional Allele Frequency Transcript Expression Transcript Expression Transcript Expression Protein Expression Protein Expression Protein Expression SNP Isolate Assay Isolate Comparison Gene Structure Protein Structure Gene Qualities BLAST Similarity GO Annotation Annotated Gene Transcript Assembly Genome Location Library Pathways SNPs Phenotype Motif Similarity E.C. Annotation ORF Cellular Location Protein Expression BLAST Similarity Phyletic Pattern Protein Qualities Protein Qualities Protein Qualities Protein Interactions Motif Similarity SAGE Tag Expression Level Differential Expression Ortholog Group Sequence www.gusdb.org/wdk
90 Searches on PlasmoDB www.gusdb.org/wdk
90 Searches on PlasmoDB 51 Gene Searches 39 Other Data Type Searches ~250 parameters total www.gusdb.org/wdk
Challenge: help users ask hard questions of the integrated data to find targeted sets of genes www.gusdb.org/wdk
Challenge: help users ask hard questions of the integrated data to find targeted sets of genes Solution:The Strategies Search Interface Induces users to move away from one click searching into seamless use of Boolean set operations. www.gusdb.org/wdk
Challenge: help users ask hard questions of the data to find targeted sets of genes Solution:The Strategies Search Interface Induces users to move away from one click searching into seamless use of Boolean set operations. Makes “advanced” easy. www.gusdb.org/wdk
Challenge: help users ask hard questions of the data to find targeted sets of genes Solution:The Strategies Search Interface Induces users to move away from one click searching into seamless use of Boolean set operations. Makes “advanced” easy. Portable to any omics database www.gusdb.org/wdk
Live demo… Build the strategy shown above Find parasite kinases that are likely exposed to the host www.gusdb.org/wdk
More Complex Malaria Drug Targets Strategy This strategy has been shared. The URL is in the Abstract www.gusdb.org/wdk
More Complex Malaria Drug Targets Strategy This strategy has been shared. The URL is in the Abstract • Enzymes – union of EC and GO term www.gusdb.org/wdk
More Complex Malaria Drug Targets Strategy This strategy has been shared. The URL is in the Abstract • Enzymes – union of EC and GO term • Two expression experiments showing expression in trophozoite life stage www.gusdb.org/wdk
More Complex Malaria Drug Targets Strategy This strategy has been shared. The URL is in the Abstract • Enzymes – union of EC and GO term • Two expression experiments showing expression in trophozoite life stage • Mass spec trophozoite experiment www.gusdb.org/wdk
More Complex Malaria Drug Targets Strategy This strategy has been shared. The URL is in the Abstract • Enzymes – union of EC and GO term • Two expression experiments showing expression in trophozoite life stage • Mass spec trophozoite experiment • Phylogenetic profile indicating presence in parasite but not host (mammals) www.gusdb.org/wdk
More Complex Malaria Drug Targets Strategy This strategy has been shared. The URL is in the Abstract • Enzymes – union of EC and GO term • Two expression experiments showing expression in trophozoite life stage • Mass spec trophozoite experiment • Phylogenetic profile indicating presence in parasite but not host (mammals) • Under purifying selection – essential to the parasite www.gusdb.org/wdk
More Complex Malaria Drug Targets Strategy This strategy has been shared. The URL is in the Abstract • Enzymes – union of EC and GO term • Two expression experiments showing expression in trophozoite life stage • Mass spec trophozoite experiment • Phylogenetic profile indicating presence in parasite but not host (mammals) • Under purifying selection – essential to the parasite • Ortholog transform – transform P.falciparum set into all Plasmodium species www.gusdb.org/wdk
More Complex Malaria Drug Targets Strategy This strategy has been shared. The URL is in the Abstract • Enzymes – union of EC and GO term • Two expression experiments showing expression in trophozoite life stage • Mass spec trophozoite experiment • Phylogenetic profile indicating presence in parasite but not host (mammals) • Under purifying selection – essential to the parasite • Ortholog transform – transform P.falciparum set into all Plasmodium species • Save and share www.gusdb.org/wdk
Strategies WDK Architecture View (web) • Model WDK Sanity Test Genomics Database JSP and CSS WDK Model (XML) WDK Model (Java Objects) JavaBeans (JSP compatible) JSP Tag Library Genomics Data Web Services Framework WDK Engine Query Cache Controller WDK Query Engine (Java) Struts controller Processes (eg, BLAST) = You provide = WDK provides User Login and Search History www.gusdb.org/wdk
Strategies WDK Architecture • Runs on any relational database View (web) • Model WDK Sanity Test Genomics Database JSP and CSS WDK Model (XML) WDK Model (Java Objects) JavaBeans (JSP compatible) JSP Tag Library Genomics Data Web Services Framework WDK Engine Query Cache Controller WDK Query Engine (Java) Struts controller Processes (eg, BLAST) = You provide = WDK provides User Login and Search History www.gusdb.org/wdk
Strategies WDK Architecture • Model-View-Controller design View (web) • Model WDK Sanity Test Genomics Database JSP and CSS WDK Model (XML) WDK Model (Java Objects) JavaBeans (JSP compatible) JSP Tag Library Genomics Data Web Services Framework WDK Engine Query Cache Controller WDK Query Engine (Java) Struts controller Processes (eg, BLAST) = You provide = WDK provides User Login and Search History www.gusdb.org/wdk
Strategies WDK Architecture • Model • Configured in XML • Abstracts Records and Searches • Specifies columns View (web) • Model WDK Sanity Test Genomics Database JSP and CSS WDK Model (XML) WDK Model (Java Objects) JavaBeans (JSP compatible) JSP Tag Library Genomics Data Web Services Framework WDK Engine Query Cache Controller WDK Query Engine (Java) Struts controller Processes (eg, BLAST) = You provide = WDK provides User Login and Search History www.gusdb.org/wdk
Strategies WDK Architecture • Model • Arbitrary data sources (eg BLAST) via web services View (web) • Model WDK Sanity Test Genomics Database JSP and CSS WDK Model (XML) WDK Model (Java Objects) JavaBeans (JSP compatible) JSP Tag Library Genomics Data Web Services Framework WDK Engine Query Cache Controller WDK Query Engine (Java) Struts controller Processes (eg, BLAST) = You provide = WDK provides User Login and Search History www.gusdb.org/wdk
Strategies WDK Architecture • View • JSP and CSS • (Javascript and Ajax) View (web) • Model WDK Sanity Test Genomics Database JSP and CSS WDK Model (XML) WDK Model (Java Objects) JavaBeans (JSP compatible) JSP Tag Library Genomics Data Web Services Framework WDK Engine Query Cache Controller WDK Query Engine (Java) Struts controller Processes (eg, BLAST) = You provide = WDK provides User Login and Search History www.gusdb.org/wdk
Strategies WDK Architecture • Controller • Struts View (web) • Model WDK Sanity Test Genomics Database JSP and CSS WDK Model (XML) WDK Model (Java Objects) JavaBeans (JSP compatible) JSP Tag Library Genomics Data Web Services Framework WDK Engine Query Cache Controller WDK Query Engine (Java) Struts controller Processes (eg, BLAST) = You provide = WDK provides User Login and Search History www.gusdb.org/wdk
User Driven Development • Computer-human interaction (CHI) studies • During prototyping • Video and audio capture of workshop participants doing exercises • Drove the design, and showed high user enthusiasm. • User feedback has been very positive. • Usage statistics • show 3-fold increase in use of Boolean operations (in comparable two month periods) www.gusdb.org/wdk
Upcoming Features • Genes Basket (delivered 1/7/10) • Cherry pick genes • Generate reports from the basket • Send to a postprocessing tool (eg, MSA) • Add as step in a strategy (eg, subtract known genes) • Web services (delivered 1/7/10) • Run searches via RESTful web services • Weighted Searches • Assign weights to searches for increased filtering discrimination • For example, weight EST data more heavily than SAGE data • Span logic • Transform a set of one type into another type based on genome span relations • For example, find all ESTs that overlap with the set of Genes I found. www.gusdb.org/wdk
Quick Tour of Template Site Simple demo site You can install it and use it as a template Has a simple Model And a simple View Show Record page Searches Report maker www.gusdb.org/wdk
Quick Look at WDK Model • Defined in XML • Records • Like “objects,” but data only. No methods. • Two data types • Attributes • Eg, Gene Name • Or Gene Location • Tables • Eg, Gene Xrefs • Or Gene GO Associations • Data acquired from DB through SQL • Searches • Return sets of records • Columns are attributes www.gusdb.org/wdk
Acknowledgements The EuPathDB User Interface Team And Principal Investigators Brian Brunk Jerric Gao Omar Harb Charles Treatman David Roos Chris Stoeckert Cristina Aurrecoechea Mark Heiges Cary Pennington Eileen Kramer Jessica Kissinger • EuPathDBis an NIAID Bioinformatics Resource Center • Supported by • NIAID Contract No. HHSN266200400037C • The Bill & Melinda Gates Foundation www.gusdb.org/wdk
Following slides are demo backup www.gusdb.org/wdk