480 likes | 667 Views
caArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation. caArray overview & demo Mervi Heiskanen (15 min) caArray architecture Scott Gustafson (15 min) webCGH overview & demo David Hall (15 min). http://caarray.nci.nih.gov/.
E N D
caArray: Cancer Array InformaticsOpen Source Tools for Microarray Data Management, Analysis and Annotation caArray overview & demo Mervi Heiskanen (15 min) caArray architecture Scott Gustafson (15 min) webCGH overview & demo David Hall (15 min) http://caarray.nci.nih.gov/
caArray Data Portal & Data Analysis Tools • Data Portal: Promotes data sharing, - submission of original, raw data files with associated experiment and sample information. • Data analysis and visualization tools: • webCGH (NCICB/RTI), XpressionWay (NCICB/SAIC) • caBIG tools: • caWorkbench - Columbia • DWD - UNC Lineberger • GenePattern - MIT/Broad ? • Magellan - UC San Francisco • VISDA – Georgetown • Cancer Molecular Pages – Burnham • Function Express – Wash U Siteman • GoMiner –NCI/CCR
caArray version 1.0 • Key features: • MIAME 1.1 compliant data annotation forms • Support for Affymetrix and GenePix native files • MAGE-ML import and export • controlled vocabularies (MGED ontology) • access to data via MAGE-OM API • caArray installations: • NCICB caArray instance supports NCI funded programs. • Local installations at the cancer centers: caBIG funded caArray adopters (Lombardi, Wistar, NYU)
caArray listservs: • caArray developers • caArray users • caArray team
caArray: Compliance with Standardization Efforts • MIAME • Minimum Information About a Microarray Experiment • 1.1 Draft 6 (April 1, 2002) • http://www.mged.org/Workgroups/MIAME/miame_1.1.html • MAGE-ML • MicroArray and GeneExpression Object Model and Markup Language • 1.1 (October 2003) • http://www.omg.org/docs/formal/03-10-01.pdf • MGED Ontology • Microarray Gene Expression Data Ontology • 1.1.8 (April 2004) • http://mged.sourceforge.net/ontologies/MGEDontology.php caBIG compatibility guidelines http://cabig.nci.nih.gov/guidelines_documentation/caBIG_Compatibility_Document
class TechnologyType • namespace: • http://mged.sourceforge.net/ontologies/MGEDOntology.daml# • documentation: • The technology type or platform of the reporters on the array. • type: • primitive • superclasses: • ArrayDesignPackage • used in classes: • FeatureGroup • used in individuals: • in_situ_oligo_featuresspotted_antibody_featuresspotted_colony_featuresspotted_ds_DNA_featuresspotted_protein_featuresspotted_ss_oligo_features • class CellLineDatabase • namespace: • http://mged.sourceforge.net/ontologies/MGEDOntology.daml# • documentation: • Database of cell line information. • type: • primitive • superclasses: • Database • used in classes: • CellLine • used in individuals: • ATCC_CulturesCABRI_Human_and_Animal_Cell_lines
caArray Phase 2 • caArray 1.2 (June 2005) • Support for additional file formats via a software toolkit • Public search without login • Copy bio sample information • caArray 1.5 (September 2005) • XpressionWay, pathway visualization tool • Integration with caDSR 3.0 • caArray 1.7 (December 2005) • Store filtered and normalized data • User management user interface • caArray 2.0 (March 2006) • Embedded MAGE-ML validation All releases: Defect fixes and usability enhancements
Acknowledgements • NCICB/SAIC • Development team: • Hangjiong Chen • Scott Gustafson • Juergen Lorenz • John Moy • Sumeet Muju • Beth Neuberger • Phu Tran • Jim Zhou • QA: • Durga Addepalli • Andrew Shinohara • Ye Wu • NCICB/TerpSys • Don Swan, Jamie Keller • Research Triangle Institute • David Hall (webCGH) NCICB Sue Dubman, Mervi Heiskanen, Xioapeng Bian, Subha Madhavan, Carl Schaefer, Gilberto Fragoso, Denise Warzel… and Ken Buetow
caARRAY’s Architecture Credits to Sumeet Muju Phu Tran
caArray Architecture TOMCAT WEB EJB CONTAINER CONTAINER caCORE ------------ VOCAB VOCAB caBIO MGR EJB INTERFACE caDSR EVS SECURITY SECURITY MGR EJB OBJECTS SERVLET DATA S T PROTOCOL TRANSFER U BROWSER MGR EJB R OBJECT T SECURITY S (DTO) JSP DB OBJECT EXPERIMENT RELATIONAL MAGE MGR EJB BRIDGE MANAGER (OJB) ) MAGE-ML Experiment and ArrayDesign S OTHER T C K MGR EJB E T S J caARRAY - B E O DB G E A G M A M ( MAGE-ML NATIVE DATA IMPORTER MDB FTP APPLET FTP STAGING AREA FILE NETCDF API FILE UPLOADER FILE SHARE MDB NETCDF API MAGE-OM API MAGE-OM MAGE-OM JAR OBJECTS RMI MGR MAGE-OM PERSISTENCE
caArray Interfaces: caArray EJB API • caArrayEJB API: Provides transaction control, asynchronous processes,service location, common security and distributed capabilities for submission and retrieval of Microarray Experiments. • The caArray presentation layer utilizes the above functionality via the caArrayEJB API. • Data Transfer Objects (DTOs) utilized to transfer data between calling application and the EJBs. • APIs can be used for federated access and submission of transaction data.
caArray Interfaces: Mage-OM API • MAGE-OM API :Provides fine grain search and retrieval of all caArray data via a caBIO-like RMI based API. • The MAGE-OM API maps the MAGE objects to the new caArray database schema. • RMI Security module incorporated for user/group level data access. • NetCDF API logic incorporated for faster retrieval of data • Built to be grid enabled
caArray Middleware • Data Representation • Data Transfer Objects (DTO) • MicroArray Gene Expression Software Toolkit (MAGE-stk) • DTO - MAGE-stk Conversion • Data Persistence • Data Access Layer • ObJectRelationalBridge (OJB) • OJB Abstraction Layer and Data Access Objects (DAO) • EJB Layer • Stateless Session Façade • Bean-managed Persistence • NETCDF Files • Large Data Set • Fast Binary Access • MAGE-ML Import and Export • Message-Driven Beans
MAGE-ML Import and Export: An Example <MAGE-ML identifier="gov.nih.nci.ncicb.caarray:MAGEML:123:1"> <AuditAndSecurity_package> <Contact_assnlist> <Person identifier="gov.nih.nci.ncicb.caarray:Person:456:1" lastName="Doe" firstName="John"> </Person> <Contact_assnlist> </AuditAndSecurity_package> <Experiment_package> <Experiment_assnlist> <Experiment identifier="gov.nih.nci.ncicb.caarray:Experiment:789:1" name=“Sample Experiment"> <Descriptions_assnlist> <Description text="This is a sample experiment."></Description> </Descriptions_assnlist> <Providers_assnreflist> <Person_ref identifier="gov.nih.nci.ncicb.caarray:Person:456:1"/> </Providers_assnreflist> </Experiment> </Experiment_assnlist> </Experiment_package> </MAGE-ML> Identifiable element Referenced Identifiable element to be resolved
MAGE-ML Import and Export • Modified from the MAGE-stk’s MAGE-ML SAX-based parser to include a persistence mechanism to insert, update and resolve (look up) parsed objects • Any valid MAGE-ML can be imported. MAGE-ML is assumed valid. Validation is typically done using ArrayExpress’s MAGEValidator • Identifiable objects are first resolved from database by matching their identifier, and if resolved the in-coming object is updated against the existing one • Identifier represents the globally unique key of a MAGE object across domains for its entire lifecycle • Identifier is separate from persisted MAGE-stk object’s primary key which is only internal to caARRAY
MAGE-ML Export • The entire object graph of an object, e.g., ArrayDesign, Experiment, is traversed to collect all Identifiable objects • The MAGE-stk’s MAGEJava object is utilized to contain all the Identifiable objects collected • When an Identifiable object is encountered, the appropriate method in the MAGEJava object is discovered and invoked using reflection to store the object into it • Ultimately MAGEJava.writeMAGEML(Writer) is invoked to recursively invoke the same method of all the contained Identifiable objects. • Xerces’s XMLSerializer pretty-formats the XML content as it is being written with appropriate new lines and indentations
A caArray Configuration caArray 1 caWorkbench caBIO caArray caDSR / EVS schema Security caARRAY EJB MAGE-OM API JAVA GRID MAGE-ML APP (future) caARRAY EJB MAGE-OM API NCICB Security caDSR / EVS caArray schema caWorkbench caBIO NCICB
webCGHA web application for the visualization and analysis of array-based CGH and gene expression data David Hall, Ph.D. Research Triangle Institute
webCGH Functions • Visualization of copy number and gene expression levels • Interrogation of genome features • Data normalization and analysis • Virtual experiments
Data Flow Database Database Adaptor Adaptor Transformer Op Op Op Op X Analytical Pipeline Cache Plot Generator
Past, Present, Future • Dec. 2003 – Version 1.0 • Basic plots, analytics, GEDP • March 2005 – Version 2.0 • More plots, analytics, caArray • Late April 2005 – Version 2.1 • Mouse/human plots • CGH/gene expression • SKY/M-FISH&CGH integration
webCGH Team • NCICB • Mervi Heiskanen • RTI • David Hall • Vesselina Bakalov • Ying Chen • Matt Westlake • Bing Liu • Laxminarayana Ganapathi • Sheping Li • Stuart Allen