350 likes | 481 Views
Director’s Challenge IT Overview. NCICB. NCICB-SAIC. Agenda. Goal. Build a Microarray Data and Analysis Portal. Development Overview. Object and Data Models Software Development Process Application Architecture Currently Developed and Deployed Functionality.
E N D
Director’s Challenge IT Overview NCICB NCICB-SAIC
Agenda • Goal • Build a Microarray Data and Analysis Portal • Development Overview • Object and Data Models • Software Development Process • Application Architecture • Currently Developed and Deployed Functionality • Future Enhancements (Use Cases) • Data Analysis • Biological Analysis (Integration with caBIO) • LIMS • EVS
Overall Goal of Microarray Data Portal Transform Numerical Data into Biological Data • Provide Convenient Means of Submitting Experiments • Variety of Methods to Query the Database • Integrate and Develop Cluster and Pattern Analysis Tools • Integrate Ontology and Annotation Tools • Develop Architecture to Facilitate Items 1-4
Reporting a Microarray Experiment • Experimental Data • Image Files • Data Files • Experimental Description • Purpose of Study • Experimental Details • Sample information • Clinical Data Standard Needed to Describe Microarray Experiment
Microarray Standards • MIAME • Minimum Information About a Microarray Experiment • Experimental Design, Array Design, Hybridization, Samples, Measurements and Normalization Controls • MAML • Microarray Markup Language • XML Implementation of the MIAME Standard • Industry moving towards MAGEML • MAGEML • XML Implementation of the MIAME Standard • Formed Via Merge of MAML and GEML Standards • De Facto Widespread Industry Support
Director’s Challenge Data Model • Based Upon MAGEML Object Model • Facilitates Data Exchange and Standard Upload • Support for Annotation, Ontology, and Analysis Tools • Tables Required for Integration with caBIO • Additional tables to hold clinical data in upcoming months • One of the First Public MAGEML Databases • Instantiated and Populated • Schema Available for Download • ErWIN Diagram Available for Download
Director’s Challenge Artifacts • dc.nci.nih.gov/informatics • Object Model • Use Cases • Sequence Diagrams • Data Models • SQL script • ErWIN diagram • Java API • Links to Industry Standards • Evolution of Existing Microarray Tools
Director’s Challenge Object Model • Based Upon the MAGEML Standard • Classes Model MAGEML Elements and Relationships • Objects Encapsulate Data and Methods to Access Data • Java Applications can Easily Exchange Objects • Objects Written to MAGEML (XML) for Non-Java Applications • Integration with NCICB caBIO Objects
NCICB Development Standards • Java Programming Language • Objects Encapsulate Data and Methods to Manage Data • Many API’s to Facilitate Rapid Application Development • Open Standards • Java Community Process • Industry Standards (i.e., MAGEML) • Open Source Architecture • Web and Application Servers (Apache,Tomcat,JBoss) • No Database-Specific Code (Triggers,Stored Procedures) • Open Access • XML, HTTP, SOAP, RDF– Variety of Languages
Software Development Process • Rational Unified Process (RUP) • Use Cases to Capture Business Requirements • Sequence Diagrams Mapping Process Across Application Layers • Class Diagrams to Map Business Concepts to Class Objects • eXtreme Programming • Assignments Partitioned Between Small Teams • Application Segmented into Smaller Deliverables • Tailored Specifically for Dynamic Requirements Environment NCICB-SAIC Approach Combines Both, Resulting in an Iterative, Flexible, and Highly Responsive Software Development Process
Director’s Challenge API • Design Patterns– Judicious Use of GoF and J2EE Patterns • DynamicJavaBean– Versatile Implementation of JavaBean • Object Factories– Control of Object Instantiation • UserProfileBean– Customizes User Experience • Metadata-Driven Configuration– Ease of Development Result is an Extensible and Configurable API
Available Configuration Parameters • Database or DTD Metadata • Error Codes (Type,Message) • Dependent Fields • TextParseBean Field—Element/Column • Field Name—Field Title • Form Name– Elements/Tables • Placeholder Name—Column/Element Name • Required Fields • Retrieved Element Name—Id • Non-Persisted Elements • Auto Assigned Elements—Form • Query Statement Metadata Configuration Parameters Loaded into Memory on Startup
Metadata-Driven Configuration – Mapping Pkg • Metadata XML File Generated by DatabaseMetadataUtil Class • Encapsulates Referential Constraints for XML or Database • Element—Data Type Mapping (for Conversion or Type Check) • Element—Primary Key or Id Mapping • Exported Keys Map– Associative Table or IDREFS Constraints Metadata Parameters Generated and Loaded on Startup
Benefits of Dynamic Configuration • Changes to Database—Auto Update of O/R Mapping Layer • Specify Application Behavior via XML Configuration Files • Facilitates DynamicJavaBean Implementation • Object Reuse via Object Factories • Redeployment or Reconfiguration via XML Files– No Recompile Configuration Parameters Loaded into Memory on Startup
Dynamic Java Beans • Properties are Not Hardcoded into DynamicJavaBean • Implementing Classes Extend or Composed of Hashtable • Facilitates Object Reuse via Factory Design Pattern • Metadata-Driven,Dynamic Object Definition • Changes to Class Definition– XML File Update
Director’s Challenge Architecture ManagerServlet RequestHandler Input Persistence RDBMS
Challenge of Experiment Submission Capturing Rich Set of MIAME Information Vs. Ease of Use • Prepopulate Fields with UserProfile and FormInputBean Data • Dynamically Tailor Form Fields Based Upon Previous Entries • Personalize Drop Down Lists via UserProfile Preferences • Capture Common Field Data and Autogenerate Missing Items Transform Numerical Data into Biological Data
Targeted Submission Functionality • XML or Form-Based Submission of MIAME-MAGE Information • Upload of Data Text Files • Upload or Manual Submission of Image Files • Leverage Architecture Design to Facilitate Ease of Use
Queries • Currently Implemented • Basic search/detail • Hardware Search • Software Search • Coming Soon • Advanced Search/detail • Protocol Search • Chip Search
Future Directions • Implement Domain Object Model • Fully Implement All Search Use Cases • Develop Annotation and Onotology Tools (Integrate with caBIO) • Integrate xClust Cluster Analysis Tool • Data Retrieval and Processing to Support Analysis Tools • Develop Pattern Analysis Tools • Batch Upload/Download • Generate MAGEML XML file upon experiment submission
Integration with NCICB caBIO • Value-added Functionalities • Java API for Annotations and Ontologies • Easy Retrieval of Information in the Form of Objects
Annotation Using caBIO to Access Gene Information • Reporter on Chip: • IMAGE clone • Affy probe set • Genbank ID • UniGene ID • Gene Info: • Annotation • Ontology • caBIO • Sequence • Gene AnnotationBean
Gene OntologyGene Expression by Functional Aspects • OntologyBean • (GoOntology) • getGenes () • getAllGenes() • (ontology/children) • Genes • Sequences • Categorize genes of interest • Explore data by gene categories • Reporters on Chip
Gene Ontology Implementation Goal: Enable user to obtain microarray data for a list of genes based on gene ontology term Steps: • G1. Get GO term by browsing GO Browser and by searching cGAP’s GO database • G2. Get a gene list based on user specified GO term • G3. Get expression data for a gene list by searching microarray database
Gene Ontology Term Goal: Enable user to obtain an accurate GO term Approaches: • G1. Get GO term by browsing GO Browser • G 2. Get GO term by searching cGAP’s GO database • G • Vocabulary Control • Help users determine a GO term for their biological question
Summary • Capture MIAME data in a MAGEML compliant database • Data Portal – valued added functionality • Bioinformatics Integration • Analytic tools
Acknowledgements • Development Team • John Yost • Jennifer Long • Cheng-Cheng Huang • Nick Xiao • Johnita Beasley • Additional Thanks • caBIO • CGAP • madB