140 likes | 158 Views
Support for MAGE-TAB in caArray 2.0. Overview and feedback. MAGE-TAB Workshop January 24, 2008. Agenda. Brief overview of caArray 2.0 caArray 2.0 and MAGE-TAB MAGE-TAB feedback. What is caArray?. caArray is a caBIG™-compliant microarray data repository at the NCICB
E N D
Support for MAGE-TAB in caArray 2.0 Overview and feedback MAGE-TAB Workshop January 24, 2008
Agenda • Brief overview of caArray 2.0 • caArray 2.0 and MAGE-TAB • MAGE-TAB feedback
What is caArray? • caArray is a caBIG™-compliant microarray data repository at the NCICB • Developed to support a federated model of microarray data sharing • Developed in line with MIAME and MAGE guidelines caArray 1.6 caArray 2.0
Goals of caArray 2.0 • Address Adopter feedback gained from our 1.x experience • Improve the user experience for storing and retrieving data produced • Simplify and improve the performance of data access through the API and grid service, for analytical applications • Harmonize with caBIG™ tissue repository (caTissue) and annotation repository (caBIO) • Support additional array platforms, including SNP arrays • Organize the application around workflow between investigators and the labs that serve them • Use an agile software development approach that will allow more frequent feature additions and better responsiveness to the user community
Features of caArray 2.0 • Store array data associated with experiment and sample annotations • Data entry through graphical user interface or MAGE-TAB • Parse Affymetrix, Illumina and GenePix formats for expression and SNP arrays • Role-based permissions for data access • Programmatic access via a Java API and grid service • Manage protocols and controlled vocabularies • MGED Ontoloty 1.3.1 comes pre-loaded • Basic Browse and Search Functionality
caArray 2.0 Annotations • Capture information for • Experiment information • Contacts • Publications • Sample Annotations • Source • Sample • Extract • Labeled Extracts • Hybridizations
caArray 2.0 supported formats Parsable file formats • Annotation • MAGE-TAB .ADF, IDF, SDRF • Array data - parsed • Affymetrix Expression and SNP • . CDF, .CEL, .CHP • Illumina Expression and SNP • .CSV • GenePix • .GAL, .GPR Unparsed formats • Affymetrix: .dat, .exp, .rpt, .txt • Illumina: .txt, .idat • Agilent: .txt, .tsv • ImaGene: .txt, .tiv • Nimblegen: .txt, .gff
caArray 2.0 permissions • Role-based permissions for each Installation • Anonymous user • System Administration • Principle investigator/Biostatistician/Lab Administrator/Lab Scientist • Data is Private until made Public • Experiment title, PI, # samples are visible but experiment content is not available to the anonymous user • Collaboration groups can be managed by the PI for pre-public collaboration • CSM 4.0 • Experiment-level and samples-level security
caArray 2.0 API and Grid Service • Support for MAGE-TAB level of annotation – Simplified implementation of MAGE • API provides a data service and analytical services • Data service allows users to use CQL to issue queries that traverse the domain model • Analytical services provide convenience methods for data access
Browse by Experiments Organism Provider Array design Search by specifying Keyword Category caArray 2.0 browse and search
MAGE-TAB in caArray 2.0 • Support MAGE-TAB v1.0 – ADF, IDF, SDRF • Term Source providers and associated Terms are captured as Controlled Vocabularies (Manage Vocabularies) • Protocols imported and viewable in Manage Protocols • Characteristics displayed on the relevant detail pages • Original files are stored in association with the Experiment • Edits made to the information in the UI are not reflected in these files • Future feature – MAGE-TAB export based on current database values
MAGE-TAB for data migration caArray 1.6 >> caArray 2.0 • Experiments in caArray 1.6 being migrated to 2.0 are being exported in MAGE-TAB format along with the associated native array data files • Challenges included • MAGE-OM >>MAGE-TAB mapping • Most challenges due to validation that all data “made it” over (not really a MAGE-TAB issue) • Manual checking still needed Jackson Labs internal MAD database >> caArray 2.0
MAGE-TAB Feedback • Initial experience with end-user-type customers is that there is a learning curve associated with using the SDRF, especially with regard to applying controlled vocabularies • Need tools to facilitate this • Source vs. Sample vs. Extract vs. Labeled Extract • Often confusion over “what goes where” • From Jackson Labs: • Documentation is good for a biologist-type end-user, but software engineer would like more detail • More real-life examples would be helpful
Specific requests to consider • Need a way to specify required fields for particular implementations • caArray UI has certain required fields – need to be able to specify these in a MAGE-TAB template • Associate “Supplemental” files with an experiment • In IDF, recommend adding a field to specify the type of array experiment (Gene Expression, SNP, aCGH, etc.)