230 likes | 380 Views
ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development of an integrated DNA microarray data processing and meta-analysis platform plus a microarray experimental data repository, in Grid. Overview.
E N D
ACT 119153 (NISR+Τ) 3rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development of an integrated DNA microarray data processing and meta-analysis platform plus a microarray experimental data repository, in Grid.
Overview • Introduction • MicroArray Experiments • Problems • The GRISSOM Portal • System Architecture Overview • Case Study • Technical Issues • GRISSOM Platform Benefits
MicroArray Experiments Gene expression microarrays constitute promising high throughput measurement methodologies of the simultaneous expression of the whole genome of an organism at a specific instant. In practice they can be used to compare the level of transcription among different conditions in order to: i) Understand the mechanisms implicated in various stages of the biological system investigated ii) Classify diseases, or in general pathologies e.g. tumours with different prognosis status that are indistinguishable by microscopic histological examination iii) Monitor the response to therapy and iv) Identify and categorize diagnostic or prognostic biomarkers
Problems • Computational Processing steps of Microarray experiments data are laborious, something which represents by far the most considerable bottleneck in the successful exploitation of the technology. • Consequently there is imperative requirement for large storage and computing facilities • This results in compounding costs in a significant yet expensive technology, thus setting back research progress in the field. • Technical setbacks: array artifacts, scratches, scanner sensitivity and settings. • The curse of dimensionality: tens of thousands of genes (variables) with a small number of samples form major challenges in statistical inference. • Noise: non-specific hybridization as well as the difference between the actual amount of mRNA per cell and the relative differential expression measured by microarrays introduce variance and noise in experiments
The GRISSOM Portal • http://www.grissom.gr • Access • Restricted Web Access • Registered Users • Special Security Mechanism
The GRISSOM Portal • http://www.grissom.gr • Web Portal Access (SSL) • Two Access Modes: HellasGrid Certificates Validation Custom Certificates Validation Signed by NHRF
The GRISSOM Portal • HellasGrid Authentication & Access • MyProxy • MyProxy Certification Authority MyPrxy-Server Grid.Auth.GR User MyProxy-logon
The GRISSOM Portal http://www.grissom.gr • Features • Experimental data upload • Versatile Data Processing: • Normalization, Filtering, Statistical Selection, Clustering, Genes Annotation • Automated experiment submission to HellasGrid Infrastructure and monitoring • Biological Experiment Repository • Meta-Analysis Methods including gene annotation and GO Analysis
The GRISSOM Portal http://www.grissom.gr • Input: • Raw Dataset Files (various image formats,for cDNA/Affy) • Analysis Parameters • Output • Expressed Gene Lists • Interactive Graphs • Annotated Genes • References to similar Experiments
The GRISSOM Portal http://www.grissom.gr • Distributed Database: • Data instantiation through PHP calls on mySQL database (distributed) while actual data on SEs • Interconnection with other open biological databases (EBI ArrayExpress, NCBI GEO) for finding other related experiments • Annotation of genes performed using specialized databases (Biomart)
System Architecture Overview • Main Components: • Web Portal (User Interface) • Local DB • Grid Middleware • PHP + Java • gLite 3.1 • Parallel Execution Code (MPI + Octave)/ in the phase of development job submission through gLite DAG for fully distributed code execution • Grid Storage Elements
System Architecture Overview Analysis steps are executed using the MPI technology over multiple nodes The number of Nodes are equal to the number of experimental conditions found in every experiment
Case Study – Test Scenarios The system was tested using multiple datasets that differ in size and architecture:
Case Study – Performance Measures Intel Core 2 Duo E4300 1.8GHz processor with 2.0 GB RAM system used running GNU Octave 2.1.73 on Linux Ubuntu 7.10 operation system
Case Study – Performance Measures Analysis Run using the Same Dataset with different Parallelization Level. First Run: 3 Nodes - Second Run: 9 Nodes
Grid-related Performance Limitations • Different node H/W generations • Heterogeneity of node installed S/W (esp. regarding biocomputing packages like Bioconductor) • Maintenance Issues
GRISSOM Platform Benefits • Parallelization • Time optimization • User Transparency • Automated Job Submission + Monitoring • Open Access Biological Experiment Repository • Shell fully concealing the Grid
GRISSOM Development Team Aristotle Chatziioannou (achatzi@eie.gr) Ilias Maglogiannis (imaglo@ucg.gr) Ioannis Kanaris (kanaris.i@aegean.gr) Charalambos Doukas (doukas@aegean.gr) Eleftherios Pilalis (epilalis@eie.gr) Panagiotis Moulos (pmoulos@eie.gr) • Under the supervision of the Institute of Biological Research & Biotechnology, National Hellenic Research Foundation Fragiskos Kolisis (kolisis@eie.gr) • in collaboration with the National Documentation Center, National Hellenic Research Foundation • Funded by ACT 119153 (NISR+Τ) 3rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6
http://www.grissom.gr Thank you Questions ?