680 likes | 697 Views
Genopolis Microarray DB a Progress Report. Marco Brandizi <marco.brandizi@unimib.it>. Dec 12, 2005. Dottorato in Informatica XIX Ciclo. Outline. Introduction GCA Application Main features Demo Demo/Gene Browser Recent added features Access control Search & Save Ongoing and future
E N D
Genopolis Microarray DB a Progress Report Marco Brandizi<marco.brandizi@unimib.it> Dec 12, 2005 Dottorato in InformaticaXIX Ciclo
Outline • Introduction • GCA Application • Main features • Demo • Demo/Gene Browser • Recent added features • Access control • Search & Save • Ongoing and future • MAGE Export • Migration on cluster • Management of knowledge about Higher Level Analysis • Other possible developments
Genes Machine gene DNA mRNA protein Cell/Life
Microarray Data Management Issues • Exp. data vs. seq. data: • Context dependent (living system, exp. Conditions) • Lack of standard unit of measure • Several normalizations methods • Multiple platforms and methods • No standard for data annotation • Vocabularies and terminology coherence • Details about: experiment, source, protocols, exp. conditions
Microarrays Data Management Issues / 2 • Evidences about data quality • What to store? • Raw Images • Computed values • Normalized values • How to find data • Complex vocabularies aware systems (ontologies) • Data mining and exp. comparison tools • Data access control
Outline • Introduction • GCA Application • Main features • Demo • Demo/Gene Browser • Recent added features • Access control • Search & Save • Ongoing and future • MAGE Export • Migration on cluster • Management of knowledge about Higher Level Analysis • Other possible developments
GCA Features • Curated experimental design representation • MIAME-compliant, (although with simplified model) • Use of controlled vocabularies • Experiment checking/publishing, with supervision • Targeted to Affymetrix platform • Chip description is simple, imported from NETAffx • Single channel technology • Access control • Users are grouped into groups and access roles • Experiments belong to user groups
GCA Features • Data Retrieval and visualization • Gene browser, a graphical visualization interface, based on the matrix model • Search & Save data • Current content: • A set of time-courses about DCs stimulated with different stimuli • Implementation & Deployment • LAMP application (Linux + Apache + MySQL + PHP) • Model Viewer Controller as much as possible: • Business objects layer • Presentation widgets (DAO-lib) • Other application control layers
GCA Features • Shortly: • A Gene Expression database software, focused on Affymetrix technology, useful as a facility for a distributed community of users
Outline • Introduction • GCA Application • Main features • Demo • Demo/Gene Browser • Recent added features • Access control • Search & Save • Ongoing and future • MAGE Export • Migration on cluster • Management of knowledge about Higher Level Analysis • Other possible developments
Outline • Introduction • GCA Application • Main features • Demo • Demo/Gene Browser • Recent added features • Access control • Search & Save • Ongoing and future • MAGE Export • Migration on cluster • Management of knowledge about Higher Level Analysis • Other possible developments
Outline • Introduction • GCA Application • Main features • Demo • Demo/Gene Browser • Recent added features • Access control • Search & Save • Ongoing and future • MAGE Export • Migration on cluster • Management of knowledge about Higher Level Analysis • Other possible developments
All rights All but admin Besta Bicocca ADMIN R, W, -publish Read only Andrea Brandizi Granucci Tiranti Norman Ottavio Experiment 123 User Permissions Brandizi, Andrea All Granucci Read Norman Read, Write Tiranti All (except admin) Ottavio None
Access management Access management Access management • Based on a core library • Recent developments (security lib) • Code has been changed so that it uses security lib • All the code that interacts with user has been wrapped with access management controls • Even malicious access attempts has been considered: • Handy writing of an URL • Handy request of an uploaded file (to be completed) • Does it work? • Yes, pretty sure • But more testing is needed
Outline • Introduction • GCA Application • Main features • Demo • Demo/Gene Browser • Recent added features • Access control • Search & Save • Ongoing and future • MAGE Export • Migration on cluster • Management of knowledge about Higher Level Analysis • Other possible developments
Outline • Introduction • GCA Application • Main features • Demo • Demo/Gene Browser • Recent added features • Access control • Search & Save • Ongoing and future • MAGE Export • Migration on cluster • Management of knowledge about Higher Level Analysis • Other possible developments
MAGE Export • Will allow to export a GCA experiment to MAGE/Array Express • A collaboration with EBI • in the context of u-GENE • So far: • Schema of GCA->MAGE(in AE compatible form) • Basic code fragments(Business objects in Java) • Still to do • Full code • Mappings with MGED-Ontology • Tests with AE
Outline • Introduction • GCA Application • Main features • Demo • Demo/Gene Browser • Recent added features • Access control • Search & Save • Ongoing and future • MAGE Export • Migration on cluster • Management of knowledge about Higher Level Analysis • Other possible developments
GCA on cluster architecture • Three machines, the minimum to have a cluster • Master (Xeon 3.2 Ghz, 2Gb RAM) • + Master Clone that ensures high availability • computation node computers(P4 3 Ghz, 512Mb) • 1Tb of SCSI disk, shared via NFS • Based on: • Debian (Linux) • Linux Virtual Server (Load Balancer) • Hearthbeat (High availability)
GCA on cluster architecture • Code needs slight changes: • PHP side and sessions: • Objects that are saved on session need to be reloaded properly • See: http://it2.php.net/manual/en/language.oop.magic-functions.php#14473 • __wakeup() is already used • __sleep() with proper return value is to be implemented • MySQL side: • The stable DB: • We need to specify the type of DB access: Read Only Mode vs. Read/Write mode • RO access uses local copy of DB • RW access uses master copy • The temporary DB: • Only master copy exists (3307 port, current deployment)