240 likes | 347 Views
Enabling Grids for E-sciencE. A GRID based platform to host multiple repositories for digital content. Antonio Calanducci 1 J.M. González 3 , R. Ramos 2 , M. Rubio 2 , D.Tcaci 3 1 INFN Catania, 2 CETA-CIEMAT, 3 MAAT-G Knowledge 3rd EGEE User Forum
E N D
Enabling Grids for E-sciencE A GRID based platform to host multiple repositories for digital content Antonio Calanducci1 J.M. González3, R. Ramos2, M. Rubio2, D.Tcaci3 1INFN Catania, 2CETA-CIEMAT, 3MAAT-G Knowledge3rd EGEE User Forum 11-14 Febrary 2008 – Clermont-Ferrand (France) www.eu-egee.org EGEE-II INFSO-RI-031688 EGEE and gLite are registered trademarks
Introduction • Need to offer a GRID based platform to host arbitrary repositories • A digital repository is a set of annotated digitalized data offered to users in a structured manner. • Both digitalized data and annotations can vary greatly from one rep to another but the following commonalties are acknoledged: • There is a basic informational unit of digitalized data (a mammogram, a page of an ancient manuscript, a 3D model..) • There is metadata around each unit of digitalized data (patient info, diagnoses, translation, historical context, physical properties …) • Specific algorithms process the data (search microcalcifications, automatic translation…) • Users browse, search and update the repository, launch algorithms (GRID WMS) • Data is stored in a federated way: each institution owns and manages its content • Metadata to DB, Digitalized data to archive (GRID SE) EGEE-II INFSO-RI-031688 • 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand 2
Goals of gLibrary/DRI • To host multiple repositories of arbitrary structure • On a GRID infrastructure (security, federation, …) • Reduce the “cost-to-deploy”, reach new communities • Open architecture • Easy to use platform, web based interface • Collaboration between INFN and CETA-CIEMAT • Builds on INFN gLibrary EGEE-II INFSO-RI-031688 • 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand 3
INFN gLibrary • Created by GILDA team at INFN Catania • Secure, robust, easy to use interface to handle digital assets stored in GRID SE • Interface to browse entries and finding files in SE • “à la iTunes” browsing allows mouse-clicks searches • Built on top of gLite GRID services: • any SRM SE, LFC, AMGA, VOMS authorization • Authentication/Authorization • Via applet, creating a proxy cert on the user’s PC • Proxy used to interact directly with GRID elements (LFC, SE, AMGA) • Files transferred directly from SE to applet and viceversa. EGEE-II INFSO-RI-031688 • 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand 4
gLibrary screenshots EGEE-II INFSO-RI-031688 • 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand 5
gLibrary/DRI • Extends gLibrary by: • Making it multirepository • No predefined repository content structure: each repository describes itself. • Decoupling navigation + management from repository specifics • DRI: Digital Repositories Infrastructure • A repository must provide: • A description of its navigational structures (trees, filters) and a viewer • A description of its data model • An storage engine (for data model persistence) • The DRI API specification describes HOW this is provided • A repository provider can • Make its own implementation of the specification • Use (or extend) the default one provided EGEE-II INFSO-RI-031688 • 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand 6
gLibrary/DRI web interface EGEE-II INFSO-RI-031688 • 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand 7
DICOM viewer EGEE-II INFSO-RI-031688 • 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand 8
gLibrary/DRI API specification • A repository has to provide: • Data Model: • XML format description of the repository’s data • Relational data model supported • Indication of which part of the data model is saved on the federated DB and which on the Storage System • Storage Module: • it takes care of data persistency • Load() and Saves() method have to be provided for loading and saving instances of the data model • User Interface Module: • definition of the navigational trees and filters • viewer for the specific repository EGEE-II INFSO-RI-031688 • 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand 9
gLibrary/DRI API specification • Contract between gLibrary/DRI platform and specific repository implementations • Each application must provide three Java modules implementing the following interfaces: • DRIUIInterface for describing trees, filters and viewers • DRIStorageInterface for storing and retrieving data • DRINodeInterface for defining repository data model • gLibrary/DRI engine orchestrates API calls to different interface implentations EGEE-II INFSO-RI-031688 • 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand 11
gLibrary/DRI UI API extract public interface DRIUIInterface { public Vector<Tree> getRepositoryTrees (String reposititoryName); public TreeHierarchy getTreeHierarchy (String treeName); public Vector getFilterNameInstances(); public Vector <FilterEntry> getFilterEntries (String filterName); public void loadViewer (String viewerClass); } P public class MyRepositoryUI implements DRIUIInterface { public Vector<Tree> getRepositoryTrees (String repositoryName) { // access repository config file/db/etc to get tree data … return new Vector( new Tree(“By author”), new Tree(“By date”)); } … } EGEE-II INFSO-RI-031688 • 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand 12
gLibrary/DRI Engine Orchestration Registered repositories MGUI.getRepositoryTrees(): what are your navigation trees? MGUI.getFilterNameInstances() what are your filters? MGUI.getFilterEntries() what are the possible values for the selected filter? MGUI.LoadViewer(): return an applet with the viewer application to display and manipulate the selected repository item EGEE-II INFSO-RI-031688 • 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand 13
gLibrary/DRI Storage API • public interface DRIStorageInterface • { • public DRIGenericNode Load(String Id); • public void Remove(String Id); • public void CreateNew(DRIGenericNode Node); • public void Save(DRIGenericNode Node); • } • public class MyRepositoryStorage implements DRIStorageInterface { • public MyRepositoryNode Load (String id) { • // access db, GRID SE, etc.. Assemble one instance of data model • … • MyRepositoryNode node = new MyRepositoryNode (db, data, …); • return node; • } • … • } EGEE-II INFSO-RI-031688 • 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand 14
gLibrary/DRI default implementation • We provide a default implementation for UI and Storage APIs • public class DRIUIModule implements DRIUIInteface • public class DRIStorageModule implements DRIStorageInterface • UI default implementation: • Loads repository trees from AMGA • Loads filter definitions from AMGA • Field display definitions from AMGA • Storage • Reads repository data model from XML file • Stores/Loads data model in AMGA and marked items in SEs EGEE-II INFSO-RI-031688 • 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand 15
XML Data model def example • public class MyRepStorageModule inherits DRIStorageModule {} • public class MyRepNode inherits DRIGenericNode • DRI Storage module reads data model from XML files: <TableName name=StoragePatient primaryIdAttr=PatientID foreignAttr=NULL> <attr name=PatientID> <dbAttrName>PatientID</dbAttrName> <type>int</type> <dbAttrType>int</dbAttrType> </attr> <attr name=PatientName> <dbAttrName>PatientName</dbAttrName> <type>String</type> <dbAttrType>Varchar(80)</dbAttrType> </attr> <attr name=AGE> <dbAttrName>PatientAge</dbAttrName> <type>Int</type> <dbAttrType>Int</dbAttrType> </attr> <attr name=studies> <dbAttrName>studies</dbAttrName> <type>Entity</type> <refEntity>StorageStudy</refEntity> </attr> </TableName> <TableName name=StorageStudy primaryIdAttr=StorageID foreignAttr=PatientID> <attr name=StorageID> <dbAttrName>StorageID</dbAttrName> <type>int</type> <dbAttrType>int</dbAttrType> </attr> <attr name=Diagnose> <dbAttrName>Diagnose</dbAttrName> <type>String</type> <dbAttrType>Varchar(255)</dbAttrType> </attr> <attr name=Mammogram> <dbAttrName>Mammogram</dbAttrName> <type>LFN</type> <dbAttrType>Varchar(255)</dbAttrType> </attr> </TableName> DRIStorageModule stores regular fields in AMGA DRIStorageModule stores specially marked fields in a GRID Storage Element e register them in the File Catalog EGEE-II INFSO-RI-031688 • 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand 16
Using UI default implementation • public class MyRepUIModule inherits DRIUIModule {}(not implements DRIUIInterface) • AMGA dump • Collection: • /ceta/mgplus/config/trees • Content: • /ceta/mgplus/config/trees/alphabetical (Collection) • > ls • Query> getattr 0 tag parentid path filter fields • >> FromAtoD • >> FromEtoJ • >> FromKtoO • >> FromPtoU • >> FromVtoZ • /ceta/mgplus/config/trees/pathologies (Collection) • > ls • >> 0 • Query>getattr 0 tag parentid path filter fields PathologyId • >> Benign • >> TumorMorphology • >> Spread • >> Microcalcifications • >> study • >> ‘/ceta/mgplus/data/patient/study:PathologyId=0 and • /ceta/mgplus/data/patient:MGPlusPatientId=/ceta/mgplus/data/patient/study:MGPlusStudyId’ • /ceta/mgplus/data/patient:MGPlusPatientId,PatientId,PatientName,Gender,AgeAtMenarche,AgeAtMenopause Where MGPLUS trees are stored Note the EMPY implementation Alphabetical patient tree definition Contents of the alphabetical tree Pathologies tree definition Contents of pathologies tree Filter definition for Microcalcification branch EGEE-II INFSO-RI-031688 • 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand 17
Mammography repository example • Goals: a GRID based repository for mammograms, patient history and collaborative diagnoses • Uses UI and Storage default implementations • Provides its own viewer which accepts a MGPlusNode: • Based on Open Source TUDOR DICOM viewer • Adapted it to comply with the DRI API • Converted it into an applet • Extended functionality (display specific patient data, annotations directly on the mammograms, etc.) • Save() method retrieve directly data files from SEs using direct GridFTP transfers EGEE-II INFSO-RI-031688 • 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand 18
Repository specific viewer EGEE-II INFSO-RI-031688 • 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand 19
gLibrary/DRI architecture EGEE-II INFSO-RI-031688 • 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand 20
Technologies • Web 2.0 Web interface (AJAX) • PHP 5 for the front-end engine • Java Servlets for the back-end DRI engine • Usage of Java-PHP bridge • Applets • For user authentication with their VO certificate • For viewers implementation • Java Introspection • XML • gLite Java APIs: AMGA, LFC wrappers, JGlobus GridFTPclient EGEE-II INFSO-RI-031688 • 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand 21
Where we are • Engine deployed and working, API and default implementation working • MGPlus repository implemented on DRI • Current work: • Interface to launch and manage jobs on Grid WMS • Generic uploader EGEE-II INFSO-RI-031688 • 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand 22
Conclusions and future work • Effectively reduced cost by APIs and default implementation. New repository providers must: • Provide empty implementations of UI and Storage (very easy) • Describe their data model in XML (very easy) • Adapt/make viewer (difficult) • Provides: • Generic multirepository platform, making GRID facilities easily accessible • attract new communities, ease of hosting • Future work: • Having a SOA and JSR170 compliant • Generic viewer and tree management interface (almost ZERO cost for rep providers) • EELA-II Official Digital Library product EGEE-II INFSO-RI-031688 • 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand 23
Contacts • Mailing list: • glibrary@ct.infn.it • Authors: • antonio.calanducci@ct.infn.it • manuel.rubio@ciemat.es • raul.ramos@ciemat.es • dtcaci@maat-g.com • jmgonzalez@maat-g.com • Prototypes: • https://glibrary.ct.infn.it (INFN gLibrary platform) • https://dri-dev.ceta-ciemat.es (gLibrary/DRI platform) EGEE-II INFSO-RI-031688 • 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand 24
Questions? EGEE-II INFSO-RI-031688 • 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand 25