190 likes | 315 Views
ILDG Middleware Status. Chip Watson ILDG-6 Workshop May 12, 2005. Outline. Status: small changes from Dec 2005 Quick review of architecture Minimal implementation facts Next steps. Status (quick look). Only a small amount of middleware work has been done in the last 6 months
E N D
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005
Outline • Status: small changes from Dec 2005 • Quick review of architecture • Minimal implementation facts • Next steps
Status (quick look) • Only a small amount of middleware work has been done in the last 6 months • development of new metadata catalog prototype at Adelaide based on XML database • modifications to metadata catalog prototype at Fermilab to conform to new interface • small amount of work on replica catalog prototypes at several sites (JLab, Adelaide, Fermilab) • Architecture remains unchanged
Architecture (review) • Web Services • Metadata Catalog maps meta data to a global name • Replica Catalog maps a global name to one or more instances • Storage Resource Manager (optional) manages a disk, or disk + tape resource Draft schemas (WSDL) for these services exist
Architecture (review) • File based directories contain... • Master directory of all collaborations’ MDC, RC and membership lists, stored as XML files • Distributed group membership lists (XML) Initial version of schemas (XML) exist
Implementation View MDC for UKQCD Master Directory http://www.lqcd.org/<tbd>.xml contains for each collaboration: metadata catalog replica catalog group membership MDC for USQCD MDC for Japan RC for UKQCD file X RC for USQCD RC for Japan Japan group file UKQCD group file subgroup A file USQCD group file subgroup B file
MetaData Catalog • ILDG schema defines only a query interface • multiple query languages (syntax) allowed for now (no clear winner yet) • queries map from physics metadata values to Global File Name (GFN) • proposed minor modification can also return the full physics metadata
Minimal Implementation • Master XML directory to be held at www.lqcd.org/<tbd>.xml • For each collaboration, need at least these: • MetaDataCatalog (e.g. running at www.usqcd.org/<tbd>) • trivial Replica Catalog (does 1:1 name mapping) • standard web or ftp server to serve files
Getting going...(or, what must a collaboration do?) First: Deploy a metadata catalog • choose an existing prototype & deploy • populate the catalog with qcdml v1.1 compliant documents, with ILDG compliant GFN’s (global file names) Note: names must have collaboration name as part of the string; this name matches the entry name in the master directory: gfn://collaboration/local-name • request watson@jlab.org to add your MDC to the master directory on www.lqcd.org
Getting going...(or, what must a collaboration do?) Second: Deploy a replica catalog • (option 1) write a simple function which maps your collaboration’s GFN naming convention into a static URL pointing to the file (i.e., no database, just string shuffling) OR • (option 2) get / implement a true RC, with multiple instance tracking (a database) • request watson@jlab.org to add your RC to the master directory on www.lqcd.org Third: Serve the files (http, ftp, srm, ...)
Nice things to also do... • Deploy a real RC, which can track another collaboration’s copies of your files • Populate a group membership file, to support group read/write access (otherwise your collaboration is relegated to “world” status) • Deploy an SRM (with protocol negotiation) and also at least one file server that supports parallel streams (gridftp, bbftp, ...) for higher performance file retrieval • Implement a web interface to your metadata catalog
Near Term Expectations • Adelaide will deploy an MDC, RC within the next few months • USQCD will also try to match this within the next 6 months, but is currently distracted with getting machines into production • others have not committed yet
Australian ILDG Node Paul Coddington School of Computer Science, University of Adelaide South Australian Partnership for Advanced Computing paul.coddington@adelaide.edu.au May 2005
Overview • A prototype ILDG node has been set up in Australia for data from the Centre for the Subatomic Structure of Matter (CSSM). • We have developed a metadata catalog, replica catalog and web portal. • Currently just allows searching, browsing and downloading of QCDML metadata • ability to download configuration files will be added later. • Metadata for around 50 ensembles is currently available.
Metadata Catalog • Ensemble and configuration QCDML metadata is generated as XML files which are loaded into Apache Xindice, an XML database. • The metadata catalog web service was developed in Java using Xindice's implementation of the XML:DB API for XML databases. • So should work with other XML databases • It (almost) conforms to the metadata catalog interface defined by the ILDG Middleware Working Group. • Added additional parameter to specify returning GFNs or XML • XPath queries are passed directly to the XML database.
Other Components • Replica catalog is a web service wrapper around the Replica Location Service for Globus Toolkit 3. • Plan to change this to GT4 RLS or something else. • No mechanism for downloading files yet • Will initially generate wget script, like Japanese portal. • Then investigate using SRM. • Web portal written using JSP. • http://www.sapac.edu.au/ildg/cssm/ • All software will be made freely available after code is cleaned up and documented.
Middleware Working Group Near Term Task List • Approve minor changes to MDC interface • Decide on the URL for, and deploy: master directory file master membership file • Collect official CA certificates from all collaborations and post at www.lqcd.org for all to easily retrieve (for configuring servers for strongly authenticated operations)
Most Significant Challenges • Get data into ILDG compliant format • create or automate creation of metadata compliant with qcdml1.1 • write files in ILDG format (or write translation program for on-the-fly translation) will LQCD application developers do this? or will manpower need to be found for translation programs? • Get the MDC operational and populated (other tasks are comparatively easy)
Other Challenges • Manpower to implement a nice user interface for browsing, and optionally retrieving files (once per collaboration, or shared, even hosted at www.lqcd.org ?) • Manpower to write some simple command line client tools to be used in workflow scripting Goal of reaching an operational status by June 2006 is still feasible!