210 likes | 349 Views
Experience of the SRB in support of collaborative grid computing Martin Dove University of Cambridge. A voyage of discovery. We aimed to focus on grid computing to support molecular-scale simulations ... ... but discovered the important role of data and information delivery
E N D
Experience of the SRB in support of collaborative grid computingMartin DoveUniversity of Cambridge
A voyage of discovery • We aimed to focus on grid computing to support molecular-scale simulations ... • ... but discovered the important role of data and information delivery • We thought that the SRB would provide a means to archive data ... • ... but discovered that it could be much more useful than that • The SRB has radically changed our view of how we should carry out the scientific process
My view of eScience Computing grids Collaborative grids Data grids
Science beyond the lab-book • Management of too many tasks • Management of the resultant data deluge • Sharing the information content with collaborators • Maintaining accuracy and verification
Expansion of calcite Neutron diffraction experiments 5% increase in c small decrease in a
BaCO3: lattice parameters Molecular dynamics simulations on the NGS
Challenge for the researcher • Short-term collation of the data • Longer-term management of the data • Sharing the data with collaborators
SRB and grid computing • It was important to build the data grid – in our case the SRB – into the heart of the computing grid environment • Then we needed tools to make the integration of the data and compute grids seamless, and which are easy to use – non-intrusive
Profile of our users • They want maximum control over their work processes – they don’t want to access them through portals or GUI’s • They also don’t want their applications pre-wrapped as services: they want to have complete control over their applications, e.g. to add capability • They know what they are doing ... • ... and they don’t want to be told how to do things!
Condor JobMgr Compute clusters Cluster JobMgr Access to external facilities and grids Desktop pools Condor JobMgr Campus grids Cluster JobMgr Parallel (HPC) clusters Data vault Data vault Data vault Data vault Internet Globus Globus Globus Globus Application server Globus is useda) to provide user authentication via digital certificates b) job submission middleware Our data grid is based on the San Diego Storage Resource Broker The application server provides databases and server capabilities for the SRB, metadata tools, and job submission tool Researcher
Job submission process • We have developed RMCS to run the job submission process • It integrates with the use of the data grid, specifically with the SRB • RMCS can be run from the user’s desktop via a shell-command client tool
5. Metadata is sent to the application server Data vault 3. Data files and application are transferred to the grid resource 7. Researcher interacts with the metadata database to extract core output values 1. Upload data files and application to data vault 6. Output files are transferred to the data vault 2. Submit job to grid via RMCS Researcher 4. Job runs on grid compute resources Application server
Parameter sweeps We have perl programs that • implement bulk file upload to the SRB or other data grid • generate set of RMCS input files • submit all the RMCS jobs Bulk job creation and submission is a one-command procedure
Data and information ? XML data representation instead
Instant messaging SciSpace.net Data vault Upload XML data files to data vault for sharing with collaborator Access Grid with JMAST View information content of data files using ccViz Researcher A Researcher B
SRB: some early positives • When we started, it was the only show in town to facilitate easy data sharing • It was affordable in terms of capital and person ££££ • It is easily extended through addition of new vaults • It proved easy to use
Anecdote: Lucy’s project Lucy was a third-year project student, and we let her perform her project using all our grid infrastructure with no compromises • Lucy learned to use the SRB-based data grid very easily • Using our data tools, she was able to provide me with remote access to the information content of her data very easily
Some caveats • We didn’t actually need to federate or distribute different data sources ... • ... and by distributing our data we discovered that such an approach gives an unnecessary weak link and issues of ownership • We didn’t need the access-control tools, nor the data replication tools, in which case some of the infrastructure was heavier than needed
So what is different now? • We now expect to be able to share their data with collaborators ... • ... and we expect this to be easy (ie not via a multi-stage process) • We now routinely produce complete archives of all files associated with a study easily and automatically, rather than have stuff dumped to our desktops • And we now expect a single place to deposit data, and for this process to be easy and automatic
Summary • The SRB was critical to the successes of the eMinerals project • The SRB was easy to use, and affordable • We have developed some tools on top of the SRB to make access, display of data, and access control easier (eg webdav access, web interface) • The SRB has radically changed the way we think about managing data – but I don’t think that this was an easy change to acquire
Credits Cambridge: Kat Austen, Richard Bruin, Mark Calleja, Gen-Tao Chiang, Ian Frame, Peter Murray-Rust, Toby White, Andrew Walker STFC: Kerstin Kleese van Dam, Phil Couch, Tom Mortimer-Jones, Rik Tyer Funded by NERC