160 likes | 255 Views
TeraGrid Gateway User Concept – Supporting Users. V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory. In collaboration with many teams:
E N D
TeraGrid Gateway UserConcept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory In collaboration with many teams: NSTG, SNS Scientific Computing, McStas group, Open Science Grid, Tech-X Corp, and the TeraGrid Partners teams.
What is a Science Gateway? • A Science Gateway • Enables scientific communities of users with a common scientific goal • Uses high performance computing • Has a common interface • Leverages community investment • Three common forms: • Web-based Portals • Application programs running on users' machines but accessing services in TeraGrid • Coordinated access points enabling users to move seamlessly between TeraGrid and other grids.
How can a Gateway help? • Make science more productive • Researchers use same tools • Complex workflows • Common data formats • Data sharing • Bring TeraGrid capabilities to the broad science community • Lots of disk space • Lots of compute resources • Powerful analysis capabilities • A nice interface to information
What is the TeraGrid? 20 computers at 11 facilities 10Gbps network Over a petaflop of computing power 136,470 CPU-cores 60 petabytes long-term storage Growing GIG UW UC/ANL PSC NCAR PU NCSA UNC/RENCI IU Caltech ORNL NICS USC/ISI SDSC LONI TACC Resource Provider (RP) Software Integration Partner Grid Infrastructure Group (UChicago) (GIG)
Neutron Science TeraGrid Gateway • Focus is neutron science • Connects facilities with cyberinfrastructure • Bridges cyberinfrastructure • Combines TeraGrid computational resources with neutron datasets • Data movement across TeraGrid • Outreach to neutron science
Community Certificate and Account • Gateways with community accounts scale to thousands of facility users • Have Jimmy Neutron community accounts on 14 TeraGrid computers • Use Jimmy Neutron Community Certificate from SNS community account • Record end-user identification for auditing and return of results Community Account
Before Gateways • Large facilities: • Recorded histogram data from experiments • Users: • Took their data home on floppy disk in pocket • Saved permanent copy on hard disk • Did not have event data to change histogram • Translated data into format needed for analysis • Wrote their own code to read and analyze data • Installed discipline focused software on their PC for analysis • Installed plotting programs/libraries to plot analysis output
After Gateways • Large facilities: • Record data from multiple facilities • SNS, HFIR, LENS, IPNS, LUJAN, … • Save permanent copy of raw data • Bin event data into histogram • Translate data into standard NeXus format • Have analysis and simulation programs available from portal • Use remote TeraGrid cyberinfrastructure for computations • Have visualization capability in portal • Users: • Use portal from web for all data, analysis, and visualization
Gateway Savings • Users do not duplicate efforts • Facilities do not duplicate efforts • Data is not lost • Data is easily shared • Natural way to integrate community contributed instrument specific software for hosting and wide availability to many facility users • Analysis is done quickly on high performance computers
Simulation Service • Simulation of neutron instrument is available in portal with McStas • Simulations agree with experimental results • Linear scaling to 1024 cores • Output is NeXus • Use cases: • Instrument design and construction • Experiment planning and analysis
Fitting Service • Fits theoretical models to the NeXus data files from the experiments • Adaptive nonlinear least squares algorithm implemented in parallel • Linear speedup to 32 cores • Service to run on TeraGrid
Reduction Service • Reduction software is available tor backscattering and reflectometry through portal • Calculations will be sent to local cluster and TeraGrid • Attempted to parallelize this calculation by distributing regions of the time-of-flight to each processor. • Each processor read only its region of the NeXus input data file and write a new file containing only that region. • Each processor performs the data reduction on its file. • The results are merged at the end of the calculation Reflectometry Data Reduction Backscattering Data Reduction
Job Information Service • Portal Job Information Service tells where jobs is running, when it started and status • Daily tests of submitting five simultaneous remote portal jobs • Percentage success is > 82%
Tests of Remote Job Submission • Difficult to diagnose the problem from the Globus output. • Check the status of the computer • Look at the output files • Some problems diagnosed: • Updated software on a computer that required relinked executables • Globus software setting the wrong time limit • Batch prologue script that killed jobs on same core • Long queue waits • Firewall installed
Conclusions • Gateways help facilities scale to a large number of users • Gateways give facilities access to high performance computing such as the TeraGrid • Gateways enable a scientific community to use community software through a common interface • Researchers are more productive if they use the same tools, use a common data format, and share data easily