440 likes | 603 Views
PRAGMA Grid – Lessons Learned. Cindy Zheng, David Abramson, Peter Arzberger, Shahaan Ayyub, Colin Enticott, Slavisa Garic, Wojtek Goscinski, Mason J. Katz,
E N D
PRAGMA Grid – Lessons Learned Cindy Zheng, David Abramson, Peter Arzberger, Shahaan Ayyub, Colin Enticott, Slavisa Garic, Wojtek Goscinski, Mason J. Katz, Bu Sung Lee, Phil M. Papadopoulos, Sugree Phatanapherom, Somsak Sriprayoonsakul, Yoshio Tanaka, Yusuke Tanimura, Osamu Tatebe, Putchong Uthayopas and the whole PRAGMA Grid team Pacific Rim Application and Grid Middleware Assembly http://www.pragma-grid.net http://goc.pragma-grid.net
Overview • PRAGMA • PRAGMA Grid • People • Hardware • Software • Operations • Grid Applications • Grid Middleware • Security • Infrastructure • Services • Grid Interoperations Heterogeneity People Collaborations Integrations Lessons learned
PRAGMA “A Practical Collaborative Framework” People and applications Overarching Goals Strengthen Existing and Establish New Collaborations Work with Science Teams to Advance Grid Technologies and Improve the Underlying Infrastructure In the Pacific Rim and Globally http://www.pragma-grid.net
PRAGMA Member Institutions CRAY PNWG USA JLU China KBSI KISTI Konkuk Korea CNIC China AIST CCS CMC NARC OsakaU TITech Japan UUtah USA UoHyd India CalIT2 CRBS SDSC UCSD USA APAN Japan NCSA StarLight TransPAC2 USA ASGCC NCHC Taiwan CICESE Mexico KU NECTEC TNGC Thailand APAC Australia BII IHPC NGO Singapore BeSTGRID New Zealand MIMOS USM Malaysia 37 institutions from 12 countries/regions Founded 2002 Supported by Members MU Australia http://www.pragma-grid.net
Workshops and Organization Information Exchange Planning and Review New Collaborations New Members Expand Users Expand Impact Routine Use Lab/Testbed Testing Applications Building Grid and GOC Multiway Dissemination Key Middleware Overview and ApproachProcess to Promote Routine Use Team Science Application-Driven Collaborations Applications Middleware Outcomes Improved middleware Broader Use New Collaborations Transfer Tech. Standards Publications New Knowledge Data Access Education
PRAGMA Working Groups • Bioscience • Telescience • Geo-science • Resources and data • Grid middleware interoperability • Global grid usability and productivity PRAGMA Grid effort is led by resources and data working group, but rely on collaborations and contributions among all working groups.
PRAGMA Grid JLU China CNIC GUCAS China AIST OsakaU UTsukuba TITech Japan NCSA USA CNIC GUCAS AIST NCSA UZH Switzerland KISTI Korea BU USA UZH UUtah USA SDSC USA SDSC LZU China LZU UPRM Puerto Rico ASGC NCHC Taiwan UoHyd India ASGC CICESE Mexico CUHK HongKong UNAM Mexico CUHK NECTEC ThaiGrid Thailand NECTEC ThaiGrid HCMUT IOIT-HCM Vietnam ITCR Costa Rica IOIT-HCM APAC QUT Australia MIMOS MIMOS USM Malaysia BII IHPC NGO NTU Singapore UCN Chile BESTGrid New Zealand NGO UChile Chile MU Australia 32 institutions in 16 countries/regions,27 compute sites,14 Gfarm sites (+ 6 in preparation) 14 gfarm sites
PRAGMA Grid Members and Teamhttp://goc.pragma-grid.net/wiki/index.php/Site_status_and_tasks • Sites • 23 sites from PRAGMA member institutions • 15 sites from Non-PRAGMA member institutions • 27 sites contributed compute clusters • Team members • 160 and growing • one management contact / site • 1~3 technical support contact / site • 1~4 application drivers / application • 1~5/Middleware development teams
PRAGMA Grid Compute Resourceshttp://goc.pragma-grid.net/pragma-doc/computegrid.html
Characteristics of PRAGMA Grid • Grass-root • Voluntary contribution • Open (PRAGMA member or not, pacific rim or not) • Long-term collaborative working experiment • Heterogeneous • Funding • No uniform infrastructure management • Variety of sciences and applications • Site policies, system and network environments • Realistically tough • Good for development, collaborations, integrations and testing
PRAGMA Grid Software Layershttp://goc.pragma-grid.net/pragma-doc/userguide/join.html Applications FMO Savannah MM5 CSTFT Siesta AMBER Phylogenetic … Application Middleware Infrastructure Middleware Ninf-G Nimrod/G Mpich-GX … Gfarm SCMSWeb CSF MOGAS … Globus (required) Local job scheduler (require one) SGE PBS LSF SQMS …
One of the major lessons from PRAGMA Grid, that everybody has noticed and would agree– “You have to Grid People before you can Grid machines” Rajesh Chhabra Australia
Grid Operationhttp://goc.pragma-grid.net, http://wiki.pragma-grid.net • Develop and maintain mutual beneficial and happy relationships among all people involved • Geographies, time-zones, languages • Funding, chain-of-command, priorities • Mutual benefit, consensus, active leadership • Coordinator, site contacts • Collaboration tools • Mailing lists, VTCs, Skype, semi-annual workshops • Grid Operation Center (GOC) • Wiki, all sites and application, middleware teams collaborate • Heterogeneity • Tolerate, technology, overcome and take advantage • Software inventory instead of software stack • Many sub-grids for applications • Recommendation instead of requirements • Software license
Create New Ways To Operatehttp://goc.pragma-grid.net, http://wiki.pragma-grid.net • Lack precedence • Everyone contributes ideas, suggestions • Evolving and improving over time • Everyone document and update (wiki) • Create new procedures • New site setup to join PRAGMA Grid http://goc.pragma-grid.net/pragma-doc/userguide/join.html • New user/application to run in PRAGMA grid http://goc.pragma-grid.net/pragma-doc/userguide/pragma_user_guide.html • Tabulate information • Application pages, site pages, resources tables, status pages • Publish instructions • Software deployment procedures, tools
Applications and Middlewarehttp://goc.pragma-grid.net/applications/default.html • Real science applications paired with and drive middleware development • Open to applications of all scientific disciplines • Achieve long-run and scientific results • ~30 applications in 3 years: • Climate simulation • Savannah/Nimrod (MU, Australia) • MM5/Mpich-Gx (CICESE, Mexico; KISTI, Korea) • Quantum-mechanics, quantum-chemistry: • TDDFT, QM-MD, FMO/Ninf-G (AIST, Japan) • Genomics and meta-genomics • iGAP/Gfarm/CSF (UCSD, USA; AIST, Japan; JLU, China) • HPM: genomics (IOIT-HCM, Vietnam) • mpiBlast/Mpich-G2 (ASGC, Taiwan) • Phylogenetic/Gfarm/CFS (UWisc and UCSD, USA) • Computational chemistry and fluid dynamics • CSE-Online (UUtah, USA) • e-AIRS (KISTI, Korea) • Gamess-APBS/Nimrod (UZurich, Switzerland) • Molecular simulation • Siesta/Nimrod (UZurich, Switzerland; MU, Australia) • Amber/Gfarm ( USM, Malaysia; AIST, Japan) • Environmental Science • CSTFT/Ninf-G (UPRM, Puerto Rico) • Computer Science • Load Balancer (VAST-HCM, Vietnam) • GriddLeS (MU, Australia)
Applications By PRIME Studentshttp://prime.ucsd.edu/student_collections2007.htm Providing UCSD undergraduate students international interdisciplinary research internships and Cultural experiences since 2004. Sample applications ran in PRAGMA grid this year: • Climate modeling • Multi-walled carbon nanotube and polyethylene oxide composite computer visualization model • Metabolic regulation of ionic currents and pumps in rabbit ventricular myocyte model • Improving binding energy using quantum mechanics • Cardiac mechanics modeling • H5N1 simulation • Shp2 Protein Tyrosine Phosphatase Inhibitor simulation for cancer research
Lessons Learned From Running Applications • PRAGMA grid and its heterogeneous environment is great for • Testing • Collaborating • Integrating • Sharing • Not easy • Middleware needs improvements • Work in heterogeneous environment • Fault tolerance • Need user friendly portals and services • Automate and integrate • Information collections (grid monitoring, workflow) • Decisions and executions (scheduling) • Domain specific easy user interfaces (portals, CE tools) • …
Ninf-Ghttp://ninf.apgrid.org • Developed by AIST, Japan • Based on GridRPC model • Support parallel computing • OGF standard • Integrated to NMI release 8 (first non-US software in NMI) • Integrate with Rocks • 4 applications ran in PRAGMA grid, 2 ran in multi-grid • TDDFT • QM/MD • FMO • CSTFT (UPRM) • Achieved long runs(50 days) • Improved fault-tolerance • Simplified deployment procedures • Speed-up development cycles
Job 1 Job 2 Job 3 Job 4 Job 5 Job 6 Job 7 Job 8 Job 9 Job 10 Job 11 Job 12 Job 13 Job 14 Job 15 Job 16 Job 17 Job 18 Nimrod/Ghttp://www.csse.monash.edu.au/~davida/nimrod Description of Parameters PLAN FILE • Developed by Monash University (MU), Australia • Supports large scale parameter sweeps on Grid infrastructure • Easy user interface – Nimrod portals • MU, Australia • UZurich, Switzerland • UCSD, USA • 3 applications ran in PRAGMA grid and 1 runs in multi-grids • Savanah climate simulation (MU) • GAMESS/APBS (UZurich) • Siesta (UZurich) • Developed interface to Unicore • Achieved long runs (90 different scenarios of 6 weeks each • Improved fault-tolerance (innovate time_step) • Enhancements in data and storage handling 1:30pm – Tutorial by David Abramson, Blair Bethwaite
Mpich-Gxhttp://www.moredream.org/mpich.htm • Mpich-GX • Korea Institute of Science and Technology Information (KISTI), Korea • Based on Mpich-g2 • Grid-enabled MPI, support • Private IP • Fault tolerance • MM5 and WRF • CICESE, Mexico • Medium scale atmospheric simulation model • Experiment • KGrid • WRF work well with MPICH-GX • MM5 experienced scaling problems with MPICH-GX when use more than 24 processors in a cluster • Functionality of the private IP is usable • Performance of the private IP is reasonable
MM5-WRF/Mpich-GX Experiment Hurricane Marty Simulation Mpich-GX Private IP support Fault Tolerance support Santana Winds Simulation KGrid output USA SDSC CICESE Ensenada México eolo 4pm tomorrow – Tutorial by Oh-kyoung Kwon pluto
PRAGMA is a great model and needs to be emulated. Has helped weaken barriers between different research groups across different continents and allowed people to trust and collaborate rather than compete. Arun Agarwal UoHyd, India
Collaborations With Science and Technology Teams • Grid security • Naregi (Japan), APGrid, GAMA (SDSC, USA) • Grid infrastructure • Monitoring - SCMSWeb (ThaiGrid, Thailand) • Accounting - MOGAS (NTU Singapore) • Metascheduling - Community Scheduler Framework (JLU, China) • Cyber-environment - CSE-Online (UUtah, USA) • Rocks and middleware (SDSC, USA; …) • Ninf-G, SCE, Gfarm, Bio, K*Rocks, Condor, … • Science, datagrid, sensor, network • Biosciences – Avian Flu, portal, … • Gfarm-fuse (AIST, Japan) • GEON data network • GLEON sensor network • OptIPuter • High performance networked TDW • Telescience
Grid Security • Trust in PRAGMA grid, http://goc.pragma-grid.net/pragma-doc/certificates.html • IGTF distribution • Non-IGTF distribution (trust all PRAGMA Grid sites) • APGrid PMA • One of three IGTF founding PMAs • Many PRAGMA grid sites are members • PRAGMA CA • Naregi-CA • AIST, UCSD, UChile, UoHyd, UPRM • PRAGMA CA (experimental and production) • Based on Naregi-CA • Catch-all CA for PRAGMA • Production CA is IGTF compliant • Myproxy and VOMS services • APAC • Work with GAMA • Integrate with Naregi-CA (Naregi, UCSD) • Integration with VOMS (AIST) • Add servelet for account management (UChile) Lessons learned • Leverage resources, setups and expertise • Balance and consider both security and easy access and use • Get more user communities involved with grid security
Gfarm Grid File Systemhttp://datafarm.apgrid.org • AIST, UTsukuba, Open source development at SourceForge.net • Grid file system that Federates storage of each site • Meta-server keeps track of file copies and locations • Can be mounted from cluster nodes and clients (GfarmFS-FUSE) • Parallel I/O, near site copy for scalable performance • Replication for fault tolerance • Use GSI authentication • Easy application deployment, file sharing
PRAGMA Gfarm Datagridhttp://goc.pragma-grid.net/pragma-doc/datagrid.html - Compute Cluster
Develop and Test GfarmFS-FUSE in PRAGMA Gridhttp://goc.pragma-grid.net/wiki/index.php/Resources_and_Data Testing with applications • Igap (Gfarm, Japan, UCSD, USA; JLU, China) • Huge number of small files • High meta-data access overhead • Meta-data cache server • Dramatic improvements (44sec -> 3.54sec) • AMBER (USM, Malaysia; Gfarm, Japan) • Remote Gfarm meta-server • Meta-server is bottle-neck • File sharing permission, security • 2.0 improved performance • Use as a shared storage only Version 1.4 works well in local or regional grid • GeoGrid, Japan • CLGrid, Chile Integration • SCMSWeb (ThaiGrid, Thailand) • Rocks (SDSC, USA; UZH, Switzerland)
SCMSWebhttp://www.opensce.org/components/SCMSWeb • Developed by Kasetsart University and ThaiGrid • Web-based real-time grid monitoring system • System usage, Job/queue status • Probe – Globus authentication, job submission, gridftp, Gfarm access, … • Network bandwidth measurements with Iperf • PRAGMA grid geo map • Support Linux, Solaris. Good meta-view, easy user interface, excellent user support • Develop and test in PRAGMA grid • Deployed in 27 sites, improve scalability and performance • Sites help with porting to ia64 and Solaris • Demands push fast expansion of functionalities • More regional/national grids learned and adopted
SCMSWeb Collaborations and Integrations • Grid Interoperation Now (GIN, OGF)http://forge.gridforum.org/sf/wiki/do/viewPage/projects.gin/wiki/GinOps • Worked with PRAGMA grid, TeraGrid, OSG, NorduGrid and EGEE on GIN testbed monitoring http://goc.pragma-grid.net/cgi-bin/scmsweb/probe.cgi, added probes to handle various grid service configurations/tests. • Worked with CERN and Implemented a XML-> LDIF translator for GIN geo map http://maps.google.com/maps?q=http://lfield.home.cern.ch/lfield/gin.kml • Worked with many grid monitor software developers on a common schema for cross-grid monitoring http://wiki.pragma-grid.net/index.php?title=GIN_%28Grid_Inter-operation_Now%29_Monitoring • Software integration and interoperations • Rocks – SCE roll • MOGAS, grid accounting • CSE-Online, CSF, provide resource info • Things are being worked on and planned • Data federator for grid applications • Provide site software information • Standardize data extractions and formats • Improve data storage with RDBMS • Interoperate with other monitoring software • Ganglia support
MOGAShttp://ntu-cg.ntu.edu.sg/pragma/index.jsp • Multi-Organization Grid Accounting System (MOGAS) • Lead by NanYang University, funded by National Grid Office in Singapore • Build on globus core (gridftp, GRAM, GSI) • Support GT2,3,4, SGE, PBS • Job/user/cluster/OU/grid levels usages; job logs; metering and charging tools • Develop and test in PRAGMA grid • Deployed on 14 sites: different GT versions, job schedulers, GRAM scripts, security policies • Feedbacks, improve, automate deployment procedure • Decentralized servers and better database to improve scalability and performance • Collaborations and integrations with applications and other middleware teams push the development of easy database interface 4pm – MOGAS tutorial by Francis Lee
CSF4http://goc.pragma-grid.net/wiki/index.php/CSF_server_and_portalCSF4http://goc.pragma-grid.net/wiki/index.php/CSF_server_and_portal • Community Scheduler Framework, v4 – meta-scheduler • Developed by Jilin University, China • Grid services host in GT4, WSRF compliant, execution Component in Globus Toolkit 4 • Open Source, http://sourceforge.net/projects/gcsf • Support GT2&4, LSF, PBS, SGE, Condor • Easy user interface - portal • Testing and collaborating in PRAGMA • Testing with application iGAP (UCSD, AIST, KISTI, …) • Collaborate and integrate with Gfarm on data staging (AIST, Japan) • Setup a CSF server and portal (SDSC, USA) • Collaborate/integrate with SCMSWeb for resource information (Thaigrid, Thailand) • Leverage resources and global grid testing environment 1:30pm – CSF4 Tutorial by Zhao-hui Ding
Computational Science & Engineering Onlinehttp://cse-online.net • Developed by University of Utah, USA (Thanh N. Truong) • Desktop tool, user friendly interface enables seamless access to remote data, tools and grid computing resources • Currently support computational Chemistry • Can be customized for other domain science • Developed interface to TeraGrid • Collaborate with ThaiGrid as case study • Used for Computational workshop • Extend grid access to portal architecture • Improved security • Working on interface PRAGMA grid • Heterogeneity Quantum Chemistry Drug Design Nano-materials
Collaborations with OptIPuterhttp://www.optiputer.net • OptIPuter (Optical networking, Internet Protocol, computer storage, processing and visualization technologies) • Infrastructure that will tightly couple computational resources over parallel optical networks using the IP communication mechanism • central architectural element is optical networking, not computers • enable scientists who are generating terabytes and petabytes of data to interactively visualize, analyze, and correlate their data from multiple storage sites connected to optical networks • Rocks VIS-roll (SDSC) • Networked Tile Display Walls (TDW) • Low cost • For research collaboration • For remote education and conferencing • Deployed in PRAGMA grid • 9 sites and more to follow • Future plan • Global Lambda Integrated Facility (GLIF) • Solve grid application bandwidth problem CNIC, China UCSD, USA
Grid Interoperation Now (GIN)http://forge.gridforum.org/sf/wiki/do/viewPage/projects.gin/wiki/GinOps OGF – GIN-OPS • GIN testbed (February, 2006 – on-going) • Application driven • TDDFT/Ninf-G (PRAGMA - AIST, Japan) • PRAGMA, TeraGrid, OSG, NorduGrid; EGEE • Savanah fire simulation (PRAGMA – Monash University, Australia) • PRAGMA, TeraGrid, OSG • Multi-Grid monitoring • SCMSWeb probe matrix (PRAGMA - ThaiGrid, Thailand) • Common schema (PRAGMA, TeraGrid, EGEE, NorduGrid)
OSG-PRAGMA Grid Interoperation Experimentshttp://goc.pragma-grid.net/wiki/index.php/Main_Page#Grid_Inter-operations • More resources and support from each grid, but no special arrangements • Application long-run • GridFMO/Ninf-G – Large scale quantum Chemistry (Tsutomo Ikegami, AIST, Japan) • 240 CPUs from OSG and PRAGMA grid, 10 days x 7 calculations • Fault-tolerance enabled long-run • Meaningful and usable scientific results
Lessons Learned From Grid Interoperation • Grid interoperation make large scale calculations possible • Differences among grids provide learning, collaboration and integration opportunities • IGTF, VOMS (GIN) • Common Software Area (TeraGrid) • Ninf-G (AIST/PRAGMA) interface to NorduGrid • Nimrod-G (MU/PRAGMA) interface to Unicore (PRIME) • VDT (OSG) and Rocks (SDSC/PRAGMA) integration • Differences in grid environment are source of difficulties for users and applications • Different user access setup procedure - take extra effort • Different job submission protocols • GRAM, Sandbox, gridftp, modified GRAM, … • One-to-one interface building is not scalable, nor desirable. Need standard. • Middleware fault tolerance and flexible resource management is important
Collaborate in Publishing Research Results Some published papers in 2007: • Amaro, RE, Minh DDL, Cheng LS, Lindstrom, WM Jr, Olson AJ, Lin JH, Li WW, and McCammon JA. Remarkable Loop Flexibility in Avian Influenza N1 and Its Implications for Antiviral Drug Design. J. AM. CHEM. SOC. 2007, 129, 7764-7765 (PRIME) • Choi Y, Jung S, Kim D, Lee J, Jeong K, Lim SB, Heo D, Hwang S, and Byeon OH."Glyco-MGrid: A Collaborative Molecular Simulation Grid for e-Glycomics," in 3rd IEEE International Conference on e-Science and Grid Computing, Banglore, India, 2007. Accepted. • Ding Z, Wei W, Luo Y, Ma D, Arzberger PW, and Li WW, "Customized Plug-in Modules in Metascheduler CSF4 for Life Sciences Applications," New Generation Computing, p. In Press, 2007. • Ding Z, Wei S, Ma, D and Li WW, "VJM -- A Deadlock Free Resource Co-allocation Model for Cross Domain Parallel Jobs," in HPC Asia 2007, Seoul, Korea, 2007, p. In Press. • Görgen K, Lynch H, Abramson D, Beringer J and Uotila P. "Savanna fires increase monsoon rainfall as simulated using a distributed computing environment", to appear, Geophysical Research Letters. • Ichikawa K, Date S, Krishnan S, Li W, Nakata K, Yonezawa Y, Nakamura H, and Shimojo S, "Opal OP: An extensible Grid-enabling wrapping approach for legacy applications", GCA2007 - Proceedings of the 3rd workshop on Grid Computing & Applications -, pp.117-127 , Singapore, June 2007 a. (PRIUS) • Ichikawa K, Date S, and Shimojo S. “A Framework for Meta-Scheduling WSRF Based Services”, Proceedings of 2007 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM 2007), Victoria, Canada, pp. 481-484, Aug. 2007 b. (PRIUS) • Kuwabara S, Ichikawa K, Date S, and Shimojo S. “A Built-in Application Control Module for SAGE”, Proceedings of 2007 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM 2007), Victoria, Canada, pp. 117-120, Aug. 2007. (PRIUS) • Takeda S, Date S, Zhang J, Lee BU, and Shimojo S. “Security Monitoring Extension For MOGAS”, GCA2007 - Proceedings of the 3rd workshop on Grid Computing & Applications - , pp.128-137 Singapore, June 2007. (PRIUS) • Tilak S, Hubbard P, Miller M, and Fountain T, ``The Ring Buffer Network Bus (RBNB) DataTurbine Streaming Data Middleware for Environmental Observing Systems," to appear in the Proceedings of the e-Science 2007 • Zheng C, Katz M, Papadopoulos P, Abramson D, Ayyub S, Enticott C, Garic S, Goscinski W, Arzberger P, Lee B S, Phatanapherom S, Sriprayoonsakul S, Uthayopas P, Tanaka Y, Tanimura Y, Tatebe O. Lesson Learned Through Driving Science Applications in the PRAGMA Grid. Int. J. Web and Grid Servies, Vol.3, No.3, pp287-312. 2007 …
Summary • PRAGMA grid • Shared vision lower resistance to use others software, test on others resources • Formed new development collaborations • Size and heterogeneity, explore issues which functional grid must resolve • Management, resources and software coordination • Identity and fault management • Scalability and performance • Feedback between application and middleware help improve software and promote software integration • Heterogeneous global grid • Is realistic and challenge • Can be good for middleware development and testing • Can be useful for real science • Impact • Software dissemination (Rocks, Ninf-G, Nimrod, SCMSWeb, Naregi-CA, …) • Help new national/regional grids (Chile, Vietnam, Hong kong, …) • Key is people, is collaboration
A Grass Roots Effort “One of the most important lessons of the Internet is that it grows most successfully where grass roots initiatives are encouraged and enabled. The Internet has historically grown from the bottom up, and this aspect continues to fuel its continued growth in the academic and commercial sectors.” • Vint Cert, UN Economic and Social Council in 2000
PRAGMA is supported by the National Science Foundation (Grant No. INT-0216895, INT-0314015, OCI -0627026) and by member institutions • PRIME is supported by the National Science Foundation under NSF INT 04007508 • PRAGMA grid is the result of contributions and support from all PRAGMA grid team members Thank You http://www.pragma-grid.net http://goc.pragma-grid.net