1 / 34

Na tional Re search G rid I nitiative (NAREGI)

Na tional Re search G rid I nitiative (NAREGI). Sub-P roject Leader, NAREGI Project Visiting Professor, National Institute of Informatics Professor, GSIC, Tokyo Inst. Technology Satoshi Matsuoka. Inter-university Computer Centers (excl. National Labs) circa 2002. Hokkaido University.

tierra
Download Presentation

Na tional Re search G rid I nitiative (NAREGI)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. National Research Grid Initiative (NAREGI) Sub-Project Leader, NAREGI Project Visiting Professor, National Institute of Informatics Professor, GSIC, Tokyo Inst. Technology Satoshi Matsuoka

  2. Inter-university Computer Centers (excl. National Labs) circa 2002 Hokkaido University HITACHI SR8000 HP Exemplar V2500 HITACHI MP5800/160 Sun Ultra Enterprise 4000 University of Tsukuba FUJITSU VPP5000 CP-PACS 2048 (SR8000 proto) Kyoto University FUJITSU VPP800 FUJITSU GP7000F model 900 /32 FUJITSU GS8000 Tohoku University NEC SX-4/128H4(Soon SX-7) NEC TX7/AzusA Kyushu University FUJITSU VPP5000/64 HP GS320/32 FUJITSU GP7000F 900/64 University of Tokyo HITACHI SR8000 HITACHI SR8000/MPP Others (in institutes) Tokyo Inst. Technology (Titech) NEC SX-5/16, Origin2K/256 HP GS320/64 Nagoya University Osaka University FUJITSU VPP5000/64 FUJITSU GP7000F model 900/64 FUJITSU GP7000F model 600/12 NEC SX-5/128M8 HP Exemplar V2500/N

  3. Simply Extend the Campus Grid? 100,000 users/machines, 1000kms Networking PetaFlops/Petabytes…Problems! Grid Software Stack Deficiency Large scale resource management Large scale Grid programming User support tools – PSE, visualization, portals Packaging, distribution, troubleshooting High-Performance networking vs. firewalls Large scale security management “Grid-Enabling” applications Manufacturer experience and support Q: Grid to be a Ubiquitous National Research Computing Infrastructure---How?

  4. National Research Grid Initiative (NAREGI) Project:Overview • A new Japanese MEXT National Grid R&D project • ~$(US)17M FY’03 (similar until FY’07) + $45mil • One of two major Japanese Govt. Grid Projects • c.f. “BusinessGrid” • Collaboration of National Labs. Universities and Major Computing and Nanotechnology Industries • - Acquisition of Computer Resources underway (FY2003) MEXT:Ministry of Education, Culture, Sports,Science and Technology

  5. Petascale Grid Infrastructure R&D for Future Deployment $45 mil (US) + $16 mil x 5 (2003-2007) = $125 mil total Hosted by National Institute of Informatics (NII) and Institute of Molecular Science (IMS) PL: Ken Miura (FujitsuNII) SLs Sekiguchi(AIST), Matsuoka(Titech), Shimojo(Osaka-U), Hirata(IMS)… Participation by multiple (>= 3) vendors Resource Contributions by University Centers as well NanotechGrid Apps (Biotech Grid Apps) (OtherApps) “NanoGrid”IMS ~10TF (BioGridRIKEN) Other Inst. National ResearchGrid Middleware R&D Grid and NetworkManagement Grid Middleware Grid R&D Infrastr.15 TF-100TF SuperSINET National Research Grid Infrastructure (NAREGI) 2003-2007 Various Partners NEC Focused “Grand Challenge” Grid Apps Areas Osaka-U Titech AIST Fujitsu U-Tokyo Hitachi U-Kyushu

  6. National Research Grid Initiative (NAREGI) Project:Goals R&D in Grid Middleware  Grid Software Stack for “Petascale” Nation-wide “Research Grid” Deployment (2) Testbed validating 100+TFlop (2007) Grid Computing Environment for Nanoscience apps on Grid - Initially ~17 Teraflop, ~3000 CPU dedicated testbed - Super SINET (> 10Gbps Research AON backbone) (3) International Collaboration with similar projects (U.S., Europe, Asia-Pacific incl. Australia) (4) Standardization Activities, esp. within GGF

  7. NAREGI Research Organization and Collaboration MEXT Center for Grid Research & Development (National Institute of Informatics) Grid R&D Advisory Board National Supercomputeing Centers Project Leader (K.Miura, NII) Coordination in Network Research Grid R&D Progam Management Committee Grid Middleware and Upper Layer R&D AIST (GTRC) Grid Networking R&D SuperSINET Joint Research Group Leaders Group Leader Technical Requirements National Supercomputing Centers Network Technology Refinement R&D Operations Coordination/ Deployment R&D Utilization of Network R&D Technology Dev. Universities Research Labs. Joint Research (Titech,Osaka-U, Kyushu-U. etc)) Computational Nano-science Center (Institute for Molecular Science) Utilization of Computing Resources ITBLProject (JAIRI) Testbed Resources (Acquisition in FY2003) NII:~5Tflop/s IMS:~11Tflop/s Nano-science Applicatons Director(Dr. Hirata, IMS) ITBLProject Dir. Operations Operations R&D R&D Joint Research Consortium for Promotion of Grid Applications in Industry R&D of Grand-challenge Grid Applocations (ISSP,Tohoku-u,,AIST etc., Industrial Partners)

  8. National Institute of Informatics (NII) (Center for Grid Research & Development) Institute for Molecular Science (IMS) (Computational Nano‐science Center) Universities and National Labs (Joint R&D) (AIST Grid Tech. Center, Titech GSIC, Osaka-U Cybermedia, Kyushu-U, Kyushu Inst. Tech., etc.) Project Collaborations (ITBL Project, SC Center Grid Deployment Projects etc.) Participating Vendors (IT and NanoTech) Consortium for Promotion of Grid Applications in Industry Participating Organizations

  9. Future Research Grid Metrics 10s of Institutions/Centers, various Project VOs > 100,000 users, > 100,000 CPUs/machines Machines very heterogeneous, SCs, clusters, desktops 24/7 usage, production deployment Server Grid, Data Grid, Metacomputing… Do not reeinvent the wheel Build on, collaborate with, and contribute to the “Globus, Unicore, Condor” Trilogy Scalability and dependability are the key Win support of users Application and experimental deployment essential However not let the apps get a “free ride” R&D for production quality (free) software NAREGI R&D Assumptions & Goals

  10. WP-1: National-Scale Grid Resource Management: Matsuoka (Titech), Kohno(ECU), Aida (Titech) WP-2: Grid Programming: Sekiguchi(AIST), Ishikawa(AIST) WP-3: User-Level Grid Tools & PSE: Miura (NII), Sato(Tsukuba-u), Kawata (Utsunomiya-u) WP-4: Packaging and Configuration Management: Miura (NII) WP-5: Networking, National-Scale Security & User Management Shimojo (Osaka-u), Oie ( Kyushu Tech.) WP-6: Grid-Enabling Nanoscience Applications : Aoyagi (Kyushu-u) NAREGI Work Packages

  11. NAREGI Software Stack 100Tflops級のサイエンスグリッド環境 WP6: Grid-Enabled Apps WP3: Grid Visualization WP3: Grid PSE WP3: Grid Workflow WP2: Grid Programming-Grid RPC -Grid MPI WP4: Packaging WP1: Grid Monitoring & Accounting WP1: SuperScheduler (Globus,Condor,UNICOREOGSA) WP5: Grid PKI WP1: Grid VM WP5: High-Performance Grid Networking

  12. Build on Unicore  Condor  Globus Bridge their gaps as well OGSA in the future Condor-U and Unicore-C SuperScheduler Monitoring & Auditing/Accounting Grid Virtual Machine PKI and Grid Account Management (WP5) Condor-U Unicore-C WP-1:National-Scale Grid Resource Management EU GRIP Condor-G Globus Universe

  13. Hierarchical SuperScheduling structure, scalable to 100,000s users, nodes, jobs among >20+ sites Fault Tolerancy Workflow Engine NAREGI Resource Schema (joint w/Hitachi) Resource Brokering w/resource policy, advanced reservation (NAREGI Broker) Intially Prototyped on Unicore AJO/NJS/TSI (OGSA in the future) WP1: SuperScheduler(Fujitsu)

  14. WP1: SuperScheduler(Fujitsu) (Cont’d) WP3 PSE (U): UNICORE; Uniform Interface to Computing Resources (G): GRIP; Grid Interoperability Project UPL (Unicore Protocol Layer) over SSL Resource Discovery, Selection, Reservation WP3: Workflow Description (convert to UNICORE DAG) Internet GATEWAY(U) WP5 hNAREGI PKI [NEC] Intranet UPL (Unicore Protocol Layer) Map Resource Requirements in RSL (or JSDL) onto CIM For Super Scheduler For Local Scheduler Imperial College, London NJS(U) Network Job Supervisor Broker NJS(U) UUDB(U) Policy Engine: “Ponder” Policy Description Lang. (as a Management App.) Analysis& Prediction … Policy DB (Repository) C.f. EuroGird [Manchester U] EuroGrid Broker [マン大] NAREGI BROKER-S [Fujitsu] Resource Broker IF CIM in XML over HTTP or CIM-to-LDAP CIM Indication (Event) UPL (Unicore Protocol Layer) CheckQoS CheckQoS & SubmitJob CIMOM (CIM Object Manager) Monitoring [Hitachi] Execution NJS(U) Execution NJS(U) NAREGI BROKER-L [Fujitsu] GMA Sensor Condor CIM Provider Globus CIM Provider Batch Q A CIM Provider TSI Connection IF Ex. Queue change event FNTP (Fujitsu European Laboratories NJS to TSI Protocol) Being Planned ClassAd MDS/GARA NQS Used in CGS-WG Demo at GGF7 TSI(U) Target System Interface TSI(U) Target System Interface TSI(U) Target System Interface TSI(U) Target System Interface GRIP(G) • TOG OpenPegasus (derived from SNIA CIMOM) • Commercial Products: MS WMI (Windows Management Instrumentation), IBM Tivoli, SUN WBEM Services, etc. CheckQoS ? DRMAA ? Globus Condor OGSI portType?

  15. “Portable” and thin VM layer for the Grid Various VM functions – Access Control, Access Transparency, FT Support, Resource Control, etc. Also provides co-scheduling across clusters Respects Grid standards, e.g., GSI, OGSA (future) Various prototypes on Linux WP1: Grid Virtual Machine(NEC & Titech) Access Control&Virtualization Secure Resource Access Control Node Virtualization & Access Transparency Resource Usage Rate Control GridVM Co-Scheduling & Co-Allocation Checkpoint Support FT Support Job Control Job Migration Resource Control

  16. Scalable GridMonitoring, Accouting, Logging Define CIM-based Unified Resource Schema Distinguish End users vs. Administrators Prototype based on GT3 Index Service, CIMON, etc. WP1: Grid Monitoring & Auditing/Accounting (Hitachi & Titech) * Self Configuring Monitoring (Titech)

  17. Grid Remote Procedure Call (RPC) Ninf-G2 Grid Message Passing Programming GridMPI WP-2:Grid Programming

  18. WP-2:Grid Programming – GridRPC/Ninf-G2 (AIST/GTRC) GridRPC • Programming Model using RPC on the Grid • High-level, taylored for Scientific Computing (c.f. SOAP-RPC) • GridRPC API standardization by GGF GridRPC WG • Ninf-G Version 2 • A reference implementation of GridRPC API • Implemented on top of Globus Toolkit 2.0 (3.0 experimental) • Provides C and Java APIs Numerical Library IDL FILE DEMO is available at AIST/Titech Booth IDL Compiler Client 4. connect back 3. invoke Executable generate 2. interface reply Remote Executable GRAM 1. interface request fork Interface Information LDIF File MDS retrieve Client side Server side http://ninf.apgrid.org/

  19. MPI Core IMPI RIM Grid ADI Latency-aware Communication Topology Other Comm. Library Vendor MPI RSH GRAM SSH P-to-P Communication Vendor MPI TCP/IP PMv2 Others WP-2:Grid Programming-GridMPI (AIST and U-Tokyo) GridMPI • Provides users an environment to run MPI applications efficiently in the Grid. • Flexible and hterogeneous process invocation on each compute node • GridADI and Latency-aware communication topology, optimizing communication over non-uniform latency and hides the difference of various lower-level communication libraries. • Extremely efficient implementation based on MPI on Score (Not MPICHI-PM)

  20. Grid Workflow - Workflow Language Definition - GUI(Task Flow Representation) Visualization Tools - Real-time volume visualization on the Grid PSE /Portals - Multiphysics/Coupled Simulation - Application Pool - Collaboration with Nanotech Applicatons Group Problem Solving Environment PSE Portal PSE Toolkit PSE Appli-pool Information Service Workflow Application Server Super-Scheduler WP-3: User-Level Grid Tools & PSE

  21. Collaboration with WP1 management Issues Selection of packagers to use (RPM, GPTK?) Interface with autonomous configuration management (WP1) Test Procedure and Harness Testing Infrastructurec.f. NSF NMI packaging and testing WP-4: Packaging and Configuration Management

  22. Traffic measurement on SuperSINET Optimal Routing Algorithms for Grids Robust TCP/IP Control for Grids Grid CA/User Grid Account Management and Deployment Collaboration with WP-1 WP-5 Grid High Performance Networking

  23. Analysis of Typical Nanoscience Applications - Parallel Structure - Granularity - Resource Requirement - Latency Tolerance Development of Coupled Simulation Data Exchange Format and Framework Collaboration with IMS WP-6:Adaptation of Nano-science Applications to Grid Environment

  24. WP6 and Grid Nano-Science and Technology Applications Overview • Participating Organizations: • -Institute for Molecular Science • Institute for Solid State Physics • AIST • Tohoku University • Kyoto University • Industry (Materials, Nano-scale Devices) • Consortium for Promotion of Grid Applications in Industry • Research Topics and Groups: • Electronic Structure • Magnetic Properties • Functional nano-molecules(CNT,Fullerene etc.) • Bio-molecules and Molecular Electronics • Simulation Software Integration Platform • Etc.

  25. Example: WP6 and IMS Grid-Enabled Nanotechnology RISM FMO • IMS RISM-FMO Grid coupled simulation • RISM: Reference Interaction Site Model • FMO: Fragment Molecular Orbital method • WP6 will develop the application-level middleware, including the “Mediator” component Solvent distribution Solute structure Mediator Mediator In-sphere correlation Cluster (Grid) SMP SC GridMPI etc.

  26. Hokkaido U. Nano-Technology For GRID Application OC-48+ transmission for Radio Telescope DataGRID for High-energy Science Tohoku U. NIFS Kyoto U. NAO Middleware for Computational GRID Waseda U. Bio-Informatics KEK Osaka U. Tsukuba U. Operation (NII) U. of Tokyo Kyushu U. Doshidha U. Tokyo Institute of Tech. Nagoya U. NII R&D ISAS Okazaki Research Institutes NIG SuperSINET: AON Production Research Network (separate funding) ■ 10Gbps General Backbone ■ GbE Bridges for peer-connection ■ Very low latency – Titech-Tsukuba 3-4ms roundtrip ■ Operation of Photonic Cross Connect (PXC) for fiber/wavelength switching ■ 6,000+km dark fiber, 100+ e-e lambda and 300+Gb/s ■ Operational from January, 2002 until March, 2005

  27. SuperSINET :Network Topology (10Gbps Photonic Backbone Network) Tohoku U Tsukuba U Kyushu U Hokkaido U Kyoto U U Tokyo KEK Kyoto U Uji IMS U Tokyo Osaka hub Osaka U Tokyo hub NII Hitotsubashi NAREGI GRID R&D Doshisha U Nagoya hub NII Chiba Nagoya U NAO TITech IMS (Okazaki) NIFS ISAS Waseda U NIG Source:National Institute of Informatics As of October, 2002

  28. ~3000 Procs, ~17TFlops The NAREGI Phase 1 Testbed ($45mil, 1Q2004) AIST SuperCluster ~11TFlops • Total ~6500 procs, ~30TFlops Titech Campus Grid~1.8TFlops Osaka-U BioGrid U-Tokyo SuperSINET (10Gbps MPLS) Small Test App Clusters (x 6) ~400km Note: NOT a production Grid system c.f. TeraGrid NII(Tokyo) IMS(Okazaki) Computational Nano-science Center ~11TFlops Application Testbed Center for Grid R&D ~ 5Tflops Software Testbed

  29. Under Procurement – Installation March 2004 3 SMPs, 128 procs total (64 + 32 + 32), SparcV +IA64+Power4 6 128-proc PC clusters 2.8Ghz Dual Xeon + GbE (Blades) 3.06Ghz Dual Xeon + Infiniband 10+37TB File Server Multi-gigabit networking to simulate Grid Env. NOT a production system (c.f. TeraGrid) > 5 Teraflops WAN Simulation To form a Grid with the IMS NAREGI application testbed infrastructure (> 10 Teraflops, March 2004), and other national centers via SuperSINET NAREGI Software R&D Grid Testbed (Phase 1)

  30. NAREGI R&D Grid Testbed @ NII

  31. Grid Technology Life Science Nanotechnology Collaborations Academia Government Corporations LAN Internet Other Research Institute Advanced Computing Center. AIST (National Institute of Advanced Industrial Science & Technology) Supercluster • Challenge • Huge computing power to support various research including life science and nanotechnology within AIST • Solution • Linux Cluster IBM eServer 325 • P32: 2116 CPU AMD Opteron • M64: 520 CPU Intel Madison • Myrinet networking • SCore Cluster OS • Globus toolkit 3.0to allow shared resources. • World’s most powerful Linux-based supercomputer • more than 11 TFLOPS ranked as the third most powerful supercomputer in the world • Operational March, 2004

  32. NII Center for Grid R&D (Jinbo-cho, Tokyo) Mitsui Office Bldg. 14th Floor Akihabara Imperial Palace Tokyo St. 700m2 office space (100m2 machine room)

  33. Resource Diversity (松竹梅 “Shou-Chiku-Bai”) 松(“shou” pine) – ES – like centers 40-100Teraflops x (a few), 100-300 TeraFlops 竹(“chiku” bamboo) – Medium-sized machines at SCs, 5-10 TeraFlops x 5, 25-50 TeraFlops aggregate / Center, 250-500 TeraFlops total 梅(“bai” plumb) – small clusters and PCs spread out throughout campus in a campus Grid x 5k-10k, 50 -100 TeraFlops / Center, 500-1 PetaFlop total Division of Labor between “Big” centers like ES and Univ. Centers, Large-medium-small resources Utilize Grid sofwate stack developed by NAREGI and other Grid projects Towards Petascale Grid – a Proposal ES’s Univ SCs

  34. Data (Grid) NAREGI deliberately does not handle data Unicore components “Unicondore” (Condor-U, Unicore-C) NAREGI Middleware GridRPC, GridMPI Networking Resource Management e.g. CIM resource schema International Testbed Other ideas? Application areas as well Collaboration Ideas

More Related