240 likes | 362 Views
Production Grid Challenges in Hungary. Péter Stefán Ferenc Szalai Gábor Vitéz NIIF/HUNGARNET. Agenda. Brief introduction Grid initiatives - ClusterGrid Challenges in a production environment Generic ClusterGrid operation model Management issues User support Monitoring
E N D
Production Grid Challenges in Hungary Péter Stefán Ferenc Szalai Gábor Vitéz NIIF/HUNGARNET
Agenda • Brief introduction • Grid initiatives - ClusterGrid • Challenges in a production environment • Generic ClusterGrid operation model • Management issues • User support • Monitoring • ClusterGrid future challenges • Conclusions
collaborative infrastructure computing infrastructure middleware infrastructure networking infrastructure Brief NIIF Introduction • Hungarian NREN. Videoconference, central HA cluster GRID supercomputing VPNs, VoIP, directory service IP, IPv6, MPLS, lambda etc 10G backbone, ~600.000 users, ~750 institutions
Supercomputers • Consists of 2 SUN E15Ks and 2 SUN 10Ks located at two universities, including 276 CPUs, 300 GB of memory. • Used to be in the top 500. • In production since 2001. • Serves more than 200 users, and 100 scientific projects.
Hungarian grid initiatives, MGKK • Hungarian grid initiatives can be classified into grid infrastructure and grid system development projects. • Key role-players formulate grid collaboration: Hungarian Grid Competence Center (MGKK) involving BUTE, ELUB, MTA-SZTAKI, NIIF/HUNGARNET, KFKI, University of Veszprém. • Intensive participation in many national and European grid initiatives: EGEE, NorduGrid, SEE-GRID, etc.
ClusterGrid initiative • It is a pool of 1400 PC nodes throughout the country involving more than 26 clusters. • Production infrastructure since July 2002. • Supercomputer clusters are planned to be involved too. • A rough measurement on the total compute capacity is about 600 Gflops. • Even though it is much smaller than regional, continental grids, in complexity it is at the same range.
Challenges in production environment • Grid definition - set clear objectives what to build • Simplicity - keep the system transparent, usable • Completeness - cover not only application level • Security - using computer networking methods (MPLS, VLAN technologies) • Compatibility - other grids (X509, LDAP) • Manageability - easy-to-maintain • Robustness - fault tolerant behavior • Usability - cover many job classes, user support • Platform independency - to be able to execute on MS
Some new ideas… • MPLS, VLAN connected resources • Web-transaction based resource broker • Dynamic, separated run-time environment
Challenges in production … cont’d … • Management • physical compute resources (supercomputers, clusters), • virtual resources (virtual clusters), • storage nodes, • users, • services • User support • Grid architecture monitoring
Storage management • Low level management of disks and volumes, file systems (cost efficient storage solutions by using ATA over Ethernet - AoE). • Medium level access management (gridFTP, FTPS). • High level data brokering (extended SRM model).
User management • User personal data is kept in an LDAP based directory service separately from authentication data. • Aided by a web registration interface. • Authentication: • X509 certificates, • LDAP based authentication. • No authorization yet.
Service management (experimental) • Relatively new direction. • It is a special service. • It is based on well-established authorization. • Basically helps to start, stop, (re)configure grid services.
User support • Grid service provider gives user support covering: • consultation about the benefits of grid usage, • code porting and optimization, • partial aid in code implementation, • job formation and execution, • generic grid usage. • Not yet covered: • model creation, • formal description, • algorithm creation.
ClusterGrid monitoring • Fluctuation of grid cluster resources between the day-shift and night-shift operation. • Blue line – total; Green area – occupied. • 2-layer hierarchical monitoring system.
Future ClusterGrid (?) challenges • Continuously growing demands for reliable compute and data storage infrastructure. • Grid systems should conform to international standards and MUST interoperate with one another. • Platform-independency is not an issue yet, but will be. • LEGO-based principles are of increasing importance. • Threats: solutions that prevent development; erosion of the belief in the power of “grid”.
Conclusions • One of the first production-level grids have been shown in a nutshell. • With special emphasis on operation, management and user support issues. • Management generally covers grid resource, grid user management and monitoring. • Some remarks regarding future development were also done.
Thanks for your attention! www.clustergrid.hu www.mgkk.hu grid-tech@niif.hu