330 likes | 455 Views
Toward third Generation Desktop Grids ( Private Virtual Cluster ). Ala Rezmerita, Franck Cappello INRIA Grand-Large.lri.fr. Agenda. Basic Concepts of DGrids First and second generation Desktop Grids Third generation concept PVC Early evaluation Conclusion.
E N D
Toward third Generation Desktop Grids(Private Virtual Cluster) Ala Rezmerita, FranckCappello INRIA Grand-Large.lri.fr
Agenda • Basic Concepts of DGrids • First and second generation Desktop Grids • Third generation concept • PVC • Early evaluation • Conclusion
Basic Concepts of Desktop Grids • Bag of tasks, master-worker, divide and conquer applications • Batch schedulers (clusters) • Virtual machines (Java, .net) • Standard OS (Windows, Linux, MaxOS X) • Cycle stealing (Desktop PCs) • Condor (single administration domain)
First Generation DG • Single application / Single user • SETI@HOME (1998) • Research for Extra Terrestrial I • 33.79 Teraflop/s (12.3 Teraflop/s for the ASCI White!), 2003 • DECRYPTHON • Protein Sequence comparison • RSA-155 (1996?) • Breaking encryption keys • COSM
First Gen DG Architecture + Centralized architecture Monolythique architecture User + Admin interface Client application Params. /results. Coordinator/ Resource Disc. Application Scheduler Parameters Task + Data + Net Results OS + Sandbox PC Protocols PC Firewall/NAT
Second Generation of DG • Multi-applications / “Multi-users” platforms: • BOINC (2002?) • SETI@home, Genome@home, XtremLab… • XTREMWEB (2001) • XtremWeb-CH, GTRS, XW-HEP, etc.. • Platform (ActiveCluster), United devices, Entropia, etc. • Alchemi (.NET based)
Second Gen DG Architecture + Centralized architecture (split tasks/data mgnt., Inter node com.) Monolythique architecture User + Admin interface Application Client application Params. / results. Coordinator/ Scheduler (Tasks) Scheduler Task + Data + Net OS + Sandbox Parameters PC Protocols Results Data Manager Scheduler (Tasks) Firewall/NAT
What we have learnedfrom the history • Rigid architecture (Open source: Yes, Modularity: No) • Dedicated job scheduler • Dedicated data management/file system • Dedicated connection protocols • Dedicated transport protocols • Centralized architecture • No direct communication • Almost no security • Restricted application domain (essentially High Throughput Computing)
Third Generation Concept • Modular architecture • Pluggable / selectable job scheduler • Pluggable / selectable data management/file system • Pluggable / selectable connection protocols • Pluggable / selectable transport protocols • Decentralized architecture • Direct communications • Strong security • Unlimited application domain (restrictions imposed only by the platform performance) PC PC PC User + Admin interface Applications Schedulers Task + Data + Net OS + Sandbox Protocols
3rd Gen Dgrid: design PC User + Admin interface Applications Apps: Binary codes PC PC Scheduler/runtime Condor, MPI Data management LUSTRE / Bittorrent OS XEN / Virtual PC Virtualization at IP level! Connectivity / Security IP Virtualization Network
PVC (Private Virtual Cluster) A generic framework turning dynamically a set of resources belonging to different administration domains into a cluster • Connectivity/Security • Dynamically connects firewall protected nodeswithout breaking the security rules of the local domains • Compatibility • Creates an execution environment for existing cluster applications and tools (schedulers, file systems, etc.)
PVC Architecture Broker: • Collects connection requests • Forwards data between peers • Realizes the virtualization • Helps in the security negotiation Peer's modules: • Coordination • Communication interposition • Network virtualization • Security • Connectivity
Virtualization • Virtual network on top of the real one • Uses a virtual network interface • Provides virtual to real IP address translation • Features an IP range and DNS The proposed solution respects the IP standards • Virtual IP: class E (240.0.0.1 – 255.255.255.254) of IP addresses • Virtual Names: use of a virtual domain name (.pvc)
Interposition • Catch the applications’ communications and transform them for transparent tunneling • Interposition techniques: • LibC overload(IP masquerading, modified kernel) • Tun / Tap (encapsulation: IP over IP) • Netfilter (IP masquerading, kernel level) • Any other…
Interposition techniques Application packet ld_preload? (1) Yes No U. Space Security check + Connection + IP Masquerading LibC’ Socket Interface LibC Kernel Route Selection 240.X.X.X Interposition modules Security check + Connection + Encapsulation Security challenge + Connection + IP Masquerading (3) Tun/Tap (2) Netfilter Group check Std network interface
Connectivity Goal: Direct connections between the peers Firewall/NAT traversal techniques: • UPnP - firewall configuration technique • UDP/TCP hole punching - online gaming and voice over IP • Traversing TCP– novel technique • Any other…
Security • Fulfil the security policy of every local domain • Enforce a cross-domain security policy Master peer in every virtual cluster • Implements global security policy • Registers new hosts PVC peer must: • Check the target peer’s membership to the same virtual cluster • After the connection establishment, authenticate target peer identity B PkM[PKC] Master M PKM Put PKC Get PKM C S PK: Public Key Pk: Private Key
Security Security protocol is based on double asymmetric keys mechanism (1)/(2) Membership to the same virtual cluster (3)/(4)/(5) Mutual authentication The PVC security protocol ensures that : • Only hosts of a same virtual cluster are connected • Only trusted connections become visible for the application
Performance Evaluation • PVC objectives: Minimal overhead for communications • Network performance with/without PVC • Connection establishment • Execution of real applications without modification in MPI • NAS benchmarks • MPIPOV program • Scientific application DOT • Bag of Tasks typical setting (BLAST on DNA database) • Condor flocking strategies • Broadcast protocol • Spot Checking / Replication + voting • Evaluation platforms • Grid’5000 (Grid eXplorer Cluster) • DSL-Lab (PC @ home, connected on Public ADSL network)
Communication perf. Connection overhead Direct Communication overhead
Connection overhead Performed on DSL-Lab platform using a specific test suite Reasonable overhead in the context of the P2P applications
Bandwidth overhead Technique: LibC overload Performed on a local PC cluster with three different Ethernet (Netperf) networks: 1Gbps, 100Mbps and 10Mbps 717 (5) 715 (5) 720 (5)
Communication overhead Technique: Tun / Tap Netfilter
MPI applications Applications : • NAS benchmarks class A (EP, FT, CG and BT) • DOT • MPIPOV Results for NAS EP: • Measured acceleration is almost linear • Overhead lower than 5% Other experiments: Losses of performances due to ADSL network
Typical configuration for Bag of tasks (Seti@home like) PC Application BLAST (DNA) Result certification PC PC Scheduler/runtime Condor Data management Bittorrent OS OS Using PVC! Connectivity / Virtualization PVC Network
Broadcast protocol? Question: Bittorent instead of Condor transport protocol? • BLAST application • DNA Database (3.2 GB) • 64 nodes Condor exec. time grows Protportionnly with #jobs Condor + Bittorent exec. Time stays almost constant
Distribution of job management? Question: how many job managers for a set of nodes? • Condor Flocking • 64 sequences of 10 BLAST jobs (between 40 and 160 seconds, with an average of 70 seconds)
Result certification? Question: how to detect bad results? • Spot Checking (black listing) • Replication + voting • Not implemented in Condor • 20 lines of script both • Test: 70 machines • 10% Saboteurs (randomly choosen) • How many jobs are required to detect the 7 saboteurs?
Applications • Computational/Data Desktop Grids • BOINC/Xtremweb like applications • User selected scheduler (Condor, Torque, OAR, etc.) • Communication between workers (MPI, Distributed file systems, distributed archive, etc.) • “Instant Grid” • Connecting resources “sans effort”: family Grid, Inter School Grids, etc. • Sharing resources and content. Example: Apple TV synchronized with a remote Itune • “Passe muraille” runtime • OpenMPI • Extended Clusters • Run applications and manage resources beyond limits of admin domains
Conclusion Third Generation Desktop Grids (2007)… • Break the rigidity, again! • Let users choose and run their favourite environments (Engineers may help) • PVC: connectivity + security + compatibility • Dynamically establishes virtual clusters • Modular, extensible architecture • Features properties required for 3rd Gen. Desktop Grids • Security model + Use of applications without any modification + With minimal communication overhead • On going work (on Grid’5000) • Test the scalability and fault tolerance of cluster tools in the Dgrid Context • Test more applications • Test & improve the scalability of the security system
Condor flocking with PVC Question: May we use several Schedulers? • Use of a synthetic job that consumes resources (1 sec.) • Sequence of 1000 submissions • Submits the synthetic jobs from the same host to Condor pool • Future work: make Condor Flocking fault tolerant.