480 likes | 655 Views
Clustering Technology Overview. Clustering with Linux. Sungho Kim , Ph.D. President KESPER Inc. Agenda. Linux Overview Linux Kernel Features Linux Network Protocols Overview Linux Clustering Overview Network Protocols for Clustering HPC Cluster Internet Cluster HA Cluster
E N D
Clustering Technology Overview Clustering with Linux Sungho Kim , Ph.D. President KESPER Inc.
Agenda Linux Overview Linux Kernel Features Linux Network Protocols Overview Linux Clustering Overview Network Protocols for Clustering HPC Cluster Internet Cluster HA Cluster Conclusions
Linux Overview Multi-user Multitasking Unix-like OS Multi-architecture, multi-platform OS Freely distributable open source OS : GNU software IEEE POSIX compliance Wide range of peripherals supports Wide configurability : From embedded to supercomputer X Window Support Full Network awareness - Various Network Protocol Support • TCP/IP, IPX/SPX, Appletalk, Samba, NFS, Web, Mail, etc 32/64 bits Full supports
Linux Hardware Systems IBM PC and compatibles Apple Macintosh : from m68000 to powerpc SUN SGI Atari/Amiga Compaq alpha Netwinder • CPU • Intel x86, AMD, Cyrix • Alpha EV5, EV6 (64-bit) • PowerPC • Sparc, UltraSparc(64-bit) • M68k • Strong/ARM • MIPS
Linux & Network • Network Applications • Web Server : Apache , Netscape • DHCP Server : dhcpd • FTP Server : proftpd ( ftp, ncftp ) • Mail Server : sendmail / pine, mutt, elm • pop3 / imap / procmail • mailing list : majordomo • Chatting Server : irc ( bitchx, irc ) • File Server : Samba • News Server : innd ( tin, pine, trn ) • DNS Server : bind • NIS Server : NIS Network Interface Cards 10/100/1000 MB/s Myrinet ATM Token Ring / FDDI / HIPPI ARCnet ISDN X.25 Frame Relay Fibre Channel WAN
Kernel Features Kernel Options of 2.2.x Code maturity level options Processor type and features Loadable module support General Setup Plug and Play support Block devices Networking options SCSI options SCSI low-level drivers Network device support Amateur Radio support ISDN subsystem CD-ROM drivers Character devices Mice Video for Linux Joystick support Ftape, the floppy tape device driver File systems Network File Systems Partition types Console drivers Sound Kernel Hacking
Specific Features Status of Kernel 2.4-test USB supports Logical Volume Manager Ext3 Journaling File Systems IrDA driver updates Gas using instead of as86 Athlon supports QuickCAM support XFree86 DRI (Direct Rendering Interface) Kernel HTTPD supports Direct decompressing from Flash or ROM I2O driver updates DVD filesystem (udf) supports
Network Protocols Supported Kernel Network Features TCP/IP Protocol IPX Multicasting ( MBONE ) Tunneling ( GRE / Mobile-IP ) VPN Advanced Router WAN Router ( WAN Card + Linux ) Frame Relay / X.25 / leased line HIPPI ( Cluster and Supercomputer ) Token Ring IP Masquerading ( NAT ) IP Alias ( Virtual IP / Virtual domain ) Bridging ( Bridging / Load Balancing ) ISDN / xDSL
Network Protocols Linux Networking Other Network Protocols EQL ( Serial Line load balancing ) SLIP ( Serial Line Interface Protocol ) CSLIP (Compressed Serial Line Interface Protocol ) PPP ( Point-to-Point Protocol ) PLIP ( Parallel Line Interface Protocol ) X.25 : PLP ( Packet Layer Protocol ) HIPPI ( High Performance Parallel Interface ) FDDI ( Fiber Distributed Data Interface ) IPv6 ( IPng ) : Experimental ARCnet SNMP
Cluster System Overview Category of Cluster Systems Categories depend on their configuration method and applied areas HPC Cluster Computation-intensive Bulk Storage ClusterStored Data sharing and service Web/Internet Cluster Network load distribution and LB HA Cluster Increase the Availability of systems Components : Network + OS + Storage + API HPC : High Performance Computing HA : High Availability LB : Load Balancer
Linux Cluster Network • Filtering • IP Packet filtering • Linux Socket filtering • (BSD socket filtering) • Unix domain socket filtering • ( X-windows, syslog ) • Firewall packet filer/IP masquerading • IP : kernel level autoconfig • Network booting • X terminal • TFTP / BOOTP / RARP IP Tunneling Encapsulating data of protocol VPN ( Virtual Private Network ) GRE tunneling • Generic Routing Encapsulation • CISCO Router Mobile-IP for laptop IP Firewalls/Masquerading NAT ( Network Address Translation ) Modified firewall IP auto forward IP port forward
Linux HPC Cluster Clustering Technology A Bunch of computers to execute some jobs in parallel with multiple computers and pre-configured networks Beowulf : Linux based Cluster Characteristics of Clusters High Availability and expandability High Performance/price Personal Supercomputer
Linux HPC Cluster Components of Clusters Hardware CPU : Intel Pentium, Digital Alpha, Mac G3 Network : Ethernet, Myrinet, ATM, Gigabit Ethernet Storage : Fibre Channel/SCSI RAID Software Operating System & Compiler : Linux, Windows NT, DEC OSF Communication Library : PVM, MPI Administration Tool : CMS Queuing Software : DQS, PBS Application Libraries : BLACS, ATLAS, ScaLapack, PBLAS
Linux HPC Cluster AVALON - Los Alamos National Lab. Configuration of Hardware systems Network Configuration 3Com SuperStack II 3900 36-port fast ethernet switches 3Com SuperStack II 9300 12-port Gigabit Ethernet switch switched network of 144 fast ethernet ports 4x + = Cost : about $300 per port. Cyclades multiport serial switches Node Configuration (140) 533MHz Alpha 21164A microprocessor DEC AlphaPC 164LX motherboard ECC SDRAM DIMMs (256 Mbytes total per node) Quantum Fireball ST3.2A 3079Mb EIDE U-ATA drive Kingston ethernet card with a DEC Tulip chipset
Node 0 Node 0 Node 0 Node 0 Node 35 Node 70 Node105 Node 140 Linux HPC Cluster 3COM 9300 1G eth. 3900 3900 3900 3900
Linux HPC Cluster Software Configurations OS : RedHat Linux 5.0, kernel 2.1.125 MPICH and own basic set of MPI routines Compiler : egcs 1.1b Application Programs SPaSM Gravitational tree code
Linux HPC Cluster Performance (113/500) 70 nodes 140 nodes Linpack benchmark 19.7 GFlops 47.7 GFlops SPaSM 12.8 GFlops 29.6 GFlops Gravitational treecode 10.0 GFlops - Price vs Performance Price of Avalon : $313,000 Avalon’s Performance = 64 CPUs 195 Mhz SGI Origin 2000 (SPaSM, Tree code, and Linpack) Price of 64 CPUs SGI Origin 2000 = over 100 M$
DSU/Router Intranet Internet Network Configuration Simple Network Connection Nodes have Internet IP Addresses LAN/WAN Server 1 Server n Cluster Server Farm
DSU/Router Intranet Internet Network Configuration Double Network Connection Nodes have Internet IP Addresses and Local IP Addresses LAN/WAN Server n Server 1 Second-layer Network Cluster Server Farm
DSU/Router Intranet Internet Network Configuration Double Network Connection + Master-Slave(NAT) configuration Nodes have local IP Addresses Master Server LAN/WAN Slave Server n Slave Server 1 Second-layer Network Cluster Server Farm
Crossbar Inter-connection Second-layer Network connection with Cross-bar connection on 32 Node Cluster 16 N o d e s 16 N o d e s • 32 Host Bus Adapters • 12 Switches • 64 Cables
I/O Connection Keyboard-Video-Mouse and Disk IO connection Master (IO controller) Node 0 (IO controller) SCSI / FC Console Splitter Switcher Node 1 (IO controller) RAID Controller(0) Node 2 (IO controller) RAID Controller(1) Monitor Keyboard Mouse Node 3 (IO controller) Node 4 (IO controller) Node 5 (IO controller) Node 6 (IO controller)
Internet Cluster Virtual Internet Cluster Server Scalable and highly available server built on a cluster of real servers The architecture of the cluster is transparent to end users and the users see only a single virtual server. Methods to build Virtual Internet Cluster Server Virtual Server via NAT Virtual Server via IP tunneling Virtual Server via IP filtering Virtual Server via Direct Routing Ref : www.linux-vs.org
Internet Cluster Internet Cluster
Internet Cluster Virtual Internet Cluster Server via NAT This is done by network address port translation. The code is implemented on Linux IP Masquerading codes and port forwarding code are reused. Refer ipfwadm command. All the process are figured out.
Internet Cluster Intranet DSU/Router Internet User L4 Switch Load Balancer Linux Box LAN/WAN Real Server 1 Real Server n Virtual Cluster Server via NAT
Internet Cluster How This Cluster Works ? (1) requests DSU/Router (5) replies User (4) rewriting replies Load Balancer Linux Box (2) Scheduling & rewriting packets LAN/WAN (3) Processing The requests Real Server n Virtual Cluster Server via NAT
Internet Cluster Virtual Internet Cluster Server via IP Tunneling IP Tunneling (IP encapsulation) is a technique to encapsulate IP datagram within IP datagrams, which allows datagrams destined for on IP address to be wrapped and redirected to another IP address. IP encapsulating is now commonly used in Extranet, Mobile-IP, IP-Multicast, tunneled host or network. The load balancer encapsulates the packet and forwarded to the server. When the server receives the encapsulated packet, it decapsulates the packet and processes the request, finally return the result directly to the user. Refer NET-3-HOWTO command.
Internet Cluster (1) requests DSU/Router Internet User Load Balancer Linux Box Replies going to the user directly Virtual IP address is assigned IP Tunnel IP Tunnel LAN/WAN Real Server 1 Real Server n Virtual Cluster Server via IP Tunneling
Internet Cluster (1) requests DSU/Router Internet User (2) encapsulation Load Balancer Linux Box Virtual IP address is assigned LAN/WAN Real Server 1 Real Server n (3) de-encapsulation & reply to user Virtual Cluster Server via IP Tunneling
Storage Cluster Network is configured with one of the virtual cluster server techniques. The disk storage is connected with Fibre Channel including SAN file systems. Internet DSU/Router LAN/WAN Fibre Channel half-duplex :100MBytes/sec full-duplex : 200MBytes/sec FC Switch Fibre RAID Storage Linux Storage Cluster with GFS or SAN
Storage Cluster • SAN(Storage Area Network) • Scalability 125 disks w/ one controller • Easiness of Management • Fast Disk I/O Speed 100 Mbytes/sec ( half-duplex ), 200Mbytes/sec (full-duplex) • Long Distance over 10 km (fiber-optical cable) Fibre Channel half-duplex :100MBytes/sec full-duplex : 200MBytes/sec Fibre Channel Switch Fibre RAID Storage Linux Storage Cluster with GFS or SAN
Storage Cluster Linux Supporting File Systems • ext2/ext3 file systems • ISO 9660 (CD-FS) • VFAT / FAT • SMB (CIFS) • UFS • NTFS • UDF ( DVD-FS ) • NFS / CODA • LVM ( Logical Volume Manager ) • GFS ( Global File Systems ) • Reiser FS ( Journaling File Systems ), SGI XFS, IBM JFS • RIO ( Raw IO ) • RAMFS • ROMFS
GFS Storage Cluster Feature Overview about GFS The Global File System (GFS) allows multiple Linux machine to share storage devices over a network. Each machine sees the network disks as local, and GFS itself appears as a local file system. Writes to a file by one Linux machine are seen by another machine that later reads that file.
GFS Cluster Configuration Normal Configuration
GFS Cluster Configuration Complex Configuration Cross-bar FC connection
GFS Cluster Configuration NFS Configuration GFS Configuration Hybrid Configuration
Reliability Non-Stop Fault-Tolerant Cluster HA Server Stand-alone Availability Serviceability High Availability Cluster Enterprise Server Requirements
High Availability Cluster Server Downtime Un-Planned Downtime Hardware Fault Software Fault Planned Downtime Hardware exchange Hardware Upgrade O/S upgrade Software upgrade Cost due to Downtime
High Availability Cluster Comparison of Availability
High Availability Cluster Concept of HA system Dual Network for Response Heartbeat Active or Standby Systems Dual IO connection for Storage Shared Storage
High Availability Cluster Lines of heartbeat Dual Network for Response Serial Connection TCP/IP over LAN Shared SCSI Heartbeat Components of HA Active or Standby Systems Dual IO connection for Storage Redundant Systems All connectable lines Shared Disks Filesystem Management software including Heartbeat checking daemon Shared Storage Ref : www.linux-ha.org
Concluding Remarks Linux Internet Cluster Products Wyz Cluster DR Cluster Mission Critical Linux Red Hat : piranha, High Availability Server Turbo Linux : Turbo Cluster Server VA Linux : VACM Legato Cluster Veritas …etc… Linux Clustering is a starting point that Linux can enter the enterprise market. Until now, however, the clustering technology is one of major considerations of technical development group like institutes or academies.
Concluding Remarks Why Linux Cluster? Cost Effective and Easy configurability Fast technical development with open source Many references in various fields Future Needs New network configurability and TCP/IP stack performance. High-Availability for enterprise markets Cluster filesystem and disk I/O performance High performance peripheral drivers Stable management and scheduler
Concluding Remarks Do Not Myth ! Clustering technology is matured enough ? Easiness and stability are acquired ? The clustering is a big market ? If, any field ? Linux is in enterprise market ? If not, backend system ? Linux vendor can maintain their advantages ?
KESPER Inc. RM 803 DongA Officetel BongMyeong-Dong YuSeong-Gu Taejeon 305-709 Republic of Korea (South Korea) Tel. 82-42-828-7458 Fax 82-42-828-7455 Sungho Kim, President/CEO shkim@kesper.co.kr or sungho@kesper.co.kr