1 / 39

High Performance Communication for Oracle using InfiniBand

Session id: #36568. High Performance Communication for Oracle using InfiniBand. Ross Schibler CTO Topspin Communications, Inc. Peter Ogilvie Principal Member of Technical Staff Oracle Corporation. Session Topics. Why the Interest in InfiniBand Clusters InfiniBand Technical Primer

rasha
Download Presentation

High Performance Communication for Oracle using InfiniBand

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Session id: #36568 High Performance Communication for Oracle using InfiniBand Ross Schibler CTO Topspin Communications, Inc Peter Ogilvie Principal Member of Technical Staff Oracle Corporation

  2. Session Topics • Why the Interest in InfiniBand Clusters • InfiniBand Technical Primer • Performance • Oracle 10g InfiniBand Support • Implementation details

  3. Why the Interest in InfiniBand • InfiniBand is key new feature in Oracle 10g Enhances price/performance and scalability; simplifies systems • InfiniBand fits broad movement towards lower costs Horizontal scalability; converged networks, system virtualization...grid • Initial DB performance & scalability data is superb Network tests done; Application level benchmarks now in progress • InfiniBand is widely supported standard - available today Oracle…Dell, HP, IBM, Network Appliance, Sun and ~100 others involved. • Tight alliance btw Oracle and Topspin enables IB for 10g Integrated & tested; delivers complete Oracle “wish list” for high speed interconnects

  4. Server Revenue Mix Server Revenue Mix Server Revenue Mix 18% 18% 18% 16% 16% 16% 43% 14% 14% 14% 12% 12% 12% 10% 10% 10% Share of Revenues Share of Revenues Share of Revenues 39% 39% 8% 8% 8% High-End High-End High-End 6% 6% 6% 1996 1996 1996 Mid Mid Mid 4% 4% 4% 23% 23% 23% 2001 2001 2001 Entry Entry Entry 2% 2% 2% 2002 2002 2002 0% 0% 0% $0-2.9K $0-2.9K $0-2.9K $3-5.9K $3-5.9K $3-5.9K $6-9.9K $6-9.9K $6-9.9K $10- $10- $10- $25- $25- $25- $50- $50- $50- $100- $100- $100- $250- $250- $250- $500- $500- $500- $1M-3M $1M-3M $1M-3M $3M+ $3M+ $3M+ 24.9K 24.9K 24.9K 49.9K 49.9K 49.9K 99.9K 99.9K 99.9K 249.9K 249.9K 249.9K 499.9K 499.9K 499.9K 999.9K 999.9K 999.9K Price Band Price Band Price Band Source: IDC Server Tracker, 12/2002 Source: IDC Server Tracker, 12/2002 Source: IDC Server Tracker, 12/2002 System Transition Presents Opportunity • Major shift to standard systems - blade impact not even factored in yet • Customer benefits from scaling horizontally across standard systems • Lower up-front costs, Granular scalability, High availability

  5. The Near Future Server Revenue Mix 18% 16% Web Services 14% Enterprise Apps 12% 10% Database Clusters & Grids Legacy & Big Iron Apps Share of Revenues 8% 6% 4% Scale Out Scale Up 2% 0% $0-2.9K $3-5.9K $6-9.9K $10- $25- $50- $100- $250- $500- $1M-3M $3M+ 24.9K 49.9K 99.9K 249.9K 499.9K 999.9K Price Band • Market Splits around Scale-Up vs. Scale-Out • Database grids provide foundation for scale out • InfiniBand switched computing interconnects are critical enabler

  6. Traditional RAC Cluster Application Servers Oracle RAC Gigabit Ethernet Fibre Channel Shared Storage

  7. OUCH! OUCH! OUCH! Three Pain Points Application Servers Oracle RAC Gigabit Ethernet Scalability within theDatabase Tierlimited by Interconnect Latency, Bandwidth, and Overhead Throughput Between the Application Tier and Database Tierlimited by Interconnect Bandwidth, and Overhead Fibre Channel I/O Requirementsdriven by number of servers instead of application performance requirements Shared Storage

  8. Clustering with Topspin InfiniBand Application Servers Oracle RAC Shared Storage

  9. Removes all Three Bottlenecks Application Servers Oracle RAC InfiniBand provides 10 Gigabit low latency interconnect for cluster Application tier can run over InfiniBand, benefiting from same high throughput and low latency as cluster Central server to storage I/O scalability through InfiniBand switch Removes I/O bottlenecks to storage and provides smoother scalability Shared Storage

  10. Industry Standard Storage Industry Standard Storage Industry Standard Storage Industry Standard Storage Industry Standard Server Industry Standard Server Industry Standard Server Industry Standard Server Industry Standard Network Industry Standard Network Industry Standard Network Industry Standard Network Example Cluster with Converged I/O • Ethernet to InfiniBand gateway for LAN access • Four Gigabit Ethernet ports per gateway • Create virtual Ethernet pipe to each server • Fibre Channel to InfiniBand gateway for storage access • Two 2Gbps Fibre Channel ports per gateway • Create 10Gbps virtual storage pipe to each server • InfiniBand switches for cluster interconnect • Twelve 10Gbps InfiniBand ports per switch card • Up to 72 ports total ports with optional modules • Single fat pipe to each server for all network traffic Industry Standard Server

  11. Topspin InfiniBand Cluster Solution Cluster Interconnect with Gateways for I/O Virtualization Ethernet or Fibre ChannelGateway modules Family of switches Host Channel Adapter With Upper Layer Protocols Integrated System and Subnet management Protocols • uDAPL • SDP • SRP • IPoIB Platform Support • Linux: Redhat, Redhat AS, SuSE • Solaris: S10 • Windows: Win2k & 2003 • Processors: Xeon, Itanium, Opteron

  12. InfiniBand Primer • InfiniBand is a new technology used to interconnect servers, storage and networks together within the datacenter • Runs over copper cables (<17m) or fiber optics (<10km) • Scalable interconnect: • 1X = 2.5Gb/s • 4X = 10Gb/s • 10X = 30Gb/s

  13. Host Host CPU Host Host Host Host Host Host Host Host System Memory MemCntlr Host Host Host Interconnect Host Host Host Host Host Host Host Host CPU Host Server Server HCA IB Link Server Server Topspin 360/90 Switch SM TCA Storage Network IB Link Ethernet link Ethernet IB Link TCA FC link IB Link InfiniBand Nomenclature

  14. InfiniBand Nomenclature CPU • HCA – Host Channel Adaptor • SM - Subnet manager • TCA – Target Channel Adaptor System Memory MemCntlr Host Interconnect CPU HCA IB Link Switch SM TCA IB Link Ethernet link IB Link TCA FC link IB Link

  15. Kernel Bypass Kernel Bypass Model • Application uDAPL async sockets User Sockets Layer Kernel TCP/IP Transport SDP Driver Hardware

  16. Data traverses bus 3 times Copy on Receive Server (Host) CPU System Memory MemCntlr App Buffer Host Interconnect CPU OS Buffer NIC interconnect

  17. Data traverses bus once, saving CPU and memory cycles With RDMA and OS Bypass Server (Host) CPU System Memory MemCntlr App Buffer Host Interconnect CPU OS Buffer HCA interconnect

  18. 1.2Gb/s 3.2Gb/s 6.4Gb/s 6.4Gb/s APIs and Performance Application uDAPL Async I/O extension BSD Sockets SDP TCP RDMA IP IPoIB 1GE 10G IB 0.8Gb/s

  19. Why SDP for OracleNet & uDAPL for RAC? • RAC IPC • Message based • Latency sensitive • Mixture of previous APIs  use of uDAPL • OracleNet • Streams based • Bandwidth intensive • Previously written to sockets  use of Sockets Direct Protocol API

  20. InfiniBand Cluster Performance Benefits Network Level Cluster Performance for Oracle RAC Block Transfer/sec (16KB) Source: Oracle Corporation and Topspin on dual Xeon processor nodes InfiniBand delivers 2-3X higher block transfers/sec as compared to GigE

  21. InfiniBand Application to Database Performance Benefits Percent Source: Oracle Corporation and Topspin InfiniBand delivers 30-40% lower CPU utilization and 100% higher throughput as compared to Gigabit Ethernet

  22. Broad Scope of InfiniBand Benefits Intra RAC: IPC over uDAPL over IB FC gateway: host/lun mapping Ethernet gateway OracleNet: over SDP over IB SAN DAFS over IB Network NAS Application Servers Shared Storage Oracle RAC 2x improvement in throughput and 45% less CPU 20% improvement in throughput 3-4x improvement in block updates/sec 30% improvement in DB performance

  23. uDAPL Optimization Timeline Workload April-August 2003: Gathering OAST and industry standard workload performance metrics. Fine tuning and optimization at skgxp, uDAPL and IB layers Database Feb 2003: Cache Block Updates show fourfold performance improvement in 4-node RAC CacheFusion LM skgxp Jan 2003: added Topspin CM for improved scaling of number of connections and reduced setup times uDAPL Dec 2002: Oracle interconnect performance released, showing improvements in bandwidth (3x), latency(10x) and cpu reduction (3x) CM Sept 2002: uDAPL functional with 6Gb/s throughput IB HW/FW

  24. RAC Cluster Communication • High speed communication is key • must be faster to fetch a block from a remote cache than to read the block from disk • Scalability is a function of communication CPU overhead • Two Primary Oracle Consumers • Lock manager / Oracle buffer cache • Inter instance parallel query communication • SKGXP Oracle’s IPC driver interface • Oracle is coded to skgxp • Skgxp is coded to vendor high performance interfaces • IB support delivered as a shared library libskgxp10.so

  25. Cache Fusion Communication LMS Lockrequest Shadow processes toclient RDMA cache cache

  26. Parallel Query Communication PX Servers PX Servers msgdata toclient data data

  27. Cluster Interconnect Wish List • OS bypass (user mode communication) • Protocol offload • Efficient asynchronous communication model • RDMA with high bandwidth and low latency • Huge memory registrations for Oracle buffer caches • Support large number of processes in an instance • Commodity Hardware • Software interfaces based on open standards • Cross platform availability InfiniBand is first interconnect to meet all of these requirements

  28. Asynchronous Communication • Benefits • Reduces impact of latency • Improves robustness by avoiding communication dead lock • Increases bandwidth utilization • Drawback • Historically costly, as synchronous operations are broken into separate submit and reap operations

  29. Protocol Offload & OS Bypass • Bypass makes submit cheap • Requests are queued directly to hardware from Oracle • Offload • Completions move from the hardware to Oracle’s memory • Oracle can overlap commutation and computation without a trap to the OS or context switch

  30. InfiniBand Benefits by Stress Area Stress level varies over time with each query InfiniBand provides substantial benefits in all three areas

  31. Benefits for Different Workloads • High bandwidth and low latency benefits for Decision Support (DSS) • Should enable serious DSS workloads on RAC clusters • Low latency benefits for scaling Online Transaction Processing (OLTP) • Our estimate: One IB Link replaces 6-8 Gigabit Ethernet links

  32. Commodity Hardware • Higher capabilities and lower cost than propriety interconnects • InfiniBand’s large bandwidth capability means that a single link can replace multiple GigE and FC interconnects

  33. Memory Requirements • The Oracle buffer cache can consume 80% of a host’s physical memory • 64 bit addressing and decreasing memory prices mean ever larger buffer caches • Infiniband provides… • Zero copy RDMA between very large buffer caches • Large shared registrations moves memory registration out of the performance path

  34. Two Efforts Coming TogetherRAC/Cache Fusion and Oracle Net • Two Oracle engineering teams working at cluster and application tiers • 10g incorporates both efforts • Oracle Net benefits from many of the same capabilities as Cache Fusion • OS kernel bypass • CPU offload • New transport protocol (SDP) support • Efficient asynchronous communication model • RDMA with high bandwidth and low latency • Commodity hardware • Working on external and internal deployments

  35. Open Standard Software APIsuDAPL and Async Sockets/SDP • Each new communication driver is a large investment for Oracle • One stack which works across multiple platforms means improved robustness • Oracle grows closer to the interfaces over time • Ready today for immerging technologies • Ubiquity and robustness of IP for high speed communication

  36. Summary • Oracle and major system & storage vendors are supporting InfiniBand • InfiniBand presents superb opportunity for enhanced horizontal scalability and lower cost • Oracle Net’s InfiniBand Support significantly improves performance for both the app server and the database in Oracle 10g • Infiniband provides the performance to move applications to low cost Linux RAC databases. ????

  37. Q & Q U E S T I O N S A N S W E R S A

  38. Next Steps…. • See InfiniBand demos first hand on the show floor • Dell, Intel, Netapp, Sun, Topspin (booth #620) • Includes clustering, app tier and storage over InfiniBand • InfiniBand whitepapers on both Oracle and Topspin websites • www.topspin.com • www.oracle.com

More Related