420 likes | 558 Views
Pushing the Limits of Database clusters. Jamie Shiers / CERN Werner Schueler / Intel. Agenda. Trend to Intel Clusters … Introduction to CERN & Data Volumes Current 9iRAC / IA32 status Performance / Scalability / Reliability Future tests & timeline Plans for Oracle tests on IA64
E N D
Pushing the Limits of Database clusters Jamie Shiers / CERN Werner Schueler / Intel
Agenda • Trend to Intel Clusters … • Introduction to CERN & Data Volumes • Current 9iRAC / IA32 status • Performance / Scalability / Reliability • Future tests & timeline • Plans for Oracle tests on IA64 • Oracle9i RAC Performance • Oracle9i RAC on Itanium
Clustering for performance Top10 "It will be several years before the big machine dies, but inevitably the big machine will die.“ — Larry Ellison “Scale Up” by Scaling Out InfoWorld – January 31, 2002 Source: tpc.org
Proprietary Solutions Lagging Worldwide Operating Environment Installed Base Server/Host Environments 2000-2005 16 Windows Servers 12 Units (M) Linux Servers 8 4 Proprietary UNIX Servers 0 2000 2001 2002 2003 2004 2005 Years Source: IDC 8/01
CERN Large Hadron Collider
ATLAS Detector System for LHC Detector is the size of a 6-floor building!
LHC: A Multi-Petabyte Problem! Long Term Tape Storage Estimates TeraBytes 14'000 12'000 10'000 8'000 LHC Experiments 6'000 4'000 LEP Experiments COMPASS 2'000 0 1995 2000 2003 2004 1996 1997 1998 1999 2001 2002 2005 2006 Year
100 MHz (1000 TB/sec) level 1 - special hardware 75 KHz (75 GB/sec) level 2 - embedded processors 5 KHz (5 GB/sec) level 3 - PCs 100 Hz (100 MB/sec) DB
LHC Data Volumes Data Category Annual Total RAW 1-3PB 10-30PB Event Summary Data - ESD 100-500TB 1-5PB Analysis Object Data - AOD 10TB 100TB TAG 1TB 10TB Total per experiment ~4PB ~40PB Grand totals (10 years) ~40PB ~160PB
LHC Summary • Multi-national research lab near Geneva • Building new accelerator: Large Hadron Collider • Will generate fantastic amounts of data: 1PB/second! • How can 9iRAC help?
LHC Computing Policy • Commodity solutions where-ever possible • Extensive use of Grid technologies • Intel / Linux for processing nodes • Farms of many K nodes: 200K in today’s terms • IA32 today moving to IA64 prior to LHC startup • 9iRAC claims to extend commodity solutions to the database market • Does it live up to the promise? • DB needs: ~100PB total; few GB/s / PB; many thousand concurrent processes; distributed access (world-wide)
History and experience • Oracle Parallel Server since V7 • “Marketing clusters” – source Larry Ellison, OOW SFO 2001 • OPS in production at CERN since 1996 • Mainly for high-availability • Tests of 9iRAC started Autumn 2001 • Servers: 9 dual Pentium® III Xeon Processor based servers, 512MB • Storage: single node as above • Suse 7.2, Oracle 9.0.1 • Currently working with 9iR2 • Servers: 10 nodes as above • Storage: now 3TB via 2 Intel-based disk-servers
CERN Computer Centre Today… has inside
Benefits of 9iRAC • Scalability • Supports VLDBs using commodity h/w • Intel/Linux server nodes (target ~100TB / cluster) • Manageability • Small number of RAC manageable • Tens / hundreds single instances a nightmare • Better Resource Utilization • Shared disk architecture avoids hot-spots and idle / overworked nodes • Shared cache improves performance for frequently accessed read-only data
9iRAC benefits ¥ € $ Cost • N x dual processors typically much much cheaper than single large multi-processor ¥ € $ Cost • Fewer DBAs ¥ € $ Cost • No need to oversize system for peak loads
Tests on Linux • Initial goals: • Test that it works with commodity H/W + Linux • Understand the configuration issues • Check how it scales • Number of nodes • Network interconnect • CPU used for the cache coherency • Identify bottlenecks • Commodity? • Server + interconnect ok • Storage outstanding question !!
Clients (interactive, batch) Disks Database servers Conventional Oracle Cluster e.g. Fibre channel based solution
Commodity Storage? • Critical issue for CERN • Massive amount of data • Extremely tight budget constraints • Long term (LHC: 2007) • network attached disks based on iSCSI? • Short/Medium term: cost effective disk servers • €7.5K for 1.5TB mirrored at > 60MB/s)
Commodity Oracle Cluster? • 3 interconnects, e.g. GbitE, possibly different protocols • General purpose network • Intra-cluster communications • I/O network Clients (interactive, batch) Disks Database servers
Test & Deployment Goals • Short-term (summer 2002): • Continue tests on multi-node 9iRAC up to ~3-5TB • Based on realistic data model & access patterns • Understand in-house, then test in Valbonne • Medium-term (Q1 2003): • Production 9iRAC with up to 25TB of data • Modest I/O rate; primarily read-only data • Long-term (LHC production phase): • Multiple multi-hundred TB RACs • Distributed in World-wide Grid
9iRAC Direction • Strong & visible commitment from Oracle • Repeated message at OracleWorld • New features in 9iR2 • e.g. cluster file system for Windows and Linux • Scalability depends to a certain extent on application • Our read-mostly data should be an excellent fit! • Multi-TB tests with “professional” storage • HP / COMPAQ centre in Valbonne, France • Target: 100TB per 9iRAC
Why 100TB? • Possible today • BT Enormous Proof of Concept: 37TB in 1999 • CERN ODBMS deployment: 3TB per node • Mainstream long before LHC • Winter 2000 VLDB survey: 100TB circa 2005 • How does this match LHC need for 100PB? • Analysis data: 100TB ok for ~10 years • One 10 node 9iRAC per experiment • Intermediate: 100TB ~1 year’s data • ~40 10 node 9iRACs • RAW data: 100TB = 1 month’s data • 400 10node 9iRACs to handle all RAW data • 10 RACs / year, 10 years, 4 experiments
LHC Data Volumes Revisited Data Category Annual Total RAW 1-3PB 10-30PB Event Summary Data - ESD 100-500TB 1-5PB Analysis Object Data - AOD 10TB 100TB TAG 1TB 10TB Total per experiment ~4PB ~40PB Grand totals (15 years) ~16PB ~250PB è è ü ü
RAW & ESD: >> 100TB • RAW: • Access pattern: sequential • Access frequency: ~once per year • Use time partitioning + (offline tablespaces?) • 100TB = 10 day time window • Current data (1 RAC) historic data (2nd RAC) • ESD: • Expect RAC scalability to continue to increase • VLDB prediction for 2020: 1000,000,000 TB (YB)
Data R A W E S D A O D TAG 1TB/yr 10TB/yr 100TB/yr Tier1 1PB/yr (1PB/s prior to reduction!) Tier0 random seq. Users
Oracle Tests on IA64 • 64 bit computing essential for LHC • Addressability: VLMs, 64 bit filesystems, VLDBs • Accuracy: need 64 bit precision to track sub-atomic particles over tens of metres • Migration IA32 IA64 prior to LHC startup
A solid history of Enterprise class processor development Intel® Xeon™ processor MP Higher processing & data bandwidth for enterprise apps Intel Xeon processor Pentium® II/III Xeon™ processors Pentium® Pro processor Multi-processor support Performance Pentium® processor Executes 2 instructions in parallel RISC techniques for 2X i386™ performance i486™processor Time Intel’s technology innovations drive price/performance and scalability
Performance Via Technology Innovations • Balanced system performance through higher bandwidth and throughput • Intel® NetBurst™ microarchitecture • Integrated multi-level cache architecture • Faster performance on business apps • Hyper-Threading Technology • up to 40% more efficient use of processor resources Processor Innovations for Increased Server Performance and Headroom
Reliability Availability Back End High Availability Mid-Tier High-end General Purpose Scalability EPIC Architecture High Performance Front-end General Purpose Throughput Performance Bandwidth Matching Enterprise Requirements Itanium® Processor family Features Enterprise Segments System Requirements Features and flexibility to span the enterprise
TPM Best Performance… OLTP model Example: Calling circle OLTP model • Taken from a real world insurance example • 4 node x 4-way Pentium® III Xeon™ 700 MHz processor-based systems • 128k TPM • Over 90% scalability Intel-based Solution Outperforms 32-way Sun Solution by More than 2x
Best Performance… TPC/C • 8 nodes * 4 way Database Servers Pentium III Xeon 900Mhz • 16 load generating Application Servers Pentium III 1Ghz
Best Performance… Price/Performance • 9iRAC on RedHat on e.g. Dell 69% faster and 85% less expensive than Oracle on RISC solutions
Itanium® Processor Family Montecito* Common hardware Performance Software scales across generations Madison* / Deerfield* • Extend performance leadership • Broaden target applications Itanium® 2 Processor • Build-out architecture/ platform • Establish world-class performance • Significantly increase deployment Itanium® Processor • Introduce architecture • Deliver competitive performance • Focused target segments 2001 2003 2002 * Indicate Intel processor codenames. All products, dates and figures are preliminary, for planningpurposes only, and subject to change without notice.
~2.0 ~2.1 ~2.0 ~1.9 ~1.7 ~1.7 1.00 Itanium® processor 800MHz 4MB L3 SPECint2000 SPECfp2000 Stream ERP OLTP Linpack 10K CAE Enterprise Technical Computing CPU/Bandwidth Using Itanium® 2 optimizations Source: Intel Corporation Itanium® 2 Processor • On track for mid’02 releases from multiple OEMs and ISV • Substantial performance leadership vs. RISC Delivering on performance promise
Deployment Strategy Intel Relevance Examples Scale Up on 8-way and above servers SAS Enterprise Miner* Oracle 9i* MP Scale Upon 4 and 8-way servers, then Scale Out on fail-over clusters Microsoft Exchange* Server Oracle* 9iRAC Scale Out with fail-over clusters on 1 to2-way servers Inktomi* Apache* Web Server Versatile Server Solutions For Scaling Right Positioned To Scale Right
Inflection point coming • Itanium2™ will have a 75%** price / performance lead over USIII at introduction in Q3’02 • Itanium2™ will outperform USIII by 40% • Itanium2™ will cost 20% less than USIII • Oracle and Intel working to make 9i on Itanium a success • Joint performance goal of 100k TPM-C on a single 4-way Itanium2™ server • 13 Intel engineers onsite and an additional 24 at Intel working to optimize 9i on Itanium2™ • Intel supplying Oracle large numbers of Itanium2™ development systems * McKinley is next generation Itanium™ processor ** Estimated Q3’02 figures
Summary • Existing Oracle technologies can be used to build 100TB databases • Familiar data warehousing techniques can be used to handle much larger volumes of historic data • Best Price and Performance through clusters vs. Risc • 9iRAC makes this possible on commodity server platforms • Standard High Volume servers offer great performance today and promise a safe investment for the future
Thank you Jamie.Shiers@cern.ch Werner.Schueler@Intel.com