260 likes | 271 Views
VLDB 2005 31st International Conference on Very Large Databases. Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Systems. Raghunath K. Othayoth Hewlett-Packard Company. Meikel Poess Oracle Corporation. Agenda. • Grid Computing. • Hardware Support.
E N D
VLDB 2005 31st International Conference on Very Large Databases Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Systems Raghunath K. Othayoth Hewlett-Packard Company Meikel Poess Oracle Corporation
Agenda • Grid Computing • Hardware Support • Software Support • TPC-H Result Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 2
Grid Computing 1) application and user perspective: −just like the power grid: Have computing power delivered as requested 2) implementation perspective: −Data virtualization −Resource provisioning −High availability Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 3
From Research to Industry • Research projects using grid technology: −Seti@Home −World Community Grid • Traditionally companies used islands of systems to implement corporate data warehouses −Unable to share resources −Too rigid to answer rapidly changing business needs −Cannot be scaled indefinitely HP and Oracle are applying the grid concept to industry data warehouses (DW) Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 4
Commercial Grid Market • IDC calls grid computing the fifth generation of computing • Commercial grid computing revenue was −2003: 1 Billion USD −2008: 12 Billion USD [estimate] • Forrester Research: −37% of enterprises are piloting, rolling out or have implemented some form of grid computing. −30% of firms are considering grid technology. (IDC,2004.Www.oracle.com/technology/tech/grid/collateral/idc_oracle10g.pdf) (Forrester, 2004. www.forrester.com/go?docid=34449) Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 5
N-tier v/s Grid Computing Application Servers (middle tier) (middle tier) Application Servers Database Servers Servers Database DSS DSS Servers Servers Resource Virtualization and Provisioning Storage Area network (SAN) Network Attached Storage (NAS) OLTP Database Servers and Direct Attach Storage DSS Servers Direct Attach Storage Application Servers (middle tier) Shared Pool of commodity Servers Internet Traditional multi tier datacenter infrastructure – Web servers, application servers and database servers are preconfigured and pre allocated. Grid Computing - Infrastructure is dynamically provisioned to applications that have been virtualized. Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 6
Commercial Grid Components • Commodity hardware (x-86 based servers) • Linux OS - cost effective • SAN – highly scalable • High speed interconnect (Gigabit Ethernet, InfiniBand) • Management software (manage as individual servers or manage as one large virtual servers) • Database layer (ties the resources together, Dynamic resource allocation, parallel processing) Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 7
Commercial Grid benefits • High scalability • High flexibility • Low total cost of ownership • High availability • Easy manageability Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 8
Oracle Features for a Data Warehouse Grid • Dynamic parallel processing • Data virtualization and dynamic resource provisioning in DW • Smart inter node parallelism Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 9
Dynamic Parallel Processing • Queries are automatically parallelized to maximize resource utilization • Degree of Parallelism (DOP) is adjusted according to resource availability and computing demands at parse time • DOP is automatically adjusted when: −Number of concurrent users change −Nodes are taken down for maintenance −Nodes are added due to increased computing demand (scale-out) −Nodes are assigned to different application Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 10
Data Virtualization and Dynamic Resource Provisioning in DW • Oracle’s shard everything architecture provides data virtualization and provisioning in Data Warehouses Interconnect Nodes 1 2 3 4 5 6 7 8 Disk Subsystem Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 11
Data Virtualization and Dynamic Resource Provisioning in DW • Oracle’s shard everything architecture provides data virtualization and provisioning in Data Warehouses OLAP Reports ETL Workload Type Interconnect Nodes 1 2 3 4 5 6 7 8 Disk Subsystem Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 12
Data Virtualization and Dynamic Resource Provisioning in DW • Oracle’s shard everything architecture provides data virtualization and provisioning in Data Warehouses During peak working hours OLAP Reports ETL Workload Type Interconnect Nodes 1 2 3 4 5 6 7 8 Disk Subsystem Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 13
Data Virtualization and Dynamic Resource Provisioning in DW • Oracle’s shard everything architecture provides data virtualization and provisioning in Data Warehouses During the night OLAP Reports ETL Workload Type Interconnect Nodes 1 2 3 4 5 6 7 8 Disk Subsystem Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 14
Data Virtualization and Dynamic Resource Provisioning in DW • Oracle’s shard everything architecture provides data virtualization and provisioning in Data Warehouses During short intervals when the DW is synchronized with the OLTP system OLAP Reports ETL Workload Type Interconnect Nodes 1 2 3 4 5 6 7 8 Disk Subsystem Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 15
Data Virtualization and Dynamic Resource Provisioning in DW • Oracle’s shard everything architecture provides data virtualization and provisioning in Data Warehouses Without response time requirements all types of workload can run on all nodes OLAP Reports ETL Workload Type Interconnect Nodes 1 2 3 4 5 6 7 8 Disk Subsystem Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 16
Data Virtualization and Dynamic Resource Provisioning in DW • This concept can be extended to different applications OLTP DW DM Workload Type Interconnect Nodes 1 2 3 4 5 6 7 8 Disk Subsystem Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 17
Data Virtualization and Dynamic Resource Provisioning in DW • This concept can be extended to different applications OLTP DW DM Workload Type Interconnect Nodes 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 Disk Subsystem Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 18
Smart Inter Node Parallelism • Optimizer avoids inter node parallelism when possible reduced interconnect traffic faster execution time 1) node locality − If possible operations are executed on one node − When the DOP of an operation can be satisfied with resources of one server it executes locally 2) full partition wise join − If two tables are equipartitioned on their join key, the join can be divided into smaller joins between partitions 3) partial partition wise join − If only one table is partitioned on the join key, the other table is dynamically repartitioned on the join key to break the large join into smaller joins. Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 19
TPC-H Benchmark • The industry standard benchmark for data warehouse applications • Stresses grid based data warehouses: −Complex queries • Sequential scans of large amounts of data • Aggregations of large amounts of data • Multi-table joins • Extensive sorting of very large sets of data −Single-user test −Multi-user test −Parallel insert operations −Parallel delete operations Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 20
Benchmarked Configuration hp ProLiant DL585 Cluster 48P : : InfiniCon Systems InfiIO3016 : 12 x hp SAN Switch 2/16 2 x hp ProCurve Switch 4148gl 48 x hp MSA1000 12 x hp ProLiant DL585- Storage Area Network 4x AMD 848 Opteron™ 2.2GHz/1MB 8GB 2 x On-board NICs 6 x hp fca 2214 DC 1 x InfiniCon Systems InfiniServ 7000 HCA Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 21
Current results 1,000 GB Results System Availability Operating System Date Rank Company System QphH Price/QphH Database Submitted Cluster Oracle HP Integrity Superdome Enterprise Server Database 10g R2 Enterprise Edt w/Partitioning HP UX 11.i V2 64 bit 1 68,100 59.00 US $ 01/18/06 08/08/05 N SUSE LINUX Enterprise Server 9 IBM eServer xSeries 346 IBM DB2 UDB 8.2 2 53,451 32.80 US $ 02/14/05 02/14/05 Y Oracle 10g RAC with Partitioning Oracle Database 10g Enterprise Edition Oracle Database 10g Enterprise Edition IBM DB2 UDB 8.2 Red Hat Enterprise Linux AS 3 HP ProLiant DL585 Cluster 48P 3 35,141 59.93 US $ 10/21/04 10/22/04 Y Sun PRIMEPOWER 2500 34,492 155.99 Euros 03/08/04 4 09/08/03 N Solaris 9 Sun *** PRIMEPOWER 2500 34,492 140.96 US $ 03/08/04 11/13/03 N Solaris 9 IBM eServer p5 570 with DB2 UDB IBM AIX 5L V5.3 5 26,156 53.43 US $ 12/15/04 09/15/04 Y Microsoft Windows Server 2003 Datacenter Edition 64- bit Microsoft Windows Server 2003 Datacenter Edition 64- bit Microsoft Windows Server 2003 Datacenter Edition 64- bit IBM AIX 5L V5.2 Microsoft SQL Server 2005 Enterprise Edition 64bit NEC Express5800/1320Xe (32SMP) 6 22,967 68.51 US $ 12/07/05 07/19/05 N Microsoft SQL Server 2005 Enterprise Edition 64bit Unisys ES7000 Orion 440 Enterprise Server 7 21,505 41.92 US $ 12/07/05 06/27/05 N Microsoft SQL Server 2005 Enterprise Edition 64bit NEC Express5800/1320Xe (32SMP) 8 20,231 76.06 US $ 12/07/05 06/07/05 N IBM eServer p655 with DB2 UDB IBM DB2 UDB 8.1 9 20,221 69.41 US $ 06/08/04 12/08/03 Y Microsoft Windows Server 2003 Datacenter Edition Oracle Database 10g release2 Enterprise Edt NovaScale 5160 10 15,069 44.32 US $ 12/20/05 06/20/05 N Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 22
Result Analysis • Leadership performance −Query performance of 35,141 QphH @ 1000GB −Price-to-performance ratio of $60/QphH @ 1000GB Database grid of ProLiant systems with multiple Opteron–- x86 processors deliver performance comparable to large SMP systems The Linux operating system delivers the throughput and processing demands necessary to achieve the benchmark result Oracle’s 10g + RAC database delivers consistent, high performance query execution in large grid environments Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 23
Future Hardware for Grid – HP BladeSystems Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 24
Conclusion • Grid is ready for prime time • In grid computing resources are provisioned on demand and virtualized for applications to meet today’s challenging business needs • Commodity x-86 based servers and blade servers offer reduced total cost of ownership • Overcomes the natural limitations of SMP systems such as number of processors, memory and disk arrays Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Servers VLDB 2005 - 31st International Conference -Trondheim, Norway 4 January 2020 25