450 likes | 683 Views
Computer Architecture Support for Database Applications. Erhan Erdinç Pehlivan. Outline. Introduction Methodology of the Experiment Analysis of OLTP workloads Analysis of DSS workloads Conclusion. Introduction.
E N D
Computer Architecture Support for Database Applications Erhan Erdinç Pehlivan
Outline • Introduction • Methodology of the Experiment • Analysis of OLTP workloads • Analysis of DSS workloads • Conclusion
Introduction • Today Database workloads alone motivatethe sale of vast quantities of symmetric multiprocessor (SMP) machines,
Introduction • Unfortunately, due to some challenges, commercial applications areoften ignored in preference to technical benchmarks, such as SPEC(Standard Performance Evaluation Corporation) • Reasons • Complex standardized benchmarks. • Large hardware requirements for full scale. • Numerous configuration parameters. • Lack of useful proprietary information.
What is SMP • method of work management that treats all processors equally • threads that can run concurrently on any available processor • improves the total throughput of the system • requires applications that can take advantage of multi-threaded parallelism
SMP(Continued) • Advantages of SMP • High performance • Simplicity to program • Easier load balancing • Disadvantages of SMP • Low availability • Low scalability
Database Workloads • OLTP(Online transaction processing) • Ex : Airline reservation systems • DSS(Decision Support Systems) • Ex: Datawarehouse systems
Characteristics of OLTP and DSS • OLTP • uses short, moderately complex queriesthat read and/or modify a relatively small portion of the overall database. • have a high degree of multiprogramming, • DSS • typically long-running, moderately to very complex queries, that scan large portions of the database in a read-mostly fashion. • The multiprogramming level in DSS systems is typically much lower than that of OLTP systems.
Motivation • Since SPEC evaluations don’t hold for DBMS, architectural behavior of two standard database workloads will be investigated in terms of • cycles per instruction (CPI) decomposition, • cache miss rates, • branch behavior. • superscalarness, • out-of-order execution
Methodology : Experimental Platform • a commodity four-processor Intel-based SMP server running Windows NT is chosen.
Software Architecture(OLTP) • Transaction ProcessingCouncil’s TPC-C benchmark
Software Architecture(DSS) • Transaction Processing Council’s TPC-D benchmark • the activity of a wholesale supplier in doing complex businessanalysis. • analysis: pricing and promotions, market share study,shipping management, supply and Demandmanagement, profit and revenue management and customer satisfactionstudy. • 17 read-only queries and 2 update queries,
Potential sources of stalls • misses to the L1 instruction cache • a branch misprediction • the instruction mix of the workload • the out-of-order execution engine
Measurement Methodology • NT performance monitor • Pentium Pro hardwarecounters. • Intel tool called emon
Analysis of OLTP Workloads • OLTP does short, moderatelycomplex transactions • small, random I/O operations • large number of concurrent users,a high degree of multiprogramming. • database implements locking,logging • The combination of these tasks : • Large instruction working set • Larger data footprint
Experimental Results: Memory System Behavior • How do OLTP cache miss rates vary with L2 cache size?
Experimental Results: Memory System • What effects do larger caches have on OLTP throughput and stall cycles?
Experimental Results: Processor Issues How useful is superscalar issue and retire for OLTP?
Experimental Results: Processor Issues • How effective is branch prediction for OLTP?
Experimental Results: Processor Issues • Is out-of-order execution successful at hiding stalls for OLTP?
Experimental Results: Multiprocessor Scaling Issues • How well does OLTP performance scale as the number of processors increases?
Experimental Results: Multiprocessor Scaling Issues • How do OLTP CPI components change as the number of processors is scaled?
Experimental Results: Multiprocessor Scaling Issues • How prevalent are cache misses to dirty data in other processors’ caches for OLTP?
Experimental Results: Multiprocessor Scaling Issues • Is the four-state (MESI) invalidation-based cache coherence protocol worthwhile for OLTP?
Experimental Results: Multiprocessor Scaling Issues • How does OLTP memory system performance scale with increasing cachesizes and increasing processor count?
Analysis of Decision SupportWorkloads • DSS queries are typically long-running, moderately to very complex queries, • Scan large portions of the database in a read-mostly fashion. • Largesequential disk I/O read operations. • The multiprogramming level in DSS systems is typically lower than thatof OLTP systems.
Experimental Results:Memory System Behaviour • How do DSS cache miss rates vary with L2 cache size?
Experimental Results:Memory System Behaviour • What impact do larger L2 caches have on DSS database performance and stallcycles?
Experimental Results:Memory System Behaviour • How prevalent are cache misses to dirty data in other processors’ caches inDSS?
Experimental Results:Memory System Behaviour • Is the four-state (MESI) invalidation-based cache coherence protocolworthwhile for DSS?
Experimental Results:Memory System Behaviour • How does DSS memory system performance scale with increasing cache sizes?
Experimental Results: Processor Issues • How useful is superscalar issue and retire for DSS? BEHAVES LIKE OLTP
Experimental Results: Processor Issues • How effective is branch prediction for DSS?
Experimental Results: Processor Issues • Is out-of-order execution successful at hiding stalls for DSS?
Conclusions for OLTP • out-of-order execution is only somewhat effective for this database workload. • increased superscalar width for the out-of-order engine may be helpful. • Innovation needed in branch prediction algorithms and hardware structures to better support database workloads. • caches are effective at reducing the processor traffic to memory • Three-state (MSI) cache coherence protocol would be better • the amount of time when the memory system is unavailable decreases with larger caches, increases with # of processors
Conclusions for DSS • out-of-order execution provides potentially more benefit for DSS than OLTP • DSS performance is less sensitive to L2cache size than OLTP performance. • Existing branch prediction schemes aremore effective for this workload. • Increasing the micro-operation retire width in the Pentium Pro’s out-of-order RISC core may provide performance improvements • Dirty misses are less prevalent for DSS than OLTP.