240 likes | 340 Views
The Thoroughly Modern Mainframe. Dr. Michael Salsburg NTSMF Users' Group Dec 9, 2002. Agenda. Large Scale WINTEL Servers Disruptive technology or trend? Scale Up or Scale Out ? A Workload-motivated discussion of SMP and CC-NUMA PCI-Based I/O Consolidation Emerging Technologies.
E N D
The Thoroughly Modern Mainframe Dr. Michael Salsburg NTSMF Users' Group Dec 9, 2002
Agenda • Large Scale WINTEL Servers • Disruptive technology or trend? • Scale Up or Scale Out ? • A Workload-motivated discussion ofSMP and CC-NUMA • PCI-Based I/O • Consolidation • Emerging Technologies
Server Industry Trends Source: IDC Intel will dominate server chip market Windows 2000 will be pervasive server OS
A Comparison using Moore’s Law • Comparison of CPU Speeds / tpcM for 4x cpu WINTEL systems
Scale Up or Scale Out? • Two of the 3-tiers in current application architectures use scale-out for growth • Increase # of Web servers • Increase # of Application Servers • Database back end cannot be scaled out • Scale up is needed for large database applications • Scale out has some inherent down sides • additional administrative/management attention • Move “headroom” needed for heavy traffic
SMP / NUMA Workload Discussion • As code executes on the processor, memory is referenced. This can be broken into three regions • High Locality of Reference • Memory is immediately re-referenced (> 95%) • Working Set – the set of addresses on which the software primarily focuses • Persistent Storage – addresses that are stored on physical devices
Scale Out- SMP or NUMA? Workload Interference • When two processes are running on the same system, their memory references will interfere. • It is preferable to only interfere at the persistent storage level • Interference at higher levels can decrease cache efficiency and slow down processing, effectively reducing the CPU power
SMP / NUMA SMP Topology • A bank of CPUs share a bank of Memory • Each CPU has a local cache to optimize high locality of reference • A cache miss has uniform latency time to get data from memory • “Dirty” memory references require fetching the updated memory from another CPU’s cache • The CPU can “stall” waiting for a memory reference
ReferenceLevel Reference Level Percentage TimeUnits CPUCache CPU Cache 98.0% 1 MainMemory Main Memory 01.9% 100 PersistentStorage RemoteCache 200 00.1% PersistentStorage 10,000 SMP / NUMA Workload Discussion • Percentages of references based on TPC-C workload profile • Relative time units show orders of magnitude between cache hit and persistent storage
SMP / NUMA NUMA (Non-Uniform Memory Access) • Overcome bus congestion and physical fabrication limitations found in a single bus architecture • Two memory latencies – near and far • The NUMA ratio is the ratio of far latency over near latency • Originally 30, now it is around 3
SMP / NUMA Hybrid (Unisys ES7000) • Another level of cache is introduced • Memory accesses can be non-uniform when comparing Next Level Cache hits to memory references • Overcomes the fabrication/congestion problems of a single bus architecture
PCI-Based I/O Cellular MultiProcessing (CMP) Architecture
6.4 GB 6.4 GB PCI-Based I/O SP2 Scalability Port 533MHz 4 PCI-Express 16X, or 8X@5Gb HyperTransport PCI-X 3 SP1 GB/sec Max Bus or per direction 266 2 1X 4X 8X 133 1 0.8 GB PCI 66 2001 & earlier 2002 2003 2004 2005
Enterprise-Level Backup / Restore • Complete recovery of a 2.5 terabyte database: • From tape, the database was recovered in only 88 minutes with a sustained throughput during restore of 2.2 TB/hr. • From the hardware snapshot, the same database was recovered in only 11 minutes. • Complete backup of a 2.5 terabyte database: • Backup to tape took only 68 minutes with minimal impact on online operations and sustained throughput of 2.6 TB/hr.
Consolidation • "[Our] servers were multiplying like rabbits," says Jeff Smith, manager of corporate network services at La-Z-Boy Inc., a Monroe, Mich.-based residential furniture producer that just completed a Windows NT server consolidation project. "Our distributed environment was becoming more and more difficult to manage." • Thinning The Server RanksComputerworld Aug 26, 2002
Consolidation • How do you stuff over 130 CPUs’ worth of workload into a 32x CPU system? • Veeerrrry carefully…… • Why are current server farms filled with under-utilized servers? • Web Hosting Sites • “New web servers are installed when Peak CPU utilization reaches above 35%.” • “Speed and reliability are very important to your web site. All of our servers are maintained at less than 15% CPU utilization. This ensures that your web site downloads as fast as possible!”
ConsolidationResponsive Consolidation • Which would you prefer – an average queue size of 0.2 on a 1x or a 32x system?
ConsolidationBenefits • Simplified Management / Administration • Higher Utilization (less “headroom”) • Less Variability of Service • Less Overall CPU Overhead • Less software licenses
Itanium IIWhat’s so great about 64 bits? • For transaction processing, memory addressing is increased and therefore the amount of main memory increases • The top 5 TPC-C results were achieved using 64 bit computing • TPC-C is a large database application – this is a sweet spot for 64 bit commercial computing Bigger is DEFINITELY Better!!