270 likes | 288 Views
Elements of Performance. Jacques Roy jacquesr@us.ibm.com. Agenda. Hardware Performance Concepts Operating Systems Block diagram File system I-Node Multi-threading Network Protocol Database Servers Design DB Technology Highlights. CPU. cache. System bus. Memory. Disk Controller.
E N D
Elements of Performance Jacques Roy jacquesr@us.ibm.com
Agenda • Hardware Performance Concepts • Operating Systems • Block diagram • File system • I-Node • Multi-threading • Network Protocol • Database Servers Design • DB Technology Highlights
CPU cache System bus Memory Disk Controller Hardware Performance IBM eServer P570 • Processor:1.9GHz dual-core CPU (~1900M instructions/sec per core)cache: 32-64KB L1 (per core), 1.9MB L2 (per core), 32MB L3(L3 cache transfer rate: 48GB/sec) • Memory: 10.6GB/sec DDR1 or 24.98GB/sec DDR2 • Up to 128GB • PCI-X bus: 1GB/sec • Disks: 15,000 RPM (rotation latency: 0.002 sec)(during the latency time, the CPU can execute 3.8 million instructions) • Network: 10/100/1000 Mbits/sec1000 MBits ≈ 100MB/sec • User: 120 words/min = ~600 chars/min0.00001 MB/sec Performance goes down dramatically as we move away from the CPU
Operating Systems • Extended Machine • Higher level of abstraction to the hardware components • Makes it easier to program • Share CPU, Memory, Disks, etc. • Optimizes use of components • Security • Protect users from each other • Virtual memory • Optimize component utilization • Scheduling – batch processing vs. multi-processing • Multi-threading • Processor Affinity
System Calls Process Control Subsystem File Subsystem Buffer Cache Memory Management Hardware Control Block Diagram of an Operating System USER Context-switching Char Block Device Driver
Performance: Reading a Disk Device • Issue a system call • Context switch • Ask the device for the information • Seek to the proper cylinder • Wait for the data to be under the read head • Put the information in a buffer • Context switch back to user space • Need to optimize access to a disk • Read at least one block at a time
File Systems . . . Cylinder group Cylinder group Superblock Bitmap I-Nodes Data blocks Block size: BSD Fast File System 4096 Original Unix: 1024-byte blocks Disk
I-Node Data Blocks Admin Info ... Direct0 ... Direct10 PTR 1 PTR 2 PTR 3 I-Node data pointers
I-Node Take Away Points • An i-node is a management structure • Contains pointers to data blocks • Allocation by block, not bytes • Data block size may be configurable at the file system level • Use of indirect pointers increase the number of I/O required to get to the data • Indirect block pointers add to the memory overhead • Less memory for real data, impacts performance
Process Kernel entity Virtual memory map File descriptor table Process privileges Program counter Stack Thread User entity Program counter Stack Multiprocessing vs Multithreading
Benefits of Multithreading • Lower Resource Consumption • Used the process resources • Code is trusted within the process • Scheduling, no context switching • Code Simplification • Ex: asynchronous events • Higher Performance
In Solaris, creating a process is about 30 times slowerthan creating a thread, synchronization variables areabout 10 times slower, and context switching about5 times slower. “Threads Primer”, page 21
Application Conceptual Layering TCP UDP Internet (IP) Network Interface OS Hardware Network Protocol • IP • Internet Protocol • UDP • User Datagram Protocol • Unreliable connectionless delivery service • TCP • Transport Control Protocol • Reliable stream transport
preamble dest addr src addr frame type frame data CRC 8-bytes 6-bytes 6-bytes 2-bytes 64-1500 bytes 4-bytes DATA Datagram header datagram data area header data Network Protocol • Ethernet frame format(max size 1518 bytes, 26 bytes overhead) • Internet DatagramHeader: 24 bytes • TCPHeader: 24 bytes Total overhead: 74 bytes
Other Network Considerations • Protocol layering • Fragments re-assembly • Routing(goes up the protocol stack) • Collisions, timeouts, retransmission, acknowledgements “Experiments at Berkeley have shown that the same TCP that operates efficiently over the global Internet can deliver 8Mbps of sustained throughput of user data between two workstations on a 10Mbps Ethernet.” “Internetworking with TCP/IP”, page 222 Network communication can be much slower than hardware speed!
Overall System Consideration Example • Late ’80s pharmacy system • Remote system connected to the pharmacy • Need a new system to handle higher transactions • Concern: Will the new computer be fast enough to handle the load? • Target: 25 transactions per second • What should you ask?
Programming and Algorithms • Performance • Solving the right problem • Code path, space, I/O (disk, network) • Frequency of use • Function calls and arguments • Code duplication • Data Structure • “Representation is the essence of programming” • Save space, simpler/faster access • Algorithms • Scalability: bubble sort vs. quick sort O(n2) vs. O(n log n) • Fundamentals: sorting, grouping, searching Under pressure, brute force often replaces proper thinking
Programming and Algorithms “The cheapest, fastest and most reliable components of a computer system are those that aren’t there”Gordon Bell, Encore Computer Corporation • Take a step back to look at your problem • General solution • Put the problem in the proper context • End-to-end view • Think about large volumes • The solution may work fine with a few users or records but does it scale? • Limit your solution • Don’t add unneeded flexibility Thomas Watson sr.
Programming and Algorithms Example This is from a real-life situation: • Problem:Create monitoring agents with alarm threshold to monitor database activities such as number of users, space utilization, CPU utilization, etc. • Original solution:Create one agent per monitored activity • New solution:Look at what the agents have in common, create one agent that can replace all other agents with SQL statements • Benefits:Less code to maintain, smaller program, better use of memory, easier to add new agents
Database Server • Higher level abstraction for data access • Easier programming • Shares database server resources between users • Processing, storage, communication • Optimizes access to data • Security • Protect users from each other, protects data • Optimize component utilization • When a user waits for something, the server can do work for another user Similar to an operating system
DB Server system Optimization • CPU • Process reuse, multi-threading, services (PIO, LIO, Logging) • Memory • Shared memory, buffer mgmt, page cleaners, data structures • Disk • Disk space mgmt, async I/O, pre-fetch, buffering, partitioning, Indexes • Network • Connection management
SQL Statements Processing • Statement parsing, syntax validation • Query plan generation • Plan selection • Query execution • Reducing overhead by: • Not submitting un-necessary statements • Using prepared statements • Use static statements (not available in Informix) • Additional possibility: • Shared statement cache
DB Usage Approach • Database usage should be considered at the functional analysis time • Carefully review any complex SQL statements • Could be poorly written SQL • Could point to schema changes • Look at query plans • Table scans could be avoided by creating an index • Limit data transfer • If possible return the answer, not the data to compute the answer We’ll revisit this multiple times
Implication of DB proper usage • Take advantage of optimized algorithms • Sorting, grouping, disk access (buffering, read-ahead), etc. • PERFORMANCE! • Reduce network traffic • Return only the needed answer, not the raw data • PERFORMANCE! • Reduce application complexity • Set processing in the database • Complex selection in the database (ex: spatial) • Concurrent access to data, transactions, etc. • Reduced application code • PERFORMANCE! TIME TO MARKET! MAINTENANCE COSTS!
Summary • Computer components have different performance characteristics • Keeping processing closest to the CPU improves performance • Databases must be an integral part of a solution design • Database should not be used only as a persistence storage • Equivalent of not starting a car! • Optimal DB use provides multiple benefits • Cost savings for licenses and hardware • Faster development (simplified environment and code) • Easier maintenance • BUSINESS ADVANTAGE!
Suggested Reading • “The Design of the UNIX Operating System”Maurice J. Bach, ISBN 0-13-201799-7 • “Internetworking with TCP/IP”, volume 1Douglas E. Comer, ISBN 0-13-216987-8 • “Programming Pearls”Jon Bentley, ISBN 0-201-65788-0