E N D
1. Glenn Paulley
Director, Engineering
http://iablog.sybase.com/paulley Self-management featuresin SQL Anywhere server
2. 2
3. Design goals of SQL Anywhere Ease of administration
Comprehensive yet comprehensible tools
Good out-of-the-box performance
“Embeddability” features ? self-tuning
Many environments have no DBA’s
Cross-platform support
32- and 64-bit Windows (XP, Server, 2003, 2000, 9x), Windows CE/Pocket PC, Linux 32- and 64-bit, HP-UX, AIX, Solaris (SPARC and Intel), Mac OS/X, Compaq Tru-64
Interoperability
3
4. Why is self-management important? In a word: complexity
Application development is becoming more complex: new development paradigms such as ORM toolkits, distributed computation with synchronization amongst database replicas, and so on
Databases are now ubiquitous in IT because they solve a variety of difficult problems
Yet most companies continue to own and manage a variety of different DBMS products, which increases administrative costs
Ubiquity brings scale, in several ways
To keep TCO constant, one must improve the productivity of each developer 4
5. Embeddability 5
6. Physical database design
7. Physical database design Logical and physical database design tradeoffs have a significant impact on performance
The fundamental problem is that the potential benefits of design changes must be amortized over the execution characteristics of a workload
Representative workloads are difficult to create
Issues include data skew, correlation, mitigating artificial contention that occurs as a result of simulating the actual system
Most DBMS vendors now offer tools to assist in index or materialized view creation (more on this later)
However, not all implications are covered by any of these tools 7
8. Physical database design with SQL Anywhere A SQL Anywhere database is composed of up to 15 dbspaces
Each dbspace is an OS file
One each used for temporary file, transaction log
Others are for user data
UNIX raw partitions are not supported
Image copy is done via simple OS file copy
No image copy utility is necessary
Easy to deploy a database to a separate machine or CE device
Files are interoperable on all supported platforms without user intervention
on-the-fly data conversion done when necessary 8
9. Physical database design By default, databases are created with
Case-insensitive searching enabled
Not blank-padded; no difference in the storage of CHAR vs. VARCHAR attributes on the disk
SQL Anywhere supports different character sets, collations for international contexts
Includes support for NCHAR data type using multi-byte UTF8 charset
Supported page sizes are 2K, 4K, 8K, 16K, 32K
All dbspaces must use the same page size
No hard limits on row length, number of BLOB columns
BLOB values (up to 2GB each) stored in separate page arena
9
10. Physical database design Database files grow automatically as necessary to accommodate any increase in data volume
Server can execute a user call-out procedure when a disk-full panic occurs
Server offers a temporary file governor to ensure that connections cannot be starved when they require space for intermediate results
Indexes on primary, foreign keys created automatically
Server will automatically detect redundant (duplicate or subsumed) indexes
two or more logical indexes share the same physical structure 10
11. Physical database design Transaction log
SQL Anywhere uses a logical logging and recovery scheme at the row level (not physiological, or page-based)
Transaction log can be translated directly into INSERT, UPDATE, DELETE SQL statements
DBTRAN utility ships with the software
Assists in recovery from catastrophic failures
Provides the backbone for two-way synchronization with Mobilink, SQL Remote 11
12. Self-management features
13. Memory management SQL Anywhere uses a single heterogeneous buffer pool with few predefined limits
Buffer pool can automatically grow and shrink depending on database workload and OS requirements for other applications
Buffer pool comprises
Table data pages
Index pages
Checkpoint log pages
Bitmap pages
Heap pages (data structures for connections, stored procedures, triggers)
Free (available) pages
All page frames are the same size 13
14. Page replacement scheme Buffer pool has no preset bounds for sort space, heap space, table pages
Buffer pool’s page replacement scheme must be able to manage various categories of pages and requests on a demand basis
Workloads change over time periods
Static configurations must tradeoff service levels for administrative effort
One basic requirement: avoid polluting the cache with the results of pages from sequential scans
Identify pages that are frequently used 14
15. Page replacement scheme Pages flow through the buffer pool over time
Window (the size of the cache) is stratified into eight segments based on time of reference
A page’s score is incremented as it moves from segment to segment
Need to differentiate reference locality (ie multiple references to a single page based on row fetch) from other forms of reuse
Replacement: based on individual page score
A modified `Clock’ algorithm goes through each page in turn; if a page’s score is less than a threshold value, page is reused
If page is retained, score is decayed
LRU-k and 2Q algorithms are frequently referenced in the literature—however we have found their overheads to be too high
15
16. Task scheduling model Basic idea: reduce server resource requirements by establishing a limited pool of ‘worker threads’ to execute requests
Reduces address space, virtual and real memory requirements
Introduces possibility of thread deadlock
Task implementation differs depending on the platform; fibers are used on Windows and Linux platforms
Scheduler assigns work across a pool of workers
Worker thread pool typically small, default is 20
Size of pool establishes the multiprogramming level
Workers are assigned dynamically to connections on demand
16
17. Task scheduling model Unassigned requests wait for an available thread
Server uses cooperative multitasking to switch between requests
Server supports intra-query parallelism
With queries, optimizer determines degree of parallelism at optimization time
17
18. Task scheduling model Waiting requests are candidates for swapping
Try to maintain constant memory footprint by using a virtual memory technique
Heaps for user connections can be ‘swapped’ to disk if server load demands it
Originally implemented in the first SQL Anywhere release in 1989
Pointers within the heap are swizzled to permit reloading of heaps on a demand basis
Diagnostic connections can use a special, reserved thread to execute its requests
Enables diagnosis of a ‘hung’ engine
18
19. Dynamic memory management A SQL Anywhere server will grow and shrink the buffer pool as necessary to accommodate both
Database server load
Physical memory requirements of other applications
Enabled by default on all supported platforms
User can set lower, upper bounds, default initial size
OS provides API to determine memory size (requires virtual memory support)
Implementation varies slightly by platform 19
20. Dynamic memory management Basic idea: match buffer pool size to operating system’s working set size
Feedback control loop
20
21. Memory allocation to server threads Query optimizer assumes the availability of an amount of available memory for each computation
Some execution operators (notably hash-based operators) have the ability to free memory without changing the computation (just making it less efficient)
Release of memory is done partition-at-a-time so effect is not abrupt
Other operators can be constructed with alternative plans that offer a cheaper execution technique if either
the optimizer’s choices are found to be suboptimal at run-time, or
the operator requires a low-memory strategy at the behest of the memory governor
21
22. Other self-management features On startup, buffer pool is automatically loaded with those database pages that were first requested on the last startup
SQL Anywhere supports ‘events’
fired on a schedule, or when specific server events occur, such as
Idle time greater than a threshold
Backup
Connect/disconnect
File size
Disk space
22
23. and adaptive query execution strategies Self-tuning query optimization
24. Automatic statistics management A feature of SQL Anywhere since 1992
Early implementations used a hash-based structure to manage column density and frequent-value statistics
See Lynch, 14th VLDB (1988) for a similar approach
Today
Self-tuning column histograms
On both base and temporary tables
Statistics are updated on-the-fly automatically
Join histograms built for intermediate result analysis
Server maintains index statistics in real-time
Index sampling during optimization 24
25. Column histograms Self-tuning implementation
Incorporates both standard [range] buckets and frequent-value statistics
Updated in real-time with the results of predicate evaluation and update DML statements
By default, statistics are computed during the execution of every DML request
Novel technique used to capture statistics on strings for use in optimizing LIKE predicates
Histograms computed automatically on LOAD TABLE or CREATE INDEX statements
Can be created/dropped explicitly if necessary
But retained by default across unload/reload
25
26. Query optimizer SQL Anywhere optimizes requests each time they are executed
Optimizer takes into account server context:
Working set
Available cache
Values of procedure, host variables
Assumption: optimization is cheap
SQL Anywhere uses a proprietary, cost-based join enumeration algorithm that primarily constructs left-deep trees
Optimization process includes both heuristic and cost-based complex rewritings
Advantages: plans are responsive to server environment, buffer pool contents/size, data skew; no need to administer ‘packages’ 26
27. Join enumeration Optimization space is over left-deep trees
Optimization process includes both heuristic and cost-based complex rewritings
Exceptions for complex, right-deep nested LEFT OUTER JOINS
Space pruning based on a ‘governor’ that allocates ‘quota’ to different portions of the search space
Join method, degree of parallelism, physical scan method (index selection), and expensive predicate evaluation are also part of search space
Each choice is re-evaluated at each step
Superb optimization times even for complex queries
No hard limits – tested with
500 quantifiers in a single block
100-way join on a CE device with 3MB buffer pool 27
28. Bypassing the query optimizer Simple, single-table queries are optimized heuristically
Access plans for queries in stored procedures/triggers/events are cached
Plans undergo a ‘training period’ where plan variance is determined
If no variance (even without variable values), plan is cached and reused
Query is periodically re-optimized on a logarithmic scale to ensure plan does not become sub-optimal 28
29. Adaptive query processing In some cases the optimizer will generate alternative access plans that can be executed if actual intermediate result sizes are poorly estimated
Server switches to alternative plan automatically at run time
Memory-intensive operators, such as hash join, have low-memory strategies that are used when buffer pool utilization is high 29
30. Adaptive query processing Some operations, such as database backup, contain sampling processes to determine the characteristics of the I/O device used for storage
Primary goal is to determine the number of disk heads available
Processes can utilize the ‘right’ number of CPUs to maximize throughput
Algorithms are sensitive to CPU requirements of the rest of the system, and automatically scale down CPU usage as necessary 30
31. Intra-query parallelism SQL Anywhere’s approach is to parallelize a left-deep plan when doing so is advantageous
There is an upper bound to the number of EXCHANGE operators in a data-flow tree; can be set by the user to override
Work is partitioned independently of the availability of worker threads at run-time
Plans are largely self-tuning with respect to degree of parallelism
Prevents starvation of any specific subplan when the number of available workers is less than optimal for some period 31
32. Management tools
33. Management tools Framework
DBISQL – interactive SQL query tool
Sybase Central – administration/development tool
Tools
Graphical query plans
Stored procedure debugger
Stored procedure profiler
Index consultant
Request-level logging
Optimization logging 33
34. Graphical query plans 34
35. Stored procedure debugger Enables debugging of a PSM or Java stored procedure
Features:
set/unset breakpoints and watch settings
determine value of variables
traverse the call stack
execute queries with the values of variables at the time of the break 35
36. Application Profiling scenarios 36
37. Application Profiling architecture 37
38. Stored procedure profiler Offers the ability to analyze the performance of a stored procedure/trigger/event
Computes the cumulative invocations and elapsed time for each statement in a procedure 38
39. Index consultant Recommends indexes to improve query performance
Main idea is to improve application performance
Particularly useful when DBA has limited experience
Permits “what-if” analysis on an existing query load and database instance
Allows DBA to find statements that are the most sensitive to the presence of an index
Can be used to find indexes that are unnecessary, i.e. those that are not utilized by the optimizer
Can be used to estimate disk storage overhead of additional indexes 39
40. Future plans Increasingly difficult to keep optimization costs low in the face of increasingly more challenging workloads
Tradeoff time for space (cached access plans)
Advantages of plan caching are workload-specific; caching expense pays off only if optimization costs can be amortized over several executions
Improve statistics, cost model management to permit self-repair, which will improve access plan quality
Numerous other opportunities for self-tuning using feedback control mechanisms
Memory allocation, multiprogramming level, intra-query parallelism controls, additional adaptive query optimization techniques
Integrate server internals management with VM hypervisors to improve operation within guest VMs 40