E110 Best Practices for Implementing with Adaptive Server IQ

E110Best Practices for Implementing with Adaptive Server IQ • Mark Mumy • Principal Systems Consultant • Business Intelligence Division • mark.mumy@sybase.com

About This Presentation • Hardware Configuration • Creating the Database • IQ-M Server Settings • Connectivity • Data Model Recommendations • Database Programming • Data Manipulation • Maintenance Tasks • Memory Usage

Target Audience • Database Administrators • All Levels • Query Developers • All Levels

For This Presentation • Let’s Keep It Interactive • I will entertain questions on a subject • Would be happy to speak to you offline, if desired • No Question is too Basic • Chances are others may have the same question • Open Question Forum at the end • Time Permitting

Hardware Configuration –Drive Arrays • Volume Manager • Not a necessary or recommended component for IQ-M • Additional overhead and software that has no value add to an IQ-M installation • Acceptable for an IQ-M single node operation where hardware does not have the ability to apply RAID to the disk devices • RAID Level • Recommend using RAID 5 as a blend between performance, protection, and cost • Recommend using raw devices (a must for multi-node IQ-M) • For details on drive array specifications and configurations see the Sun Reference Architecture Whitepapers and EMC Whitepapers on array configurations with IQ-M

Hardware Configuration –Memory • How Much RAM for IQ-M • As much as possible! • Most systems should have at least 2 GB • Don’t forget additional RAM for each load being performed

Hardware Configuration –Processors and Disk Controllers • Optimal CPU configuration • Queries: 1 CPU per active query – more if queries are complex and can be run in parallel • Loads: 1 CPU per HG index plus 1 CPU per 2-5 columns being loaded • Disk Controllers • 1 fiber controller per 5 write CPU’s • 1 fiber controller per 10 read CPU’s • In mixed mode operation use the 5:1 ratio • Recommend a minimum of 2 controllers

Creating The Database –Creation Options • Creation Options • Block Size and Page Size • Minimum should be 128K Page Size (8K Page) • Use 256K Page for larger databases • Larger the page size the more RAM that will be required • CASE IGNORE vs. CASE RESPECT • Use CASE RESPECT whenever possible • Removes case comparison steps and improves performance • Java in the Database • Install if Java will be needed in the database • No impact if installed and not used

Creating The Database –File Placement • Use relative links – makes it easier to relocate files • Use symbolic links for IQ MAIN and IQ TEMP • /IQM_devs/IQ_MAIN_00 • /IQM_devs/IQ_TEMP_00 • Place the transaction log and transaction log mirror on significantly large file system • Recommend 5 GB of filesystem space per IQ-M node for database related storage • Transaction logs, IQ MSG, scripts, catalog file, etc

Creating The Database –Files Locations • Filename rules • Keep the names unique for each IQ-M node • Catalog: DATABASE_NAME.db • Transaction log: DATABASE_NAME.tlog • IQ MSG: DATABASE_NAME.iqmsg • IQ Main Store: DATABASE_NAME_iqmain_000 • 000 would be an incrementing number • IQ Temp Store: DATABASE_NAME_iqtmp_000 • 000 would be an incrementing number • IQ Configuration File: DATABASE_NAME.cfg

IQ-M Server Settings –Query Performance • FORCE_NO_SCROLL_CURSORS • Should always be set to ON • Very few applications require this to be OFF • Can improve query performance

IQ-M Server Settings –Query Plans • Query Plan Settings to Provide Optimal Query Information to DBA’s and Sybase Engineering • set temporary option query_plan='on'; • set temporary option query_plan_after_run='on'; • set temporary option query_timing='on'; • set temporary option query_detail='on'; • set temporary option dml_options10='on'; • set temporary option query_plan_as_html='on'; • set option QUERY_NAME = ‘Query Name‘ • Should not be set globally as the IQ MSG file will grow rapidly

IQ-M Server Settings –Storage • Append_Load • Can be used to improve load performance • Will not reuse Row ID’s or the space occupied by those Row ID’s • Great for systems where large, contiguous chunks of data are deleted • May not good if random rows are deleted as it can lead to fragmentation and allocated, but unused space

IQ-M Server Settings –Storage Continued • Disk_Striping • If ON, IQ-M will stripe writes to all available devices • If OFF, the first device must be full before the next is used • Disk_Striping_Packed • If ON, it forces better space usage and less fragmentation • Fragmentation is indicated when out of space messages are returned but the main dbspace used is less than 100% full • Allows Adaptive Server IQ to better utilize small pieces of unused space that remain after compression

IQ-M Server Settings –Data Loads • LOAD_MEMORY_MB • Set to 0 (default) on systems with enough RAM • Set to something <= 500 on systems where RAM is tight or there are simultaneous loads taking place

Client Access and Network Connectivity • Different internal environments and settings for ODBC, JDBC, and Open Client connections • JDBC and ODBC are recommended • JConnect is Sybase’s implementation of JDBC • Use the JDBC 2 driver (JConnect 5.2/5.5) com.sybase.jdbc2.jdbc.SybDriver • Open Client • Use with caution • Most applications written using Open Client expect an ASE server

Client Access and Network Connectivity –Open Client vs. ODBC • See Chapter 32 of the ASA Users Manual as well as the ASA Reference Section “Transact-SQL and SQL/92 compatibility options”for complete list of differences • If writing stored procedures or embedded application code, make sure to explicitly make settings for compatibility as these options will get set to different values for Open Client vs. ODBC connections • ALLOW_NULLS_BY_DEFAULT • QUOTED_IDENTIFIER • STRING_RTRUNCATION • ANSI_BLANKS • ANSINULL • CHAINDED

Client Access and Network Connectivity • AutoPreCommit Within ODBC • Set registry setting AutoPreCommit to Y • Forces applications to issue a COMMIT before each query • Go to the registry and update the corresponding Sybase Data Source Name (DSN) created, by adding a new value 'AutoPreCommit’ with a value of ‘Y' HKEY_LOCAL_MACHINE/SOFTWARE/ODBC/ODBC.INI/{DSN} • Packet Sizes • Larger packet sizes will help with large data retrieval • Use –p option in IQ-M configuration file to increase size • Use CommBufferSize parameter in ODBC connection string

Client Access and Network Connectivity • Network Speed • Data retrieval will be as fast as the network • 100 MB of data will take 80 seconds on a 10 Mbit LAN • 100 MB of data will take 8 seconds on a 100 Mbit LAN • 100 MB of data will take 0.8 seconds on a Gigabit LAN • LAN speed may be the performance bottleneck for queries that return large amounts of data • The faster the network cards and LAN the better off concurrency will be: more available bandwidth per user • Network Interface Cards • Adding multiple network cards to the IQ-M node will help with network

Client Access and Network Connectivity • ASE Component Integration Services (CIS) • Can have ASE CIS reference IQ-M tables (proxy tables) • Use ASE 12.5 CIS class of ASIQ • Prior to to ASE 12.5 ASAnywhere or ASEnterprise classes had to be used • Not a viable option if joining non-IQ-M tables with an IQ-M proxy table • Can map multiple ASE logins to a single IQ-M login • Data modifications should not be performed on proxy tables • Can map ASE proxy tables to stored procedure calls in IQ-M

Client Access and Network Connectivity • IQ-M Component Integration Services (CIS) • Can only be used on the Solaris and WinNT versions of IQ-M • Can map to tables in Oracle (ODBC), ASE (ODBC & JDBC), ASA (ODBC & JDBC), DB2 (ODBC), MS SQL Server (ODBC), and any ODBC source • Not a viable option if joining IQ-M tables with non-IQ-M proxy tables • Data modifications may be performed on proxy tables

Data Model Recommendations –IQ UNIQUE • Use whenever possible to help save storage space and possibly improve query performance • Not necessary for cardinality greater than 64K (65536) • Not necessary for datatypes of tinyint or char(1) • Does not need to be exact, but should be close to cardinality • If the value is <= 255 then IQ-M will place a 1-byte FP index on the column – 1 byte of storage per row • If the value is > 255 but <= 65536 then IQ-M will place a 2-byte FP index on the column – 2 bytes of storage per row • May slightly hinder data loads, but improve query speeds

Data Model Recommendations –NULL Values • Always specify NULL or NOT NULL • Open Client and ODBC connections have different defaults • Allows the optimizer a better guess at join criteria • NULL data does not save space on the database page as it would in ASE • Will be compressed out when stored on disk

Data Model Recommendations –Unsigned Datatypes • Use unsigned datatypes where possible • Use for surrogate keys and join columns • Unsigned data comparisons are quicker • The caveat to this is that Open Client may misinterpret the value if it is too large as it does not understand large unsigned data • Can convert to signed integer, numeric, or decimal if returning data to an Open Client application • This caveat applies to moving data between IQ-M servers with INSERT FROM LOCATION

Data Model Recommendations –Long Varchar and Long Varbinary • Can be used to store large amounts of text or binary data • VARCHAR() or VARBINARY() datatypes • Maximum width is 32K (32768) bytes • The WORD index is the only index allowed on data 256 bytes or wider • Storage will be allocated in 256 byte chunks • A 257 byte string will require 512 bytes of storage • Much less than the 2K requirement in ASE TEXT and IMAGE columns

Data Model Recommendations –Varchar vs. Char • Use CHAR() whenever possible • All data in IQ-M is stored fixed width • VARCHAR() types only add overhead • A VARCHAR(100) columns will require 101 bytes of storage • 100 bytes for data • 1 byte for the size of data • CHAR() data is blank padded, VARCHAR() is not

Data Model Recommendations –Proper Datatype Sizing • Use the smallest datatypes possible for data • Be aware of all datatypes in IQ-M – there may be more than you know • If hour, minute and second information is not necessary, use DATE instead of DATETIME • If the data will fit within a TINYINT or SMALLINT datatype use that rather than INTEGER or BIGINT • Allows the engine to store data in smaller units (1-byte TINYINT or 2-byte SMALLINT versus 4-byte INTEGER or 8-byte BIGINT • Don’t over allocate storage when defining NUMERIC() or DECIMAL() as it can be costly for data that doesn’t need all that space

Data Model Recommendations –When and Where to Use Indexes • Always use indexes on: • Join columns • Searchable columns • Aggregation columns • If uncertain, place an LF or HG index on the column • A column with an HNG, CMP, or WD index should have a corresponding LF or HG index (very rare circumstances will negate this) • Indexes are not needed on columns whose data is ONLY returned to the client (projected) • Use Primary Key or UNIQUE HG indexes where appropriate

Data Model Recommendations –Multi-column Indexes • Currently, only UNIQUE HG and PRIMARY KEY indexes support multiple columns • HG inserts are the most expensive in IQ-M • Try to guarantee that inserts will happen at the end of the index • Place generally incrementing data at the beginning of the index list • For instance, a transaction date or batch number • Something that will try to guarantee a sequential key

Data Model Recommendations –Join Column • Prefer joining integer datatypes (unsigned if possible) • Integer comparisons are quicker than character comparisons • Keep the datatypes as narrow as possible to improve join performance

Data Model Recommendations –Primary Keys • Multi-column primary keys should have an additional LF or HG index placed on each individual column • Must be done manually • UNIQUE CONSTRAINT, UNIQUE HG, and Primary Key are identical structures: HG index with no G-Array • Use when possible • Helps optimizer make more informed query path decisions even if index is not used in joins or searches • You get an HG Index created automatically • This HG index has no G-Array (uses less space)

Data Model Recommendations –Temporary Tables • 3 Types of Temporary Tables • “#” tables • create table #temp table ( col1 int ) • Local Temporary Tables • declare local temporary table temp table ( col1 int ) • Behave just like “#” tables • Global Temporary Tables • create global temporary table temp table ( col1 int ) • Tables structure is static across connections and reboots

Data Model Recommendations –Temporary Tables • On Commit Preserve Rows • Use this option so that rows in temporary tables remain after the transaction has been committed • Temporary tables are available at the current level (parent) and all of its children • A parent cannot see a child's temporary table(s)

Data Model Recommendations – Cursors • Avoid using cursors • Generally means row based processing • IQ-M was designed for set based processing • Sometimes they cannot be avoided • Open With Hold • Allows the cursor to remain open across transactions • If not used, the cursor is closed when a commit is issued

Database Programming Language –Watcom SQL vs. T-SQL • IQ-M (ASA) is not 100% T-SQL Compatible, but very close • Recommend using Watcom SQL • All system procedures written with it • Many more code examples and more IQ-M people versed in it • Watcom SQL has some extensions that T-SQL does not: • Dynamic SQL • Better Loop control • Full cursor movement rather than just read next • Batches and procedures must be written in the same dialect • Cannot mix T-SQL with Watcom SQL

DECLARE CURSOR GOTO IF PRINT RAISERROR SET WHILE (T-SQL) vs. LOOP Global variables Variable Names CALL FOR ASA requires variables to be declared immediately after a BEGIN Database Programming Language –Watcom SQL vs. T-SQL • Behavior differences include:

Database Programming Language –Commit and Rollback • Use transaction control around logical units of work, even read only queries • Should commit before a read/write batch is started to ensure latest version of data is available • Should issue commit and rollback after batch completion to release all query resources • Rollback will free memory resources in use by previous operations

Database Programming Language –Custom Functions • Custom functions can be written in either SQL or Java • Great way to encapsulate business logic for transforming data • Can have a significant performance impact on queries • Functions are executed in the catalog portion of the engine • All result rows may need to be moved to ASA • Can be time consuming for large result sets • Turn on query plans to see what impact the functions have on effective query plans

Database Programming Language –Outer Joins • T-SQL Outer Join: *= or =* • ANSI Standard/Watcom SQL Outer Join: [left|right|full] outer join • Be careful of outer join syntax • The T-SQL syntax can be very ambiguous or non-deterministic for IQ-M to translate • All T-SQL outer joins must be converted to ANSI outer joins and then processed • Use the ANSI standard instead as they are not ambiguous and are always clear in their meaning • Visit http://my.sybase.com/detail?id=1017447 for more details

Data Manipulation –Load Table • Parallel Load Table • Make sure to put column delimiter after last column • Must use ROW DELIMITED BY and DELIMITED BY options in load table command • Column and row delimiters must be a single character • 1 to 4 byte delimiters allowed for serial loads

Data Manipulation –Load Table • If possible, perform load table from binary datafiles • Can be 3 to 10 times faster than ASCII loads • Can use FILLER() clause with a delimiter or byte count • Better performance achieved by casting the date or datetime formats rather than letting IQ-M guess • If possible, issue a single load table with multiple files rather than 1 load table per file to be loaded into a table

Data Manipulation –Insert From Location • Great way to move data from any Open Client source • Syntax: insert into TABLE() location ‘SERVERNAME.DBNAME’ { select statement }; • IQ-M username and login must match on remote system • Interfaces entry must match the SERVERNAME • Can also be used to move data quickly from an ASA table to an IQ-M in the same server

Data Manipulation –Single Row Operations • Avoid at all costs for large data manipulation operations • Different from single statement operations that modify many rows • Individual INSERT … VALUES() will be slower than bulk load operations • Expect no more than 5,000 to 20,000 operations per hour

Data Manipulation –Named Pipes and Flat Files • Named pipes can be faster – no disk I/O • How to make a named pipe: mknod PIPE_NAME p • Can use BCP, GZIP, UNCOMPRESS, or applications to push IQ-M formatted data into named pipe • LOAD TABLE command can read from named pipe • Can also fast extract data to named pipes so that they can be read by another application or even compressed and stored

Data Manipulation –Partial Width Inserts • Can induce fragmentation and overly large space consumption if not watched • Row ID’s from deleted data are not reused during a partial width insert operation • Space from delete data is not reused (because Row ID’s are not reused) • Partial width inserts are analogous to APPEND_LOAD=‘on’ in terms of Row ID and space behavior • Only becomes a problem if partial width inserts are a way of life for a table

Data Loading –The Deep Fact Table • Many databases have a “deep” table • A growing table with tens of millions of rows • Rows typically ‘rolled out’ and replaced over time • HG indexes are very slow to load/delete • Loading time can steadily increases as rows are added • Solution (with a caveat) • Partition the table (Example – time: day, week, month) • Build a view that is a Union of all the tables

Partitioned t1 t2 t3 Data Loading –Partitioned Fact Table Big Fact Table Accessed by View Create View bigtable as Select * from t1 Union All Select * from t2 Union All Select * from t3

Data Loading – Why Partitioning? • Loads are faster and predictable • x million rows will load consistently • IQ may process the Union All in parallel • As long as cpu resources are available • To roll data out, truncate one table • Truncate table is much faster than Delete • No changes to DDL required • Load new data into the empty (truncated) table

Data Extraction • For fastest data unloads use the TEMP_EXTRACT options • Can unload data in ASCII or BINARY format • Recommend BINARY format as it is faster to reload into IQ-M • Great way to archive portions of the database • Can unload to one or more files, serially • Avoid using ISQL or DBISQL redirection to a file • Much slower than fast unloads

Maintenance Tasks –Database Consistency Checks • The routine to check the database for potential corruptions is SP_IQCHECKDB • There are no run-time options for this command, however the DBCC options control the behavior of the SP_IQCHECKDB command • Recommendations • Run Level 1 DBCC’s every 1 to 4 weeks (will process 1-2 GB per second) • Run Level 2 DBCC’s every 1 to 3 months (will process 50-100 MB per second)

E110 Best Practices for Implementing with Adaptive Server IQ