Memory – and Changing Data

8 Memory – and Changing Data

Introduction • Memory Sizing and consideration is a vital area for IQ performance • Get it wrong and your server(s) will run like a three legged dog! • Get it correct and you will achieve all that IQ is capable of • Tied to memory is how IQ loads and alters data – hence the reason that this is in this section

Catalog Page Size • Back in an earlier section we talked about the ASA page size, and the effect this has on the number of columns in an IQ table • To start a server that allows for a large ASA page size you have the –gp option set in the params.cfg file • So for a 32K ASA (catalog) page size: -gp 32768 needs to be in the start-up line • You can then create a database with a 32K Catalog Page Size

Memory • How Much RAM for IQ • As much as possible! • Most systems should have at least 2 GB • Rough rule of thumb is 2 GB per CPU • Don’t forget additional RAM for each load being performed • Shared memory is only needed on 32-bit systems and some 64-bit systems to wire memory into RAM • HP-UX 64-bit should not use –iqsmem

Processors and Disk Controllers • Optimal CPU configuration • Queries: 1 CPU per active query – more if queries are complex and can be run in parallel • Loads: 1 CPU per HG index plus 1 CPU per 2-5 columns being loaded • Disk Controllers • 1 fiber controller per 5 write CPU’s • 1 fiber controller per 10 read CPU’s • In mixed mode operation use the 5:1 ratio • Recommend a minimum of 2 controllers

Memory - 1 • This is the memory picture for an IQ machine: • Allow between 60 and 100 MB for the O/S and ASIQ Server • For a 64-bit system consider up to 10% of the available bRAM for O/S and ASIQ Executable • The bottom three boxes are controlled by the options on the following pages • Do not exceed the virtual (or real) memory limit for the system Operating System ASIQ Server ASIQ Overhead ASIQ Main Cache ASIQ Temp cache

Memory – 2 • Main_Cache_Memory_MB Def: 8 Mb • Used to set the size of the main cache • This must be set correctly • Temp_Cache_Memory_MB Def: 4 MB • Used to set the size of the temp. cache • Load_Memory_MB Def: 0 • Used to limit the “per user” memory used for loads (not inserts) – default is much too big for most systems • All these require a server re-boot as memory is not dynamically re-addressable in IQ-M

Memory – 3 • Virtual Memory limits for IQ operating systems: • SOLARIS (2.6 & 2.7) 3 GB • SOLARIS 2.7 2.8 64 bit 32 GB • Tru 64 27 GB (64 bit !) • AIX (4.2.1, 4.3.1, 4.3.2) 32bit 2.75 GB • AIX 64bit 128 GB • HPUX (11.0) 32bit 2.7 GB • HPUX (11.0) 64bit 64 GB • NT 2 GB (Cache limits) • Windows 2000 (NT5) 4 GB • SGI IRIX 64 128 G

Cache Memory • Ever since IQ 10 (or before) hit the streets we have said • You probably need more Main Cache than Temp. • WRONG ! • Start at 50/50 then move to more Temp. Cache • There are only a very few limited areas where you need more Main than Temp.

Adding Data to IQ - 1 • The fastest way to get data into IQ is to Load from a flat file (10 of thousands of rows per second) • The second fastest is to use the INSERT from LOCATION syntax to load from an existing server (thousands to hundreds of rows per second) • The (almost) slowest method is Replication Server (hundreds of rows per HOUR) • Each has it’s strengths…

Adding Data to IQ - 2 • If a table is very volatile (CDR, CC Trans.) then you should use the LOAD command from flat files to load • If the table changes during the day – but is not highly volatile (products, customers), then use INSERT … LOCATION, this is fast and does not have the complexity of the LOAD • If the table changes very infrequently – but there are a lot of them (decode, dimensions) then use Rep. Server • There is no right answer as to what to use for a load…

Loading • Back a few slides ago we mentioned Load_Memory_MB • I think that it is about time to shed some light on the IQ-M Load process • Even although there is only 1 parameter to affect the load/insert (and delete!) processes, we need to understand it

Loading – The Big Picture Main Cache ASCII Flat File Load Memory Building Simple Index Pages Single Thread Per HNG, LF and FP Single Thread Pass 2 Multi Thread Temp. Cache 1 Sorting Groups Building B-Trees Single Thread Per HG (maybe 2) Row Delimited! Or it is Single Thread Raw IO Buffers Intermediate Buffers 2 Pass 1 1 3 IO Operations to Temp Store indicate Not Enough Temp. Cache Write of Completed HG B-tree G-Arrays and Bitmaps to Main Cache 2 Write of completed HNG, LF and FP Pages to Main IQ Store 3 Main IQ Store Write of completed HG B-trees, G-Arrays and Bitmaps to Main IQ Store Temp. IQ Store

LOAD_MEMORY_MB • If this is set to zero the system can, if required use an UNLIMITED amount of heap memory. • The 500 MB limit for Load_Memory_MB is just a limit for this option. If you have set the option to 0 the system can use whatever it wants… • So if you have limited memory, and fairly wide tables, it is probably a really good idea to set Load_Memory_MB to a number (maybe even 500) to prevent a possible Operating System Out_of_Memory error.

Load Clause • Row Delimited By • Always,Always use this sub-clause • During the data read from disk, before the index creator threads are started the conversion process can be either parallel or serial • Without Row Delimited By the process will always be serial • This will triple the Load Pass1 times

Append Load • By default the system will always attempt to fill “deleted slots” in the table structure • This can dramatically increase the load times as the system has to “search” for the “holes” in the columns (rowids with row_existence off) • By setting the following option the server will “append” to the rowids – and ignore spaces in the table Append_Load Def OFF

Low Fast Load - 1 • LF_Bitmap_Cache_kb Def 4 • By default, during a load, each distinct value for each LF index in a load has a heap that is 4 Kbytes in size • If the distinct count for all LF indexes that are being loaded is greater than 20,000 (the manual says 10,000 – this is too low) then reduce this option as the system will start to use too much memory • Memory usage can be checked using the IQ Monitor • Memory = (LF_Bitmap_Cache_kb * 1024) * column_distinct_count

Low Fast Load - 2 • Remember this is in HEAP memory, not in the caches • So the heap can grow very large • For a Writer Node, this is not too much of a problem, but Simplex units may suffer from this

12.6 Changes – DML 1 • Implicit Conversion for Binary to Bit Data • Non-zero binary values are converted to 1, zero to 0 • ISDATE() and ISNUMERIC() functions • But they are ASA functions – so slow again… • Load Table • Column Delimiters May be Multi-byte (up to 255 bytes) • Load Table can be used for Temporary Tables andGlobal Temp Tables

12.6 Changes – DML 2 • IQ Message File Logging • Load and Delete details are no longer sent to theIQ Message File • Set DDL_Option2 if you want this data • 0 – no messages • 1 – insert only • 2 – delete only • 3 – insert and delete messages

Performance • Delete Performance Improved • At least 25% faster • Under certain circumstances up to 200% faster • Sybase IQ Replication Performance • From 3000 rows per hour (around 1 row per second) to • 300,000 rows per hour (90 rows per second)

Prefetch - 1 • Prefetch_Sort_PercentDef. 20 • Prefetch_Garray_Percent Def. 60 • Garray Percent is the percentage of buffers (allocated to a client) that are used for read-ahead during HG searches (and loads) • Sort Percent is the percentage of user buffers that can be used for sort buckets (over the whole system) This is a temp cache variable • These are the only prefetch options that will/may speed up loading, as the HG is the only index that has a large READ component when loading, and also an HG has a large sort component in Pass 2

Wash Day (?) • Remember the LRU MRU ? • The wash area and sweeper thread numbers can be controlled by • Sweeper_Threads_PercentDef 10 • Wash_Area_Buffers_PercentDef 20 • The threads is the percentage of currently used threads on the server (so this is variable) • The buffer percentage is the percentage of the WHOLE cache that is inside the wash area • Use IQ Monitor to see when these need to changed

Multiplex - Writer Configuration • Everything we have talked about in the course to date, regarding configuration, has been about balancing the servers for read and writer performance • With IQ-M that need has been removed, we can tune each instance of IQ-M for a specific task • The next few slides discuss what we would want to do to generate a write only server

Multiplex - Writer Memory • Three things are important here • The size of the Load Memory • The size of the Main Cache • The size of the Temp Cache • Basically we are only concerned with having enough Main Cache to hold the FP, LF and HNG active pages • We need s much Temp Cache as we can for the HG load • The Load Memory should be pegged as high as we can make it

Loading small as we need Set to Max 500MB Main Cache ASCII Flat File Load Memory Building Simple Index Pages Single Thread Per HNG, LF and FP Single Thread Pass 2 Multi Thread Temp. Cache 1 Sorting Groups Building B-Trees Single Thread Per HG (maybe 2) large as possible Row Delimited! Or it is Single Thread Raw IO Buffers Intermediate Buffers 2 Pass 1 1 3 IO Operations to Temp Store indicate Not Enough Temp. Cache Write of Completed HG B-tree G-Arrays and Bitmaps to Main Cache 2 Write of completed HNG, LF and FP Pages to Main IQ Store 3 Main IQ Store Write of completed HG B-trees, G-Arrays and Bitmaps to Main IQ Store Temp. IQ Store

Multiplex - Writer Main Cache - 1 • The main cache for the writer should be as small as we can get away with • We need space for the following pages/buffers • FP Block Maps, Identity Pages, last pages • LF Block Maps, Identity Pages, Bitmap Headers and last page of each bit map • HNG Block Maps, Identity Pages, Bitmap Headers and last page of each bit map

Multiplex - Writer Main Cache - 2 • Notice there is no space allocated for the HG indexes • The majority of the space requirements for the HG indexes is in Temp Cache • However we will need to read in the B-trees and the G-Arrays for the HG index so don’t be slavish as to the smallness of the Main Cache

Multiplex - Load and User Memory • Do not constrain this – unless you have a small amount of memory let IQ have as much Load Memory as it desires • User Overhead will also be minimal, generally you will be only running one writer task at a time • If you are running multiple table loads simultaneously, then remember each load could take over 500MB of Load Memory

Multiplex - Writer Temp Cache • Whatever is left… • And then check with IQ Monitor to confirm minimal disk activity into the Temp Store

Multiplex - Reader Configuration • There are three areas to consider here • What is the hardware configuration of the writer nodes • Are they the same or different • What is each reader going to run • How are you partitioning the readers • By User ID • By Query Type • Do you have Network Spoofing or Open Switch above the reader nodes?

Load Errors - 1 • When an error is detected in the incoming data stream during a load the default operation of the engine is to roll-back the entire load process. • This is the ANSI standard. • In 12.5 certain data issues during the load phase can be ignored and reported on. • The data errors that can be trapped are: • Uniqueness failures (duplicate rows) • Data Conversion Errors (like alphabetic where only numeric will do!) • Null errors (incoming NULL in a non-Null column) • Foreign Key errors (Referential Integrity Errors)

Load Errors - 2 • For each of these errors a clause in the load process can detail how many errors will be processed before the load is aborted and rolled back (up to an infinite number), and what to do with the row in error.

Load Errors - 3 • LOAD [INTO] TABLE [ owner ].table-name • .. <stuff omitted> • [IGNORE CONSTRAINT constrainttype [, ...]] • [MESSAGE LOG ‘string’ ROW LOG ‘string’ [ONLY LOG logwhat, [...]] • [LOG DELIMITED BY ‘string’] • logwhat: { ALL | NULL | UNIQUE | DATA VALUE | FOREIGN KEY } • constrainttype: { UNIQUE integer | NULL integer |FOREIGN KEY integer | • DATA VALUE integer |ALL integer }

Extracting Data - 1 • In 12.4.2 there are a series of options that can be used to direct the results of a select clause to a file • The following can be set: • The output filename • The type of output (binary [swapped] or ASCII) • Delimiter strings (column and row) • Null handling

Extraction Options - 1 • Temp_Extract_Name (1 thru 8) default blank • Temp_Extract_Size (1 thru 8) default 0 (128GB) • Temp_Extract_Binary default OFF • Temp_Extract_Swap default OFF • Temp_Extract_Column_Delimiter default ‘,’ • Temp_Extract_Row_Delimiter default ‘\n’ • Temp_Extract_Null_As_Zero default OFF • Temp_Extract_Quote default ‘’ • Temp_Extract_Quotes default OFF • Temp_Extract_Quotes_all default OFF • Note: A second select statement will overwrite the first output

Extraction Options - 2 • In 12.5 there is an APPEND option to allow for appending data to existing files • For the append option in the extract options the file must exist. • The append is a by product the primary reason for this is security – to allow the user to set up the file so it is only readable by Sybase and a specific user id. • Fully qualified pathnames can be specified for the location of this file – it does not need to be in the java directory(!)

Memory – and Changing Data

Memory – and Changing Data

Presentation Transcript