1 / 32

Database Design (for IQ-M)

3. Database Design (for IQ-M). Introduction. This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely for IQ This is to allow me to add more information directly related to IQ

fifi
Download Presentation

Database Design (for IQ-M)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3 Database Design (for IQ-M)

  2. Introduction • This section has been re-vamped for the 12.4.3 course • I have removed all the design bits that are not absolutely for IQ • This is to allow me to add more information directly related to IQ • So this is not a Data WarehouseDesign or Methodology course, I have colleagues that do it much better then me!

  3. Aggregation • This was a dirty word in IQ-M • Not anymore • It is not sensible to trawl through 10 billion rows of data, when a simple aggregation table can reduce the queried data to 10 thousand rows – or less • Even with IQ-M we can improve performance by as much as 100 times using aggregation correctly

  4. A Statement Just because ASIQ-M is fast does not give us the right to generate slow databases or applications

  5. Primary Key/Foreign Key • IQ-M does not enforce Primary Key/Foreign Key relationships – but it will do soon (12.5 Q2 2002) • The optimiser does use the PK/FK relationship for query planning • Only specify this relationship if the relationship does exist • Incorrect specification can result in query plan errors (performance degradation) and possibly errors • ASA does modify a join that is defined as PK/FK to an ANSI NATURAL join – this can cause problems with orphan rows

  6. Key Specification • In a Data Warehouse the production key is not, generally, used as the warehouse key • It is more acceptable practice to use a generated key • Make this key an Unsigned INT or BIGINT • This is the absolutely most efficient key datatype in IQ-M

  7. Primary Keys • In IQ-M a Primary Key is an ANSI standard Primary Key • It is UNIQUE • It must not be null • If specified as a table or column constraint then a specialised form of the HG index is created

  8. Foreign Keys • Always generate an HG index on a Foreign Key • If the relationship is 1:1 then generate the Foreign Key column as a UNIQUE • This will force auto generation of a unique HG index • Again try to specify join columns as Unsigned INT or BIGINT

  9. A digression on Datatypes • There are some very important issues concerning datatypes • We have discussed the actions of the indexes – there are areas where an index can be forced to run slowly if the datatype is specified wrongly • Always consider the requirements for the datatype • In correct datatype specification is as bad as incorrect index selection

  10. Signed vs. Unsigned • If you don’t need signed data in an int or bigint – use UNSIGNED • This will speed up the accessing of the HNG index sometimes doubling the performance • HNG stores negative data as 1s complement • This means SUM() AVG() etc. run quickly • But range checks require another set of scans • If we stored as 2s complement then • Range checks would run with 1 scan • But SUM() AVG() would be slower!!

  11. Other Datatype Issues • Signed vs. Unsigned does not affect the other indexes to any great degree • But… • The selection of datatypes does • We have already discussed keys but some other areas are worth commenting on…

  12. Long Varchar() - 1 • A long varchar() is defined as a varchar() with a length greater than 255. • If you can avoid this please try to • Only FP and word index index is allowed • No enumerated indexes or HNG • We have seen a number of customers who use varchar(1024) as Primary Keys • please DO NOT DO THIS!!

  13. Long Varchar() - 2 • Long varchar() are stored as 256 byte chunks, so using 4 bytes in a varchar(32000) only uses 256 bytes • By default these 256 byte chunks are memset (set to zeros to improve compression) • There is an upgrade option to memset existing 12.4.0 varchar() – this is worth doing, if you have the time!

  14. Char() vs. Varchar() • Always, if you can use char() • Generally this will improve performance, at the modest cost of storing some small number of extra bytes • Query performance on retrieval of char() vs. Varchar() indicates that there can be a 2-3% performance hit per column, and we have seen 10% degradation on single columns

  15. Memset - 1 Initialize_Memory_To_Ones_On_Allocation • Some slides ago I mentioned memset • This is a function of IQ that allows varchar() – and other objects – to have their unused portions set to one on initialization • This will improve the performance of the compression algorithms (usually) • It may possible slow down loads – but I have not found this to be the case

  16. Memset - 2 • To memset existing varchar() when upgrading to 12.4.2 set the following option Convert_Varchar_To_1242 • To bring the HG indexes to 12.4.2 standard when upgrading a pre 12.4.2 database set the option Convert_HG_To_1242 • In the 12.4.3 upgrade, both these options happen by default

  17. Float, Real and Double • Unless you really need them – please do not use • FLOAT • REAL • DOUBLE • They can only have Flat FP indexes – no others • The do not store “exact” values – only approximate • Please try to use • NUMERIC • DECIMAL

  18. NUMERIC and DECIMAL • Numeric and Decimal are aliases of each other • Any numeric or decimal with a precision of less than 12 will be stored as an INT (with conversions) • Any numeric or decimal with a precision of between 12 and 20 will be stored as a BIGINT (with conversions)

  19. Join Columns • You must generate the database schema with the table join columns having the same datatype. • INT, UINT and BIGINT are best, but the column datatypes for each join must be the same • Conversion cost is horrendous

  20. Case and Collation Sequences • In terms of RAW performance the fastest IQ database is one where CASE is set to RESPECT and the collation sequence is BINARY (ISO_bineng) • This is probably not suitable for the general application of the database or warehouse server • CASE set to IGNORE is the next fastest, then changes in the collation sequence • The performance hits can be quite high (around 10-20% - we think!)

  21. String Searches • String Searches such as substr(1,3,col_name) are really very slow, they rely on FP searches • With low cardinality (1 and 2 byte FP) data the search is faster, but this can still be a restriction • Create a new column which is the first 3 characters of the col_name column, then search on this • This way there is no function call, so no projection, so the optimiser can use a fast index LF or HG (or if it is a range query an HNG)

  22. Telephone Numbers • A classic example of the above is the telephone number +1-301-896-1733 +1 -> Country Code 301 -> Area Code 896 -> Sub Area Code 1733 -> Local Number • Make this 4 columns (actually 5 - the whole number), then searches use fast indexes

  23. Date time • As with telephone numbers, try storing a data time as as series of columns (or a dimension table) • Try creating columns DD MM YY HH MM SS DoWeek DoYear Quarter etc.

  24. Date vs. Datetime • A slightly better solution to the above can be considered in the light of the 1 and 2 byte FP indexes • Try storing the date part of a datetime as a date and the time part as hh mm ss • So: Datetime -> date_col, hh_col, mm_col, ss_col

  25. Loading Dates • There is NO default date or datetime format for loads into IQ • The format must be explicitly set for the load/insert to get the best performance • However some formats are conversion enhanced

  26. DD/MM/YYYY DD.MM.YYYY DD-MM-YYYY HH:NN:SS HHNNSS HH:NN:SS.S HH:NN:SS.SS HH:NN:SS.SSS HH:NN:SS.SSSS HH:NN:SS.SSSSS YYYY-MM-DD HH:NN:SS YYYYMMDD HHNNSS YYYY-MM-DD HH:NN:SS YYYY-MM-DD HH:NN:SS.S YYYY-MM-DD HH:NN:SS.SS YYYY-MM-DD HH:NN:SS.SSS YYYY-MM-DD HH:NN:SS.SSSS YYYY-MM-DD HH:NN:SS.SSSSS Enhanced Conversion formats

  27. Date Load • So it is better to use Col1 DATE(‘YYYY-MM-DD’) • than Col1 ASCII(10) • The performance enhancement can be as much as a 100 fold speed up in loads (for small tables)

  28. UNION • In IQ-M 12.4.3 the UNION clause has very few disadvantages • Generally UNIONs are all processed in parallel • so if you have a low user count they work well • Also the delete question now can be solved • Do not use DISTINCT in the UNION clause, or in the SELECT statement

  29. UNION and Delete • If you are storing a fixed (in time) amount of data e.g.. 6 months • Then every month you delete 1/6th of the data in the table • This is expensive • It is better to split the fact table into 6 x one month tables • At the end of the month you truncate the oldest table • And possibly rename the table sets • Remember for Multiplex table rename is DDL and hence can only be done in simplex mode!

  30. What is wrong Here ? SELECT col1, col2 FROM table1, table2, table3 WHERE table1.col1 < table2.col1 AND table1.col1 > table3.col1

  31. Cartesian Joins • These are expensive – they involve the join of every row in one table to every row in a second table. • Table A 1,000,000 rows • Table B 100,000 rows • Worktable 100,000,000,000 rows Select * from T, R where T.a = 10 Cartesian Select * from T, R where T.a between R.b and T.b Cartesian Select * from T, R where ABS(T.a * R.b) = T.b Cartesian • But Select * from T, R where ABS(T.a * T.b) = R.bNot Cartesian

  32. Database Design - End

More Related