450 likes | 811 Views
Pete Jones Manager, Data & Applications Mgt Atradius. Presentation Agenda : Introductions. What ever happened to Jon Dartnell? Brief overview of Atradius. Implementing Oracle Cost Based Optimiser at Atradius. Next Oracle objectives.
E N D
Presentation Agenda : • Introductions. What ever happened to Jon Dartnell? • Brief overview of Atradius. • Implementing Oracle Cost Based Optimiser at Atradius. • Next Oracle objectives. • Oracle 9i Initial Impressions and… a few thoughts on problem • solving. (Paul Mansfield).
Introductions Robin Burge, Manager, Central Systems Infrastructure (CSI) Paul Mansfield, Senior z/Os Oracle Database Engineer (CSI) Pete Jones, Manager, Data & Applications Mgt. (D&AM) ATRADIUS: formerly GERLINGNCM, previously NCM.
Brief Overview of Atradius Main Datacentre: UK, Wales, Cardiff - 160 miles directly East of London, England. UK – houses the Datacentre and has the preponderance of IT resources for Atradius. UK is a Branch office (ITS plus business approx. 560 staff) Headquarters: Netherlands, Amsterdam. (ITS plus business approx. 640 staff) Other Europe: Cologne (Germany), Namur (Belgium), Paris (France), Italy (Rome), Nordic countries (Denmark, Sweden, Norway, Finland) Poland (other E. European countries). Rest of the World: USA, Canada, Mexico, Australia, New Zealand, Japan, China, India Total Staff: 3,500 (310 IT Staff). Main Lines of Business: Credit Insurance administered through a Group Wide System, Credit Management, Bonding, Factoring. Financials: Number two in the world credit insurance market with a total turnover of €1.3 billion, a 25% market share worldwide.
Technical Overview Platforms: • IBM z900 Series, 2064/2C5 (4 LPARS - 5 CPU’s, 10gb memory) • 12 Terabytes EMC , IBM 3494 Tape library • IFL – ready for VM/Linux • AIX – 1 x P690 (8 LPARS - 26 CPU’s, 81gb memory), 8 x P610’s . • Intel/Linux – New this year! http://www.atradius.com
Internal Users ‘SYMPHONY’ – Group System 10+yrs old! Oracle Forms 6 - access via Citrix, otherwise Forms on Desktop. dB Oracle 8.7.1.4.50 z/OS (size 207gb) Oracle Financials v11.5.8 Other systems supporting Symphony: Imaging system (Filenet) Cobol Batch (TWS8.2) Customer Reports dB (Oracle – AIX ) Pricing dB (Oracle - AIX) DWH (Oracle - AIX). Many Test environments (x16 full volume on z/OS) Core Business Applications
External Customers ‘SERV@NET’ - ‘http://www.atradius.com/serv@net’ IBM Apache and WebSphere v4.2 CTG, CICS Same Backend dB as Symphony Java development - outsourced (THBS) Core Business Applications
INTEGRATION, INTEGRATION, INTEGRATION, INTEGRATION, INTEGRATION… which has prevented us from do anything strategic with Oracle. But we did manage to squeeze in something…. Implemented Oracle Cost Base Optimiser (CBO). What have we achieved in the past year?
1. RBO is no longer supported under 10g. 2. Although it is not a physical prerequisite for migrating to 9i, it would be a logical and common sense move to migrate to CBO first and get it stable. 3. There are a host of CBO features that we cannot use or properly use and these are starting to rear their heads as Symphony is developed. SOLD! Do we have a Choice? that was the easy bit…… Why CBO for Atradius?
PRE Project planning: 1. Not an easy project! Goal was 3 months to do it in. 2. Lack of resource available right from the beginning (QA and developers) – Really needed more attention and commitment! 3. No clear strategy on how to implement CBO e.g. % of table to be analysed – there always conflicting information. 4. DBA Research! Test, analyse results and more research. Implemented CBO by ‘hook or by crook’ - a managers view…
Implement CBO by ‘hook or by crook’ a managers view… Some challenges we faced: 1. Impacts on testing for on-going projects – when to cut them over to CBO. 2. Impact on the release management process – weekly changes, when do we need to re-analyse a Table/index… how to implement that in Test > Production. 3. How to manage CBO once in production. 4. Impact on performance! No firm idea until we were in production. 5. Contingency! we missed the first implementation date.
Production preparation tips: 1. Finally received acceptance 6 months later (3 months late) – expect problems in Production! 2. For Week 1&2 ensure plenty of resource to react to production problems. 3. Communication and problem management - manage user expectations. 4. Implementation Timing: Choose a ‘quiet’ week e.g. school holiday, so system would be quieter by default. 5. A weekend implementation and perform as MUCH LIVE WORK as you can on that weekend. 6. Back-out plan!! Test it well first – requirement minimum down time… 7. As a manager book your vacation and get out of the country! ;0) Implement CBO by ‘hook or by crook’ a managers view…
Post Implementation First Day: Top SQL statements doing full table scans from disk – CPU up 100%. System was holding out! Skiing vacation was excellent! (Paul will walk through some of the issues in more detail). Week 1: Very bad statements were fixed and going into second week CPU had dropped down, but still an increase of 25%. Week 2: Skiing over :( and a month later we are still stable and managing the remaining/new performance issues. Implement CBO by ‘hook or by crook’ a managers view…
Conclusion: Overall not impressed and disappointed! Performance – yet to see any performance benefits. DBA’s tell me its more difficult to manage e.g. DB reorgs more complicated and always risks that execution paths will change after a reorg – more performance testing, overhead on changes. Two months on and we still need to reduce our CPU! Still need to remove rule hints, which were used to quickly fix CBO performance problems. Still a steep learning curve for all my DBA’s. CONCLUSION: A successful Project… CBO by hook or by crook… Implement CBO by ‘hook or by crook’ a managers view…
Upgrade to 9.2 – planning under way – Paul has some initial technical feedback on this. Web Forms – Proof Of Concept - strategic option is to go to a J2EE application, but have to take Web Forms as first step! Worrying factor is how a generated ‘web’ application will perform? Linux/VM – We are looking to push this forward starting with our web Infrastructure (Serv@net). Robin & I have yet to agree time scales for running Oracle on this platform. 24 by 6. Oracle Financials upgrade to 11.5.9 ……………….. what next ?
Oracle 9i Initial Impressions and… a few thoughts on problem solving Author: Paul Mansfield, IT Services Date: March 24, 2004
A few gripes before I start • CBO under 8i is pretty bad - hope it’s better under 9i • A lot of the 9i installation documentation is generic and it’s very confusing to know which bits apply to Z/OS • Patches very fragmented and...why do we end up with everything when we just selected a subset on the initial install • The ‘new’ route of going thru a middle level agent before speaking to mainframe support isn’t working - it just delays things
9i Initial Thoughts • We wanted to run 8.1.7.4 and 9.2.0.4 in same LPAR with a minimal number of IPLs during cutover and we wanted to keep the same service names. • Changing TYPE= in your SERVICE definitions • ( NET8 to NET and ORA8 to ORA) requires an IPL. • Documentation says you MUST change the TYPE= • but not true under 9.2.0.4. • Can get away with one IPL.
Oracle 8 and 9 services can run alongside each other under the same SSN but must use the new 9i LLA modules DEFINE SERVICE ORAB DESC('Oracle ORAB Database') - TYPE(ORA) PROC(ORAORAB) - MAXAS(1) - PARM('ORACLE.V9R2M0.PARMLIB(O9PARM)') DEFINE SERVICE NET9 PROCEDURE(ORANET9) TYPE(NET) - DESCRIPTION('NET9 V9.2.0 OSDI TEST') - PARM('HPNS PORT(1551) OSUSER - DUMP(ORACLE.NET9.DUMP.OUTPUT)') DEFINE SERVICE ORAX PROCEDURE(ORAORAX) TYPE(ORA8) - DESCRIPTION('ORACLE DB V8.1.7 ODSI ORAX TEST') - MAXAS(2) - PARM('ORACLE.V8R1M7.PARMLIB(O8PARM)') DEFINE SERVICE NET8 PROCEDURE(ORANET8) TYPE(NET8) - DESCRIPTION('NET8 V8.1.7 OSDI TEST') - PARM('HPNS PORT(1549) OSUSER - DUMP(ORACLE.NET8.DUMP2.OUTPUT)')
You could try writing a program to issue commands to switch from an 8i service to a 9i service //ORABSWAP JOB (0000,OR),'ORACLE SWAP',CLASS=A,// MSGCLASS=X,PRTY=15,MSGLEVEL=(1,1),NOTIFY=&SYSUID//STC EXEC PGM=COMMAND,PARM='OSDI STOP ORAB'//WAIT EXEC PGM=WAIT,PARM='10'//STC EXEC PGM=COMMAND,// PARM='OSDI ALTER SERVICE ORAB PROC(ORAB9204)'//STC EXEC PGM=COMMAND,// PARM='OSDI ALTER SERVICE ORAB PARM('ORACLE.V9R2M0.PARMLIB(O9PARM)')'//WAIT EXEC PGM=WAIT,PARM='10'//STC EXEC PGM=COMMAND,PARM='OSDI START ORAB'// (sample kindly provided by Oracle Support)
A few changes we have to make to old jobs/procs etc • No more SVRMGRL • Audit records can no longer be cut to SMF datasets • Construct CONNECT ABC/EDF@Z:ORAB no longer works
Precompilers • In the past we used both COBOL precompilers i.e PROCOB and PROCOB18. • We would like to cutover to one precompiler with 9i but ‘tightening up’ in PROCOB precompiler means program changes are needed. • Must use NODYNAM option and therefore programs should be changed to use CALL WS-VAR rather than CALL “PROG1”. (also applies to v8) • Use PICX=VARCHAR2 option (no longer the default) in precompiler step otherwise tests on fields with trailing blanks will include the blanks in comparisons. (also applies to v8) • Compilations now produce a PM3 object rather than a load module at Linkedit time. These must be stored in a PDSE. You can produce a traditional load module if required. • For the SYSPUNCH DD in the PROCOB step you must specify RECFM=FB as part of the DCB information i.e DCB=(RECFM=FB,LRECL=80,BLKSIZE=800) • All CICS programs have to be recompiled.
Access Manager for CICS • Documentation says you must set distributed_transactions parameter in INITORA. This is an obsolete parameter. • The LIBCLNTS module is massive (25meg) . I found I needed the CICS region size to be 40meg greater than the EDSA size in the SIT to get it to load (why so big?!!). • Some of our programs still had a ‘CONNECT /’ this no longer works.
Upgrade process • Documentation rather frustrating. Seems to assume everyone going from MPM to OSDI. • Couldn’t get upgrade to work with 9.2.0.2. (ORA-03113: end of file on communication channel) Tried using 9.2.0.4 and went very smoothly. • Control Files now larger • Don’t use SGA_MAX_SIZE • DB_CACHE_SIZE instead of DB_BLOCK_BUFFERS, i.e db_block_buffers=500 becomes db_cache_size=2000k • Even at 9.2.0.4 experienced hangs when issuing OSDI STOP commands (oh no! not more patches).
DO YOU EVER FEEL LIKE THIS?! DEVELOPMENT
A few thoughts on problem solving • Some of the tools available: • Statspack or V$ queries • IPCS • LOGMNR • Traces • RMF monitor (enclave resource consumption) • SMF stats
My favourite V$ query SELECT SQL_TEXT, BUFFER_GETS, EXECUTIONS, DECODE(EXECUTIONS,0,1,BUFFER_GETS/EXECUTIONS) BUFF_EXEC, OPTIMIZER_MODE, FIRST_LOAD_TIME, LOADS, DISK_READS, PARSING_USER_ID FROM V£SQLAREA WHERE BUFFER_GETS > 10000 ORDER BY BUFFER_GETS; Look out for any newcomers at the bottom of the heap. We recently found the following:
select procedure_catalog, procedure_schema, procedure_name, parameter_name, ordinal_position, parameter_type, parameter_hasdefault, parameter_default, is_nullable, data_type, character_maximum_length, character_maximum_length character_octet_length, numeric_precision, numeric_scale, description, type_name, type_name, overload from (select null procedure_catalog, owner procedure_schema, decode(package_name,NULL,object_name,package_name||'.'||object_name) procedure_name, decode(position,0,'RETURN_VALUE',nvl(argument_name,chr(0))) parameter_name, position ordinal_position, decode(in_out,'IN',1,'IN/OUT',2,'OUT',decode(argument_name,null,4,3),null) parameter_type, null parameter_hasdefault, null parameter_default, null is_nullable, decode(data_type, 'CHAR', 129, 'NCHAR', 129, 'DATE', 135, 'FLOAT', 139, 'LONG', 129, 'LONG RAW’ , 128, 'NUMBER', 139, 'RAW', 128, 'ROWID', 129, 'VARCHAR2', 129, 'NVARCHAR2', 129, 13) data_type, decode(data_type, 'CHAR', decode(data_length, null, 2000, data_length), 'LONG', 2147483647, 'LONG RAW', 2147483647, 'ROWID', 18, 'RAW', decode(data_length, null,2000, data_length), 'VARCHAR2', decode(data_length, null, 4000, data_length), 'DATE', null, 'FLOAT', null, 'NUMBER', null, null) character_maximum_length, decode(data_type, 'DATE', 19, 'FLOAT', 15, 'NUMBER', decode(data_precision, null, 0, data_precision), 'CHAR', null, 'NCHAR', null, 'LONG', null, 'LONG RAW', null, 'RAW', null, 'VARCHAR2', null, 'NVARCHAR2', null, null) numeric_precision, decode(data_type, 'DATE', 0, 'NUMBER', decode(data_scale, null, 0, data_scale), 'CHAR', null, 'NCHAR', null, 'FLOAT', null, 'LONG', null, 'LONG RAW', null, 'RAW', null, 'VARCHAR2', null, 'NVARCHAR2', null,null) numeric_scale, null description, data_type type_name, overload from all_arguments where data_level = 0 and data_type is not null) procedure_parameters where procedure_name = 'GS_INSERT_SEARCH_RESULTS_SP' order by 1,2,3,5 …..GENERATED BY THIRD PARTY VB APPLICATION!!
Just a thought • How about 2 new columns in V$SQLAREA • CPU_TIME • ELAPSED_TIME • I’m impressed - they are there in Oracle 9 (in microseconds) • Well done Oracle • Tells me how costly a statement is without using trace • --------------------------------------------------------------------------- • BUFFER_GETS EXECUTIONS BUFF_EXEC OPTIMIZER_MODE CPU TIME ELAPSED TIME • ----------- ---------- ---------- ------------------------- --------------- • UPDATE TBOR_NON_NCM_ORGANISATIONS SET D_ORCCT_ST_TOT_AMT=:b1 WHERE ID = :b2 • 3405291 67349 50.5618643 CHOOSE 12.46 56.52
IPCS (interactive problem control facility) Ask Oracle if you can have a set of their IPCS routines to run on your machine. When a DUMP occurs the first thing you want to know is who caused it. If you can find this out 5 minutes after the dump occurs and can contact the user involved they may actually remember what they were doing. Meanwhile you can be FTPing your 2 gig dump to Oracle Support.
IPCS (contd) use the MAPMIRS command ------------------------- IPCS Subcommand Entry ------------------------------- Enter a free-form IPCS subcommand or a CLIST or REXX exec invocation below: ===> mapmirs ----------------------- IPCS Subcommands and Abbreviations -------------------- ADDDUMP | DROPDUMP, DROPD | LISTMAP, LMAP | RUNCHAIN, RUNC ANALYZE | DROPMAP, DROPM | LISTSYM, LSYM | SCAN ARCHECK | DROPSYM, DROPS | LISTUCB, LISTU | SELECT ASCBEXIT, ASCBX | EQUATE, EQU, EQ | LITERAL | SETDEF, SETD ASMCHECK, ASMK | FIND, F | LPAMAP | STACK CBFORMAT, CBF | FINDMOD, FMOD | MERGE | STATUS, ST CBSTAT | FINDUCB, FINDU | NAME | SUMMARY, SUMM CLOSE | GTFTRACE, GTF | NAMETOKN | SYSTRACE COPYDDIR | INTEGER | NOTE, N | TCBEXIT, TCBX COPYDUMP | IPCS HELP, H | OPEN | VERBEXIT, VERBX COPYTRC | LIST, L | PROFILE, PROF | WHERE, W CTRACE | LISTDUMP, LDMP | RENUM, REN |
IPCS (contd) locate the entry with STAT=08xxxxxx MIRS=7EEE39D0 KEY=00800048 8388680 STAT=0202000A GATE=00000000 ATHP=0F8CB5AC >>JOBN=ORANET8 OUSR=OSUSR USRN= TERM= PGMN=MINBNDS SAVE=1D7CEB90 CEET=1D7C32C8 CSTO=0009E218 HWST=0009E218 ASID=005C HASN=0000 MIRS=7EEBE9D0 KEY=00760046 7733318 STAT=0202000A GATE=00000000 ATHP=0F1765AC >>JOBN=ORANET8 OUSR=OSUSR USRN= TERM= PGMN=MINBNDS SAVE=1D5A1B90 CEET=18AFD2C8 CSTO=0017B818 HWST=00186588 ASID=005C HASN=0000 MIRS=7EED01D0 KEY=00480042 4718658 STAT=08020002 GATE=FFEE0000 ATHP=0F8CEB2C >>JOBN=ORANET8 OUSR=OSUSR USRN= TERM= PGMN=MINBNDS SAVE=00000000 CEET=1A7AA2C8 CSTO=00313848 HWST=00000000 ASID=005C HASN=0000 use the MIRS= address 0EED01D0 in the IPCS browse option. Scroll down a page and you will hopefully see ............ORAN ET8 ........ MINBNDS G048 0042GEORGE01
Try writing your own IPCS routine Dump will contain block of stats e.g CPU used, Phys Reads, Log Reads etc for each user and pointer to SQL executed Sample output from 8.1.7.4 dump analysis below: >>USRID= ORAWPI5 SID= 9 LOGRD= 51208 CPU(MS)= 890 PHYRD= 12525 USER ROLLB= 0 SQL= BEGIN:b1:=DBMS_PIPE.RECEIVE_MESSAGE('BU_AGENCY_SEARCH');IF:b1=0 THEN:b2:=D >>USRID= ORAWPI6 SID= 10 LOGRD= 1406 CPU(MS)= 181 PHYRD= 179 USER ROLLB= 0 SQL= BEGIN:b1:=DBMS_PIPE.RECEIVE_MESSAGE('bu_get_is_buyer_data');IF:b1=0 THEN:b >>USRID= GLBFATW SID= 11 LOGRD= 0 CPU(MS)= 0 PHYRD= 0 USER ROLLB= 0 SQL= SELECT NGBTL.ID,NGBTL.CLA_BUCLT_ID,NGBTL.ORNNN_ID,NGBTL.WF_EVENT_TYPE FROM >>USRID= CFTELS1 SID= 12 LOGRD= 2904 CPU(MS)= 21 PHYRD= 162 USER ROLLB= 0 SQL= >>USRID= GLBFATW SID= 13 LOGRD= 0 CPU(MS)= 0 PHYRD= 0 USER ROLLB= 0 SQL= DELETE FROM TBEA_BATCH_CHECKPOINT_DATA WHERE(EABSM_PROCESS_CODE=:b1 AND EA >>USRID= SYS SID= 14 LOGRD= 0 CPU(MS)= 0 PHYRD= 0 USER ROLLB= 0 SQL= SELECT TO_CHAR(SYSDATE,'YYDDDHH24MISS'),TO_CHAR(SYSDATE,'DDMMYYYY'),TO_CHA ORABUP0 HI LOGR = 124717277 ORABUP0 HI PHYS READS = 6160749 ORABUP0 HI CPU = 219262 ZZNDN03 HI USER ROLLBACKS = 198
Log Miner • Copy log before use • Run queries against clone of our Prod system • Doesn’t work well with more than one large log • Useful not just as an audit trail but also as a summary of who has been doing all the updates
Log Miner (contd) sample1 connect / as sysdba; execute sys.dbms_logmnr.add_logfile(- logfilename => '/dsn/CFCICS1.BRINLOG.VSAM', - options => sys.dbms_logmnr.new); exec sys.dbms_logmnr.start_logmnr ( - dictfilename => '/oracle/logs/ORAZ/dictionary.ora', - STARTTIME => to_date('09-Oct-2003 13:25:00', 'DD-MON-YYYY HH24:MI:SS'), ENDTIME => to_date('09-Oct-2003 13:28:05', 'DD-MON-YYYY HH24:MI:SS')); SET HEADING OFF SET PAGESIZE 0 set arraysize 1 select username, session#, sql_undo, sql_redo from v£logmnr_contents; exec sys.dbms_logmnr.end_logmnr ( );
Log Miner (contd) sample 2 execute sys.dbms_logmnr.add_logfile(- logfilename => '/dsn/CFCICS1.BRINLOG.A0017639.LOG', - options => sys.dbms_logmnr.new); exec sys.dbms_logmnr.start_logmnr ( - dictfilename => '/oracle/logs/ORAZ/dictionary.ora'); SET PAGESIZE 0 SELECT USERNAME, SESSION#, OPERATION, COUNT(*) FROM V£LOGMNR_CONTENTS GROUP BY USERNAME, SESSION#, OPERATION; exec sys.dbms_logmnr.end_logmnr ( );
Tracing Logon Triggers CREATE OR REPLACE TRIGGER sys.LOG_TRIGGER AFTER LOGON ON DATABASE begin if user = 'ORAF012' or user = 'ORAFPI3' or user = 'ORAFPIP' then execute immediate 'alter session set sql_Trace=true'; end if; end;
Tracing (contd) Logon Trigger to show bind variables (beware abends) CREATE OR REPLACE TRIGGER LOG_TRIGGER_bind AFTER LOGON ON DATABASE begin if user = 'ORABPIP' or user = 'ORAB001' then execute immediate 'alter session set events ''' || '10046 trace name context forever, level 4'''; end if; end; /
Tracing (contd) Issues with EXPLAIN PLAN when using CBO Why does explain plan in the trace show different path to explain plan when run against the offending statement? I still want to close my traces - bring back CLOSETRACE command. If I extend into another trace dataset does first one close?
RMF Monitoring • RMFWDM Overview • Enclave Resource Consumption • Enclave Attribute CLS/GRP P Goal % D X EAppl% TCPU USG DLY IDL • *SUMMARY 8.468 • ENC00079 ORACLEHI 1 75 0.866 1.051 13 1.7 70 • ENC00051 ORACLEHI 1 75 0.450 0.772 9.0 2.0 20 • ENC00068 ORACLEHI 1 75 0.371 6.060 2.1 2.1 76 • ENC00072 ORACLEHI 1 75 0.268 6.490 4.1 0.0 86 • ENC00024 ORACLEHI 1 75 0.237 4.803 6.0 1.2 76 • Userid . . . . . . . . . . : GBSTHO3 • Subsystem Type: OSDI Owner: ORANET8 System: SYS1 • Can look back in time • Sometimes sessions running in a tight loop in Oracle do not show updates to CPU used figure in V$Sesstat
SMF Statistics • Some figures still a bit buggy but still can be very useful. • Standard report provided. • We extract top ten resource using sessions during the day. Summary of Oracle resource usage sent to IT managers • Any internal users using more than 250 secs CPU in one day receive an e-mail asking what they were doing. • Any developers found in production also get a mail.