250 likes | 412 Views
Oracle Wait Interface - 911. SURENDER SARA NCOAUG Email : SURENDER.SARA@ORABYTE.COM SURENDER.SARA@SERACONSULTING.US. Performance Methodology Goals. Always Consistent and Repeatable Results Cost Effective Corporate Wide Deployment Customizable Always Accurate and becomes a Standard .
E N D
Oracle Wait Interface - 911 SURENDER SARA NCOAUG Email : SURENDER.SARA@ORABYTE.COM SURENDER.SARA@SERACONSULTING.US
Performance Methodology Goals • Always Consistent and Repeatable Results • Cost Effective • Corporate Wide Deployment • Customizable • Always Accurate and becomes a Standard
Performance Tuning Categories • When we don’t know anything about the issue and need to zero in to a narrow time window and eventually to the set of problem statements. • We know the timings window when the problem seem to occur but not the problem. • We know what specific problem we need to resolve (end user response time perspective), users know what needs to be tuned – BEST START ! • Meeting ongoing performance SLA • Building Performance History ( STATSPACK like approach but more intelligent and specific )
Performance Tuning Goals (1) • To be able to answer questions about sudden slowness, localized or generalized when we receive a help desk call or someone walks in or we get a page • Ability to compare hourly performance horizontally, this will enable us to answer “ WHY ARE WE SLOW TODAY” type questions compared to ‘YESTERDAY WAS FINE” • Need to build historical performance data to compare the fluctuations vertically ( today) and horizontally ( same time frame but different days )
Performance Tuning Goals (2) • Look for signatures and patterns over time and to know what to expect in response time and when ( PROACTIVE GOAL ) • Baseline for tuning efforts • Translate the data into meaningful capacity planning needs • Quick, few clicks and the problem is exposed
Best Strategy, Always On! • The ideal way to collect data is “ALWAYS ON” strategy. This can be accomplished in two ways • 10046 always on • Init.ora parameter • Database logon trigger • Sampling v$session_wait every second • Direct SGA attach method where we can sample the data without using database resources can be the best way to accomplish this goal ( need c, pro*c expertise)
Issues with Best Strategy - 10046 • 10046 • Not possible to leave it on always, will have negative effect on your performance • Will fill your space in no time on a highly active system ( USER_DUMP_DEST )
Issues with Best Strategy – Sampling v$session_wait • If sampling is done per second on this table via direct attached SGA method that method will work as long as the data store is not in the same spot as monitored database
What to do - 10046 ? • 10046 • Use this for tracing programs once they are identified as a bottleneck. • Use this to trace all sessions when problematic time window is defined for ex. Put a database logon trigger at 9 am and stop at 10 am. We will get trace for all the users at that time interval • Use this to trace all given user connection for already active sessions ( will start tracing inactive sessions also when they become active) using PLSQL code • Have to be prudent to trace specific programs manually by eliminating the START and END empty wait times • EMAIL SURENDER.SARA@veritiesllc.com for the tracing paper
What to do – v$session_wait ? • Sample data every X seconds – 30 to 60 seconds recommended • We are after big bottlenecks during the first phase • This way we meet the goal of always on for all sessions but not creating problems of our own
Fitting It together - GOALS! • Need something that will capture what each session did in terms of WAITS and RESOURCES used before they disconnect • Need something to sample every session’s all the time • Meets our goal “always on” but is lightweight • Provide historical data • Provide a quick way to catch big issues every time • Cost effective and corporate standard
Data Collection • Collect Intelligent Data based on IO and Memory Related waits • Build holding tables, one for IO and couple for memory related waits • Build ssdba_extents and ssdba_segments tables and build indexes on them, depending on the segment changes may have to refresh them before running the diagnosis script • Sample data every 30 seconds to 1 minute from v$session_wait into holding tables, this is changeable. • Create logoff trigger to save v$session_event and sesstat statistics per user into logoff holding tables • Memory Related waits need to capture the problematic constructs at the time they occur • IO related waits can be taken care of using holding tables ( direct IO read/write can be an exception)
Logoff trigger • Creating logoff stats table in the same database that is being monitored • Keep 5-7 days data at minimum • Create one table for events and one for statistics
Loggoff Event table • drop table ssdba.ssdba_logoff_event_t; • create table ssdba.ssdba_logoff_event_t as • select • a.SID , • a.EVENT , • a.TOTAL_WAITS , • a.TOTAL_TIMEOUTS , • a.TIME_WAITED , • a.AVERAGE_WAIT , • a.MAX_WAIT , • sysdate as logoff_time , • b.SERIAL# , • b.PADDR , • b.USERNAME , • b.OSUSER , • b.PROCESS , • b.MACHINE , • b.TERMINAL , • b.TYPE , • b.logon_time • from • v$session b , • v$session_event a • where • 1=2;
Loggoff Stats table • drop table ssdba.ssdba_logoff_stats_t; • create table ssdba.ssdba_logoff_stats_t as • select • c.username , • c.osuser, • a.sid,c.serial#, • c.paddr, • c.process, • c.logon_time , • a.statistic# , • b.name, • a.value, • sysdate as logoff_time • from v$session c, v$sesstat a, v$statname b • where 1=2;
ssdba_io_dump (IO Capture) • HOUR NUMBER • DAY NUMBER • MONTH NUMBER • YEAR NUMBER • SNAP_DT DATE • SID NUMBER • SEQ# NUMBER • EVENT VARCHAR2(64) • P1TEXT VARCHAR2(64) • P1 NUMBER • P1RAW RAW(4) • P2TEXT VARCHAR2(64) • P2 NUMBER • P2RAW RAW(4) • P3TEXT VARCHAR2(64) • P3 NUMBER • P3RAW RAW(4) • WAIT_TIME NUMBER • SECONDS_IN_WAIT NUMBER • STATE VARCHAR2(19) • SERIAL# NUMBER • USERNAME VARCHAR2(30) • OSUSER VARCHAR2(30) • PADDR RAW(4) • LOGON_TIME DATE • PROCESS VARCHAR2(9) • SQL_HASH_VALUE NUMBER • SADDR RAW(4) • MODULE VARCHAR2(48) • ROW_WAIT_OBJ# NUMBER • ROW_WAIT_FILE# NUMBER • ROW_WAIT_BLOCK# NUMBER • ROW_WAIT_ROW# NUMBER
ssdba_library_dump • HOUR NUMBER • DAY NUMBER • MONTH NUMBER • YEAR NUMBER • SNAP_DT DATE • SID NUMBER • SERIAL# NUMBER • USERNAME VARCHAR2(30) • PADDR RAW(4) • LOGON_TIME DATE • SQL_HASH_VALUE NUMBER • KGLPNMOD NUMBER • KGLNAOBJ VARCHAR2(1000)
ssdba_buffer_dump • HOUR NUMBER • DAY NUMBER • MONTH NUMBER • YEAR NUMBER • SNAP_DT DATE • HLADDR RAW(4) • FILE# NUMBER • DBABLK NUMBER • TCH NUMBER • OBJ NUMBER • OBJECT_NAME VARCHAR2(128)
ssdba_sqltext_dump • HOUR NUMBER • DAY NUMBER • MONTH NUMBER • YEAR NUMBER • SNAP_DT DATE • HASH_VALUE NUMBER • ADDRESS RAW(4) • PIECE NUMBER • SQL_TEXT VARCHAR2(64)
FILE-BLOCK Related Waits • buffer busy due to global cache file# block# id • buffer busy waits file# block# id • buffer read retry file# block# not used • control file parallel write files blocks requests • control file sequential read file# block# Blocks • control file single write file# block# Blocks • conversion file read files block# Blocks • db file parallel read files blocks requests • db file parallel write files blocks requests • db file scattered read file# block# Blocks • db file sequential read file# block# Blocks • db file single write file# block# Blocks • direct path write file number first dba block cnt • enqueue name|mode id1 id2 • free buffer waits file# block# set-id# • write complete waits file# block# id
Memory Related Waits, need to capture wait as well as the details • library cache pin • Shared Pool Latch • Hot Blocks
Q&A regarding this Approach – Distorting the event by merely looking at it! • Are we aggravating the issue? No – To prove that lets sample v$session_wait for 10 minutes at per minute sample see the resource used • Test was done to take snapshot of sesstat and system_event for this activity • execute snap_events_start • execute snap_latch_start • set autotrace on explain statistics • declare • pp number; • begin • for i in 1..10 • loop • select count(*) into pp from v$session_wait ; • dbms_lock.sleep(60); • end loop; • end; • / • set autotrace off • execute snap_latch.end_snap • execute snap_events.end_snap
Reports • Will tell you exactly what you waited for and what caused the wait ! • Fry the big fish in days ! • Resort to 10046 once the TOP Bad code is found ! • Do typical performance fixes for events discovered. • Analyzing wait event data will yield a path toward a solution for almost any problem.