350 likes | 478 Views
Chapter 22. Database Troubleshooting. Introduction to Database Troubleshooting. DBAs need to be experts at database troubleshooting. There are a wide variety of topics covered in this chapter: Assessing database availability issues quickly.
E N D
Chapter 22 Database Troubleshooting
Introduction to Database Troubleshooting • DBAs need to be experts at database troubleshooting. • There are a wide variety of topics covered in this chapter: • Assessing database availability issues quickly. • Identifying system performance issues with operating system utilities. • Querying data dictionary views to display resource intensive SQL statements. • Using Oracle performance tools to identify resource consuming SQL statements. • Identifying and resolving locking issues. • Troubleshooting open cursor issues. • Investigating issues with the undo and temporary tablespaces. • Auditing database activities.
Checking Server and Database Availability • The following command establishes: • Is the database up and available • Is the listener up • Is the network up $ sqlplusbarts/l1sa@'dwdb1:1521/dwrep1' • Use ping to see if the box can be connected to over the network • Use telnet to establish server availability on a given port • Use tnsping to troubleshoot Oracle Net connectivity
Investigating Disk Fullness • When first logging onto a box, one issue that will cause a database to hang or have problems is a full mount point. • The df command with the human readable -h switch assists with verifying disk fullness: $ df -h • The du command is also useful for investigating where space is being consumed: • $ du -sh
Locating the Alert Log and Trace Files • Troubleshooting an unavailable database oftentimes begin with inspecting the alert.log. • Here’s a OS function that can help you navigate to the alert.log location: #-----------------------------------------------------------# # cd to bdump function bdump { echo $ORACLE_HOME | grep 11 >/dev/null if [ $? -eq 0 ]; then lower_sid=$(echo $ORACLE_SID | tr '[:upper:]' '[:lower:]') cd $ORACLE_BASE/diag/rdbms/$lower_sid/$ORACLE_SID/trace else cd $ORACLE_BASE/admin/$ORACLE_SID/bdump fi } # bdump#-----------------------------------------------------------#
Removing Files • Linux/Unix environments provide a wide variety of ways to identify the size and age of files. • When troubleshooting, oftentimes it’s useful to find files of over a certain size or age. $ find . -type f -mtime +2 -name "*.trc" • Then to delete files identified by the find command (be very careful when doing this): $ find . -type f -mtime +2 -name "*.trc" | xargsrm
Viewing the Alert Log via OS Tools • After navigating to the directory that contains the alert.log, you can view the most current messages by viewing the end (furthest down) in the file (in other words, the most current messages are written to the end of the file). • To view the last 50 lines, use the tail command: $ tail -50 alert_<SID>.log • You can continuously view the most current entries by using the -f switch: • $ tail -f alert_<SID>.log • You can also directly open the alert.log with an operating system editor (such as vi): $ vi alert_<SID>.log
Viewing the alert.log Using the ADRCIUtility • If you're using Oracle Database 11g or higher, you can use the ADRCI utility to view the contents of the alert.log file. • Run the following command from the operating system to start the ADRCI utility: $ adrci • You should be presented with a prompt: adrci> • Use the SHOW ALERT command to view the alert.log file: adrci> show alert • If there are multiple Oracle homes on the server, then you will be prompted to choose which alert.log you want to view.
Identifying System Bottlenecks Using vmstat • Linux/Unix environments provide a wide variety of command line tools for troubleshooting system bottlenecks. • The vmstat utility displays real-time performance information about processes, memory, paging, disk I/O, and CPU usage. $ vmstat 2 10
Identifying System Bottlenecks Using top • Another tool for identifying resource-intensive processes is the top command. Use this utility to quickly identify which processes are the highest consumers of resources on the server. • By default, top will repetitively refresh (every few seconds) information regarding the most CPU-intensive processes. • $ top
Mapping an Operating System Process to a SQL Statement • Oftentimes when you have multiple databases running on a single server, it’s difficult to pinpoint which database and process may be consuming inordinate amounts of system resources. • In these situations, use ps to identify the process and its associated database. • Then use a SQL statement to identify what the process is doing within the database.
Mapping an Operating System Process to a SQL Statement $ ps -e -o pcpu,pid,user,tty,args | sort -n -k 1 -r | head • Then once the process and associated database are identified, then run a query similar to this one to identify details about the process: select 'USERNAME : ' || s.username|| chr(10) || 'OSUSER : ' || s.osuser || chr(10) || 'PROGRAM : ' || s.program || chr(10) || 'SPID : ' || p.spid || chr(10) || 'SID : ' || s.sid || chr(10) || 'SERIAL# : ' || s.serial# || chr(10) || 'MACHINE : ' || s.machine || chr(10) || 'TERMINAL : ' || s.terminal from v$session s, v$process p where s.paddr = p.addr and p.spid = '&PID_FROM_OS';
Monitoring Real-Time SQL Execution Statistics • The V$SQL_MONITOR view provides near real-time statistics on resource intenisve SQL queries: select * from ( select a.sidsession_id ,a.sql_id ,a.status ,a.cpu_time/1000000 cpu_sec ,a.buffer_gets ,a.disk_reads ,b.sql_textsql_text from v$sql_monitor a ,v$sql b where a.sql_id = b.sql_id order by a.cpu_timedesc) where rownum <=20;
Monitoring Real-Time SQL Execution Statistics • The V$SQL_MONITOR view provides near real-time statistics on resource intenisve SQL queries: select * from ( select a.sidsession_id ,a.sql_id ,a.status ,a.cpu_time/1000000 cpu_sec ,a.buffer_gets ,a.disk_reads ,b.sql_textsql_text from v$sql_monitor a ,v$sql b where a.sql_id = b.sql_id order by a.cpu_timedesc) where rownum <=20;
Displaying Resource Intensive SQL • To display queries currently executing: select * from ( select a.sidsession_id ,a.sql_id ,a.status ,a.cpu_time/1000000 cpu_sec ,a.buffer_gets ,a.disk_reads ,substr(b.sql_text,1,15) sql_text from v$sql_monitor a ,v$sql b where a.sql_id = b.sql_id and a.status='EXECUTING' order by a.disk_readsdesc) where rownum <=20;
Oracle provides several utilities for diagnosing database performance issues • Automatic Workload Repository (AWR) • Automatic Database Diagnostic Monitor (ADDM) • Active Session History (ASH) • Statspack
Using AWR • An AWR report is good for viewing the entire system performance and identifying the top resource-consuming SQL queries. • Run the following script to generate an AWR report: SQL> @?/rdbms/admin/awrrpt • You can also generate an AWR report for a specific SQL statement by running the awrsqrpt.sqlreport. • You will be prompted for the SQL_ID of the query of interest: SQL> @?/rdbms/admin/awrsqrpt.sql
Using ADDM • ADDM useful as a first place to look for database performance issues. • Use DBMS_ADDM PL/SQL package or Oracle provided SQL script to view the output of ADDM. • You need to provide as input (you’ll be prompted) a begin and end snapshot ID: SQL> @?/rdbms/admin/addmrpt
Using ASH • Useful for identifying short lived performance problems. • Use the Oracle provided script to manually run this utility: SQL> @?/rdbms/admin/ashrpt
Using Statspack • Statspack does not require an extra license. • Provides much of the same type of information as the AWR reports. • First create the repository and the PERFSTAT user: SQL> @?/rdbms/admin/spcreate.sql • To enable the automatic gathering of Statspack statistics, run this script: • SQL> @?/rdbms/admin/spauto.sql • After some snapshots have been gathered, you can run the following script as the PERFSTAT user to create a Statspack report: • SQL> @?/rdbms/admin/spreport.sql
Detecting and Resolving Locking Issues • Oracle provides several data dictionary views to help diagnose locking issues: V$LOCK V$LOCKED_OBJECT V$SQLAREA V$SESSION
Resolving Open Cursor Issues • Run a query such as the following to determine the number of open cursors each session has opened: select a.value, c.username, c.machine, c.sid, c.serial# from v$sesstat a ,v$statname b ,v$session c where a.statistic# = b.statistic# and c.sid = a.sid and b.name = 'opened cursors current' and a.value != 0 and c.username IS NOT NULL order by 1,2; • You may need to increase the value of OPEN_CURSORS to resolve issues. • Another common issue is code that doesn’t correctly close a cursor.
Determining if Undo is Correctly Sized • Undersized online redo log files can cause excessive switching which in turn causes modified database blocks to be more frequently flushed from memory. • To get an idea of switching frequency: select to_char(begin_time,'MM-DD-YYYY HH24:MI') begin_time ,ssolderrcnt ORA_01555_cnt ,nospaceerrcnt no_space_cnt ,txncountmax_num_txns ,maxquerylenmax_query_len ,expiredblksblck_in_expired from v$undostat where begin_time > sysdate - 1 order by begin_time;
Viewing SQL that is Consuming Undo Space • Sometimes you may face a situation where you keep running out of space in the UNDO tablespace. • In these situations it's helpful to identify which users are consuming space in the undo tablespace. • Run this query to report on basic information regarding space allocated on a per user basis: select s.sid, s.serial# ,s.osuser, s.logon_time,s.status, s.machine,t.used_ublk,t.used_ublk*16384/1024/1024 undo_usage_mbfrom v$session s ,v$transaction t where t.addr = s.taddr;
Determining if Temporary Tablespace is Sized Correctly • There is no exact formula to determine the optimal size of the temporary tablespace. • You’ll have to monitor your temporary tablespace while there is a load on your database to determine its usage patterns. • If you are using Oracle Database 11g or higher, run the following query to show both the allocated and free space within the temporary tablespace: select tablespace_name ,tablespace_size/1024/1024 mb_size ,allocated_space/1024/1024 mb_alloc ,free_space/1024/1024 mb_free from dba_temp_free_space;
Viewing SQL that is Consuming Temporary Space • When troubleshooting temporary tablespace issues, run a query such as this to show how resources are being consumed: SELECT s.sid, s.serial# ,s.username, p.spid,s.module, p.program,SUM(su.blocks) * tbsp.block_size/1024/1024 mb_used,su.tablespaceFROM v$sort_usagesu ,v$session s ,dba_tablespacestbsp ,v$process p WHERE su.session_addr = s.saddrAND su.tablespace = tbsp.tablespace_nameAND s.paddr = p.addrGROUP BY s.sid, s.serial#, s.username, s.osuser, p.spid, s.module, p.program, tbsp.block_size, su.tablespaceORDER BY s.sid;
Enabling Oracle Standard Auditing • Oracle's Standard Auditing feature is enabled through setting the AUDIT_TRAIL initialization parameter. • Set the AUDIT_TRAIL parameter to DB, which specifies that Oracle will write audit records to an internal database table named AUD$. • For example, when using an spfile, here's how to set the AUDIT_TRAIL parameter: SQL> alter system set audit_trail=db scope=spfile;
Auditing DML Usage • Auditing allows you to capture the types of SQL statements being used to access a table. • For example, the following statement enables auditing on all DML access to the EMP table owned by INV_MGMT: SQL> audit select, insert, update, delete on inv_mgmt.emp; • From this point on, any DML access to the INV_MGMT.EMP table will be recorded in the SYS.AUD$ table. • You can use a query such as this to report on DML access to a table: select username, obj_name, timestamp ,substr(ses_actions,4,1) del ,substr(ses_actions,7,1) ins ,substr(ses_actions,10,1) sel ,substr(ses_actions,11,1) upd from dba_audit_object;
Auditing Logon/Logoff Events • Use the BY ACCESS clause of AUDIT to enable auditing of logon and logoff activities, like so: SQL> audit create session by access; • Now every user that logs on and logs off the database will result in a record being inserted into the AUD$ table. • You can report on logon and logoff activity with a query such as this: select username ,action_name ,to_char(timestamp,'dd-mon-yyyy hh24:mi:ss') event_time ,to_char(logoff_time,'dd-mon-yyyy hh24:mi:ss') logoff_time from dba_audit_trail;
Viewing Enabled Audit Actions • There are several data dictionary views that allow you to view what auditing actions have been enabled. • Use the following query to enabled auditing at the privilege level: select user_name ,privilege ,success ,failure from dba_priv_audit_opts;
Turning Auditing Off • To permanently disable auditing, set the AUDIT_TRAIL initialization parameter to none: SQL> alter system set audit_trail='none' scope=spfile; • Now stop and restart your database to turn off Oracle's Standard Auditing feature. • If you only want to disable features that you have previously enabled with AUDIT, then use the NOAUDIT statement to turn off specific auditing events. • For example, the following statement turns off all statement auditing that was enabled via an AUDIT ALL statement: SQL> noaudit all;
Purging the Audit Table and Files • You should periodically purge the AUD$ table so that it doesn’t consume inordinate amounts of space in your SYSTEM tablespace. • The easiest way to purge audit records is to delete from or truncate the audit table. This example truncates the AUD$ table: SQL> truncate table aud$; • If you want to preserve newer records in the AUD$ table, then use a DELETE statement. • For example, this deletes records over 21 days old: SQL> delete from aud$ where timestamp# > sysdate + 21;
Moving the Audit Table to a Non-System Tablespace • If you find that the AUD$ table is growing large and consuming too much space in the SYSTEM tablespace, you can move the table to a non-SYSTEM tablespace. • Use the DBMS_AUDIT_MGMT package to move the AUD$ table to a separate tablespace. • This example moves the AUD$ table to a tablespace named AUD_TBSP: • BEGIN • DBMS_AUDIT_MGMT.SET_AUDIT_TRAIL_LOCATION( • AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD, • AUDIT_TRAIL_LOCATION_VALUE => 'AUD_TBSP'); • END; • /
Auditing at a Granular Level • Fine-grained auditing (FGA auditing) is a separate feature from regular database auditing. • Fine-grained auditing allows you to audit SQL at a more granular level than simple INSERT, UPDATE, DELETE, and SELECT operations. • FGA auditing allows you to audit for SQL activities that occur at the column level. • FGA auditing also allows you to perform a Boolean check on an operation
Summary • Database troubleshooting covers a wide variety of topics and subject areas. • This chapter focused on some common database problems you’ll encounter. • As a DBA you should be aware of what kinds of problems can happen, how to diagnose issues, and how to resolve them in a timely manner.