270 likes | 376 Views
Ten More Database Mysteries. Chris Lawson Database Specialists, Inc. www.dbspecialists.com clawson@dbspecialists.com. 1. The Case of the Unwanted Services. DBA detects listener starting extra services NOT defined! > lsnrctl status Services Summary …
E N D
Ten More Database Mysteries Chris Lawson Database Specialists, Inc. www.dbspecialists.com clawson@dbspecialists.com
1 The Case of the Unwanted Services • DBA detects listener starting extra services NOT defined! > lsnrctl status • Services Summary… • database1 has 1 service handler(s) • database2 has 1 service handler(s) • databasez has 5 service handler(s) • They appear to be harmless, but what are these extra services? • Restarted Listener. At first all OK, then 5 extra services activated. • Confirmed that listener.ora file does NOT list the unwanted database. • DBA confirmed using correct listener.ora file. • Puzzle: There is no entry whatsoever for databasez.
Unwanted Services: Solution • Multiple listener services is indicative of MTS (Multi-threaded Server) Dispatchers; however, MTS was not being used! • DBA recalled that databasez was actually on a different server. • Solution: Database on different server started MTS services. • Key init.ora parameter: MTS_LISTENER_ADDRESS = "(ADDRESS=(PROTOCOL=tcp)(HOST=myhost)(PORT=1526))" • Services match init.ora parameter MTS_DISPATCHERS. • How it happened: The remote init.ora file had originated on ‘our’ server, and the MTS parameter entries were not removed. • Resolution: On other server, DBA corrected the init.ora entry, bounced database. Extra services automatically stopped. • But - that’s not all...
Unwanted Services: The Rest of the Story • These extra services turned out to be not so “harmless” after all. • The ‘unwanted’ services are fully functional; they will intercept connection requests and redirect them to a different server! • Production users were redirected to a “clone” testing database on another server, and used it for several hours! • This occurs because each MTS service points to a particular dispatcher on a particular server. > lsnrctl services Services Summary... demo has 3 service handler(s) D001 <machine: hohp2, pid: 3385> (ADDRESS=(PROTOCOL=tcp)(DEV=17)(HOST=[1.2.3.4)(PORT=1179)) • Perhaps could be used for rapid failover to standby database?
2 Incognito DBA Privileges • Application testers need to “refresh” the test database often. • They will need to start/shut database as the oracle user. • DBA, being suspicious (justly) by nature, is reluctant to provide oracle account. • As punishment, users frequently ask DBA to run the refresh. • Question: How can developers perform tasks as oracle, without actually having the oracle account?
Incognito DBA Privileges: Solution • Create script to refresh database, including startup, shutdown, etc. • UNIX setuid feature allows programs to run with another identity. But setuid feature does not propagate to commands within the script. • Trick: Use a ‘wrapper’ C program that runs as oracle and calls the refresh script: # include <stdio.h> main () { system (“./refresh.ksh”); } • Activate setuid for the wrapper program: chmod 6711 refresh • Users simply enter refresh to run the refresh program as oracle. • Thanks to Brian Keating of Database Specialists for this solution.
3 NT Remote Control • NT box is running an important database. It appears that the Listener is down. • NT server is remote. • Question: How can you check the listener remotely?
NT Remote Control: Solution • Use ‘SC’ command. It is part of the NT Resource Kit. • The SC utility is very similar to the NET commands. • First enable a security "context" to the remote box; e.g. map a network drive to the remote server, and provide the administrator account/password. • The network drive may be disconnected after SC use is done. • Note: A mapped network drive is not required if you are connected locally as administrator, and the remote server uses the same administrator password. • Run SC using the format: SC \\[SERVER] [CMD] SERVICE
NT Remote Control: Solution (continued) Some common SC commands are: query--------Queries the status for a service start--------Starts a service pause--------Sends a PAUSE control request continue-----Sends a CONTINUE request stop---------Sends a STOP request to a service config-------Changes the configuration of a service qc-----------Queries the configuration information delete-------Deletes a service (from the registry) create-------Creates a service (add to registry)
NT Remote Control: Solution (continued) EXAMPLE: Check status of the Oracle 8 LISTENER service. NT server named ‘TSUNAMI’ SC \\TSUNAMI QUERY ORACLETNSLISTENER80 SERVICE_NAME: ORACLETNSLISTENER80 TYPE : 10 WIN32_OWN_PROCESS STATE : 4 RUNNING (STOPPABLE,NOT_PAUSABLE,IGNORES_SHUTDOWN) WIN32_EXIT_CODE : 0 (0x0) SERVICE_EXIT_CODE : 0 (0x0) CHECKPOINT : 0x0 WAIT_HINT : 0x0j
4 The Sad Case of the Failing Failover • Very critical Customer Support application for large HMO. • 24 x 7 crucial. Millions of monthly customers interactions. • IBM RS-6000 High-availability cluster (HACMP) • Scenario: Sudden hardware glitch causes failover to backup node. • Secondary node fails to start database! Database complains of missing datafile. • Critical application is down! • Objective: Early-rising DBA (from different project) not familiar with the setup needs to get database up!
The Failing Failover: Solution • Investigation shows all file systems mounted. Bad News: file in question is simply not there. • Good News: File is not supposed to be there--only a symbolic link. • Each node has symbolic links pointing from “nice-looking” paths to the actual location of raw device: /u04/oradata/data01.dbf > /dev/r123 • The primary node had all the links; the backup node was missing 2 symbolic links. Simply creating the links allowed database startup.
The Failing Failover: Solution(continued) • Moral of the story: High reliability hardware only as good as the weakest link--including maintenance requirements. • This strategy was very susceptible to human error. Every new .dbf file required new symbolic link on both nodes. One omission completely destroyed the high-availability plan.
5 The Case of the Impatient Insurance Agent • Query to find details for expired insurance policies only returns 50 rows, but takes 2 minutes to finish. • Query is simple join of 2 tables: Response and Policy 5 where-clause ‘filters’ 1 ‘filter’ result set: 20,000 rows result set: 15,000 rows RESP POL policy_id result: only 50 rows • Puzzle: How can query ever be quick, since the ‘super filter’ to reduce set to 50 rows is spread across two tables ?
Impatient Agent: Solution The Trick: 1) Perform pre-processing that uses filters, but retrieves only the key (policy_id) that is used as the join column. 2) This field is retrieved via index read only--thus avoiding the table reads that account for the delays. 3) Use this result set as the starting point for the original join. pre-processing driving “table” original join RESP POL RESP POL index reads only table reads on small set only
6 Daffy Database Links • Version 7.3.4 database. Database links working normally. Database has been running straight for 3 weeks. • Suddenly, the links fail: ORA-12154: TNS:could not resolve service name • Investigation shows tnsnames.ora file unchanged. Link definition unchanged. • Listener is running Oracle 8.0.4. Adding entry to tnsnames.ora file in Oracle 8 directory causes the links to work again! • Tests with new links show that the tnsnames.ora file for 7.3.4 is being ignored! • Further, if Oracle 8 tnsnames file is removed, the one in Oracle 7 is used instead!
Database Links: Solution • Clue: Listener had been restarted recently. But why should the Listener process (a server side function) affect finding the tns alias? • Metalink analyst suggested checking TNS_ADMIN; but how is that relevant? We are not creating a UNIX session. • Oracle Note 37808.1 clears-up the confusion: For database links, TNS_ADMIN takes its value from the value defined when the listener was started • That is, a client process (link) is influenced by a server-side process! • The listener had been started most recently with TNS_ADMIN set to 8.0.4, causing links to look in 8.0.4 tns file. • Unsetting TNS_ADMIN, then restarting listener caused all operations to run as expected.
7 All Primary Keys are Equal(But some are more equal than others) • Background: Application tuning often requires DBA to ‘lure’ optimizer into using certain indexes. Hints not always successful or possible. • Optimizer ‘likes’ certain indexes, because they typically are faster than others. The favorite: an index on Primary Key. • Scenario: 20 gigabyte insurance billing system. Oracle 8.0.4. • Particular query joins to a table called POLICY. Joinuses the PK index (policy_ID). • Query speed-up requires that join use new index on (PK, other col). • Created new double index--but optimizer would not cooperate. It foolishly insists on using the PK index, even if hint used. • Question: How can we get the optimizer to obey the DBA?
All Primary Keys are Equal: Solution • We need a way to ‘trick’ the optimizer into not using the PK index, but instead, use our ‘extra column’ index. • This is difficult, because the PK index is the #1 choice. • Trick: ‘Disguise’ the PK index as a unique index. • Oddity: If unique index already exists, addition of a matching PK will use the existing index. • In determining execution plan, optimizer will treat the new index as if it were a unique index, not a PK index. • So, drop the PK, build a matching unique index, then rebuild the PK. • Optimizer no longer stubbornly insists on using this ‘PK’ index, because it is not a true, pedigreed PK index--merely a unique index.
8 The Sad Case of the Homeless Archive Logs • Scenario: Large medical application; hundreds of connections into database. This is a critical, 24x7 server. • Users suddenly complain of database ‘hanging.’ • Alert log shows database unable to write archive log. ORA-00255: error archiving log 1 of thread 1, sequence # 200 ORA-00270: error creating archive log /demo/arch/1_200.dbf ORA-19504: failed to create file "/demo/arch/1_200.dbf" • Investigation reveals disk crash on disk housing archive logs. • No .dbf files on that disk--only archive logs. • Problem: How can database operation be resumed, with minimal disruption to hundreds of users?
Homeless Archive Logs: Solution • DBA decided to dynamicallyredirect the archive logs, thereby avoiding need to shutdown database. • To change destination of archive logs: alter system archive log start to ‘[new path]’ • For example: SVRMGRL> alter system archive log start to '/demo/arch2/' • Once the ‘backlog’ of archive logs is corrected, database automatically resumes normal operation. • Alert log now shows: Fri Feb 11 09:00:47 2000 ARCH: Archival started Archiver process freed from errors. No longer stopped.
9 The Puzzling Affair of the Old Archive Logs • With database in Archive Mode, old archive logs must be purged. • On Unix, this is simple, using the find command (in cron). • For example, to purge archive logs greater than 6 days old: find /logdir/arch -name “arch*” -mtime +6 -exec rm {} \; • Problem: How can we similarly purge the old archive logs on NT?
Archive Logs: Solution • On NT, there is apparently no ‘find’ command, as in Unix. • The archive logs can be found indirectly, because the database ‘knows’ when/where they were written. • Logs listed in V$ARCHIVED_LOG • Example, to remove logs older than 2 days: spool del_logs.sql select 'del ‘ || name from v$archived_log where completion_time < (sysdate - 2); spool off host del_logs • Similar query could be used to copy archive logs to backup.
10 SQL*Plus Madness • Scenario: Connection ‘hangs’ upon trying to connect in SQL*Plus. • Also get error message re DBMS_APPLICATION_INFO • DBA investigates. He finds: 1) Connect via Svrmgrl ok; 2) System user connects ok • Problem: How can DBA ‘fix’ SQL*Plus?
SQL*Plus Madness: Solution • Package DBMS_APPLICATION_INFO is used to ‘register’ a running application, for tracking/analysis purposes. • Once an application is ‘registered’ it will show up in v$session in ‘module’ field. • SQL*Plus is one of few applications that try to ‘register.’ This explains why svrmgrl still works. • This all points to problem with the ‘register’ of SQL*Plus. • The set-up script to build necessary tables is called ‘pupbld.sql’
SQL*Plus Madness: Solution(continued) • Action: Try to run pupbld.sql as SYSTEM but it hangs! • Investigation shows: product_profile table missing, but synonym still there! • Solution: drop public synonym for product_profile, then rebuild pupbld. All OK now.
Contact Information Chris Lawson clawson@dbspecialists.com http://www.dbspecialists.com Database Specialists, Inc. 388 Market Street, Suite 400 San Francisco, CA 94111