Ten More Database Mysteries

Ten More Database Mysteries Chris Lawson Database Specialists, Inc. www.dbspecialists.com clawson@dbspecialists.com

1 The Case of the Unwanted Services • DBA detects listener starting extra services NOT defined! > lsnrctl status • Services Summary… • database1 has 1 service handler(s) • database2 has 1 service handler(s) • databasez has 5 service handler(s) • They appear to be harmless, but what are these extra services? • Restarted Listener. At first all OK, then 5 extra services activated. • Confirmed that listener.ora file does NOT list the unwanted database. • DBA confirmed using correct listener.ora file. • Puzzle: There is no entry whatsoever for databasez.

Unwanted Services: Solution • Multiple listener services is indicative of MTS (Multi-threaded Server) Dispatchers; however, MTS was not being used! • DBA recalled that databasez was actually on a different server. • Solution: Database on different server started MTS services. • Key init.ora parameter: MTS_LISTENER_ADDRESS = "(ADDRESS=(PROTOCOL=tcp)(HOST=myhost)(PORT=1526))" • Services match init.ora parameter MTS_DISPATCHERS. • How it happened: The remote init.ora file had originated on ‘our’ server, and the MTS parameter entries were not removed. • Resolution: On other server, DBA corrected the init.ora entry, bounced database. Extra services automatically stopped. • But - that’s not all...

Unwanted Services: The Rest of the Story • These extra services turned out to be not so “harmless” after all. • The ‘unwanted’ services are fully functional; they will intercept connection requests and redirect them to a different server! • Production users were redirected to a “clone” testing database on another server, and used it for several hours! • This occurs because each MTS service points to a particular dispatcher on a particular server. > lsnrctl services Services Summary... demo has 3 service handler(s) D001 <machine: hohp2, pid: 3385> (ADDRESS=(PROTOCOL=tcp)(DEV=17)(HOST=[1.2.3.4)(PORT=1179)) • Perhaps could be used for rapid failover to standby database?

2 Incognito DBA Privileges • Application testers need to “refresh” the test database often. • They will need to start/shut database as the oracle user. • DBA, being suspicious (justly) by nature, is reluctant to provide oracle account. • As punishment, users frequently ask DBA to run the refresh. • Question: How can developers perform tasks as oracle, without actually having the oracle account?

Incognito DBA Privileges: Solution • Create script to refresh database, including startup, shutdown, etc. • UNIX setuid feature allows programs to run with another identity. But setuid feature does not propagate to commands within the script. • Trick: Use a ‘wrapper’ C program that runs as oracle and calls the refresh script: # include <stdio.h> main () { system (“./refresh.ksh”); } • Activate setuid for the wrapper program: chmod 6711 refresh • Users simply enter refresh to run the refresh program as oracle. • Thanks to Brian Keating of Database Specialists for this solution.

3 NT Remote Control • NT box is running an important database. It appears that the Listener is down. • NT server is remote. • Question: How can you check the listener remotely?

NT Remote Control: Solution • Use ‘SC’ command. It is part of the NT Resource Kit. • The SC utility is very similar to the NET commands. • First enable a security "context" to the remote box; e.g. map a network drive to the remote server, and provide the administrator account/password. • The network drive may be disconnected after SC use is done. • Note: A mapped network drive is not required if you are connected locally as administrator, and the remote server uses the same administrator password. • Run SC using the format: SC \\[SERVER] [CMD] SERVICE

NT Remote Control: Solution (continued) Some common SC commands are: query--------Queries the status for a service start--------Starts a service pause--------Sends a PAUSE control request continue-----Sends a CONTINUE request stop---------Sends a STOP request to a service config-------Changes the configuration of a service qc-----------Queries the configuration information delete-------Deletes a service (from the registry) create-------Creates a service (add to registry)

NT Remote Control: Solution (continued) EXAMPLE: Check status of the Oracle 8 LISTENER service. NT server named ‘TSUNAMI’ SC \\TSUNAMI QUERY ORACLETNSLISTENER80 SERVICE_NAME: ORACLETNSLISTENER80 TYPE : 10 WIN32_OWN_PROCESS STATE : 4 RUNNING (STOPPABLE,NOT_PAUSABLE,IGNORES_SHUTDOWN) WIN32_EXIT_CODE : 0 (0x0) SERVICE_EXIT_CODE : 0 (0x0) CHECKPOINT : 0x0 WAIT_HINT : 0x0j

4 The Sad Case of the Failing Failover • Very critical Customer Support application for large HMO. • 24 x 7 crucial. Millions of monthly customers interactions. • IBM RS-6000 High-availability cluster (HACMP) • Scenario: Sudden hardware glitch causes failover to backup node. • Secondary node fails to start database! Database complains of missing datafile. • Critical application is down! • Objective: Early-rising DBA (from different project) not familiar with the setup needs to get database up!

The Failing Failover: Solution • Investigation shows all file systems mounted. Bad News: file in question is simply not there. • Good News: File is not supposed to be there--only a symbolic link. • Each node has symbolic links pointing from “nice-looking” paths to the actual location of raw device: /u04/oradata/data01.dbf > /dev/r123 • The primary node had all the links; the backup node was missing 2 symbolic links. Simply creating the links allowed database startup.

The Failing Failover: Solution(continued) • Moral of the story: High reliability hardware only as good as the weakest link--including maintenance requirements. • This strategy was very susceptible to human error. Every new .dbf file required new symbolic link on both nodes. One omission completely destroyed the high-availability plan.

5 The Case of the Impatient Insurance Agent • Query to find details for expired insurance policies only returns 50 rows, but takes 2 minutes to finish. • Query is simple join of 2 tables: Response and Policy 5 where-clause ‘filters’ 1 ‘filter’ result set: 20,000 rows result set: 15,000 rows RESP POL policy_id result: only 50 rows • Puzzle: How can query ever be quick, since the ‘super filter’ to reduce set to 50 rows is spread across two tables ?

Impatient Agent: Solution The Trick: 1) Perform pre-processing that uses filters, but retrieves only the key (policy_id) that is used as the join column. 2) This field is retrieved via index read only--thus avoiding the table reads that account for the delays. 3) Use this result set as the starting point for the original join. pre-processing driving “table” original join RESP POL RESP POL index reads only table reads on small set only

6 Daffy Database Links • Version 7.3.4 database. Database links working normally. Database has been running straight for 3 weeks. • Suddenly, the links fail: ORA-12154: TNS:could not resolve service name • Investigation shows tnsnames.ora file unchanged. Link definition unchanged. • Listener is running Oracle 8.0.4. Adding entry to tnsnames.ora file in Oracle 8 directory causes the links to work again! • Tests with new links show that the tnsnames.ora file for 7.3.4 is being ignored! • Further, if Oracle 8 tnsnames file is removed, the one in Oracle 7 is used instead!

Database Links: Solution • Clue: Listener had been restarted recently. But why should the Listener process (a server side function) affect finding the tns alias? • Metalink analyst suggested checking TNS_ADMIN; but how is that relevant? We are not creating a UNIX session. • Oracle Note 37808.1 clears-up the confusion: For database links, TNS_ADMIN takes its value from the value defined when the listener was started • That is, a client process (link) is influenced by a server-side process! • The listener had been started most recently with TNS_ADMIN set to 8.0.4, causing links to look in 8.0.4 tns file. • Unsetting TNS_ADMIN, then restarting listener caused all operations to run as expected.

7 All Primary Keys are Equal(But some are more equal than others) • Background: Application tuning often requires DBA to ‘lure’ optimizer into using certain indexes. Hints not always successful or possible. • Optimizer ‘likes’ certain indexes, because they typically are faster than others. The favorite: an index on Primary Key. • Scenario: 20 gigabyte insurance billing system. Oracle 8.0.4. • Particular query joins to a table called POLICY. Joinuses the PK index (policy_ID). • Query speed-up requires that join use new index on (PK, other col). • Created new double index--but optimizer would not cooperate. It foolishly insists on using the PK index, even if hint used. • Question: How can we get the optimizer to obey the DBA?

All Primary Keys are Equal: Solution • We need a way to ‘trick’ the optimizer into not using the PK index, but instead, use our ‘extra column’ index. • This is difficult, because the PK index is the #1 choice. • Trick: ‘Disguise’ the PK index as a unique index. • Oddity: If unique index already exists, addition of a matching PK will use the existing index. • In determining execution plan, optimizer will treat the new index as if it were a unique index, not a PK index. • So, drop the PK, build a matching unique index, then rebuild the PK. • Optimizer no longer stubbornly insists on using this ‘PK’ index, because it is not a true, pedigreed PK index--merely a unique index.

8 The Sad Case of the Homeless Archive Logs • Scenario: Large medical application; hundreds of connections into database. This is a critical, 24x7 server. • Users suddenly complain of database ‘hanging.’ • Alert log shows database unable to write archive log. ORA-00255: error archiving log 1 of thread 1, sequence # 200 ORA-00270: error creating archive log /demo/arch/1_200.dbf ORA-19504: failed to create file "/demo/arch/1_200.dbf" • Investigation reveals disk crash on disk housing archive logs. • No .dbf files on that disk--only archive logs. • Problem: How can database operation be resumed, with minimal disruption to hundreds of users?

Homeless Archive Logs: Solution • DBA decided to dynamicallyredirect the archive logs, thereby avoiding need to shutdown database. • To change destination of archive logs: alter system archive log start to ‘[new path]’ • For example: SVRMGRL> alter system archive log start to '/demo/arch2/' • Once the ‘backlog’ of archive logs is corrected, database automatically resumes normal operation. • Alert log now shows: Fri Feb 11 09:00:47 2000 ARCH: Archival started Archiver process freed from errors. No longer stopped.

9 The Puzzling Affair of the Old Archive Logs • With database in Archive Mode, old archive logs must be purged. • On Unix, this is simple, using the find command (in cron). • For example, to purge archive logs greater than 6 days old: find /logdir/arch -name “arch*” -mtime +6 -exec rm {} \; • Problem: How can we similarly purge the old archive logs on NT?

Archive Logs: Solution • On NT, there is apparently no ‘find’ command, as in Unix. • The archive logs can be found indirectly, because the database ‘knows’ when/where they were written. • Logs listed in V$ARCHIVED_LOG • Example, to remove logs older than 2 days: spool del_logs.sql select 'del ‘ || name from v$archived_log where completion_time < (sysdate - 2); spool off host del_logs • Similar query could be used to copy archive logs to backup.

10 SQL*Plus Madness • Scenario: Connection ‘hangs’ upon trying to connect in SQL*Plus. • Also get error message re DBMS_APPLICATION_INFO • DBA investigates. He finds: 1) Connect via Svrmgrl ok; 2) System user connects ok • Problem: How can DBA ‘fix’ SQL*Plus?

SQL*Plus Madness: Solution • Package DBMS_APPLICATION_INFO is used to ‘register’ a running application, for tracking/analysis purposes. • Once an application is ‘registered’ it will show up in v$session in ‘module’ field. • SQL*Plus is one of few applications that try to ‘register.’ This explains why svrmgrl still works. • This all points to problem with the ‘register’ of SQL*Plus. • The set-up script to build necessary tables is called ‘pupbld.sql’

SQL*Plus Madness: Solution(continued) • Action: Try to run pupbld.sql as SYSTEM but it hangs! • Investigation shows: product_profile table missing, but synonym still there! • Solution: drop public synonym for product_profile, then rebuild pupbld. All OK now.

Contact Information Chris Lawson clawson@dbspecialists.com http://www.dbspecialists.com Database Specialists, Inc. 388 Market Street, Suite 400 San Francisco, CA 94111

Ten More Database Mysteries

Ten More Database Mysteries

Presentation Transcript

Menopause’s Mysteries

Ten Database Mysteries Chris Lawson Database Specialists, Inc. www.dbspecialists.com clawson@dbspecialists.com

DATABASE MYSTERIES: BATCH JOB DIAGNOSTICS

Life’s Mysteries

Mysteries

Mysteries

Space Mysteries

Unsolved mysteries

Mysteries

MYSTERIES

Mysteries

Mysteries

More Object Database Concepts

Solving Mysteries

Mysteries

AFI Top Ten Database

history’s mysteries

Mysteries

Capital Mysteries

Mysteries

Top Ten Mysteries of the Universe

Mysteries