350 likes | 435 Views
Deleting LOTS of Data. Author: Randy Cunningham, OCP. Why Use DELETE?. Retention policies; good data hygiene Data de-duplication Corporate mergers & acquisitions Purge and archival processing Cleaning out temporary & work tables Reclaim storage space Obtain better performance?….
E N D
Deleting LOTSof Data Author: Randy Cunningham, OCP
Why Use DELETE? • Retention policies; good data hygiene • Data de-duplication • Corporate mergers & acquisitions • Purge and archival processing • Cleaning out temporary & work tables • Reclaim storage space • Obtain better performance?…
What Are The Issues? • DELETE never finishes • DELETE takes too long • Locking • Impact on application performance • Desired space not released by DELETE • UNDO space & errors
MVLog X 5 1 2 3 6 4 X X Undo Insert… Redo What Happens During a DELETE
Alternatives to DELETE • TRUNCATE TABLE • CTAS, truncate and reinsert data to keep • Use partitioning: • Truncate partition • Drop partition • Exchange partition • DELETE confined to one, or a few, partitions
TRUNCATE TABLE • Upside: typically many times faster than using an unqualified DELETE operation • Downsides: • Does not operate in transactional context (it’s DDL) • You can’t rollback a TRUNCATE • Cannot truncate parent rows in a R.I. Constraint • It is DDL and requires DROP TABLE privileges
Delegating TRUNCATE rights CREATE PROCEDURE trunc_do (fqtn VARCHAR2) IS BEGIN EXECUTE IMMEDIATE'TRUNCATE TABLE '||fqtn; END trunc_do; GRANT EXECUTE ON trunc_do TOPUBLIC;
Delegating TRUNCATE rights CREATE PROCEDURE trunc_do (fqtn VARCHAR2) IS BEGIN EXECUTE IMMEDIATE'TRUNCATE TABLE '||fqtn; END trunc_do; GRANT EXECUTE ON trunc_do TO PUBLIC;
Delegating TRUNCATE rights CREATE PROCEDURE trunc_do (fqtn VARCHAR2) IS BEGIN EXECUTE IMMEDIATE'TRUNCATE TABLE '||fqtn; END trunc_do; CREATE PROCEDURE trunc (owner IN VARCHAR2 DEFAULT USER, table_name IN VARCHAR) AUTHID CURRENT_USER IS fqtn VARCHAR2(80) := owner || '.' || table_name; BEGIN EXECUTE IMMEDIATE-- Ensure DELETE rights 'DELETE FROM ' || fqtn || ' WHERE 1=2'; trunc_do (fqtn); END trunc; GRANT EXECUTE ON trunc TOPUBLIC;
Create Table as Select… • Use where most of the data are deleted • Good for sidestepping UNDO issues • Provides quick & simple fault resilience • Downside: requires intermediate storage
Create Table as Select: Example CREATE TABLE student_temp AS SELECT * FROM student WHERE status_flag IN ('A','N','T') OR enroll_date >= to_date (’02/15/2006’, ‘MM/DD/YYYY’); TRUNCATE TABLE student; INSERT /*+APPEND*/ INTO student SELECT * FROM student_temp; DROP TABLE student_temp;
Create Table as Select: Observations • Run steps manually, or provide thorough error handling • All limitations of TRUNCATE TABLE apply • Ensure there is enough space for the CTAS temporary holding table • Be sure to use APPEND hint in the INSERT • Does not work for tables with LONG data type (use SQL*Plus COPY command instead)
Table Partitioning • DROP PARTITION – ideal for temporal data • TRUNCATE PARTITION – for cyclic data • EXCHANGE PARTITION – for archival • DELETE … [PARTITION] – a full scan of the partition is quicker than a full table scan
Dropping a Table Partition • Quickly eliminates an entire range or list • Example: ALTER TABLE enrollment DROP PARTITION Year2001 UPDATE GLOBAL INDEXES; 2002 2003 2004 2005 2006
Jun May Apr Mar Feb Jan Dec Nov Oct Sep Aug Jul Truncating a Table Partition • Quick, easy way to manage cyclic partitions • Example: ALTER TABLE enrollment TRUNCATE PARTITION mar_data UPDATE GLOBAL INDEXES; X
Exchanging a Table Partition • Quick way to excise data from the table, but retain it for archival or backup: CREATE TABLE enrolled03_2005 AS SELECT * FROM enrollment WHERE 1=2; ALTER TABLE enrollment EXCHANGE PARTITION mar_data WITH TABLE enrolled03_2005 UPDATE GLOBAL INDEXES;
DELETE on a Table Partition • If the predicate filters the partition key, then only specific partitions are scanned. • In this case, it is not necessary to specify PARTITION in the DELETE command. • Look for these in the EXPLAIN PLAN: • PARTITION RANGE SINGLE • PARTITION RANGE ITERATOR
If DELETE You Must… • Check the EXPLAIN PLAN for the DELETE! • DELETE in batches • Get indexes out of the way • Use ROWID to minimize interference • Use ROWID to implement parallelism
Check EXPLAIN PLAN for DELETE • If most rows are being deleted, you will want to see a full table scan. • If most blocks are being visited, you will want to see a full table scan. • Optimize subquery form DELETE to perform well as a stand-alone SELECT.
DELETE in batches • Delete a million (or so) rows at a time: DELETE FROM evt WHERE status = 'X' AND rownum <= 1000000; • Delete an unrelated partition at a time: DELETE FROM evt PARTITION div162 WHERE status = 'X';
Get Indexes Out of the Way • Old way: Drop the indexes and build them from scratch following deletion • New way: Alter the indexes unusable and rebuild them following deletion: ALTER INDEXu_name_ixUNUSABLE; DELETE FROM namesWHERE … ALTER INDEXu_name_ixREBUILD;
Before You Tamper With Indexes… • Be certain an index isn’t needed to facilitate the access path for your DELETE • Ensure that a significant proportion of the rows in the table are affected • Test to be sure that it is worthwhile • Be sure that other database operations are not relying on the index (best done during a scheduled maintenance window)
ROWID is your friend • Isolates query operation from DELETE • Minimizes number of block changes • Is blazingly fast and efficient • Overcomes key preserved table restrictions • Can facilitate DELETE workload mgmt: • Restartability • Home-brew parallelization • Batching
ROWID Case Study 1 • We have identified extraneous rows: SELECT * FROM ( SELECT domain_owner, latest_uce, MAX(latest_uce) OVER (PARTITION BY domain_owner) very_latest FROM spammers) WHERE latest_uce < very_latest
ROWID Case Study 1 DELETE FROM ( SELECT * FROM ( SELECT domain_owner, latest_uce, MAX(latest_uce) OVER (PARTITION BY domain_owner) very_latest FROM spammers) WHERE latest_uce < very_latest)
ROWID Case Study 1 Result: ORA-01752: cannot delete from view without exactly one key-preserved table
ROWID Case Study 1 DELETE FROM spammers WHERE ROWID IN ( SELECT ROWID FROM ( SELECT domain_owner, latest_uce, MAX(latest_uce) OVER (PARTITION BY domain_owner) very_latest FROM spammers) WHERE latest_uce < very_latest)
ROWID Case Study 2 - Deduplication DELETEFROM Customers WHERE ROWID IN ( SELECT DISTINCT ROWID FROM (SELECT ROWID, MIN (ROWID) OVER (PARTITION BY Cust_Last_Name, Cust_First_name) Best FROM Customers) WHERE ROWID <> Best);
ROWID Case Study 3 - Staging CREATE GLOBAL TEMPORARY TABLE Rowid_Tbl (Xrowid ROWID) ON COMMIT PRESERVE ROWS; INSERT INTO Rowid_Tbl SELECT DISTINCT ROWID FROM (SELECT ROWID, …FROMHuge_Table …); DELETEFROMHuge_Table WHERE ROWID IN (SELECT Xrowid FROMRowid_Tbl) ;
ROWID Case Study 4 – Workload DEFINE N=8 -- # buckets desired CREATE TABLE Delete_Driver (rid ROWID, pctile NUMBER, PRIMARY KEY (rid)); INSERT /*+APPEND*/ INTO Delete_Driver SELECT ROWID rid, ntile(&N) OVER (ORDER BY ROWID) pctile FROM Warranty WHERE Warranty_Expiration < TO_DATE('01/01/2002','MM/DD/YYYY');
ROWID Case Study 4 - Workload • Each batch, whether run in serial or parallel, operates this way: Define Job=4 -- set between 1 and 8 DELETE FROM (SELECT NULL FROM Warranty, Delete_Driver WHERE Warranty.ROWID = Delete_Driver.rid AND pctile=&Job); COMMIT;
ROWID Caveats • Not for use in portable SQL… it is specific to Oracle • Don’t use stale ROWIDs… don’t keep them around permanently • Not for use where rows migrate constantly