590 likes | 792 Views
B3: Putting OpenEdge Auditing to Work: Dump and Load with (Almost) No Downtime. David EDDY. Senior Solution Consultant. Why?. Why?. Get access to Type-II storage areas Reorganise database Can’t afford to have DB off-line!. So What’s the Big Deal?.
E N D
B3: Putting OpenEdge Auditing to Work: Dump and Load with (Almost) No Downtime David EDDY Senior Solution Consultant
Why? • Get access to Type-II storage areas • Reorganise database • Can’t afford to have DB off-line!
So What’s the Big Deal? • Dump and load with 2 small downtime slots: • A few minutes at front end • As little as 5 min at back end – YMMV • Old DB continues in live use while building new • No lost data
Agenda Introduction The Dump/Load Process Techniques: Turning Auditing to this Purpose Techniques: Offline Dump and Load Other Issues
Introduction • Why is this important? • The plan • Why it works • Why it might not work • Why auditing?
Why is this Important? • Type II Storage Areas • 15-20% performance boost • Removal of limits • On-line maintainability • Future capabilities (e.g. per-area backups)
Plan Timeline Old DB New DB Add storage ares for auditing data Enable auditing Add auditing policy & enable Copy live DB via backup/restore Dump Build new structure Load & index build Initial replication Second replication • Changeover: • Final replication • Verification test (db content comparison) • Signoff • Make new DB live
Why It Works • Bulk of DB activity is reads • Updates small proportion of total • Batch update can be done in << real time • Multiple updates per row • Only interested in the last • Further decreases total work replicating changes • ...for most databases
Why It Might Not Work • Update rate too high / no duplication of activity • Cost of change capture too high • No unique key • Record replication needs unique key
Why Auditing • Reliable, light-weight change capture • Implemented in DB engine core • Easy to set up • Can use GUI tool for policy set up if desired • No code impact • See OpenEdge Core Services manual for more detail about auditing
Agenda Introduction The Dump/Load Process Techniques: Turning Auditing to this Purpose Techniques: Offline Dump and Load Other Issues
The Dump/Load Process Old DB New DB • Migrate to OpenEdge 10.1C in-situ • Prepare auditing policy • Initiate auditing • Offline dump/load • Repeated batch replication to “catch up” • Verification of database contents & signoff • Changeover
The Dump/Load Process:Migrate to OpenEdge 10.1C In-Situ • Auditing requires 10.1 • OpenEdge 10.1C+ ideally • Must run 10.1 in production during change capture • If this represents a migration, you must first: • Do a trial migration in your test environment • Test & get signoff • Migrate clients to 10.1 if needed
Migration In-Situ (continued) • Convert database to 10.1 if not already there • First disable JTA, Replication, AI if enabled • Thenproutil db –C truncate bi $DLC10/bin/proutil db –C conv910 • New DB will have 64-bit features enabled • So we will also enable them in migrated DB to catch problems early proutil db –C enableseq64 proutil db –C enablelargekeys
The Dump/Load Process:Prepare Auditing Policy • Nominate tables to be replicated • Omit any scratch/work tables • Specify unique key for each table • Ideally corresponding with an index • Create auditing policy • APMT (from DB Admin Tool) • Direct policy data creation • Automation • Extract key field list from DB • Load policy without using APMT
The Dump/Load Process:Initiation of Auditing • Add storage areas to contain auditing datad “AuditData”,128;512 /path/to/extents d “AuditIndex”,16;64 /path/to/extents prostrct add[online] db audit.st • Enable auditing (and create auditing tables)proutil db –C enableauditing area AuditData indexarea AuditIndex deactivateidx • Consider enabling large file supportproutil db –C enablelargefiles • Load and enable the policies • APMT or bespoke tool
The Dump/Load Process:Offline Dump/Load • Take online backup of production DB • Restore in working area • Dump contents • Don’t dump scratch/work tables • Delete old DB • Build new structure for DB • Modify schema to suit (and load into new DB) • Load dumped data • Procedural details later
The Dump/Load Process:Batch Replication • Gather committed changes • Record timestamp of last change gathered • Eliminate duplicates • Only interested in last change for each row • Split list of changes into multiple lists • Use multiple loaders to load changes • Write a script to do all this
Batch Replication – Catching Up • Initial run will take some time • 3 days work might take 4 hours to apply • Next run will be faster • 4 hours work might take 15 minutes to apply • Once caught up, replicate periodically • Keep replication execution time low • Replication interval is a trade-off • Less frequent means more data reduction through duplicate elimination • Too infrequent means replication takes too long
The Dump/Load Process:Verification of Contents • You should have a basis for certifying that new DB is a good copy of the old • proutil tabanalys • Application reports e.g. trial balance • Everyone must agree that tests are valid basis for certification • Tests should not take too long to run
The Dump/Load Process:Changeover • Block user access to live DB • Perform final replication • Dump users, sequence values from old DB • Verify content equality (& signoff) • Make new DB live • Load users, seqvals into new DB • Shut down old DB • Change references in startup scripts, OR move or delete old DB and replace with new DB • Allow users access • Initiate backup
Agenda Introduction The Dump/Load Process Techniques: Turning Auditing to this Purpose Techniques: Offline Dump and Load Other Issues
Techniques: Auditing for Dump/Load • Audit policy • Audit data tables • Extracting key data • Data reduction • Applying changes
Audit Policy • Defines auditing: • What tables/fields are audited • What data is recorded • Can be multiple policies active • DB will merge at runtime to generate “effective” policy • Report available through APMT • Contained in tables • _aud-*-policy
Audit Policy Starting Point • You must know the name of each table you will replicate • You must know for each replicated table the list of fields you will use to uniquely identify each row in that table • Suggest automatic generation from schema (unique indexes) # Table keys for dump/load auditing policy – # C:\oebin\oework\tmp\test Salesrep: SalesRep Vacation: StartDate Department: DeptCode State: State ...
Audit Policy with APMT • GUI tool provided with OpenEdge distribution • How to use: • Define a new policy • Add tables • Add fields • Add events • Enable policy • Export new policy to XML (for later upload to live)
Audit Policy with APMT: Policy These values are default
Audit Policy with APMT: Tables Note “standard” audit level
Audit Policy with APMT: Fields List of key fields (for the Customer table in this example). Don’t forget to set the identifying seq num for each field in unique key.
Audit Policy with APMT: Events You need to define audit events, which describe the events used by this policy.Events 5100-5102 are the standard events for capturing creates, updates, and deletes in the database.Don’t forget to do this or no changes will be captured.
Audit Policy Tables • If you’re not using APMT, you need to set: • _aud-audit-policy • _aud-file-policy • _aud-field-policy • _aud-event-policy
Audit Policy Tables: _aud-audit-policy One record per policy (i.e. we add 1 record) _audit-policy-guid = base64-encode(generate-uuid) _audit-data-security-level = 0 _audit-custom-detail-level = 0 _audit-policy-active = true
Audit Policy Tables: _aud-file-policy One record for each table being audited _audit-policy-guid = _aud-audit-policy._audit-policy-guid _file-name = “table name” _owner = “PUB” /* usually */ _audit-create-level = 12 _audit-update-level = 12 _audit-delete-level = 12 _create-event-id = 5100 _update-event-id = 5101 _delete-event-id = 5102
Audit Policy Tables: _aud-field-policy One record for each key field of each table _audit-policy-guid = _aud-audit-policy._audit-policy-guid _file-name = “table name” _owner = “PUB” /* usually */ _field-name = “field name” _audit-identifying-field =seq_num_of_field_in_unique_key _audit-create-level = 12 _audit-update-level = 12 _audit-delete-level = 12
Audit Policy Tables: _aud-event-policy Three records per policy (one for each event used) _audit-policy-guid = _aud-audit-policy._audit-policy-guid _event-id = {5100|5101|5102} _event-level = 1
Audit Data Tables • Only interested in _aud-audit-data • Key fields: • _audit-date-time • _event-id • _event-context • Some indexes present even after deactivateidx: • _Audit-data-guid • _Audit-date-time
Audit Data Tables: _aud-audit-data • _audit-date-time is a datetime-tz field • _event-id tells you whether it was a create, update or delete • _audit-event-context contains the table name and key field values • _audit-event-data contains changed field values but can be safely ignored
_audit-event-context _audit-event-context = Tablename + chr(6) + list-of-key-field-values List-of-key-field-values = key-value-1 [ + chr(7) + key-value-2 ] … Order of key field values is dictated by identifying field sequence number
Extracting key data • Pull data from _aud-audit-data in _audit-date-time order • Want only those rows which can be share-locked • Unlockable audit data means transaction still in progress • Must not replicate uncommitted data • Dump only up to last row time time stamp different to last lockable row • Order not guaranteed within same date/time value • Need only 3 fields: • _event-context • _event-id • _audit-date-time
Data reduction • Need only last change for each row mentioned • Ignore or discard earlier changes
Applying changes • Changes can be applied in any order • Recommend multi-threading load • OpenEdge performance best with multiple threads • Tune new DB for heavy transaction activity • Write program to read event data and emulate in new DB fail-safe • Create/Update = Create if row does not exist in new; copy row from live DB if does exist in new • Delete = do nothing if target row does not exist • Batch transactions for performance • Run multiple threads e.g. 4-8
Agenda Introduction The Dump/Load Process Techniques: Turning Auditing to this Purpose Techniques: Offline Dump and Load Other Issues
Techniques: Off-Line Dump and Load • Planning the new database • Adjusting the schema • Dumping • Loading • Index rebuild
Planning the New Database • Plan to use Type II storage areas everywhere • Rows per block not that critical • Type-II areas have no max size (>1PB @ 128) • Use 128 or 256 for all table areas • Use 8 or 16 for all indexes • Blocks per cluster more important • Use 512 bpc only for objects 20MB+ • Use 8bpc otherwise • Distinct data & index areas • Try to equalise area sizes • Index build is parallel across areas
Adjusting the Schema • Old .df has old storage areas encoded • Manual editing is error prone • Recommended technique: • Build empty DB with both old & new areas (only 1 var extent for each area) • Load old schema • Build & execute script which tablemove’s to new schema • Dump new .df • New .st and .df define the new database
Dumping • Use binary dump • No size limit • No data corruption risk • Self-balancing parallel technique: • List tables in descending order of size • Start multiple dump threads • Each thread takes 1st table from list and dumps it • Speed is nice but not critical • Dump multithreaded might be useful for very large tables
Loading • Use binary load • Difficult to go faster than offline load as follows:proutil db –C truncate bi –bi 512 –biblocksize 8for f in *.bd* ; do proutil db –C load $f –r –B 500done • Don’t bother with “build indexes” • Multithreaded idxbuild is faster
Index Rebuild • Set up sort file dbname.srt • Invoke index build in multi-threaded mode:proutil db –C idxbuild all –TB 31 –TM 32 –SS db.srt –SG 64 –B 500 –thread 1 –threadnum cpus*1.5 • IDXBUILD is multi-threaded by default • No need to specify –thread, –threadnum if happy with 1 thread per CPU
Agenda Introduction The Dump/Load Process Techniques: Turning Auditing to this Purpose Techniques: Offline Dump and Load Other Issues