600 likes | 893 Views
After Imaging. The DBA’s Best Friend. A Few Words About The Speaker. Tom Bascom Progress® User since 1987 White Star Software, LLC DBAppraise ®, LLC Consulting Services related to Progress Databases and Application Architecture. tom@wss.com tom@dbappraise.com. What is it? a nd
E N D
After Imaging The DBA’s Best Friend
A Few Words About The Speaker • Tom Bascom • Progress® User since 1987 • White Star Software, LLC • DBAppraise®, LLC • Consulting Services related to Progress Databases and Application Architecture.tom@wss.com tom@dbappraise.com
What is it? and Why Do I Need it?
What is After-Imaging? • A journal of transaction “notes” that can be replayed against a baseline backup to restore a database to the last completed transaction or a point in time or a specific transaction number. • This is the same concept that some other databases refer to as the “redo log”. • Differs from the before image file (undo log) as space is not reused without interaction or scripting.* * 10.1B AI Archiver improves this.
Why do I need after-imaging? • Protection from media loss -- such as bad tapes, a crashed disk, a destroyed data center or stolen servers…
I have backups.Do I still need after-imaging? • With a backup your potential exposure to data loss is the entire time period between backups. • For example -- if you do nightly backups and your disk crashes at 4:45pm you restore from backup and lose an entire day of work. If you have one or more bad tapes your data loss could be much worse. • With after-imaging you restore the same backup, roll-forward your archived ai files and lose only uncommitted transactions.
Why else do I need after-imaging? • Protection from human errors: • Human error is at least as big a risk as hardware problems. $ cd /db $ rm * for each order: delivered = yes. end. for each customer: delete customer. end. $ vi dbname.db … :x
Or an Audit Log? Isn’t AI the same as disk mirroring? • No, disk mirrors will happily delete both copies of your deleted database. • Or delete all of your customers on both mirrors. • No, an audit log cannot be replayed to reconstruct the missing data.
I have OpenEdge Replication.Do I still need after-imaging? • OE Replication is a super-set of after-imaging. You still must configure and manage after-imaging. • After-imaging still provides an additional layer of protection – even with OE Replication in place. • OE Replication is aggressively real-time. You cannot build in a time delay like you can with after-imaging.
What about performance? Are there downsides to after-imaging? • It is not automatically enabled. • You must manage archived logs. • Recovery is not automated. • There might be a very small penalty. • But you can usually only measure it under extremely high loads.
How Does After-Imaging Work?
How does after-imaging work? DB First, make a backup! BI .a1 .a2 .a3 .a4 probkupdbname dbname.pbk
How does after-imaging work? Shared Memory DB BIW AIW Then, enable after-imaging, start the database and start an AI Writer. Extent .a1 will be “busy”. BI .a1 .a2 .a3 .a4 busy empty empty empty rfutildbname –C aimage begin
How does after-imaging work? Shared Memory DB BIW AIW Switch extents. Extent .a1 will be marked “full” and extent .a2 will become “busy”. BI .a1 .a2 .a3 .a4 full busy empty empty rfutildbname –C aimage new
How does after-imaging work? Shared Memory DB BIW AIW Switch extents again. Extent .a2 will be marked “full” and extent .a3 will become “busy”. BI .a1 .a2 .a3 .a4 full full busy empty rfutildbname –C aimage new
How does after-imaging work? Shared Memory DB BIW AIW Once more, switch extents. Extent .a3 will be marked “full” and extent .a4 will become “busy”. BI .a1 .a2 .a3 .a4 full full full busy rfutildbname –C aimage new
How does after-imaging work? Shared Memory DB BIW AIW Switch… Oops! There are no “empty” extents! All after-image extents are either “full” or “busy”! BI .a1 .a2 .a3 .a4 full full full busy rfutildbname –C aimage new
How does after-imaging work? Shared Memory DB BIW AIW .001 BI .a1 .a2 .a3 .a4 .002 full full full busy .003 Copy full extents… Use the extent sequence number to name them.
How does after-imaging work? Shared Memory DB BIW AIW .001 BI .a1 .a2 .a3 .a4 .002 empty empty empty busy .003 Mark the full extents as “empty”. rfutildbname -C aimage extent empty
How does after-imaging work? Shared Memory DB BIW AIW .001 BI .a1 .a2 .a3 .a4 .002 busy empty empty full .003 rfutildbname –C aimage new
How does after-imaging work? Shared Memory DB BIW AIW .001 BI .a1 .a2 .a3 .a4 .002 busy empty empty full .003 ai.sweep .004
How does after-imaging work? Shared Memory DB BIW AIW .001 .005 BI .a1 .a2 .a3 .a4 .002 full busy empty empty .003 ai.new ai.sweep .004
How does after-imaging work? Shared Memory DB BIW AIW .001 .005 BI .a1 .a2 .a3 .a4 .002 .006 empty full busy empty … .003 ai.new ai.sweep .004
How do I use after-imaging to recover? • Restore from backup. The preferred method is to restore to a dedicated recovery area. DO NOT DESTROY a damaged database without first backing it up. • Determine where to recover to (point in time, transaction id, last archived ai extent...) • Obtain the archived ai extents from the backup point through to the recovery point. • Roll forward the archived extents: rfutildbname -C roll forward [–endtimeyyyy:mm:dd:hh:ss] –a archiveExtent ai.rolldbnamestartExtent [endExtent]
How do I recover using AI? Shared Memory DB /ailogs .001 .005 BI .a1 .a2 .a3 .a4 .002 .006 … prorestdbname dbname.pbk < backup.list rfutildbname –C roll forward –a /ailogs/dbname.001 .003 .004
How do I recover using AI? Shared Memory DB /ailogs .001 .005 BI .a1 .a2 .a3 .a4 .002 .006 … rfutildbname –C roll forward –a /ailogs/dbname.002 .003 .004
How do I recover using AI? Shared Memory DB /ailogs .001 .005 BI .a1 .a2 .a3 .a4 .002 .006 … rfutildbname –C roll forward –a /ailogs/dbname.003 … .003 .004
Post-recovery… • Remember to enable after-imaging. It is disabled on the roll-forward target!
What is “Log Based Replication”? • Log Based Replication is a fancy name for using after-image files (“logs”) to maintain a copy of your database. • Uses for Log Based Replication: • Verified Backup – make sure that your archived AI files are valid. • Reporting Database – use “norecover” to create a reporting database. • Warm Spare – keep a copy of your database (almost) ready to go in failover mode.
How does Log Based Replication work? /stg .001 DB /arc .001 BI .a1 .a2 .a3 .a4 rfutildbname –C roll forward –a /stg/dbname.001 mv /stg/dbname.001 /arc/dbname.001
How does Log Based Replication work? /stg .002 DB /arc .001 BI .a1 .a2 .a3 .a4 .002 rfutildbname –C roll forward –a /stg/dbname.002 mv /stg/dbname.002 /arc/dbname.002
How does Log Based Replication work? /stg .006 DB /arc .001 .005 BI .a1 .a2 .a3 .a4 .002 .006 … .003 rfutildbname –C roll forward –a /stg/dbname.seq# mv /stg/dbname.seq# /arc/dbname.seq# .004
What about the New! AI Archiver? • The aiarchiver is a daemon that automates extent switching and archiving. • New startup parameters allow you to start, stop and configure the aiarchiver. • Does not handle off-site archiving, redundant archiving, compression or purging of archived logs. • Uses a hideous file naming convention. • Does not handle recovery. • Does not handle monitoring or alerting.
Practical • Matters
How often should I switch extents? • How much data can you afford to lose? • Can users re-enter 5 minutes of data? 15? 60? • Can you “replay” external transactions? (EDI interfaces and so forth…) • Is your workload the same 24x7? • Do the answers above vary between a “batch window” and “online activity”? • How about weekends and holidays? • I often find hourly switches at night and every 15 minutes during the day to be a good starting point.
How should I setup after-imaging? • Add ai extents: • How many extents? • 4 is the absolute minimum: • 1 busy, 1 full, 1 empty (plus 1 “locked” if using OE Replication). • 8 is my recommended default: • The “extras” give you time to react to issues. • 16 is my suggested maximum – more is just awkward. # ai.st a /ai a /ai a /ai a /ai prostrct add dbname ai.st -or- prostrctaddonlinedbname ai.st
Should I use fixed or variable extents? • Variable Length • More flexible. • Simpler scripting. • Easier monitoring. • More time to correct problems. • Fixed Length • Many legacy implementations still use them. • Fixed might be appropriate for very high volume sites. • Recommendation: Use variable length extents.
How much disk space do I need? • How much BI space do you use? (How many bi clusters do you close in a period of time?) • How many archived logs should you keep online? • Do you keep disk images of backups online? • What about off-site copies of backups and archived logs? • Do you plan to recover to dedicated recovery disk space or “on top of” the existing database?
What sort of disks should I use for AI? • Dedicated disks. • The primary job of after-imaging is to protect against media failure. • Storing after-image files on the same disks as the data extents nullifies that protection! • RAID5 (parity) is probably not your best option: • After-Imaging is, essentially, write-only. • RAID5 disks are performance-challenged when writing. • RAID10 (mirrored stripes) is probably not beneficial: • After-Imaging writes are sequential. • RAID1 (mirroring) is the best choice.
How do I start after-imaging? • Backup: • probkup is simpler because it marks the db as “backed up”. • OS backups require an extra manual step: • Enable After Imaging: • Start an AI Writer (AIW): rfutildbname -C mark backedup rfutildbname -C aimage begin proaiwdbname
After-Imaging on UNIX # crontab (source server) # 1,16,31,46 * * * * ai.new cs608 base callbcallrinvpr >> /logs/ai.log 2>&1 # 2,17,32,47 * * * * ai.sweep cs608 base callbcallrinvpr >> /logs/ai.log 2>&1 # 0 20 * * * ai.purge cs608 # crontab (target server) # 10,25,40,55 * * * * ai.warm cs608 base > /dev/null # 0 * * * * ai.ready cs608 base callbcallrinvpr > /tmp/ai.ready.log # 0 20 * * * ai.purge cs608
How should I monitor after-imaging? • After-imaging should be enabled. • Busy extents should be 1. • Full extents should be less than or equal to 2. • Empty extents should be “most of them”. • The last messages in the .lg file of a replicated database should be:(with appropriately recent date and time stamps.) Roll forward completed. (334) rfutil -C roll forward session end.
Extents Stop Switching • You may have disabled cron, the cron job or the aiarchiver (if you are using it). • Or you may have introduced a scripting error. • You may have run out of disk space somewhere. • With variable extents in use and “large files” enabled disk space becomes the limiting factor. You have more time to detect, respond to and fix the problem. • With fixed extents the database may stall or crash much sooner. • If you are out of ideas try a manual extent switch.
Roll Forward Fails • You may have guessed the wrong extent – this is harmless. Try another. The message in the .lg file tells you which sequence# you need. • An archived extent might be missing or damaged – find a valid copy and try again. This is a good reason to make redundant copies of ai logs. • A more serious error may have occurred. Read the .lg file and check out the error on PSDN if necessary. Use “roll forward retry” after correcting the error.