280 likes | 354 Views
Why Backup Your Production Database? I Never Do. Adam Backman adam@wss.com Partner, White Star Software. Backup are for Sissies. No reason to backup – Our stuff never fails Just takes up resources We already have redundancy I hate changing the tapes I ’ m tired I ’ m hungry
E N D
Why Backup Your Production Database? I Never Do. Adam Backman adam@wss.com Partner, White Star Software
Backup are for Sissies • No reason to backup – Our stuff never fails • Just takes up resources • We already have redundancy • I hate changing the tapes • I’m tired • I’m hungry • I don’t feel good
Thank You Very Much for Your Time
Portions of a OpenEdge Database • Database table of contents (.db) • The data files (.d*) • Before image journals (.b*) • After image journals (.a) • Modified buffers in memory
Other Important Stuff • Application • External data (GIS, photos, …) • User files • External systems (EDI, Data warehouse, …)
Reliability is Important • Loss of data is expensive • Many businesses now lack a paper trail • Redundancy does not equal reliability • Rogue program • 2 copies of bad data
What is High Availability? • Classic definition equals 24x7 operation • Examples: manufacturing/e-commerce/follow the sun • Little or no downtime • Maintenance is done in very specific windows • More common definition • Traditional business 8a-6p, single country 3 time zones 9-5 • Operational hours are critical • Maintenance windows on a regular basis • Unconventional definition • Is performance good enough to run the business
Backup Process’ Impact to Production • Backing up production • Pause during backup of before image journal • Uses I/O capacity of production • Impacts the effectiveness of the buffer pool • Split mirror backup • Use of quiet point keeps pause to a minimum • Pause is non-zero • After Image file backup • Still needs a backup to begin the process • Very little impact for backup process • Long recovery time
Cover all sides • Everyone should be running after image journaling • Need removable backup periodically • Wide scale events (fire, flood, …) • To recover from after image journals • Replication is becoming the new default • OE Replication • Log-based replicarion • Hardware replication
OpenEdge Replication • OpenEdge Replication is only replication method supported by Progress • OpenEdge Replication is the only method that allows you to use the target database(s) from reporting • OpenEdge Replication requires that you have after image journaling enabled • Do not attempt to implement OpenEdge Replication until after you have a good AI management plan implemented
OpenEdge Replication Production (source) Reporting (target) Shared Memory Shared Memory Replication Server Replication Agent Target DB Source DB
Log-Based Replication • Log-based replication has been used for years as OE Replication is a fairly new product • Log-based replication provides a vehicle for replication without the licensing costs of OE Replication • Not real-time • Code for this type of replication must be maintained by the user and there is no official support from the vendor
Hardware-Based Replication • Hardware-based replication is a function of the hardware vendors and thus supported directly by them • This method is NOT supported by Progress • ALL write operations must be guarantied across the source and target disk systems
Archiving • Who does your archiving (Iron Mountain, third-party, someone’s house, …) • What do you keep • Two weeks of dailies • 5 weeks of weeklies • 1 year of monthlies • How to label you backups • Who did the backup • Command to restore • Date and Time
Archiving (continued) • Data Archiving • Archive/Delete? • Archive/Save historical • Archive/Save aggregates • After Image file archiving • At least 2 backups worth • I recommend a week or more if possible
Building a Good Recovery Strategy • Know your business • Components of business how people do business with you • Components of systems Tools (applications and physical) • Know your risks (fire, flood, hurricane, …) • Be inclusive • Technical people (network, phones, facilities, …) • Business people (people who own the data) • Build an execution plan with contingencies
Goals (Event-based goals) If we lose a disk (DB gone) If we have a fire (Machine Gone) If we have a natural disaster (Facility Gone) Hardware Software Data Other stuff Creating a plan
Acceptable downtime (Generally cost based) Everyone wants zero but it is generally cost prohibitive Planned outages Hardware install and maintenance Software upgrade O/S upgrade or patch Notifications (Both before and during outage) Who When What do they do? Creating a plan - Goals
What makes your business run? Phones Faxes Business to Business (EDI, XML Feed, …) Can people work from home? Do you have/need another location? Contact lists in case of major catastrophe Kept up-to-date Kept online and printed in an accessible location Creating a Plan – Other Stuff
How about if I am a SAAS user • Who is your provider • Verify their recovery plan • Run dry run of at least one recovery scenraio • Have specific service level agreements • Time to recover • Maximum loss of data • Penalties for missing times
How about if I am a SAAS provider • Build regular recovery plan • Unique concerns • Security • Compliance (HIPAA, SOX, …) • Build achievable SLAs for your users
First implementation should be a totally manual process to insure the steps work and allow for documentation Document the process as you go Who are you logged in as? Exactly what you typed Where you were (console, remote, …) Can things be done in parallel or sequentially Where are the logs and what to look for in the logs Implementing Your Plan
All recovery documentation should be VERY specific Create documents for normal maintenance Backups Database growth Modification of OS, Application, printers, … Create scenario based recovery plans Lose a disk (or disk pair) Fire Flood Documentation
Who does the test? Not the person who wrote it The backup person for the implementation Someone who is “always” there regardless of technical ability How often to test? Material data change (10% increase is a good target) Any change in database configuration Do you have a second site or redundant hardware? Do you have enough disk capacity (space and throughput) Testing Your Plan
Fail over to your backup system Fail back to your primary system Contingency planning for personnel, physical plant and equipment (Lead time for resources) How to test your plan
Get over it. You still need to backup. Backup your backup not production if possible Be inclusive when building your team Always backup what you have now, however little, before starting to recover Create and maintain a comprehensive plan Include everything needed to use the application: Hardware, applications, and data Create and maintain physical and online contact lists Test your plan periodically (At least annually) Summary: Recovery Planning
Still have questions? Please feel free to contact me directly. Adam Backman White Star Software (603)897-1010 adam@wss.com
THANK YOU Thank you for your time