1 / 7

A brief timeline

A brief timeline Due to a firmware bug both controllers of the Sun StorageTek 6540 array reboot within 90 minutes after each other on 18/08/2010. All attempts to restore and recover the data to the original hardware fail. The database is corrupted during the recovery process.

yon
Download Presentation

A brief timeline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A brief timeline • Due to a firmware bug both controllers of the Sun StorageTek 6540 array reboot within 90 minutes after each other on 18/08/2010. • All attempts to restore and recover the data to the original hardware fail. The database is corrupted during the recovery process. • On 01/09/2010 the database is successfully restored to alternate hardware. • On 02/09/2010 preparations are started to synchronize the database from RAL. • RAL starts to copy LHCb and ATLAS data to SARA. 3D DBA Workshop 16-17 November 2010

  2. The data is imported into the database on 08/09/2010. • On the same day CERN brings the streams up. 3D DBA Workshop 16-17 November 2010

  3. Some minor issues • At SARA the COMPATIBLE parameter had to be changed from 10.2.0.3 to 10.2.0.4 to match the one at RAL. • There is an error in the Oracle Database Administrator's Guide on page 8-37 regarding the syntax of the parameter file of the impdp command. 3D DBA Workshop 16-17 November 2010

  4. Conclusions on the resynchronization • The resynchronization process went rather smoothly (at least from SARA’s point of view). • For SARA this was a learning opportunity. The procedure has been documented for possible future use. • The assistance we received from both RAL and CERN was amazing. 3D DBA Workshop 16-17 November 2010

  5. Conclusions on data corruption • We’ve been unable to determine the exact cause of the corruption. • An upgrade of the storage firmware and a rebuild of the LUNs solved the problem (but for how long?). • Always use "db_block_checking='TRUE'" in combination with db_block_checksum to detect logical corruption at a very early stage. 3D DBA Workshop 16-17 November 2010

  6. 3D DBA Workshop 16-17 November 2010

  7. 3D DBA Workshop 16-17 November 2010

More Related