300 likes | 310 Views
Learn how to protect component files from damage and recover from unexpected system failures using journaling techniques. Available in APL version 12.0.3 and later.
E N D
Journaled Component Files Or – How to never see FILE DAMAGED again! John Scholes and Richard Smith 13 October, 2008
Purely linear file layout 1 2 3 • Free space • Component data (APL arrays) • Global file information (root)
Updating a linear file 1 2 2 2 3 3 3 • Replacing a component with a smaller one wastes space • Replacing a component with a larger one is not possible ... • ... unless you move potentially large amounts of data first
Actual file layout 3 1 2 • Free space • Global file information (root) • Component index blocks • Component data (APL arrays) • Free space nodes
Updating a component 1 1’ • Write the new data to free space • (Note that the free space node is overwritten) • Update the component index blocks • Update the free space nodes • Update the root
Adding a component 2 1’ • Write the new data in free space • (Note that a free space node is overwritten) • Update the component index blocks • Update the free space nodes • Update the root
Adding – and causing damage 2 ! 1’ • Write the new data in free space • (Note that a free space node is overwritten) • ** APL process is killed ** • The free space node is still referenced but has been corrupted
The solution - journaling • The free space in a file can be safely updated • The majority of an update occurs in this free space • Updates to existing data are first written to a journal • The update is then completed
Adding - journaled 2 1’ • The free space can be updated • The journal is put in free space • Most of the component is written • (The free space node was left intact) • All remaining updates are journaled • The journal is activated
Adding - journaled 2 1’ • Only free space updated so far • Entire update recorded in file
Adding - journaled 2 2 1’ • The journal is executed • The journal is removed • The update is complete
Accessing the file – example 1 2 2 1’ • Normal case - there is no journal • Nothing needs to be done
Accessing the file – example 2 2 1’ • Process killed before journal complete • The updates were all in free space • The file has been safely rolled back
Accessing the file – example 3 2 2 1’ • Process killed after journal complete but before update finished • The journal is (re-)executed • The journal is removed • The update has been completed and damage repaired
Journaled files • Are supported now in 12.0.3 • Have very little impact on performance and file size • May be enabled on a per-file basis • ⎕FPROPS converts a file to/from journaled
Journaled files • Can only be accessed by 12.0.3 or later (but journaling can be switched off) • Are not enabled by default • Protect from file damage if APL is killed • Do not currently always protect from file damage if the OS is killed
Disk caching Disk O/S Kernel APL Process 1 1 3 3 2 1 2 3 2 3 3 2 1 2 3 1 1 • Disk writes are held in memory and flushed efficiently (out of sequence) • Data still flushed if APL killed • But if the O/S is killed, out of sequence data may be lost
Why this matters - example 2 1’ • 1. Write to free space (inc journal) • 2. Mark journal as present • O/S dies; update 1 incomplete • Executing this broken journal would corrupt the file • There are 4 such points in an update
Critical update sequence 2 2 1’ These must be done atomically: 1. Write to free space (inc journal) 2. Mark journal as present 3. Execute the journal 4. Remove the journal
fsync solution Disk O/S Kernel APL Process 1 2 3 1 2 3 1 3 • fsync causes APL to wait for the data to be committed to disk • Could issue 4 fsyncs per update
fsync solution • Slows the application considerably • So we should reduce the number of fsyncs if possible • Good news is that we can
First fsync elimination 2 1’ • 1. Write to free space (inc journal) • 2. Mark journal as present • O/S dies; update 1 incomplete • Executing this broken journal would corrupt the file • Solution: add checksums to detect
Second fsync elimination 2 2 1’ • 2. Mark journal as present • 3. Start executing the journal • O/S dies; journal no longer present • No journal for recovery • Solution: use the checksumming and redundancy to rebuild indices
Second fsync elimination • Note: omitting this fsync does not prevent damage • But we are able to fix it
Third fsync elimination 2 2 1’ • 3. Execute the journal • 4. Remove the journal • O/S dies; earlier updates lost • No journal for recovery • Rebuild indices
Fourth fsync elimination 2 2 1’ 3 • 4. Remove the journal • O/S dies; update lost • If the journal is still present we may re-execute it on recovery • Otherwise it will fail its checksum validation
Additional journaling options • Two fsyncs eliminated by checksumming • One further fsync eliminated if recovery tool used • Last fsync eliminated if recovery tool used ... • ... potential loss of more data
Additional journaling options • Are planned for a future release • Will have a greater impact on performance and file size • Will offer a variety of options so that security and performance may be balanced • Will be configured on a per-file basis
Journaled Component Files John Scholes and Richard Smith 13 October, 2008