790 likes | 931 Views
Formal Modeling and Analysis of a Flash Filesystem in Alloy. Eunsuk Kang TDS Seminar, Mar. 14, 2008. What is flash memory?. Non-volatile, high-performance storage Applications: MP3 players, laptop drives, digital cameras, etc. NASA Mars Exploration Rover Spirit.
E N D
Formal Modeling and Analysis of a Flash Filesystem in Alloy Eunsuk Kang TDS Seminar, Mar. 14, 2008
What is flash memory? • Non-volatile, high-performance storage • Applications: MP3 players, laptop drives, digital cameras, etc.
NASA Mars Exploration Rover Spirit On-board flash memory to store scientific data
Flash anomaly on Spirit • System failure18 days after landing (2004) • Loss of communication with Earth, stuck in “reboot” loop • Cause: Flaw in the flash filesystem • Cost: 10 days of lost scientific activity
Testing for unanticipated? • Out of free space, but still attempted to service file operations • “There was a belief among the FSW development team that the system would not exhibit the behavior that is the root cause of the anomaly…” [Reeves, 2004] • Testing is essential, but is it enough?
Answer: Formal methods? • Allows exhaustive analysis • BUT: Verifying a poorly designed piece of code in an after-the-fact, ad hoc manner is impractical • Apply formal methods early, get the design right
Grand Challenge in Verification • Long term • “Build a verifying compiler” – Tony Hoare • Short term • “Build a verified flash filesystem” – Joshi & Holzmann (Jet Propulsion Laboratory) • In this talk • “Build a verified design for a flash filesystem”
What is POSIX? • IEEE standard for filesystem operations • Adopted by UNIX, Mac OS X, etc. • Reference model for the flash filesystem • Function signatures & behaviors • e.g. write(fildes, *buf, nbyte, offset) • “The write() function shall attempt to write nbyte bytes from the buffer pointed to by buf to the file associated with the open file descriptor, fildes.”
POSIX filesystem in Alloy • Alloy • First-order relational logic + transitive closure sig Data {} // data element sig FID {} // file identifier sig File { contents : seq Data } sig AbsFsys { // abstract filesystem fileMap : FID -> lone File // “lone” means one or zero }
Abstract read operation fun readAbs [fsys: AbsFsys, fid: FID, offset, size: Int] : seq Data { let file = fsys.fileMap[fid] | (file.contents).subseq[offset, offset + size – 1] } // simulation run { some fsys : AbsFsys, fid : FID, output : seq Data | output = readAbs[fsys, fid, 1, 3] } for 3
Abstract write operation pred writeAbs[fsys, fsys’ : AbsFsys, fid : FID, buffer : seq Data, offset, size : Int] { let file = fsys.fileMap[fid], file’ = fsys’.fileMap[fid], buffer’ = buffer.subseq[0, size – 1] { (#buffer’ = 0) => file’ = file // case 1 (#buffer’ != 0) => // case 2 & 3 file’.contents = (zeros[offset] ++ file.contents) ++ shift[buffer’, offset] writePromote[fsys, fsys’, file, file’, fid] } } } // promotion pred writePromote[fsys, fsys’ : AbsFsys, file, file’ : File, fid : FID] { fsys’.fileMap = fsys.fileMap ++ (fid -> file’) }
Alloy is pure logic • No built-in syntax/semantics for state machines • Transition as an explicit constraint between two states // promotion pred writePromote[fsys, fsys’ : AbsFsys, file, file’ : File, fid : FID] { fsys’.fileMap = fsys.fileMap ++ (fid -> file’) } pred writeAbs[fsys, fsys’ : AbsFsys, fid : FID,buffer : seq Data, offset, size : Int] { letfile = fsys.fileMap[fid], file’ = fsys’.fileMap[fid], buffer’ = buffer.subseq[0, size – 1] { (#buffer’ = 0) => file’ = file // case 1 (#buffer’ != 0) => // case 2 & 3 file’.contents = (zeros[offset] ++ file.contents) ++ shift[buffer’, offset] writePromote[fsys, fsys’, file, file’, fid] } } }
Abstract write operation: Case 1 • Input buffer is empty; no changes to the file // promotion pred writePromote[fsys, fsys’ : AbsFsys, file, file’ : File, fid : FID] { fsys’.fileMap = fsys.fileMap ++ (fid -> file’) } pred writeAbs[fsys, fsys’ : AbsFsys, fid : FID,buffer : seq Data, offset,size : Int] { let file = fsys.fileMap[fid], file’ = fsys’.fileMap[fid], buffer’ = buffer.subseq[0, size – 1] { (#buffer’ = 0) => file’ = file // case 1 (#buffer’ != 0) => // case 2 & 3 file’.contents = (zeros[offset] ++ file.contents) ++ shift[buffer’, offset] writePromote[fsys, fsys’, file, file’, fid] } } }
Abstract write operation: Case 2 • Offset is within the file • Shift buffer by offset & override existing data // promotion pred writePromote[fsys, fsys’ : AbsFsys, file, file’ : File, fid : FID] { fsys’.fileMap = fsys.fileMap ++ (fid -> file’) } pred writeAbs[fsys, fsys’ : AbsFsys, fid : FID, buffer : seq Data, offset, size : Int] { let file = fsys.fileMap[fid], file’ = fsys’.fileMap[fid], buffer’ = buffer.subseq[0, size – 1] { (#buffer’ = 0) => file’ = file // case 1 (#buffer’ != 0) => // case 2& 3 file’.contents =(zeros[offset] ++file.contents) ++ shift[buffer’, offset] writePromote[fsys, fsys’, file, file’, fid] } } }
Abstract write operation: Case 3 • Offset is after the end of the file • Fill in the gap with zeros // promotion pred writePromote[fsys, fsys’ : AbsFsys, file, file’ : File, fid : FID] { fsys’.fileMap = fsys.fileMap ++ (fid -> file’) } pred writeAbs[fsys, fsys’ : AbsFsys, fid : FID, buffer : seq Data, offset, size : Int] { let file = fsys.fileMap[fid], file’ = fsys’.fileMap[fid], buffer’ = buffer.subseq[0, size – 1] { (#buffer’ = 0) => file’ = file // case 1 (#buffer’ != 0) => // case 2 &3 file’.contents =(zeros[offset] ++file.contents) ++ shift[buffer’, offset] writePromote[fsys, fsys’, file, file’, fid] } } }
Promotion • A style of modeling changes in system state • Ensure all other files remain unchanged // promotion pred writePromote[fsys, fsys’ : AbsFsys, file, file’ : File, fid : FID] { fsys’.fileMap = fsys.fileMap ++ (fid -> file’) } pred writeAbs[fsys, fsys’ : AbsFsys, fid : FID, buffer : seq Data, offset, size : Int] { let file = fsys.fileMap[fid], file’ = fsys’.fileMap[fid], buffer’ = buffer.subseq[0, size – 1] { (#buffer’ = 0) => file’ = file // case 1 (#buffer’ != 0) => // case 2 & 3 file’.contents = (zeros[offset] ++ file.contents) ++ shift[buffer’, offset] writePromote[fsys, fsys’, file, file’, fid] } } }
What makes flash special? • Two types: NOR and NAND • Program (i.e. write) at the page level, erase at the block level • Must erase before programming • Block can be erased only a limited number of times (need wear-leveling)
Modeling memory hierarchy sig Page { data : seq Data } { #data = PAGE_SIZE } sig Block { pages : seq Page } { #pages = BLOCK_SIZE } sig LUN { blocks : seq Block } { #blocks = LUN_SIZE } sig Device { LUNs : seq LUN … } { #LUNs = DEVICE_SIZE } // simulation with constraints run { some Device DEVICE_SIZE = 1 LUN_SIZE = 2 BLOCK_SIZE = 2 PAGE_SIZE = 4 } for 4
Addressing mode Row & column addresses: sig RowAddr { // used to access a page lunIndex : Int blockIndex : Int pageIndex : Int } A column address is an Int, and identifies a data element in a page Example: rowAddr.lunIndex = 0 rowAddr.blockIndex = 1 rowAddr. pageIndex = 1 columnAddr = 1
Page status & data structures • Each page is associated with its current status abstractsig PageStatus {} one sig Free, Allocated, Valid, Invalid extends PageStatus {} • Auxiliary data structures* sig Device { LUNs : seq LUN, pageStatusMap : RowAddr -> one PageStatus, eraseCountMap : RowAddr -> one Int, // wear-leveling reserveBlock : RowAddr // garbage collection } { #LUNs = DEVICE_SIZE } (* disclaimers)
Flash API functions // reads data from page, starting at “colAddr” fun read[d : Device, colAddr : Int, rowAddr : RowAddr] : seq Data { … } // program data into page & set page status to “Allocated” pred program[d, d’ : Device, colAddr : Int, rowAddr : RowAddr, data : seq Data] { … } // erase data in block & increase its erase count, and set status of every page in block to “Free” pred erase[d, d’ : Device, rowAddr : RowAddr] { … }
Concrete filesystem in Alloy sig Inode { blockList : seq VBlock } sig VBlock {} // virtual block sig ConcFsys { inodeMap : FID -> lone Inode blockMap : VBlock one -> one RowAddr }
Concrete read operation (snippet) pred readConc[fsys : ConcFsys, d : Device, fid : FID, offset, size : Int, buffer : seq Data] { … all i : blocksToRead.inds { let vblock = blocksToRead[i], rowAddr = fsys.blockMap[vblock], from = PAGE_SIZE*i, to = from + PAGE_SIZE – 1 | buffer.subseq[from, to] = read[d, 0, rowAddr] } } … }
State of a flash filesystem • State is represented by a pair (ConcFsys, Device) pred readConc[fsys : ConcFsys, d : Device, fid : FID, offset, size : Int, buffer : seq Data] { … all i : blocksToRead.inds { let vblock = blocksToRead[i], rowAddr = fsys.blockMap[vblock], from = PAGE_SIZE*i, to = from + PAGE_SIZE – 1 | buffer.subseq[from, to] = read[d, 0, rowAddr] } } }
Read operation animated Initially, buffer is empty
Read operation animated Three calls to flash read in total
Concrete read operation: Step 1 • Extract blocks to read from inode using offset & size pred readConc[fsys : ConcFsys, d: Device, fid : FID, offset, size : Int, buffer : seq Data] { … all i : blocksToRead.inds { let vblock = blocksToRead[i], rowAddr = fsys.blockMap[vblock], from = PAGE_SIZE*i, to = from + PAGE_SIZE – 1 | buffer.subseq[from, to] = read[d, 0, rowAddr] } } }
Concrete read operation: Step 2 • Consider each index i in blocksToRead pred readConc[fsys : ConcFsys, d: Device, fid : FID, offset, size : Int, buffer : seq Data] { … all i : blocksToRead.inds { let vblock = blocksToRead[i], rowAddr = fsys.blockMap[vblock], from = PAGE_SIZE*i, to = from + PAGE_SIZE – 1 | buffer.subseq[from, to] = read[d, 0, rowAddr] } } }
Concrete read operation: Step 3 • Retrieve the address of page for ith virtual block pred readConc[fsys : ConcFsys, d: Device, fid : FID, offset, size : Int, buffer : seq Data] { … all i : blocksToRead.inds { let vblock = blocksToRead[i], rowAddr = fsys.blockMap[vblock], from = PAGE_SIZE*i, to = from + PAGE_SIZE – 1 | buffer.subseq[from, to] = read[d, 0, rowAddr] } } }
Concrete read operation: Step 4 • Calculate indices for current buffer slot pred readConc[fsys : ConcFsys, d: Device, fid : FID, offset, size : Int, buffer : seq Data] { … all i : blocksToRead.inds { let vblock = blocksToRead[i], rowAddr = fsys.blockMap[vblock], from = PAGE_SIZE*i, to = from + PAGE_SIZE – 1 | buffer.subseq[from, to] = read[d, 0, rowAddr] } } }
Concrete read operation: Step 5 • Execute the flash API function, read pred readConc[fsys : ConcFsys, d: Device, fid : FID, offset, size : Int,buffer : seq Data] { … all i : blocksToRead.inds { let vblock = blocksToRead[i], rowAddr = fsys.blockMap[vblock], from = PAGE_SIZE*i, to = from + PAGE_SIZE – 1 | buffer.subseq[from, to] = read[d, 0, rowAddr] } } }
Wear-leveling example Client sends a write request to overwrite data in VBlk1 with 0110 Simple approach: Erase Block2 & program Page5
Non-wear-leveling approach: Step 1 Client sends a write request to overwrite data in VBlk1 with 0110 Step 1: Erase Block2
Non-wear-leveling approach: Step 2 Client sends a write request to overwrite data in VBlk1 with 0110 Step 2: Program 0110 into Page5 - Done.
Why wear-level? What’s wrong with a simple approach? 1. Frequent requests on VBlk1: Block2 wears out quickly 2. H/W failure: Original data in Page5 is lost
Wear-leveling approach Client sends a write request to overwrite data in VBlk1 with 0110 Wear-leveling approach: Search for a free page & program
Wear-leveling approach: Step 1 Client sends a write request to overwrite data in VBlk1 with 0110 Step 1: Program 0110 into a free page, Page3
Wear-leveling approach: Step 2 Client sends a write request to overwrite data in VBlk1 with 0110 Step 2: Invalidate Page5 & validate Page3
Wear-leveling approach: Step 3 Client sends a write request to overwrite data in VBlk1 with 0110 Step 3: Update blockMap
Erase-unit reclamation example Client sends a write request to append 0101 at the end of the inode Problem: Flash is out of free pages (besides reserved ones)
Erase-unit reclamation: Step 1 Client sends a write request to append 0101 at the end of the inode Step 1: Pick a dirty block with the least erase count
Erase-unit reclamation: Step 2 Client sends a write request to append 0101 at the end of the inode Step 2: Relocate valid data to reserveBlock