220 likes | 366 Views
A Wander through GHC’s New IO library. Simon Marlow. The 100-mile view. the API changes: Unicode putStr “A légpárnás hajóm tele van angolnákkal ” works! (if your editor is set up right…) locale-encoding by default, except for Handles in binary mode ( openBinaryFile, hSetBinaryMode )
E N D
A Wander through GHC’s New IO library Simon Marlow
The 100-mile view • the API changes: • Unicode • putStr “A légpárnás hajóm tele van angolnákkal” works! (if your editor is set up right…) • locale-encoding by default, except for Handles in binary mode (openBinaryFile, hSetBinaryMode) • changing the encoding on the fly • hSetEncoding :: Handle -> TextEncoding -> IO () • hGetEncoding :: Handle -> IO (Maybe TextEncoding) • data TextEncoding • latin1, utf8, utf16, utf32, … :: TextEncoding • mkTextEncoding :: String -> IO TextEncoding • localeEncoding :: TextEncoding
The 100-mile view (cont.) • Better newline support • teletypes needed both CR+LF to start a new line, and we’ve been paying for it ever since. hSetNewlineMode :: Handle -> NewlineMode -> IO () data Newline = LF {- “\n” –} | CRLF {- “\r\n” -} nativeNewline :: Newline data NewlineMode = NewlineMode { inputNL :: Newline, outputNL :: Newline } noNewlineTranslation = NewlineMode { inputNL = LF, outputNL = LF } universalNewlineMode = NewlineMode { inputNL = CRLF, outputNL = nativeNewline } nativeNewlineMode = NewlineMode { inputNL = nativeNewline, outputNL = nativeNewline }
The 10-mile view • Unicode codecs: • built-in codecs for UTF-8, UTF-16(LE,BE), UTF-32(LE-BE). • Other codecs use iconv on Unix systems • Built-in codecs only on Windows (no code pages) • yet… • The pieces for building a codec are provided…
The 10-mile view • Build your own codec: API in GHC.IO.Encoding data BufferCodec from to state = BufferCodec { encode :: Buffer from -> Buffer to -> IO (Buffer from, Buffer to) close :: IO () getState :: IO state setState :: state -> IO () } type TextEncoder state = BufferCodec Char Word8 state type TextDecoder state = BufferCodec Word8 Char state data TextEncoding = forall dstate estate . TextEncoding { mkTextDecoder :: IO (TextDecoder dstate) mkTextEncoder :: IO (TextEncoder estate) } Saving and restoring state is important since Handles support buffering, random access, and changing encodings
The 1-mile view Type class providing I/O device operations: close, seek, getSize, … • Make your own Handles! • why mkFileHandle, not mkHandle? Type class providing buffered reading/writing mkFileHandle :: (IODevice dev, BufferedIO dev, Typeable dev) => dev -> FilePath -> IOMode -> Maybe TextEncoding -> NewlineMode -> IO Handle Typeable, in case we need to take the Handle apart again later For error messages ReadMode/WriteMode/…
IODevice -- | I/O operations required for implementing a 'Handle'. class IODevice a where -- | closes the device. Further operations on the device should -- produce exceptions. close :: a -> IO () -- | seek to the specified positing in the data. seek :: a -> SeekMode -> Integer -> IO () seek _ _ _ = ioe_unsupportedOperation -- | return the current position in the data. tell :: a -> IO Integer tell _ = ioe_unsupportedOperation -- | returns 'True' if the device is a terminal or console. isTerminal :: a -> IO Bool isTerminal _ = return False … etc … Default is for the operation to be unsupported
BufferedIO class BufferedIO dev where newBuffer :: dev -> BufferState -> IO (Buffer Word8) fillReadBuffer :: dev -> Buffer Word8 -> IO (Int, Buffer Word8) fillReadBuffer0 :: dev -> Buffer Word8 -> IO (Maybe Int, Buffer Word8) emptyWriteBuffer :: dev -> Buffer Word8 -> IO (Buffer Word8) flushWriteBuffer :: dev -> Buffer Word8 -> IO (Buffer Word8) flushWriteBuffer0 :: dev -> Buffer Word8 -> IO (Int, Buffer Word8) Device gets to allocate the buffer. This allows the device to choose the buffer to point directly at the data in memory, for example. 0-versions are non-blocking, non-0 versions must read or write at least one byte (but may transfer less than the whole buffer)
RawIO -- | A low-level I/O provider where the data is bytes in memory. class RawIO a where read :: a -> Ptr Word8 -> Int -> IO Int readNonBlocking :: a -> Ptr Word8 -> Int -> IO (Maybe Int) write :: a -> Ptr Word8 -> Int -> IO () writeNonBlocking :: a -> Ptr Word8 -> Int -> IO Int readBuf :: RawIO dev => dev -> Buffer Word8 -> IO (Int, Buffer Word8) readBufNonBlocking :: RawIO dev => dev -> Buffer Word8 -> IO (Maybe Int, Buffer Word8) writeBuf :: RawIO dev => dev -> Buffer Word8 -> IO () writeBufNonBlocking :: RawIO dev => dev -> Buffer Word8 -> IO (Int, Buffer Word8)
Example: a memory-mapped Handle • Random-access read/write doesn’t perform very well with ordinary buffered I/O. • Let’s implement a Handle backed by a memory-mapped file • We need to • define our device type • make it an instance of IODevice and BufferedIO • provide a way to create instances
Example: memory-mapped files • Define our device type Ordinary file descriptor, provided by GHC.IO.FD data MemoryMappedFile = MemoryMappedFile { mmap_fd :: FD, mmap_addr :: !(Ptr Word8), mmap_length :: !Int, mmap_ptr :: !(IORef Int) } deriving Typeable Address in memory where our file is mapped, and its length The current file pointer (Handles have a built-in notion of the “current position” that we have to emulate) Typeable is one of the requirements for making a Handle
aside: Buffers module GHC.IO.Buffer ( Buffer(..), .. ) where data Buffer e = Buffer { bufRaw :: !(ForeignPtr e), bufState :: BufferState, -- ReadBuffer | WriteBuffer bufSize :: !Int, -- in elements, not bytes bufL :: !Int, -- offset of first item in the buffer bufR :: !Int -- offset of last item + 1 } Data bufRaw bufL bufR bufSize
Example: memory-mapped files • (a) make it an instance of BufferedIO instance BufferedIO MemoryMappedFile where newBuffer m state = do fp <- newForeignPtr_ (mmap_addr m) return (emptyBuffer fp (mmap_length m) state) fillReadBuffer m buf = do p <- readIORef (mmap_ptr m) let l = mmap_length m if (p >= l) thendo return (0, buf{ bufL=p, bufR=p }) elsedo writeIORef (mmap_ptr m) l return (l-p, buf{ bufL=p, bufR=l }) flushWriteBuffer m buf = do writeIORef (mmap_ptr m) (bufR buf) return buf{ bufL = bufR buf } fillReadBuffer returns the entire file! flush is a no-op: just remember where to read from next
Example: memory-mapped files • (b) make it an instance of IODevice instance IODevice MemoryMappedFile where close = IODevice.close . mmap_fd seek m mode val = do let sz = mmap_length m ptr <- readIORef (mmap_ptr m) let off = case mode of AbsoluteSeek -> fromIntegral val RelativeSeek -> ptr + fromIntegral val SeekFromEnd -> sz + fromIntegral val when (off < 0 || off >= sz) $ ioe_seekOutOfRange writeIORef (mmap_ptr m) off tell m = do o <- readIORef (mmap_ptr m); return (fromIntegral o) getSize = return . fromIntegral . mmap_length … etc …
Example: memory-mapped files • provide a way to create instances mmapFile :: FilePath -> IOMode -> Bool -> IO Handle mmapFile filepath iomode binary = do (fd,_devtype) <- FD.openFile filepath iomode sz <- IODevice.getSize fd addr <- c_mmap nullPtr (fromIntegral sz) prot flags (FD.fdFD fd) 0 ptr <- newIORef 0 let m = MemoryMappedFile { mmap_fd = fd, mmap_addr = castPtr addr, mmap_length = fromIntegral sz, mmap_ptr = ptr } let (encoding, newline) | binary = (Nothing, noNewlineTranslation) | otherwise = (Just localeEncoding, nativeNewlineMode) mkFileHandle m filepath iomode encoding newline Open the file and mmap() it Call mkFileHandle to build the Handle
Demo… $ ./Setup configure Configuring mmap-handle-0.0... $ ./Setup build Preprocessing library mmap-handle-0.0... Building mmap-handle-0.0... [1 of 1] Compiling System.Posix.IO.MMap ( dist/build/System/Posix/IO/MMap.hs, dist/build/System/Posix/IO/MMap.o ) Registering mmap-handle-0.0... $ ./Setup register --inplace --user Registering mmap-handle-0.0... $ ghc-pkg list --user /home/simonmar/.ghc/x86_64-linux-6.11.20090816/package.conf.d: mmap-handle-0.0
Demo… $ cat test.hs import System.IO import System.Posix.IO.MMap import System.Environment import Data.Char main = do [file,test] <- getArgs h <- if test == "mmap" then mmapFile file ReadWriteMode True else openBinaryFile file ReadWriteMode sequence_ [ do hSeek h SeekFromEnd (-n) c <- hGetChar h hSeek h AbsoluteSeek n hPutChar h c | n <- [ 1..10000] ] hClose h putStrLn "done" $ ghc test.hs --make [1 of 1] Compiling Main ( test.hs, test.o ) Linking test ...
Timings… $ time ./test /tmp/words file done 0.24s real 0.14s user 0.10s system 99% ./test /tmp/words file $ time ./test /tmp/words mmap done 0.09s real 0.09s user 0.00s system 99% ./test /tmp/words mmap $ time ./test ./words file # ./ is NFS-mounted done 10.44s real 0.20s user 0.52s system 6% ./test tmp file $ time ./test ./words mmap # ./ is NFS-mounted done 0.10s real 0.09s user 0.00s system 93% ./test tmp mmap
More examples • A Handle that pipes output bytes to a Chan • Handles backed by Win32 HANDLEs • Handle that reads from a Bytestring/text • Handle that reads from text
The -1 mile view • Inside the IO library • The file-descriptor functionality is cleanly separated from the implementation of Handles: • GHC.IO.FD implements file descriptors, with instances of IODevice and BufferedIO • GHC.IO.Handle.FD defines openFile, using FDs as the underlying device • GHC.IO.Handlehas nothing to do with FDs
Implementation of Handle Existential: packs up the IODevice, BufferedIO, Typeable dictionaries, and codec state is existentially quantified data Handle__ = forall dev enc_state dec_state . (IODevice dev, BufferedIO dev, Typeable dev) => Handle__ { haDevice :: !dev, haType :: HandleType, -- read/write/append etc. haByteBuffer :: !(IORef (Buffer Word8)), haCharBuffer :: !(IORef (Buffer CharBufElem)), haEncoder :: Maybe (TextEncoder enc_state), haDecoder :: Maybe (TextDecoder dec_state), haCodec :: Maybe TextEncoding, haInputNL :: Newline, haOutputNL :: Newline, .. some other things .. } deriving Typeable Two buffers: one for bytes, one for Chars.
Where to go from here • This is a step in the right direction, but there is still some obvious ugliness • We haven’t changed the external API, only added to it • There should be a binary I/O layer • hPutBuf working on Handles is wrong: binary Handles should have a different type • in a sense, BufferedIO is a binary I/O layer: it is efficient, but inconvenient • FilePath should be an abstract type. • On Windows, FilePath = String, but on Unix, FilePath = [Word8]. • Should we rethink Handles entirely? • OO-style layers: binary IO, buffering, encoding • Separate read Handles from write Handles? • read/write Handles are a pain