630 likes | 999 Views
Java NIO. NIO : New I/O. Prior to the J2SE 1.4 release of Java, I/O had become a bottleneck.
E N D
NIO: New I/O • Prior to the J2SE 1.4 release of Java, I/O had become a bottleneck. • JIT performance was reaching the point where one could start to think of Java as a platform for High Performance computation, but the old java.io stream classes had too many software layers to be fast—the specification implied much copying of small chunks of data; there was no way to multiplex data from multiple sources without incurring thread context switches; also there was no way to exploit modern OS tricks for high performance I/O, like memory mapped files.
New I/O changes that by providing: • A hierarchy of dedicated buffer classes that allow data to be moved from the JVM to the OS with minimal memory-to-memory copying, and without expensive overheads like switching byte order; effectively buffer classes give Java a “window” on system memory. • A unified family of channel classes that allow data to be fed directly from buffers to files and sockets, without going through the intermediaries of the old stream classes. • A family of classes to directly implement selection (AKA readiness testing, AKA multiplexing) over a set of channels. • NIO also provides file locking for the first time in Java.
References • The Java NIO software is part of J2SE 1.4 and later, from http://java.sun.com/j2se/1.4 • Online documentation is at: http://java.sun.com/j2se/1.4/nio • There is an authoritative book from O’Reilly: “Java NIO”, Ron Hitchens, 2002
Buffers • A Buffer object is a container for a fixed amount of data. • It behaves something like a byte [] array, but is encapsulated in such a way that the internal storage can be a block of system memory. • Thus adding data to, or extracting it from, a buffer can be a very direct way of getting information between a Java program and the underlying operating system. • All modern OS’s provide virtual memory systems that allow memory space to be mapped to files, so this also enables a very direct and high-performance route to the file system. • The data in a buffer can also be efficiently read from, or written to, a socket or pipe, enabling high performance communication. • The buffer APIs allow you to read or write from a specific location in the buffer directly; they also allow relative reads and writes, similar to sequential file access.
The ByteBuffer Class • The most important buffer class in practice is probably the ByteBuffer class. This represents a fixed-size vector of primitive bytes. • Important methods on this class include: byte get() byte get(int index) ByteBuffer get(byte [] dst) ByteBuffer get(byte [] dst, int offset, int length) ByteBuffer put(byte b) ByteBuffer put(int index, byte b) ByteBuffer put(byte [] src) ByteBuffer put(byte [] src, int offset, int length) ByteBuffer put(ByteBuffer src)
File Position and Limit • Apart from forms with an index parameter, these are all relative operations: they get data from, or insert data into, the buffer starting at the current positionin the buffer; they also update the position to point to the position after the read or written data. The position property is like the file pointer in sequential file access. • The superclass Buffer has methods for explicitly manipulating the position and related properties of buffers, e.g: int position() Buffer position(int newPosition) int limit() Buffer limit(int newLimit) • The ByteBuffer or Buffer references returned by these various methods are simply references to this buffer object, not new buffers. They are provided to support cryptic invocation chaining. Feel free to ignore them. • The limit property defines either the last space available for writing, or how much data has been written to the file. • After finishing writing a flip() method can be called to set limit to the current value of position, and reset position to zero, ready for reading. • Various operations implicitly work on the data between position and limit.
Creating Buffers • Four interesting factory methods can be used to create a new ByteBuffer: ByteBuffer allocate(int capacity) ByteBuffer allocateDirect(int capacity) ByteBuffer wrap(byte [] array) ByteBuffer wrap(byte [] array, int offset, length) These are all static methods of the ByteBuffer class. • allocate() creates a ByteBuffer with an ordinary Java backing array of size capacity. • allocateDirect()—perhaps the most interesting case—creates a directByteBuffer, backed by capacity bytes of system memory. • The wrap() methods create ByteBuffer’s backed by all or part of an array allocated by the user. • The other typed buffer classes (CharBuffer, etc) have similar factory methods, except they don’t support the important allocateDirect() method.
Other Primitive Types in ByteBuffer’s • It is possible to write other primitive types (char, int, double, etc) to a ByteBuffer by methods like: ByteBuffer putChar(char value) ByteBuffer putChar(int index, char value) ByteBuffer putInt(int value) ByteBuffer putInt(int index, int value) … The putChar() methods do absolute or relative writes of the two bytes in a Java char, the putInt() methods write 4 bytes, and so on. • Of course there are corresponding getChar(), getInt(), … methods. • These give you fun, unsafe ways of coercing bytes of one primitive type to another type, by writing data as one type and reading them as another. • But actually this isn’t the interesting bit—this was always possible with the old java.ioDataStream’s. • The interesting bit is that the new ByteBuffer class has a method that allows you to set thebyte order…
Endian-ness • When identifying a numeric type like int or double with a sequence of bytes in memory, one can either put the most significant byte first (big-endian), or the least significant byte first (little-endian). • Big Endian: Sun Sparc, PowerPC CPU, numeric fields in IP headers,… • Little Endian: Intel processors • In java.io, numeric types were always rendered to stream in big-endian order. • Creates a serious bottleneck when writing or reading numeric types. • Implementations typically must apply byte manipulation code to each item, to ensure bytes are written in the correct order. • In java.nio, the programmer specifies the byte order as a property of a ByteBuffer, by calling one of: myBuffer.order(ByteOrder.BIG_ENDIAN) myBuffer.order(ByteOrder.LITTLE_ENDIAN) myBuffer.order(ByteOrder.nativeOrder()) • Provided the programmer ensures the byte order set for the buffer agrees with the native representation for the local processor, numeric data can be copied between JVM (which will use the native order) and buffer by a straight block memory copy, which can be extremely fast—a big win for NIO.
View Buffers • ByteBuffer has no methods for bulk transfer of arrays other than type byte[]. • Instead, create a view of (a portion of) a ByteBuffer as any other kind of typed buffer, then use the bulk transfer methods on that view. Following methods of ByteBuffer create views: CharBuffer asCharBuffer() IntBuffer asIntBuffer() … • To create a view of just a portion of a ByteBuffer, set position and limit appropriately beforehand—the created view only covers the region between these. • You cannot create views of typed buffers other than ByteBuffer. • You can create another buffer that represents a subsection of any buffer (without changing element type) by using the slice() method. • For example, writing an array of floats to a byte buffer, starting at the current position: float [] array ; … FloatBuffer floatBuf = byteBuf.asFloatBuffer() ; floatBuf.put(array) ;
Channels • A channel is a new abstraction in java.nio. • In the package java.nio.channels. • Channels are a high-level version of the file-descriptors familiar from POSIX-compliant operating systems. • So a channel is a handle for performing I/O operations and various control operations on an open file or socket. • For those familiar with conventional Java I/O, java.nio associates a channel with any RandomAccessFile, FileInputStream, FileOutputStream, Socket, ServerSocket or DatagramSocket object. • The channel becomes a peer to the conventional Java handle objects; the conventional objects still exist, and in general retain their role—the channel just provides extra NIO-specific functionality. • NIO buffer objects can written to or read from channels directly. Channels also play an essential role in readiness selection, discussed in the next section.
Opening Channels • Socket channel classes have static factory methods called open(), e.g.: SocketChannel sc = SocketChannel.open() ; Sc.connect(new InetSocketAddress(hostname, portnumber)) ; • File channels cannot be created directly; first use conventional Java I/O mechanisms to create a FileInputStream, FileOutputStream, or RandomAccessFile, then apply the new getChannel() method to get an associated NIO channel, e.g.: RandomAccessFile raf = new RandomAccessFile(filename, “r”) ; FileChannel fc = raf.getChannel() ;
Using Channels • Any channel that implements the ByteChannel interface—i.e. all channels except ServerSocketChannel—provide a read() and a write() instance method: int read(ByteBuffer dst) int write(ByteBuffer src) • These may look reminiscent of the read() and write() system calls in UNIX: int read(int fd, void* buf, int count) int write(int fd, void* buf, int count) • The Java read() attempts to read from the channel as many bytes as there are remaining to be written in the dst buffer. Returns number of bytes actually read, or -1 if end-of-stream. Also updates dst buffer position. • Similarly write() attempts to write to the channel as many bytes as there are remaining in the src buffer. Returns number of bytes actually read, and updates src buffer position.
This example assumes a source channel src and a destination channel dest: ByteBuffer buffer = ByteBuffer.allocateDirect(BUF_SIZE) ; while(src.read(buffer) != -1) { buffer.flip() ; // Prepare read buffer for “draining” while(buffer.hasRemaining()) dest.write(buffer) ; buffer.clear() ; // Empty buffer, ready to read next chunk. } • Note a write() call (or a read() call) may or may not succeed in transferring whole buffer in a single call. Hence need for inner while loop. • Example introduces two new methods on Buffer: hasRemaining() returns true if position < limit; clear() sets position to 0 and limit to buffer’s capacity. • Because copying is a common operation on files, FileChannel provides a couple of special methods to do just this: long transferTo(long position, long count, WriteableByteChannel target) long transferFrom(ReadableByteChannel src, long position, long count)
Memory-Mapped Files • In modern operating systems one can exploit the virtual memory system to map a physical file into a region of program memory. • Once the file is mapped, accesses to the file can be extremely fast: one doesn’t have to go through read() and write() system calls. • One application might be a Web Server, where you want to read a whole file quickly and send it to a socket. • Problems arise if the file structure is changed while it is mapped—use this technique only for fixed-size files. • This low-level optimization is now available in Java. FileChannel has a method: MappedByteBuffer map(MapMode mode, long position, long size) • mode should be one of MapMode.READ_ONLY, MapMode.READ_WRITE, MapMode.PRIVATE. • The returned MappedByteBuffer can be used wherever an ordinary ByteBuffer can.
Scatter/Gather • Often called vectored I/O, this just means you can pass an array of buffers to a read or write operation; the overloaded channel instance methods have signatures: long read(ByteBuffer [] dsts) long read(ByteBuffer [] dsts, int offset, int length) long write(ByteBuffer [] srcs) long write(ByteBuffer [] srcs, int offset, int length) • The first form of read() attempts to read enough data to fill all buffers in the array, and divides it between them, in order. • The first form of write() attempts to concatenate the remaining data in all buffers and write it. • The arguments offset and length select a subset of buffers from the arrays (not, say, an interval within buffers).
SocketChannels • As mentioned at the beginning of this section, socket channels are created directly with their own factory methods • If you want to manage a socked connection as a NIO channel this is the only option. Creating NIO socket channel implicitly creates a peer java.net socket object, but (contrary to the situation with file handles) the converse is not true. • As with file channels, socket channels can be more complicated to work with than the traditional java.net socket classes, but provide much of the hard-boiled flexibility you get programming sockets in C. • The most notable new facilities are that now socket communications can be non-blocking, they can be interrupted, and there is a selection mechanism that allows a single thread to do multiplex servicing of any number of channels.
Basic Socket Channel Operations • Typical use of a server socket channel follows a pattern like: ServerSocketChannel ssc = ServerSocketChannel.open() ; ssc.socket().bind( new InetSocketAddress(port) ) ; while(true) { SocketChannel sc = ssc.accept() ; … process a transaction with client through sc… } • The client does something like: SocketChannel sc = SocketChannel.open() ; sc.connect( new InetSocketAddr(serverName, port) ) ; … initiate a transaction with server through sc… • The code above will typically be using read() and write() calls on the SocketChannel to exchange data between client and server. • So there are four important operations: accept(), connect(), write(), read() .
Nonblocking Operations • By calling the method socket.configureBlocking(false) ; you put a socket into nonblocking mode (calling again with argument true restores to blocking mode, and so on). • In non-blocking mode: • A read() operation only transfers data that is immediately available. If no data is immediately available it returns 0. • Similarly, if data cannot be immediately written to a socket, a write() operation will immediately return 0. • For a server socket, if no client is currently trying to connect, the accept() method immediately returns null. • The connect() method is more complicated—generally connections would always block for some interval waiting for the server to respond. • In non-blocking mode connect() generally returns false. But the negotiation with the server is nevertheless started. The finishConnect() method on the same socket should be called later. It also returns immediately. Repeat until it return true.
Interruptible Operations • The standard channels in NIO are all interruptible. • If a thread is blocked waiting on a channel, and the thread’s interrupt() method is called, the channel will be closed, and the thread will be woken and sent a ClosedByInterruptException. • To avoid race conditions, the same will happen if an operation on a channel is attempted by a thread whose interrupt status is already true. • See the lecture on threads for a discussion of interrupts. • This represents progress over traditional Java I/O, where interruption of blocking operations was not guaranteed.
Other Features of Channels • File channels provide a quite general file locking facility. This is presumably important to many applications (database applications), but less obviously so to HPC operations, so we don’t discuss it here. • There is a DatagramChannel for sending UDP–style messages. This may well be important for high performance communications, but we don’t have time to discuss it. • There is a special channel implementation representing a kind of pipe, which can be used for inter-thread communication.
Readiness Selection • Prior to New I/O, Java provided no standard way of selecting—from a set of possible socket operations—just the ones that are currently ready to proceed, so the ready operations can be immediately serviced. • One application would be in implementing an MPI-like message passing system: in general incoming messages from multiple peers must be consumed as they arrive and fed into a message queue, until the user program is ready to handle them. • Previously one could achieve equivalent effects in Java by doing blocking I/O operations in separate threads, then merging the results through Java thread synchronization. But this can be inefficient because thread context switching and synchronization is quite slow. • One way of achieving the desired effect in New I/O would be set all the channels involved to non-blocking mode, and use a polling loop to wait until some are ready to proceed. • A more structured—and potentially more efficient—approach is to use Selectors. • In many flavors of UNIX this is achieved by using the select() system call.
Classes Involved in Selection • Selection can be done on any channel extending SelectableChannel—amongst the standard channels this means the three kinds of socket channel. • The class that supports the select() operation itself is Selector. This is a sort of container class for the set of channels in which we are interested. • The last class involved is SelectionKey, which is said to represent the binding between a channel and a selector. • In some sense it is part of the internal representation of the Selector, but the NIO designers decided to make it an explicit part of the API.
Setting Up Selectors • A selector is created by the open() factory method. This is naturally a static method of the Selector class. • A channel is added to a selector by calling the method: SelectionKey register(Selector sel, int ops) • This, slightly oddly, is an instance method of the SelectableChannel class—you might have expected the register() method to be a member of Selector. • Here ops is a bit-set representing the interest set for this channel: composed by oring together one or more of: SelectionKey.OP_READ SelectionKey.OP_WRITE SelectionKey.OP_CONNECT SelectionKey.OP_ACCEPT • A channel added to a selector must be in nonblocking mode! • The register() method returns the SelectionKey created • Since this automatically gets stored in the Selector, so in most cases you probably don’t need to save the result yourself.
Example • Here we create a selector, and register three pre-existing channels to the selector: Selector selector = Selector.open() ; channel1.register (selector, SelectionKey.OP_READ) ; channel2.register (selector, SelectionKey.OP_WRITE) ; channel3.register (selector, SelectionKey.OP_READ | SelectionKey.OP_WRITE) ; • For channel1 the interest set is reads only, for channel2 it is writes only, for channel3 it is reads and writes. • Note channel1, channel2, channel3 must all be in non-blocking mode at this time, and must remain in that mode as long as they are registered in any selector. • You remove a channel from a selector by calling the cancel() method of the associated SelectionKey.
select() and the Selected Key Set • To inspect the set of channels, to see what operations are newly ready to proceed, you call the select() method on the selector. • The return value is an integer, which will be zero if no status changes occurred. • More interesting than the return value is the side effect this method has on the set of selected keys embedded in the selector. • To use selectors, you must understand that a selector maintains a Set object representing this selected keys set. • Because each key is associated with a channel, this is equivalent to a set of selected channels. • The set of selected keys is different from (presumably a subset of) the registered key set. • Each time the select() method is called it may add new keys to the selected key set, as operations become ready to proceed. • You, as the programmer, are responsible for explicitly removing keys from the selected key set belonging to the selector, as you deal with operations that have become ready.
Ready Sets • This is quite complicated already, but there is one more complication. • We saw that each key in the registered key set has an associated interest set, which is a subset of the 4 possible operations on sockets. • Similarly each key in the selected key set has an associated ready set, which is a subset of the interest set—representing the actual operations that have been found ready to proceed. • Besides adding new keys to the selected key set, a select() operation may add new operations to the ready set of a keyalready in the selected key set. • Assuming the selected key set was not cleared after a preceding select(). • You can extract the ready set from a SelectionKey as a bit-set, by using the method readyOps(). Or you can use the convenience methods: isReadable() isWriteable() isConnectable() isAcceptable() which effectively return the bits of the ready set individually.
A Pattern for Using select() … register some channels with selector … while(true) { selector.select() ; Iterator it = selector.selectedKeys().iterator() ; while( it.hasNext() ) { SelectionKey key = it.next() ; if( key.isReadable() ) … perform read() operation on key.channel()… if( key.isWriteable() ) … perform write() operation on key.channel()… if( key.isConnectable() ) … perform connect() operation on key.channel()… if( key.isAcceptable() ) … perform accept() operation on key.channel()… it.remove() ; } }
Remarks • This general pattern will probably serve for most uses of select(): • Perform select() and extract the new selected key set • For each selected key, handle the actions in its ready set • Remove the processed key from the selected key set • Note the remove() operation on an Iterator removes the current item from the underlying container. • More generally, the code that handles a ready operation may also alter the set of channels registered with the selector • e.g after doing an accept() you may want to register the returned SocketChannel with the selector, to wait for read() or write() operations. • In many cases only a subset of the possible operations read, write, accept, connect are ever in interest sets of keys registered with the selector, so you won’t need all 4 tests.
Key Attachments • One problem with the pattern above is that when it.next() returns a key, there is no convenient way of getting information about the context in which the associated channel was registered with the selector. • For example channel1 and channel3 are both registered for OP_READ. But the action that should be taken when the read becomes ready may be quite different for the two channels. • You need a convenient way to determine which channel the returned key is bound to. • You can specify an arbitrary object as an attachment to the key when you create it; later when you get the key from the selected set, you can extract the attachment, and use its content in to decide what to do. • At its most basic the attachment might just be an index identifying the channel.
Simplistic Use of Key Attachments channel1.register (selector, SelectionKey.OP_READ, new Integer(1) ) ; // attachment … channel3.register (selector, SelectionKey.OP_READ | SelectionKey.OP_WRITE, new Integer(3) ) ; // attachment … while(true) { … Iterator it = selector.selectedKeys().iterator() ; … SelectionKey key = it.next() ; if( key.isReadable() ) switch( ((Integer) key.channel().attachment() ).value() ) { case 1 : … action appropriate to channel1 … case 3 : … action appropriate to channel3 … } … }
For Client/Server • Don’t want processor to wait too long on network • Traditional: Multiple threads generate data that is stored in buffers until network is ready • Overhead of thread syncing and thread creation • Need support by OS
import java.nio.*; import java.nio.channels.*; import java.net.*; import java.io.IOException; public class ChargenClient { public static int DEFAULT_PORT = 19; public static void main(String[] args) { if (args.length == 0) { System.out.println("Usage: java ChargenClient host [port]"); return; } int port; try { port = Integer.parseInt(args[1]); } catch (Exception ex) { port = DEFAULT_PORT; }
try { SocketAddress address = new InetSocketAddress(args[0], port); SocketChannel client = SocketChannel.open(address); ByteBuffer buffer = ByteBuffer.allocate(74); WritableByteChannel out = Channels.newChannel(System.out); while (client.read(buffer) != -1) { buffer.flip(); out.write(buffer); buffer.clear(); } } catch (IOException ex) { ex.printStackTrace(); } } }
import java.nio.*; import java.nio.channels.*; import java.net.*; import java.util.*; import java.io.IOException; public class ChargenServer { public static int DEFAULT_PORT = 19; public static void main(String[] args) { int port; try { port = Integer.parseInt(args[0]); } catch (Exception ex) { port = DEFAULT_PORT; } System.out.println("Listening for connections on port " + port);
byte[] rotation = new byte[95*2]; for (byte i = ' '; i <= '~'; i++) { rotation[i-' '] = i; rotation[i+95-' '] = i; } ServerSocketChannel serverChannel; Selector selector; try { serverChannel = ServerSocketChannel.open(); ServerSocket ss = serverChannel.socket(); InetSocketAddress address = new InetSocketAddress(port); ss.bind(address); serverChannel.configureBlocking(false); selector = Selector.open(); serverChannel.register(selector, SelectionKey.OP_ACCEPT); } catch (IOException ex) { ex.printStackTrace(); return; }
while (true) { try { selector.select(); } catch (IOException ex) { ex.printStackTrace(); break; } Set readyKeys = selector.selectedKeys(); Iterator iterator = readyKeys.iterator(); while (iterator.hasNext()) { SelectionKey key = (SelectionKey) iterator.next(); iterator.remove(); try { if (key.isAcceptable()) { ServerSocketChannel server = (ServerSocketChannel) key.channel(); SocketChannel client = server.accept(); System.out.println("Accepted connection from " + client); client.configureBlocking(false); SelectionKey key2 = client.register(selector, SelectionKey.OP_WRITE);
ByteBuffer buffer = ByteBuffer.allocate(74); buffer.put(rotation, 0, 72); buffer.put((byte) '\r'); buffer.put((byte) '\n'); buffer.flip(); key2.attach(buffer); } else if (key.isWritable()) { SocketChannel client = (SocketChannel) key.channel(); ByteBuffer buffer = (ByteBuffer) key.attachment(); if (!buffer.hasRemaining()) { // Refill the buffer with the next line buffer.rewind(); // Get the old first character int first = buffer.get(); // Get ready to change the data in the buffer buffer.rewind(); // Find the new first characters position in rotation int position = first - ' ' + 1; // copy the data from rotation into the buffer buffer.put(rotation, position, 72); // Store a line break at the end of the buffer buffer.put((byte) '\r');
buffer.put((byte) '\n'); // Prepare the buffer for writing buffer.flip(); } client.write(buffer); } } catch (IOException ex) { key.cancel(); try { key.channel().close(); } catch (IOException cex) {} } } } } }
The behavior of the file lock is platform-dependent. On some platforms, the file lock is advisory, which means that unless an application checks for a file lock, it will not be prevented from accessing the file. On other platforms, the file lock is mandatory, which means that a file lock prevents any application from accessing the file. try { // Get a file channel for the file File file = new File("filename"); FileChannel channel = new RandomAccessFile(file, "rw").getChannel(); // Use the file channel to create a lock on the file. // This method blocks until it can retrieve the lock. FileLock lock = channel.lock(); // Try acquiring the lock without blocking. This method returns // null or throws an exception if the file is already locked. try { lock = channel.tryLock(); } catch (OverlappingFileLockException e) { // File is already locked in this thread or virtual machine } // Release the lock lock.release(); // Close the file channel.close(); } catch (Exception e) { }
By default, a file lock is exclusive, which means that once acquired, no other access is permitted. A shared file lock allows other shared locks on the file (but no exclusive locks). Note: Some platforms do not support shared locks, in which case the lock is automatically made into an exclusive lock. Use FileLock.isShared() to determine the type of the lock. try { // Obtain a file channel File file = new File("filename"); FileChannel channel = new RandomAccessFile(file, "rw").getChannel(); // Create a shared lock on the file. // This method blocks until it can retrieve the lock. FileLock lock = channel.lock(0, Long.MAX_VALUE, true); // Try acquiring a shared lock without blocking. This method returns // null or throws an exception if the file is already exclusively locked. try { lock = channel.tryLock(0, Long.MAX_VALUE, true); } catch (OverlappingFileLockException e) { // File is already locked in this thread or virtual machine } // Determine the type of the lock boolean isShared = lock.isShared(); // Release the lock lock.release(); // Close the file channel.close(); } catch (Exception e) {}
Direct vs. non-direct buffers A byte buffer is either direct or non-direct. Given a direct byte buffer, the Java virtual machine will make a best effort to perform native I/O operations directly upon it. That is, it will attempt to avoid copying the buffer's content to (or from) an intermediate buffer before (or after) each invocation of one of the underlying operating system's native I/O operations. A direct byte buffer may be created by invoking the allocateDirect factory method of this class. The buffers returned by this method typically have somewhat higher allocation and deallocation costs than non-direct buffers. The contents of direct buffers may reside outside of the normal garbage-collected heap, and so their impact upon the memory footprint of an application might not be obvious. It is therefore recommended that direct buffers be allocated primarily for large, long-lived buffers that are subject to the underlying system's native I/O operations. In general it is best to allocate direct buffers only when they yield a measureable gain in program performance. Whether a byte buffer is direct or non-direct may be determined by invoking its isDirect method. This method is provided so that explicit buffer management can be done in performance-critical