310 likes | 419 Views
Network Memory Servers: An idea whose time has come. Glenford Mapp David Silcott Dhawal Thakker. Motivation. Networks are now much faster than disks Should be quicker to get data from the memory of another computer compared to using local disk Not a new idea - so what’s different?.
E N D
Network Memory Servers:An idea whose time has come Glenford Mapp David Silcott Dhawal Thakker
Motivation • Networks are now much faster than disks • Should be quicker to get data from the memory of another computer compared to using local disk • Not a new idea - so what’s different?
What’s different? • Networks are faster and cheaper • Gigabit NICs are £35.00 • We could also see 10G NICs in the near future • Memory is also cheaper • 1GB = £100.00 • Likely to remain stable • Availability of good “free” Oses • Linux and Free BSD
Our approach is also different • Previous approaches • Dominated by the Distributed Shared Memory crowd (Apollo System) • DSM never became mainstream • lots of fundamental changes to OS platform required • Exotic Hardware (e.g Scalable Coherent Interconnect or SCI) • Network Memory became a casualty of this failure
Previous Approach cont’d • Remote paging was also one of the key areas (SAMSON project, NYU) • Idle machines approach • Use memory of other machines in the network when no one is logged on but get off when the person returns • Very complex - • how do you give guarantees to everyone
Our Approach • Applied Engineering Approach • what are the real numbers in this area • Use the power of the Network • use standard networking approach • No DSM, no virtual memory plug-ins • Client-Server approach • Dedicated servers with loads of memory
Design of the Network Memory Server (NMS) • NMS has an independent interface • Can interface with any OS • not like Network Block Device (NBD) in Linux • NMS is stateless • Does not keep track of previous interactions • Actions of the NMS are regarded as atomic • Either complete success or total failure
Design of NMS cont’d • NMS deals with blocks of data • Has no idea how the blocks are being used • Not like NFS • Each block is uniquely identified by a block_id allocated by the NMS • Each client is uniquely identified by a client_id
Block_ids • 64-bit entities • 32 minor index • 16 major index • 16 bit security tag • generated when the blocks are created • checked before any read/write operation on a block
NMS calls • GetblockMemory(client_id, size, nblocks, options) • Creates a number of blocks of a certain size with consecutive block_ids • returns the starting Block_id • options - backup • Release(client_id, block_id, nblocks) • Releases a number of consecutive block_ids
NMS calls cont’d • WriteBlockMemory(client_id, block_id, offset, length, *buf) • writes data in buffer to a block on the server • ReadBlockMemory(client_id, block_id, offset, length, *buf) • reads data from a block on the server into a buffer
NMS calls cont’d • GetClientid(password) • creates a new client • GetMasterBlock(password, client_id) • returns a number of blocks of sector/block_id mappings • StoreMasterBlock(block_id, client_id, password, nblocks) • stores a number of sector/block_id mappings
NMS Client • How does a client use the NMS? • What interface is presented to the OS • Interface is one that is used to support hard disks. In Linux, we use the block device interface • So the OS thinks of the NMS service as a fast hard disk
NMS Client cont’d • So the OS tells the NMS client to read and write sectors. • NMS client will take sectors and map them onto blocks which it gets from the NMS • When block device is unmounted, we must store the sector/block_id mappings on the NMS
NMS Cont’d • The StoreMasterBlock call stores these mappings on the NMS • When the device is remounted, it must first get the sector/block_id mappings from the NMS and rebuild the sector table. • The GetMasterBlock call retrieves the mappings from the NMS
NMS Client Cache • Client also has a cache of blocks that are used to store recently used sectors • this is a secondary cache as the main caching is really done by the Unix Buffer Cache • Design decision to keep our cache as a simple round-robin cache - • replace the next item pointed to in the cache
NMS Client Operations • Since we are not a normal disk, we do not need to rearrange read and write operations • So we attempt to read and write blocks as the requests come in. • Also developed a write-out thread operation. So a special thread, called the Write-out thread writes modified blocks to the NMS
NMS Client Implementation Programs Operating System Unix Buffer Cache Block Device Interface Sector / Block_id Hash Table NMS Block Device Cache (Two levels) Write-Out Queue
Getting a sector Yes Yes Read from/ Write to Cache Entry Is sector in Hash table Is it in the cache No Yes Put it on Write Out Queue Has replaced entry been modified Yes Is it a read No Return Rubbish Replace Entry No No Get Block_id From NMS. Put Entry in Hash Table Yes Is the cache full No Yes Is it a read Get New Cache Entry Get Data from NMS Server; put in cache entry No Write Data to Cache Entry OK
Structures on NMS Server Allocated Memory Memory for Internal Use by the NMS Client_id Hash Table Block_id Hash Table (Two-level) Memory for Clients
Testing and Evaluation • What do we really want to know • What does it take to operate faster than a hard disk? • Can you use standard hardware (Middlesex) • Do you need special hardware (Cambridge) • Level 5 Networks • What are the key parameters in this space
What do you measure • What happens if we change the block size of the data transfer • What happens if we change the number of units transferred in one transfer • Added multi-write operation • Is local caching any good • What is the network traffic like
Using Iozone • Iozone is quite popular • Measures the memory hierarchy • Disk particulars • 60 GB, 2MB buffer, 7200 RPM, Seek Time 9.0 ms, Average latency 4.16ms • Network - • using Intel E1000 NICs and Netgear Gigabit Switch (GS 104); using UDP port 6111 • NMS client and server implemented as Linux kernel modules
Conclusions and Future • We can beat the disk • Will compare these results with those using Level 5 hardware (Rip Sohan, LCE) • Open source release planned • Developing a Network Storage Server • Building prototypes • running Linux and Windows using NMS