250 likes | 379 Views
Project Mimir. A Distributed Filesystem Uses Rateless Erasure Codes for Reliability Uses Pastry’s Multicast System Scribe for Resource discovery and Utilization. Erasure Codes.
E N D
Project Mimir • A Distributed Filesystem • Uses Rateless Erasure Codes for Reliability • Uses Pastry’s Multicast System Scribe for Resource discovery and Utilization
Erasure Codes • an erasure code transforms a message of n blocks into a message with more than n blocks, such that the original message can be recovered from a subset of those blocks. The fraction of the blocks required is called the rate, denoted r.
Rateless Erasure Codes • Also called Fountain Codes • Are rateless because they can produce an infinite stream of encoded data. • Most rateless codes are sub-optimal, a file of n blocks can be decoded from any m encoded blocks where m≥ (1+ε)n.
Luby Tranform • Low Density • First rateless erasure code discovered • Encodes blocks by randomly selecting a degree d (1≤d≤n) and then uses the XOR operation on d random un-encoded blocks. • Decodes blocks by the principle that • A XOR B = C • A XOR C = B • B XOR C = A
Encoded Block Identification The decoder must be able to identify an encoded block’s degree and which blocks were used to encode it. • Pass the seed used in the random number generator to regenerate an identical encoding. • Additional decoder overhead • All encoders and decoders must use the same random number generators • Attach binary headers to encoded blocks where each bit represents whether a specific block was used in the encoding. • Additional network overhead • Header bit length equals n
Common Uses of LT Codes • One way communication protocols over noisy channels • File streaming (IPTV & other media) • Long distance communication • Satellite communications • Mobile phone • High latency network
LT Codes in Mimir • Encoded blocks striped evenly across a large network of computers. • Generate xn encoded data and guaranteed successful decoding when 1-(2/x) of all network nodes fail. • Distributed disk space equals real disk space /x • 50 computers *100GB each = 5TB • X=4; distributed disk space =1.25TB • 100% reliability while at least 25 computers are online
Challenges • Cannot encode new data unless file is reconstructed first, high churn network requires a lot of computation • Decoding still probabilistic, although probability is extremely high: greater than 99.99999 at failure limit
Modifications to LT Code • Modified the LT Code to guarantee each block is encoded an equal number of times (evenly saturated). RobuStore also does this. • Evenly distribute according to block degree • Modified distribution • spiking • n unique blocks with degree 1 • offset distribution
Application level any/multicast • Issues with Network level multicast • Uses for multicast • Publish Subscribe Architecture • Resource Advertisement/ Discovery • Mass Content Distribution
Issues with network level multicast • Difficult to set up • Does not handle large numbers of multicast and anycast groups • Does not handle very dynamic networks • Often will not work over the Internet
Content Based Publish Subscribe • Allows for expandable network architectures • Allows for conditional matching and event notification • Allows for fault tolerant networks • Needs Distributed Multidimensional Matching algorithm to match publish subscribe problem to one of multidimensional indexing
Distributed Multidimensional Matching • Requires • each attribute has a known domain • a known finest granularity • a known global order of the attributes • The mapping • a d dimensional space S where d = num attributes • every attribute ai mappes to a dimesion di in S • S is managed by bst that is a recursive subdivision of S into regions through (d-1) – dimensional hyperplanes • each hyperplane divides a region in half. • each region has a corresponding node n(r) in search tree
DMM Continued • Each region is addressed by a bit string called a z-code and is assocated with one node in the tree • a subscription s is stored at all leaf nodes n(ri) in the search tree such that ri intersects s. • the information of each node in the tree is stored at the peer p(r) in the DHT . • Subscriptions are sent to the root peer and flow down to appropriate leaf nodes.
Resource Discovery (Topic Based Matching) • Manage dynamic distributed resources • Nodes join groups when they have a desired resource leave when they no longer are avalible • other nodes can request nearby resources by anycasting/multicasting messages to the appropriate group.
Implementation • each group has a key called groupId which maps to a DHT's ID space. • Create a group • send a message to that group id • nearest node becomes root of spanning group tree • root then adds the requesting node • Join • send a message to the root • if an intermediate node is part of that spanning tree add the requesting node as a child and stop • otherwise keep forwarding the message
Continued • Leaving the group • If a node has entries in the children table • mark as not a member and stop • otherwise • send a leave message to it's parent in the tree • parent then removes node from the children table • Anycast • Implemented as a DFS of the group tree • Load balanced since different requests start at different nodes
Multicast • Like anycast but sends to all group members • Uses bandwidth on O(N) • Useful to request data (values) when most members will have some data to contribute.
Pastry/Scribe in Mimir • Pastry's Id routing is used to provide a network independent routing scheme. • We have three Topics • Metadata Controller Topic • provides security and path to fileid mapping • Storage Node Topic • provides a way to list avalible resources (file storage) • Client Topic • provides information on client nodes
File Storage Request • Send Multi cast request to MDC to add a file to the system • MDC sends back to requesting client the new files id • Client then multi casts a store request message • all storage nodes respond with ip/port data • Client then connects directly to each storage node and stripes the encoded blocks over them evenly.
File Retrieval Request • Multi cast a GET FILE <FILE Id> message to storage nodes • Storage nodes look up what data they have for that file and send it back to the requesting Id. • Client then rebuilds the file as data is being received • If the file can not be rebuilt print some error message.
Advantages of Pastry for Mimir • P2P network provides a reliable way to handle a dynamic set of nodes and keep communication open with no central communication point • Multicast helps us quickly attempt to get and save files