140 likes | 150 Views
This presentation discusses the use of intrinsic references in distributed systems, comparing them with physical references. It explores the storage and retrieval mechanisms using intrinsic references, as well as their use in hierarchical data structures. The presentation also covers terminology such as collision resistance and one-way hash functions.
E N D
Intrinsic References in Distributed Systems Presented by: Nimish Pachapurkar ScaLAB seminar 21st October 2002
Snapshot: • To contrast and compare Intrinsic References with Physical References. • Storage and Retrieval mechanism using intrinsic references : Elephant Store • Use of intrinsic references in Hierarchical data structures Terminology: • Collision resistance: • Extremely difficult to find two sequences with same hash. • Implies that hash is unique (sufficiently so…) • One-way hash: • Given a hash of a sequence it is difficult to reconstruct the sequence. • Reference => Hash AND Referent => byte sequence • (ex. Memory addresses and data, URLs and web pages etc.) ScaLAB seminar 21st October 2002
Physical References – • Relationship between reference and referent is defined by state of the physical system. • Change in the state changes the referent. • All accesses to referent have to be through the system. • Bottleneck and potential failure point • Intrinsic References - • Collision resistant (unique) and one-way hash value • State Independence: The relationship between S and R depends only on the hash function. • Uniqueness: A given R refers only to a particular S from which it was obtained. • Physical storage is still required to store/retrieve the referents. ScaLAB seminar 21st October 2002
Intrinsic References and Distributed Storage – • Useful for Distributed, replicated storage mechanism. • No reference-referent inconsistency (hash gives the reference) • Simple hashing can check for the correctness of the data • Opaque Storage – • Used for storing an instance of a data structure in Elephant Store • Serialize the data structure, store the byte sequence. • Called OPAQUE representation as data structure is hidden behind the byte sequence. • Hash of the sequence is the reference (digest). • Retrieval: Retrieve the byte sequence from store, de-serialize Opaque Reference (Hash digest) Serialization (makes the structure opaque) Data Structure ScaLAB seminar 21st October 2002
HDAGs – • Hash based Acyclic Directed Graph. • Nodes are directories • arcs are directory – sub-directory relationships. • Root digest of a rooted HDAG is used as intrinsic reference to the whole HDAG. • Application: Can be used to represent a file system or mail system. • Root digest uniquely represents the state of whole directory structure and not just the root directory ScaLAB seminar 21st October 2002
Versions and Change (Problems with OR) – • For a file system, example of Opaque representation is a tarball of the directory structure. • Change in any file will cause the opaque representation to change. • Hash digest also changes. • There is no relationship between the old and new representations. • Solution: Use HDAGs • Adding a file to a directory is same as a new mail in Inbox. • The representation of all other files & directories is not changed. • Efficient than Opaque Rep. • Saves communication cost among replicas for distributed storages. ScaLAB seminar 21st October 2002
Advantages of HDAGs – • Efficient for Distributed systems (version management) • Every version is represented by a unique intrinsic reference which is independent of physical system. • Replication and caching will never lead to inconsistencies • Two versions of an object are represented by sharing majority of the storage and communication costs. • Conclusions – • HDAGs promise to be a useful mechanism for building and maintaining distributed storage systems. ScaLAB seminar 21st October 2002
OS Support for P2P Programming:a Case for TPS Presented by: Nimish Pachapurkar ScaLAB seminar 21st October 2002
Introduction – • Need for RPC-like interaction mechanism for P2P infrastructures • Must be decoupled • Anonymous and asynchronous • Layers over RPC would certainly hamper performance • Type based Publish/Subscribe as a candidate • Abstraction of low-level P2P library – JXTA • What’s in the paper: • Comparison of the implementation of TPS with pure JXTA • A “first” experience • Design and source code of applications ScaLAB seminar 21st October 2002
JXTA • Three layers • Core Layer: Several protocols ensuring basic communication between peers, message routing or peer group creation • Service Layer: Ready-made services such as content management system and wire service • Application Layer: All the code written by the programmer • Six concepts: • ID: for any resource (peer, pipe, peergroup, codat) • Peer: Any device with an electronic pulse (normal and special) • Rendez-vous and routers • Pipe: Virtual communication channel – asynchronous and uni-directional (wire for many-to-many) – independent of IP • PeerGroup: Collection of peers • Advertisement: XML msg with information about new resource • Message: Any kind of communication (using XML) ScaLAB seminar 21st October 2002
Protocols for JXTA – • PDP – Peer Discovery Protocol • Allows different peers to find each other • PRP - Peer Resolver Protocol • Just above the transport layer, dispatches JXTA message to right service • PIP – Peer Information Protocol • Know the status of a peer. (time the peer was up, channels available) • PMP – Peer Membership Protocol • Obtain group membership requirements information (credentials, password, etc.) • PBP – Peer Binding Protocol • Keeps different peers in a pipe bound together (even when they move) • ERP – Endpoint Routing Protocol • For routing messages between the peers • Enables communication between 2 peers even when they do not know how to connect to each other (due to Firewall etc.) ScaLAB seminar 21st October 2002
TPS over JXTA – • Publish/Subscribe paradigm • Time decoupling: Publisher and Subscriber do not need to be up at the same time • Space decoupling: Publisher and Subscriber do not need to know each other • Flow decoupling: Sending or receiving of messages do not block the participants. • This decoupling suits the server-less architectures. • Subscription based on Subject and Content • Type-based: Subject => Event object type Content => State of instance of that type • Type safety • Subscriber knows event type in advance ScaLAB seminar 21st October 2002
Example – • Ski renting application • Need to find ski rentals with reasonable rates • Must surf the net for a long time • Alternative: Use the TPS based P2P infrastructure • Subscribe to ski-rental type and wait for answers • Publisher: (A new shop is opened) • Search launched for ski-rental advertisement • If not found, a new one is created • Programming phases – ScaLAB seminar 21st October 2002
Performance – • Invocation time Time for sendMessage() • Publisher produces 50 evts • JXTA-WIRE is quicker • No difference between SR-JXTA and SR-TPS • Throughput: Similar trends! • Conclusion- • TPS is a viable alternative abstraction to RPC for future Internet-wide Operating Systems to support P2P applications • Simple to use, type-safe, preserves decoupled nature of P2P. • Makes programming easier than with pure JXTA. ScaLAB seminar 21st October 2002