290 likes | 381 Views
Semantic Small World (SSW). An Overlay Network and Index Structure for Semantic based P2P Search. Presented by : Raj Kumar Rasam. Semantic Small World.
E N D
Semantic Small World (SSW) An Overlay Network and Index Structure for Semantic based P2P Search Presented by: Raj Kumar Rasam
Semantic Small World • Presents the design of an overlay network called Semantic Small world (SSW) that facilitates efficient semantic based search in P2P systems. That is, given a query, which can be a point query or range query, we need to return a set of contents that are most relevant to the search criteria according to some semantic distance function. • The proposed overlay network supports the following searches • Semantic Search. • Supports both Point and Range Queries. • Support search for Top K Queries. • Source – Mei Li, Wang-Chien Lee, Anand Sivasubramaniam (PSU). Semantic Small World: An Overlay Network for Peer-to-PeerSearch, 12th IEEE International Conference on Network Protocols 2004 ( ICNP 2004) Berlin, Germany.
Semantic Small World • Based on three innovative Ideas: • Small World Network: overlay network scalable to large networks. • Semantic Clustering: Clustering strategy that places peer nodes based on the semantics of their data objects. • Dimension Reduction: Adaptive Space Linearization (ASL) – technique for construction of a one-dimensional SSW to address the challenges raised by high dimensionality of semantic space. • Dynamically cluster peers with semantically similar data closer to each other in the semantic space. • Organize these clusters into an overlay network and build a distributed index structure when a peer joins or leaves the network.
Semantic Small World Semantic Space and Vector: Each object is represented as a k-element vector, namely Semantic Vector (SV) that can be mapped to a point in a k- Dimensional Semantic Space. Euclidean distance is used to represent the semantic closeness between two SV’s. or Semantic Vector: Given N k-dimensional data points the “Centroid “ is Given Xo and Yo as two points then Euclidean Distance Do is :
Semantic Small World • There are two types of queries. They are • Point Query • Range Query • A point query is defined by the vector We expect to return those objects x such that the Euclidean distance between x and Q is minimum. • A simple range query is described by an hyper rectangular region We expect to return those objects that belong to the region Q
Semantic Small World • Characteristics of Small World Network: • Average Path Length between two nodes. • Cluster Coefficient defined as the probability that two neighbors of a node are neighbors themselves. • Small average path length and large cluster coefficient. • Searches can be efficiently conducted when network exhibits following properties: • Each node knows its local neighbors called short range contacts. • Each node knows a small number of randomly chosen distant nodes called long range contacts., with a probability proportional to C/d where d is the distance and C is the normalization constant that brings the total probability to 1. • A search can be performed in
Semantic Small World Construction of SSW network • Each cluster is owned by a maximum of M number of peers. • Peer nodes within each cluster know each other. • Have short range contacts and long range contacts • 3 steps in constructing network • Obtain semantic label to position a peer. • Form peer clusters in semantic space. • Construct overlay network across these clusters
Semantic Small World 0.25 0.4 0.2 (1,0) (1,1) 0.8 0.6 0.8 1 0.4 0.2 (1,0) (0,0) 0.2 0.4 0.8
Semantic Small World Semantic Labeling: • Executed before or when a peer node joins the network. • A peer clusters its local data objects into data clusters consisting of data objects with similar semantics and chooses the “centroid” of its largest data cluster as its semantic label or join point. Cluster Formation • SV’s of all data objects form a virtual search space • Semantic space is made up of clusters with a maximum size of M peers in each cluster. • If the size of cluster exceeds M, it is split into two clusters.
Semantic Small World Overlay Network / Index construction. • Each peer maintains a set of short range contacts and a certain number of long range contacts. • These long range contacts reduce network diameter and transform the network into a small world with poly-logarithmic search cost. Dimension Reduction • EX: SV for a document is made of 50-300 elements • Adopt a technique called Adaptive Space Linearization (ASL), for linearizing the clusters in high dimensional space into a one-dimensional SSW (SSW-1D) through cluster split process.
Semantic Small World Cluster Splitting process • If the cluster size exceeds pre-defined maximum size, M, it is partitioned. • Choose two peers such that they are semantically farthermost from each other. • Alternatively assign peers to the two sub-clusters based on the shortest distance to the seeds. • Finally the cluster space is partitioned at the middle point of the dimension that has the largest span between the centroids of the semantic labels of the two sub-clusters. • We need to describe a naming schema to maintain 1-1 mapping between the naming of clusters in SSW-1D and their semantic subspaces. • Use 128-bit binary number (called cluster ID) to name the cluster.
Semantic Small World Naming Scheme for clusters • Each peer maintains a variable, Par_Bit, which initially points to MSB of the cluster ID. • Par_Bit indicates the bit to be set (to 0/1) in the next cluster split. • After each split the two sub-clusters will rename their ID by resetting the bit pointed by Par_Bit separately, retaining all other bits the same as the ID of the original cluster and decrease their Par_Bit by one. • The cluster that has smaller centroid along the partition dimension obtains an ID with the bit pointed by Par_Bit set to 0 and the other one obtains an ID with the bit pointed by Par_Bit set to 1. • The process is repeated as more peers join the system to invoke more splits.
Semantic Small World 0.25 0.8 (1,1) 6 0110 12 1100 15 1111 (0,1) 5 0101 P=3 0.8 P=4 0.8 P=4 14 1110 P=2 0.6 P=3 4 0100 P=3 0.4 P=2 11 1011 0.2 P=1 P=4 P=3 0 0000 2 0010 8 1000 10 1010 (1,0) (0,0) 0.2 0.4 0.6
c0 c15 c6 c8 c12 c14 c11 c5 c2 c10 c4 Semantic Small World
Semantic Small World 0.25 0.4 0.2 (1,0) (1,1) 0.8 0.6 0.8 1 0.4 0.2 (1,0) (0,0) 0.2 0.4 0.8
Semantic Small World Data Structure At Each Peer ClusterState: { ClusterRange, ClusterSize, Par_His, Par_Bit } NeighborList: { NodeId } ShortContact: { NodeId, ClusterRange } LongContact: { NodeId, ClusterRange } ForeignIndex: { Semantic Vector, NodeId }
Semantic Small World Example: DS of Peer 1 that belongs to Cluster 4 ClusterState: ClusterRange = { (0, 0.25), (0.4, 0.8) } ClusterSize = 4 Par_His = { 1: (v, 0.4),2: (h, 0.4), 3: (v, 0.25), 4: (h,0.8) } Par_Bit = 4 NeighborList: { 3, 4, 5 } ShortContact: { ( 7, {(0.00, 0.25), (0.80, 1.00)} ), (12, {(0.25, 0.40), (0.40, 1.00)} ) } LongContact: { (25, {(0.60, 1.00), (0.00, 0.20)} ) } ForeignIndex: { Semantic Vector, NodeId }
Semantic Small World Search: • Search operation has two modes : 1. search-within-cluster 2. search-across-cluster • To initiate search, requester has to first generate a search semantic vector (SV) for the query Q. • Search Within Cluster: Peer checks whether the SV for query Q falls within its cluster range. If this is the case, it floods the request to peers in its NeighborList except for the one from whom the message was received. Object with highest similarity is returned as result.
Semantic Small World Search-Across-Cluster • If Q does not belong to ClusterRange of the peer, search across the cluster mode is invoked. • In this case, a pseudo-cluster-name (PCN), the estimated ID for the cluster covering SV is calculated for the query based on the partition history (Par_His) stored at that particular peer. • The search is continued by forwarding the message to the contact that has the closest naming distance to the PCN. • The above process is repeated until the cluster whose semantic subspace covering SV is reached.
Semantic Small World Estimation of PCN • Set all the bits of PCN to 0 • Iterate through Par_His of current Peer(i), the bits of PCN are set as the same value of corresponding bits of Peer i’s Cluster ID, Ci, as long as query confirms to the same Par_His entry. • Otherwise, corresponding bit is set to a different value and PCN estimation process at peer i stops, since this peer does not have further details about the PCN.
Semantic Small World Algorithm For PCN Estimation For x = B to Par_Bit +1 Do Obtain Partition Dimension d and Partition Point pfrom Peer i’sPar_Hisx. Ifi.ClusterRanged≤pand Qd≤p or i.ClusterRanged>pand Qd>pThen PCNx=Cix. Else PCNx=1 - Cix. Break. End If End For
Semantic Small World 0.25 0.8 (1,1) 6 0110 12 1100 15 1111 (0,1) 5 0101 P=3 0.8 P=4 0.8 P=4 14 1110 P=2 0.6 P=3 4 0100 P=3 0.4 P=2 11 1011 0.2 P=1 P=4 P=3 0 0000 2 0010 8 1000 10 1010 (1,0) (0,0) 0.2 0.4 0.6
Semantic Small World 1. General Process Peer Join 2. Cluster Splitting 3. Foreign Index Publishing General Process: • Generate x.label for the join point chosen by Peer x. • Sends a join message to existing Peer i. • Peer i performs search and directs the message to Peer j (contact peer) that covers the point. • If the cluster size of Peer j is below the maximum size M then Peer x simply joins the cluster. • If the size exceeds M, Cluster splitting is invoked.
Semantic Small World Cluster Splitting • If the cluster size > M contact peer j first obtains a complete list of the semantic labels of all the peers present in the cluster by polling the peers in the cluster through flooding. • Then it splits the cluster into two using the Cluster Splitting strategy as discussed earlier. • Peer j finishes the cluster splitting by informing all the other peers to update their ClusterState, ShortConatct, LongContact and ForeignIndex. • Cluster Splitting operation is invoked infrequently by choosing large M but large M will increase the search cost due to flooding within the cluster.
Semantic Small World Foreign Index Publishing: • After Joining the cluster , a peer may find that some of its local data objects do not belong to this cluster. • The newly joined peer publishes the locations of these data objects to their corresponding peer clusters (as foreign indexes). • The first node ( in a corresponding peer cluster) reached during the publishing process adds a tuple consisting of SV and the NodeId of source node for a data object into its foreign index store.
Semantic Small World Peer Removal 1. Peer Leave. 2. Peer Failure. Peer Leave: • Checks to see if it is the last peer in the cluster. • If its not the last peer then it simply informs its leaving by transferring its foreign index to a randomly selected peer in its cluster. • If it is the last peer in the cluster then the semantic subspace of this cluster needs to be merged with one of its neighboring clusters. • The leaving peer then transfers the foreign index. • The receiving peer updates the cluster range as well as short range contacts. Similarly all the other peers in the cluster update their range as well as their affected short range contacts.
Semantic Small World Peer Failure: • A failed peer is detected during routine operations such as search. • If there is a failure in long range contact then it simply re-establishes another long range contact. • If the peer detecting the peer failure is located in one of the neighboring clusters E of the failed peer, originally located in cluster F, it is likely that other peers in cluster E maintain short range contacts with other live peers in cluster F. At an expense of two messages, the short range contact of the detecting peer can be recovered. • If a short range contact of a peer in cluster E cannot be recovered by contacting other peers in the network, it implies that no live peer exits in cluster F and at this point cluster merging is invoked.
Semantic Small World Object Publication: let SVxbe the semantic vector of an object x, that a peer wants to publish, then • If SVx belongs to its cluster range then it includes the point corresponding to the semantic vector in its semantic subspace. • If SVx does not belong to its cluster range then it searches for the cluster which covers this point and publishes it as a foreign index. Object Removal: If the SVx belongs to its cluster range then it just removes the point from its cluster. If it is stored as a foreign index, it sends a message to delete the foreign index stored at the node where the object is being published.