10 likes | 125 Views
Latent Interest Group Discovery and Management by P2P OSNs. J. He, G. Kesidis and D.J. Miller – The Pennsylvania State University In collaboration with K. Levitt, J. Rowe, S.F. Wu – The University of California at Davis. Experimental Results
E N D
Latent Interest Group Discovery and Management by P2P OSNs J. He, G. Kesidis and D.J. Miller – The Pennsylvania State University In collaboration with K. Levitt,J. Rowe, S.F. Wu – The University of California at Davis • Experimental Results • EM-BIC uses 22% less hops than Closest, and 36% less than RW+Cache • Success Prob.: EM-BIC and Closest (>96% ) vs. RW+Cache (~80%) • Abstract • Recently, specialized P2P architectures have been proposed to alleviate privacy concerns of centralized Online Social Networks (OSNs). To resolve anycast queries in P2P OSNs, clustering techniques are employed to discover emerging latent Interest Groups (IGs). In this poster, we give some performance results for our proposed approach based on Emulab experiments. • Motivation • Centralized online social network (OSN) architecture, e.g. Facebook: • pros - centralized networking; • cons - privacy including cookies and stored information in the central server including location, targeted advertising by the central OSN itself, and threats from third parties (governments, hackers). • Privacy issues motivate consideration of a peer-to-peer (P2P) OSN architecture, e.g., Diaspora, Friendica. • But currently unsophisticated networking among pods/superpeers. • In particular, latent interest group (IG) discovery and anycasting. • Our approach • Assumptions for preliminary study: A standard dictionary used by all with λ uniquely defined keywords (KWs). All IGs are initially latent. Pods know their active local users (simply by checking local login data). A pod does not share “raw” user data for privacy concerns. No pod leaves/joins, i.e., static social graph. • Interest groups: An IG is a set of users who actively share a common interest. Queries are coded as bit vectors in standard keyword space, e.g., q ∈ {0,1}λ. IG identified as the centroid of a cluster of queries. E.g., Euclidean distance between two queries or IG-centroids. • Query forwarding: Upon receiving a query q, a pod takes the following steps (failing one, the next is tried): • Centralized lookup, e.g., via CCN; • Search local users; • Find the centroid, C, closest to q; • If q is closer to C than τ (< 1) times the distance between q and its second nearest centroid, then use the forwarding table at C; • Random Walk / Limited-Scope Flood • So, initially forward by RW/LSF • Query forwarding table: Forwarding a query q first matched to closest centroid C2, then to pod p22 associated withthe closest stored query q22within C’s cluster. • Query clustering: Bernoulli model involving prior probabilities that an arbitrary query is in cluster j and probabilities that the ath keyword is used in the query generated based on cluster j. These quantities are learned iteratively by Expectation-Maximization with model-order selection (BIC). • Experimental Set-Up • Benchmark query forwarding algorithms: • RW+Cache: next pod of the exactly matched cached query or choose a random one • Closest: next pod of closest cached query • Performance metrics: • Forwarding - Average number of query hops, Query success probability • Clustering - IG name accuracy (distance between a learned IG and the nearest true IG), IG number accuracy • Network and query configuration: • 256-KW dictionary, 50 IGs, 10-15 KWs per IG • KW overlap among IGs, and there are a limited number of KWs in queries, leading to query-to-IG ambiguity. • 256 pods, 1-2 users per pod, 3-4 neighbors per pod, 3 IGs per user • Query rate of IGs follows Zipf distribution • Cache size per pod: 300 queries • Re-clustering interval: 25 cache touches • Emulab experimental set-up: • 16 Emulab nodes, each has 16 pods running on it, so there are 256 pods in total. • 100Mbps link rate and no packet loss in link layer • Each pod has two modules: • More distant IGs can be discovered when # of cache touches increases • Monotonic improvement in clustering performance with cache touches • Clustering 300 IGs can take as long as 200s • . • Future Work • Explore different IG removal/identification methods • NLP issues: multiple definitions of each KW, KW misspelling, extraneous KWs, etc. • When privacy not an issue, clarify interfacing with centralized OSNs or futuristic ICNs/CCNs • Different query forwarding algorithms including privacy-preserving primitives • Cooperative learning via sharing centroids • Integrating our pod module to run on Emulab with Diaspora pod codebase. Acknowledgements This work supported by NSF CNS grant 1152320. References Details of this work (with a description of related work and other simulated experimental results) was recently published by us in Proc. ASE/IEEE SOCIALCOM, Washington, DC, Sept. 2013.