290 likes | 475 Views
P2PR-tree: An R-tree-based Spatial Index for P2P Environments. ANIRBAN MONDAL YI LIFU MASARU KITSUREGAWA University of Tokyo. E-mail: anirban@tkl.iis.u-tokyo.ac.jp. PRESENTATION OUTLINE. Motivating Spatial Applications on P2P systems Existing Spatial Indexes
E N D
P2PR-tree: An R-tree-based Spatial Index for P2P Environments ANIRBAN MONDAL YI LIFU MASARU KITSUREGAWA University of Tokyo. E-mail: anirban@tkl.iis.u-tokyo.ac.jp
PRESENTATION OUTLINE Motivating Spatial Applications on P2P systems Existing Spatial Indexes Our proposal: The P2PR-tree Performance Analysis Conclusion and Future Work
Spatial Applications on P2P systems • Spatial data occurs in several important and diverse applications • Geographic Information Systems (GIS) • Computer-aided design (CAD) • Resource management • Development planning, emergency planning and scientific research. • Unprecedented growth of available spatial data at geographically distributed locations. • Trend of increased globalization. • Popularity of P2P data sharing Efficient global sharing of distributively owned spatial data in P2P systems
Query Query MBR Application example Searching for Real Estate information in Tokyo Results
Existing Spatial Indexes • Centralized spatial indexes • R-tree, R*-tree, R+-tree • Distributed spatial indexes • M-Rtree • MC-Rtree
Master client client client client MC-Rtree R-tree which indexes the covering MBRs of the data stored at the clients • Centralization • Designed for clusters. • Optimize disk I/Os. Each client has its own R-tree for managing its own data
Why can’t we use existing R-tree-based approaches? • They use centralized mechanisms →not scalable. • All updates must pass through Master Node • All searches need to be routed by the Master Node →Performance bottleneck at the Master Node • They do not optimize communication time.
GRID-Related Projects • GRID Physics Network and European DataGrid • Improving scientific research which require efficient distributed handling of data in the petabyte range, • Earth Systems GRID (ESG) • aims at facilitating detailed analysis of huge amounts of climate data by a geographically distributed community via high bandwidth networks. • NASA Information Power GRID (IPG) • improve existing systems in NASA for solving complex scientific problems efficiently
How our proposal differs from GRID-related spatial works? • GRID • Restrict data sharing only among scientific and research organizations • Individual nodes are usually dedicated and expected to be available most of the time. • Some amount of centralized control is possible by collaborations between organizations. • Our proposal • Allow normal users to share/upload data. • Individual nodes may join/leave anytime. • Distributively owned peers, hence centralized control practically challenging.
Broadcast (Gnutella) • Centralized (Napster) • Routing indices (RIs) • Distributed hash tables (Chord,CAN,Tapestry) Existing Search mechanisms in P2P systems Existing works on P2P systems mostly address file-sharing.
P2PR-tree (Peer-to-Peer R-tree) • A distributed R-tree-based indexing scheme designed for P2P systems • Parts of the distributed indexes are built autonomously by each peer. • Hierarchical and performs efficient pruning. • Completely decentralized • Highly Scalable
B1 B2 B3 B4 G1 G2 G3 G4 Block 1 Block 2 Block 3 Block 4 Dividing the Universe Level 0 P P5 P P2 P6 P1 P P4 P P20 P3 P P Level 1 P P P P P P P P P P P P P P P ….. Level 2 P P P SG1 SG2 P5 P6 P3 P P P P P P P P P P P Level 3 P1 P2 P20 P3 P4
Definitions • Unit: A Block, Group, Subgroup atany level, or a peer • UnitMBR: Minimum Bounding Rectangle of a Unit • Router: In order to route messages to a Unit X, a peer A needs to know at least one peer (say peer B) which belongs to Unit X. We define peer B as Peer A’s Router to Unit X. • UnitRouterInfo: The addresses of routers to a Unit • UnitInfo: UnitMBR and UnitRouterInfo of a Unit • ChildInfo (Level i): UnitInfo of Child Units at Level i+1 in the P2PR-tree
Data Structure at a peer A Peer of Level L can be specified as where maintains the following information
B1 B2 B3 B4 G1 G1 G2 G2 G3 G3 G4 G4 SG1 SG1 SG2 SG2 P5 P6 P3 Example of Data Structure P2 can be specified as Peer(1.2.1.2) Level 0 Units Level 1 Units ... Level 2 Units Level 3 Units P1 P2 P20 P3 P4 P11 P12 P21 P33 P66
B1 B2 B3 B4 G1 G2 G3 G4 P3 P3 P2PR-tree BlockMBR information stored at every peer P2 P5 P1 P6 Level 0 P4 G1 G2 Level 1 ….. P8 P5 P6 P3 P1 P2 P4 P3 Level 2 P9 Maintaining information Peer Level = 2 , (B1,B2,B3,B4) (G1,G2,G3,G4) (P6,P3) P10 G4 G3 Block 1
B1 B2 B3 B4 G1 G2 G3 G4 P2PR-tree BlockMBR information stored at every peer P2 P5 P1 P6 Level 0 P4 P3 G1 G2 Level 1 ….. P8 P5 P6 P3 P1 P2 P4 P3 Level 2 P9 Maintaining information Peer Level = 2 , (B1,B2,B3,B4) (G1,G2,G3,G4) (P6,P3) P10 G4 G3 Block 1
B1 B2 B3 B4 G1 G2 G3 G4 P1 P2 P3 P4 Peer Join operation in P2PR-tree SG2 SG1 BlockMBR information stored at every peer P2 P5 P1 P6 Level 0 P4 P20 P3 G1 G2 P30 Level 1 ….. P8 P5 P6 P3 P30 Level 2 P9 P10 G4 G3 Maintaining information Peer Level = 2 , (B1,B2,B3,B4) (G1,G2,G3,G4) (P2,P3,P4) Block 1
B1 B2 B3 B4 G1 G2 G3 G4 SG1 SG2 Peer Join operation in P2PR-tree SG2 SG1 BlockMBR information stored at every peer P2 P5 P1 Level 0 P6 P4 P20 P3 G1 G2 Level 1 P30 ….. P5 P6 P3 P30 Level 2 P8 P9 P1 P2 P20 P3 P4 Level 3 P10 G4 G3 Maintaining information Peer Level = 3 , (B1,B2,B3,B4) (G1,G2,G3,G4), (SG1,SG2),(P2,P20) Block 1
Routing Issues • Assumption: A peer initially knows at least N routers for a Unit. • Piggybacking to refresh routers for each peer. • During piggybacking, a peer sends the addresses and reliability information of other peers in its own Unit. • Each peer maintains most reliable R routers for Units based on reliability. • What if all routers that a peer knows in a specific Unit are unavailable? • Peer contacts the peers in other blocks to find out new routers for that block.
Query Level = 0 Query comes to P60 G1 G1 G2 G2 G3 G3 G4 G4 SG1 SG2 Searching the P2PR-tree BlockMBR information stored at every peer P42 P44 P46 P45 P41 G2 G1 B1 B2 B3 B4 B1 P60 Level 0 Block 4 P49 P43 P48 Level 1 G4 … G3 P40 P45 P46 P60 P5 P6 P3 P30 Level 2 SG2 P2 P1 P5 P4 P6 Level 3 P1 P2 P20 P3 P4 P3 P20 G2 G1 P30 Maintaining Information Peer Level = 2 (P5→B1, P25→B2, P35→B3, B4) (P41→G1, G2, P43→G3, P49→G4) (P45, P46) Maintaining Information Peer Level = 2 (P5→B1, P25→B2, P35→B3, B4) (P41→G1, G2, P43→G3, P49→G4) (P45, P46) SG1 Block 1 P8 P9 P10 G4 G3
G1 G2 G3 G4 SG1 SG2 Searching the P2PR-tree BlockMBR information stored at every peer Query Level = 1 Query comes to P60 P42 P44 P46 P45 P41 G2 G1 B1 B2 B3 B4 P60 Level 0 Block 4 P49 P43 P48 G1 G2 G3 G4 G1 Level 1 G4 … G3 P40 P45 P46 P60 P5 P6 P3 P30 Level 2 SG2 P2 P1 P5 P4 P6 Level 3 P1 P2 P20 P3 P4 P3 P20 G2 G1 P30 SG1 Maintaining Information Peer Level = 2 (B1, P26→B2, P36→B3, P42→B4) (P4→G1, G2, P8→G3, P9→G4) (P6, P30) Maintaining Information Peer Level = 2 (B1, P26→B2, P36→B3, P42→B4) (P4→G1, G2, P8→G3, P9→G4) (P6, P30) Block 1 P8 P9 P10 G4 G3
G1 G2 G3 G4 Searching the P2PR-tree BlockMBR information stored at every peer Query Level = 2 Query comes to P60 P42 P44 P46 P45 P41 G2 G1 B1 B2 B3 B4 P60 Level 0 Block 4 P49 P43 P48 G1 G2 G3 G4 Level 1 G4 … G3 P40 SG1 P45 P46 P60 SG1 SG2 P5 P6 P3 P30 Level 2 SG2 P2 P1 P5 P4 P6 Level 3 P1 P2 P20 P3 P4 P3 P20 G2 G1 P30 Maintaining Information Peer Level = 3 (B1, P27→B2, P37→B3, P43→B4) (G1, P6→G2, P8→G3, P10→G4) (P20→SG1, SG2) (P3) Maintaining Information Peer Level = 3 (B1, P27→B2, P37→B3, P43→B4) (G1, P6→G2, P8→G3, P10→G4) (P20→SG1, SG2) (P3) SG1 Block 1 P8 P9 P10 G4 G3
G1 G2 G3 G4 P1 P1 P2 P2 Searching the P2PR-tree BlockMBR information stored at every peer Query Level = 3 Query comes to P60 P42 P44 P46 P45 P41 G2 G1 B1 B2 B3 B4 P60 Level 0 Block 4 P49 P43 P48 G1 G2 G3 G4 Level 1 G4 … G3 P40 P45 P46 P60 SG1 SG2 P5 P6 P3 P30 Level 2 SG2 P2 P1 P5 P4 P6 Level 3 P20 P3 P4 P3 P20 G2 G1 P30 Maintaining Information Peer Level = 3 (B1, P28→B2, P38→B3, P45→B4) (G1, P30→G2, P8→G3, P9→G4) (SG1, P3→SG2) (P1, P2) Maintaining Information Peer Level = 3 (B1, P28→B2, P38→B3, P45→B4) (G1, P30→G2, P8→G3, P9→G4) (SG1, P3→SG2) (P1,P2) SG1 Block 1 P8 P9 P10 G4 G3
Performance Evaluation • Investigates the following • Effect of variations in workload skew • Performance metric: • Average Response Time • Comparison with Centralized MC-Rtree • 1000 data providing peers
Effect of variations in workload skew when the query interarrival rate was fixed at 20 queries/second
Effect of variations in workload skew when the query interarrival rate was fixed at 100 queries/second
Conclusion • Investigation of the problem of spatial indexing in P2P environments. • Proposal of the P2PR-tree (Peer-to-Peer R-tree). • Scalable decentralized P2P data structure • Efficient routing scheme
Future Scope of Work • Detailed simulation • Replication • Availability • Load-balancing