Thinking in G i *(d) calculation with Map-Reduce

Thinking in Gi*(d) calculation with Map-Reduce 2010-3-29

Preprocessing • Generate Data Table • Divide domain into cells, count number of points in every cell; • Accumulate cells into quads; • Put all points into quads(I/O intensive operation? need Map Reduce?) • Generate Index Table:O(n2½)? • For every quad, increase its boundary by step, till it covers the whole domain. • In every step, calculate quads which intersect with;(need spatial index?) • Store the deduplicate index item into index table. • Calculation of Gi*(d) • Algorithm of Gi*(d) in M-R(?) • counts how many neighbor quads should be used by index table; • Copy current quad to nodes which neighbor quads reside; • Do map task to calculate Gi*(d) in all neighbored nodes; • Do reduce task to calculate Gi*(d). • C/C++ should be used in Gi*(d) calculation • GPU may be helpful in calculation. • Hotspot cells/quads should be reside in memory/most of nodes • How to accelerate calculation by tuning MR parameters/ Gi*(d) algorithm parameters?

Structure of Tables • DATA_TABLE • Row : Quad_id • Family : data • Count : points in Quad • Body • point info : point1/point2/point3/…… • Each point record : x/y/z(3 float point number, 12 bytes) • INDEX_TABLE • Row: Quad_id • Family : border • XS • XE • YS • YE • Family : D • D1 • D2 • … • Dn

Storage model • Data distribution strategies • Evenly distributed in all nodes • Locality distributed • Data Cache Strategies • ?? • ?? • Application model • Batch processing of Gi*(d) (per cell/per quad) • Interactive processing of Gi*(d) (per point) • Support for different storage strategies Locality distributed Evenly distributed

Thinking in G i *(d) calculation with Map-Reduce