160 likes | 302 Views
Grid-enabling geospatial applications methodologies & practices. Jibo Xie, Chaowei Yang Joint Center for Intelligent Spatial Computing George Mason University AAG 2008. Outline. Introduction Parallelizing geospatial applications Grid-enabling geospatial applications Practices Conclusion.
E N D
Grid-enabling geospatial applicationsmethodologies & practices Jibo Xie, Chaowei Yang Joint Center for Intelligent Spatial Computing George Mason University AAG 2008
Outline • Introduction • Parallelizing geospatial applications • Grid-enabling geospatial applications • Practices • Conclusion
Introduction • Grid computing • Parallel processing is a computing technique which emphasizes the feasible exploitation of available concurrency in a computational process (Hazra 1995) • Grid computing (Foster and Kesselman 1999, Foster et al. 2001), a new computing infrastructure, provides a potential solution to harness the idle workstations and other resources in a flexible manner. • Manipulate large databases, accelerate code execution, and resolve excessive time consuming problems (Fadlallah and Dessaint 2000) • Why grid computing for GIS? • Data intensive feature • Time consuming data models • Real time applications
Characters of Grid environment • Bote-Lorenzo et al.(2004) • Large scale. • Geographical distribution • Heterogeneity • Resource sharing & coordination • Multiple administrations.
Principles for parallelizing geospatial applications • More specifically, it requires consideration of the trade-offs between (Healey et al 1998): • 1) the architecture of individual processing nodes; • 2) the granularity of the machine (no. of nodes); • 3) communication bandwidth and the latency (the time between the sending and receipt of messages between processors); • 4) the topology of the interconnection between processors; • 5) the extent of overlap(or not ) of communication and computation.
Grid enabling methods for Geospatial applications • Shared memory (not popular now) • Distributed memory • 1)Message passing based distributed memory • 2)Batch based parallelization
1) MPI based parallel computing Parallel geocomputing Node1 MPI_Gather MPI_Scatter Master Master Node2 Output results Input Spatial Data Node3 Node4 Slave
2) Batch based parallel computing Serial computing part Parallel computing part
Batch mode example in condor Computing job MITSIMLAB [EXE]
Limitation to parallel performance • Amdahl’s Law • Others • Parallel hardware • Work load balance • Communication latency Sn -----Speed up with n processor n ------- Number of processors S ------ percentage of sequential fraction Sn = Lim Sn = 1/s n->∞ If s = 10% the maximum speedup is 10
Grid computing environment (SURA grid homepage 2007)
Speed up of WRF_NMM dust model (MPI) 2 servers, each has 8 cpu cores (2 quard cores)
Speedup of traffic simulation( condor ) 4 servers(2 servers each with 8 cpu cores; 1 server with 4 cpu cores; 1 server with 2 cpu cores)
Performance enhancement • Using computing nodes with same hardware • Select suitable parallel computing mode • Reduce the sequential part of the parallel program as much as possible • Data redundancy instead of communication
Summary • Grid enabling methods can enhance performance for geospatial applications • Bach computing mode is more scalable than MPI mode for grid enabling geospatial application • No or little communication between jobs: No or little communication between jobs will fit to grid computing. • Coarse granularity: Coarse granularity means relatively large amounts of computational work are done between communications. • Data decomposition and data redundancy: Geospatial applications are data intensive and time consumption.
THANKS! Q&A