330 likes | 344 Views
Explore profiling protocols and servers in grid computing to enhance performance and reduce failure rates. Analysis of GridFTP and NeST for optimization insights.
E N D
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA
Motivation • Scientific experiments are generating large amounts of data • Education research & commercial videos are not far behind • Data may be generated and stored at multiple sites • How to efficiently store and process this data ? Source: GriPhyN Proposal, 2000
Motivation • Grid enables large scale computation • Problems • Data intensive applications have suboptimal performance • Scaling up creates problems • Storage servers thrash and crash • Users want to reduce failure rate and improve throughput
Profiling Protocols and Servers • Profiling is a first step • Enables us to understand how time is spent • Gives valuable insights • Helps • computer architects add processor features • OS designers add OS features • middleware developers to optimize the middleware • application designers design adaptive applications
Profiling • We (middleware designers) are aiming for automated tuning • Tune protocol parameters, concurrency level • Depends on dynamic state of network, storage server • We are developing low overhead online analysis • Detailed Offline + Online analysis would enable automated tuning
Profiling • Requirements • Should not alter system characteristics • Full system profile • Low overhead • Used OProfile • Based on Digital Continuous Profiling Infrastructure • Kernel profiling • No instrumentation • Low overhead/tunable overhead
Profiling Setup • Two server machines • Moderate server: 1660 MHzAthlon XP CPU with 512 MB RAM • Powerful server: dual Pentium 4 Xeon 2.4 GHz CPU with 1 GB RAM. • Client Machines were more powerful – dual Xeons • To isolate server performance • 100 Mbps network connectivity • Linux kernel 2.4.20, GridFTP server 2.4.3 , NeST prerelease
GridFTP Profile Read Rate = 6.45 MBPS, Write Rate = 7.83 MBPS =>Writes to server faster than reads from it
GridFTP Profile • Writes to the network more expensive than reads • => Interrupt coalescing
GridFTP Profile IDE reads more expensive than writes
GridFTP Profile File system writes costlier than reads => Need to allocate disk blocks
GridFTP Profile More overhead for writes because of higher transfer rate
GridFTP Profile Summary • Writes to the network more expensive than reads • Interrupt coalescing • DMA would help • IDE reads more expensive than writes • Tuning the disk elevator algorithm would help • Writing to file system is costlier than reading • Need to allocate disk blocks • Larger block size would help
NeST Profile Read Rate = 7.69 MBPS, Write Rate = 5.5 MBPS
NeST Profile Similar trend as GridFTP
NeST Profile More overhead for reads because of higher transfer rate
NeST Profile Meta data updates (space allocation) makes NeST writes more expensive
GridFTP versus NeST • GridFTP • Read Rate = 6.45 MBPS, write Rate = 7.83 MBPS • NeST • Read Rate = 7.69 MBPS, write Rate = 5.5 MBPS • GridFTP is 16% slower on reads • GridFTP I/O block size 1 MB (NeST 64 KB) • Non-overlap of disk I/O & network I/O • NeST is 30% slower on writes • Lots (space reservation/allocation)
Effect of Protocol Parameters • Different tunable parameters • I/O block size • TCP buffer size • Number of parallel streams • Number of concurrent transfers
L2 DTLB Misses Parallelism triggers the kernel to use larger page size => lower DTLB miss
Profiles on powerful server • Next set of graphs were obtained using the powerful server
Transfer Rate versus Parallelism in Short Latency (10 ms) Wide Area
Conclusion • Full system profile gives valuable insights • Larger I/O block size may lower transfer rate • Network, disk I/O not overlapped • Parallelism may reduce CPU load • May cause kernel to use larger page size • Processor feature for variable sized pages would be useful • Operating system support for variable page size would be useful • Concurrency improves throughput at increased server load
Questions • Contact • kola@cs.wisc.edu • www.cs.wisc.edu/condor/publications.html