1 / 34

High Performance IO HPC Workshop – University of Kentucky May 9, 2007 – May 10, 2007

High Performance IO HPC Workshop – University of Kentucky May 9, 2007 – May 10, 2007 . Andrew Komornicki, Ph. D. Balaji Veeraraghavan, Ph. D. Agenda. Introduction General IO performance Results of some small tests. Modular IO libraries, Linux and AIX. I/O Optimization.

colum
Download Presentation

High Performance IO HPC Workshop – University of Kentucky May 9, 2007 – May 10, 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High Performance IO HPC Workshop – University of KentuckyMay 9, 2007 – May 10, 2007 Andrew Komornicki, Ph. D. Balaji Veeraraghavan, Ph. D.

  2. Agenda • Introduction • General IO performance • Results of some small tests. • Modular IO libraries, Linux and AIX

  3. I/O Optimization • Analyze the IO pattern • Determine optimization method • Optimize in user space • Minimize source code changes • Possibly relink with libtkio.so

  4. General I/O Performance • C: • Do not use fopen(), fread(), or fwrite(); • These are inefficient due to small (4KB) IO blocks and extra memory copies. • Use instead: • POSIX open(), read(), write() • Direct (raw) IO will eliminate an additional memory copy • FORTRAN: • Use unformatted IO

  5. Asynchronous IO, an example • Non Blocking IO • aio_read(), aio_write(), aio_return(); • Completion Notification • Polling with aio_error(); • Block until complete with aio_suspend(): • Cancellation of IO requests • aio_cancel(); • Large File enabled • Removes the 2GB file size limitation • POSIX conforming

  6. Results of Bonnie IO test • Run on Blade system in San Mateo Lab • System Memory, 5 Gbytes • File systems, ext2, and ext3 • All tests done in four stages: Writing with putc()...done Rewriting...done Writing intelligently...done Reading with getc()...done Reading intelligently... done

  7. Results of Bonnie IO test, Block IO performance Size (MB) Write (Kbytes/sec) Read(Kbytes/sec) __________________________________________ 2000 841,524 2,233,282 4000 83,237 1,658,013 8000 56,599 50,974 16000 49,656 50,677

  8. Results of Bonnie IO test Results for ext2 file system, time in seconds Size (MB) User System Elapsed _________________________________ 2000 48.7 10.8 84.3 4000 97.6 23.9 252.6 8000 194.9 56.4 1009.2 16000 388.4 111.8 2088.3

  9. Results of Bonnie IO test Results for ext3 file system, time in seconds Size (MB) User System Elapsed ______________________________________ 2000 48.7 19.2 96.3 4000 97.7 45.5 265.8 8000 194.6 90.1 1016.9 16000 396.9 201.8 2058.3

  10. Modular I/O (MIO) • Familiar and flexible runtime interface • MIO modules • mio • trace • pf • MIO available on both Linux and AIX

  11. MIO user code interface • open MIO_open • read MIO_read • write MIO_write • close MIO_close • lseek MIO_lseek • fcntl MIO_fcntl • ftruncate MIO_ftruncate

  12. MIO run time interface • MIO_STATS="file name" • MIO_FILES=" *.dat*[trace|pf ] *.inp [aix]" • MIO_DEBUG="ALL" • MIO_DEFAULTS="trace/mbytes , pf/cache=10m“

  13. trace module • summary of file activity • binary events file • low cpu overhead • typical options • /stats • /mbytes /gbytes /tbytes • /events=mio.evt

  14. pf module • User selectable cache size • User selectable page size • User selectable prefetch depth • Direct or system buffered IO • Global or private cache • Usage summary

  15. pf module • detects sequential I/O • user memory buffering • options • /global • /cache_size=10m • /page_size=1m • /prefetch=1 • /stride=1 • /direct • /stats

  16. Relink with libtkio.a • libtkio.a has shared object members • tkio.so 32 bit and 64 bit • Entry points for • open,open64,close,read,write,lseek,lseek64 • fcntl,ffinfo,fstat,fstat64,fstatfs,fsync • ftruncate,ftruncate64 • unlink,aio_...

  17. Default tkio behavior • Uses dlopen and dlsym for runtime linking

  18. tkio runtime interface • setenv TKIO_ALTLIB so_name/print/abort • export TKIO_ALTLIB=so_name/print/abort • so_name is name of shared library • Either name.so or libname.a(name.so) • tkio calls function in so_name that returns a structure filled with I/O entry points to replace default entry points • /print option outputs a print to stderr indicating success of load • /abort issues exit(-1) if load is not successfull

  19. tkio using MIO • setenv TKIO_ALTLIB get_mio_ptrs_64.so

  20. kernel libmio libc Application libtkio ->MIO_open64 ->MIO_write ->MIO_read ->MIO_lseek64 ->MIO_close open64 write read lseek64 close ->open64 ->write ->read ->lseek64 ->close Fortran I/O Demonstration only stdio fopen frwrite fread fclose X

  21. trace pf aix kernel libc libtkio libmio open64 write read lseek64 close ->MIO_open64 ->MIO_write ->MIO_read ->MIO_lseek64 ->MIO_close ->open64 ->write ->read ->lseek64 ->close

  22. System buffered Data Movement user space kernel MIO space system buffers 256kb

  23. pf cached Data Movement user space kernel MIO space 5 x 2mb system buffers 256kb

  24. O_DIRECT Data Movement user space kernel MIO space 5 x 2mb O_DIRECT system buffers 256kb

  25. Asynchronous Data Movement user space kernel MIO space 5 x 2mb O_DIRECT system buffers 256kb

  26. MSC.NASTRAN trace output from program <->pf Trace close : program <-> pf : /bmwfs/cdh108.T20536_13.SCR300 : (281946/2162.61)=130.37 mbytes/s current size=0 max_size=16277 mode =0777 sector size=4096 oflags =0x302=RDWR CREAT TRUNC open 1 0.01 write 478193 462.10 59774 59774 131072 131072 read 1777376 1700.48 222172 222172 131072 131072 seek 911572 2.83 fcntl 3 0.00 trunc 16 0.40 close 1 0.03 size 127787 Mbytes requested and Mbytes delivered Min/Max Request size in bytes Number of occurances

  27. MSC.NASTRAN trace output Trace close : pf <-> aix : /bmwfs/cdh108.T20536_13.SCR300 : (276645/1460.73)=189.39 mbytes/s current size=0 max_size=16276 mode =0777 sector size=4096 oflags =0x8000302=RDWR CREAT TRUNC DIRECT open 1 0.01 write 4382 154.86 684 684 131072 2097152 awrite 33390 1.42 58491 58491 131072 2097152 suspend 33390 240.00 242.27 mbytes/s read 5178 272.71 10354 10354 1048576 2097152 aread 103560 5.70 207115 207115 524288 2097152 suspend 103560 786.04 261.59 mbytes/s seek 136950 0.00 fcntl 3 0.00 trunc 16 0.40 close 1 0.00 size 11013 pages 138477

  28. MSC.NASTRAN pf output pf close for /bmwfs/cdh108.T20536_13.SCR300 global cache 0: 150 pages of 2097152 bytes 29739/29749 pages not preread for write 138316/139754 prefetches : prefetch=3 29576 write behinds 478193 writes 1777376 reads page writes 37772/33124 mbytes transferred program --> 59774 --> pf --> 59176 --> aix program <-- 222172 <-- pf <-- 217469 <-- aix

  29. DataView file activity plot file position ( bytes ) time ( seconds )

  30. DataView file activity plot file position ( bytes ) time ( seconds )

  31. Asynchronous I/O plotting file position ( bytes ) suspend time hidden time queuing time time ( seconds )

  32. cache page activity file position ( bytes ) time ( seconds )

  33. MSC.Nastran performance gains 16 cpu 32GB NH2 node 2.2M dof, 767GB I/O, 8 copies 2GB memory per copy 8 SSA, 16 loops, 4 disk/loop 114MB/sec 198MB/sec

  34. MIO Summary • Demonstrated performance gains • Simple to implement • Flexible run time interface • Delivered as a shared object library • Contact: bauerj@us.ibm.com

More Related