90 likes | 98 Views
This evaluation discusses the use of local SSD caches to avoid network throughput limitations in batch system workers. It explores the usage of a union file system for transparent redirection and evaluates the performance of different cache access methods.
E N D
Data Access Evaluation • ekpsg01: storagea, ssda, aufs • HPDA Update Talk - Cache Access Methods • EKP Tuesday Computing Meeting
Brief Overview • Goal: machine-local reads on batch system workers • Avoid network throughput limitation with host-local SSD caches • Persistent data on remote file server, copies on local devices • Using union file system for transparent redirection • Current • work item • Worker • Job • Job • Job • Job • union fs • SSD • File Server
Cache Device Evaluation Test • Artus JEC analysis (250GB Input) as reference • read from /storage/a, ssda or aufs (100% ssda) • via 32, 24, 16, 8, 4 or 2 concurrent processes • tracked by /usr/bin/time and dstat • Joram & • Dominik • Worker • dstat • Job • Job • Job • time • aufs • SSD • Manually • placed • File Server
Input Rate (Network + Drive) • Local reads (ssda & aufs) consistently faster, stalls at HT barrier • Remote read stalls at ~1Gb/s, scales badly after 4 cores • => Local cache is adequate for improving scalability • physical cores • logical cores • 1Gb/s • Fileserver • stalled
Event Rate (the thing that counts) • Translates host input speed (almost) directly to job • Local reads consistently faster, no loss from union filesystem • => Local cache delivers consistent performance improvement
Notable Conclusions • SSDs sufficient • enough space for J&D analysis • scaled nicely • aufs sufficient • no notable performance loss • no problem from dated version (used v2 vs. current v3) • Fileserver test problematic… • Peak network speed slower than expected (x0.1) • Instantaneous network speed varied widely (2MB/s-115MB/s) • Output directory broken for hours… • Interpolation suggests 40-80 cores sufficient for saturation • Got 64 in ekpsg0X,150 in ekpblus
BACKUP - Analysis • Artus Analysis • CMSSW_5_3_22 • ~250GB 2012 Data
BACKUP - Test Machine • Host: EKPSG01 • 64GB RAM • 32 Cores @ 2.60GHz (16 physical + 16 logical) • SL6, kernel 2.6.32-504.3.3.el6.aufs21.x86_64 • /scratch/ssda • Model=ADATA SX910 • 512 GB, Read 550 MB/s, Write 530 MB/s • /storage/a • Fileserver A via 10GBit/s ethernet • dd read ~115 MB/s (1GBit/s from FSA?) • /hpda/storage/hpda • aufs 2.0 mount • br=/scratch/ssda/storage/a=ro:/storage/a=rw