340 likes | 357 Views
This seminar discusses the problem of client-side caching in data delivery and explores the use of storage virtualization, specifically FreeLoader Desktop Storage Cache, as a solution. It addresses the challenges of wide-area data movement, latency tolerance, and limited storage options, and proposes a virtual cache approach using desktop storage scavenging. The seminar also compares FreeLoader with other storage systems and explores client access pattern aware striping for optimizing cache access.
E N D
Optimizing End-User Data Delivery Using Storage Virtualization Sudharshan Vazhkudai Oak Ridge National Laboratory Ohio State University Systems Group Seminar October 20th, 2006 Columbus, Ohio
Outline • Problem space: Client-side caching • Storage Virtualization: • FreeLoader Desktop Storage Cache • A Virtual cache: Prefix caching • End on a funny note!!
Problem Domain • Data Deluge • Experimental facilities: SNS, LHC (PBs/yr) • Observatories: sky surveys, world-wide telescopes • Simulations from NLCF end-stations • Internet archives: NIH GenBank (serves 100 gigabases of sequence data) • Typical user access traits on large scientific data • Download remote datasets using favorite tools • FTP, GridFTP, hsi, wget • Shared interest among groups of researchers • A Bioinformatics group collectively analyze and visualize a sequence database for a few days: Locality of interest! • Often times, discard original datasets after interest dissipates
So, what’s the problem with this story? • Wide-area data movement is full of pitfalls • Sever bottlenecks, BW/latency fluctuations • GridFTP-like tuned tools not widely available • Popular Internet repositories still served through modest transfer tools! • User applications are often latency intolerant • e.g., real-time viz rendering of a TerraServer map from Microsoft on ORNL’s tiled display! • Why can’t we address this with the current storage landscape? • Shared storage: Limited quotas • Dedicated storage: SAN storage is a non-trivial expense! (4TB disk array ~ $40K) • Local storage: Usually not enough for such large datasets • Archive in mass storage for future accesses: High latency • Upshot • Retrieval rates significantly lower than local I/O or LAN throughput
Is there a silver lining at all? (Desktop Traits) • Desktop Capabilities better than ever before • Space usage to Available storage ratio is significantly low in academic and industry settings • Increasing numbers of workstations online most of the time • At ORNL-CSMD, ~ 600 machines are estimated to be online at any given time • At NCSU, > 90% availability of 500 machines • Well-connected, secure LAN settings • A high-speed LAN connection can stream data faster than local disk I/O
Storage Virtualization? • Can we use novel storage abstractions to provide: • More storage than locally available • Better performance than local or remote I/O • A seamless architecture for accessing and storing transient data
Desktop Storage Scavenging as a means to virtualize I/O access • FreeLoader • Imagine Condor for storage • Harness the collective storage potential of desktop workstations ~ Harnessing idle CPU cycles • Increased throughput due to striping • Split large datasets into pieces, Morsels, and stripe them across desktops • Scientific data trends • Usually write-once-read-many • Remote copy held elsewhere • Primarily sequential accesses • Data trends + LAN-Desktop Traits + user access patterns make collaborative caches using storage scavenging a viable alternative!
Old wine in a new bottle…? • Key strategies derived from “best practices” across a broad range of storage paradigms… • Desktop Storage Scavenging from P2P systems • Striping, parallel I/O from parallel file systems • Caching from cooperative Web caching • And, applied to scientific data management for • Access locality, aggregating I/O, network bandwidth and data sharing • Posing new challenges and opportunities: heterogeneity, striping, volatility, donor impact, cache management and availability
FreeLoader Architecture • Lightweight UDP • Scavenger device: metadata bitmaps, morsel organization • Morsel service layer • Monitoring and Impact control • Global free space management • Metadata management • Soft-state registrations • Data placement • Cache management • Profiling
FreeLoader installed in a user’s HPC setting GridFTP access to NFS GridFTP access to PVFS hsi access to HPSS Cold data from tapes Hot data from disk caches wget access to Internet archive Testbed and Experiment setup
Optimizing access to the cache: Client Access-pattern Aware Striping • Uploading client likely to access more frequently • So, let’s try to optimize data placement for him! • Overlap network I/O with local I/O • What is the optimal local:remote data ratio? • Model
Philosophizing… • What the scavenged storage “is not”: • Not a file system, not a replacement to high-end storage • Not intended for wide-area resource integration • What it “is”: • Low-cost, best-effort storage cache for scientific data sources • Intended to facilitate • Transient access to large, read-only datasets • Data sharing within administrative domain • To be used in conjunction with higher-end storage systems
Towards a “virtual cache” • Scientific data caches typically host complete datasets • Not always feasible in our environment since: • Desktop workstations can fail or space contributions can be withdrawn leaving partial datasets • Not enough space in the cache to host the new dataset in entirety • Cache evictions can leave partial copies of datasets • Can we host partial copies of datasets and yet serve client accesses to the entire dataset? • ~ FileSystem-BufferCache:Disk :: FreeLoader:RemoteDataSource
The Prefix Caching Problem: Impedance Matching on Steroids!! • HTTP Prefix Caching • Multimedia, streaming data delivery • BitTorrent P2P System: leechers can download and yet serve • Benefits • Bootstrapping the download process • Store more datasets • Allows for efficient cache management • Oh…, that scientific data trends again (how convenient…) • Immutable data, Remote source copy, Primarily sequential accesses • Challenges • Clients should be oblivious to dataset being partially available • Performance hit? • How much of the prefix of a dataset to cache? • So, client accesses can progress seamlessly • Online patching issues • Client access to remote patching I/O mismatch • Wide-area download vagaries
Virtual Cache Architecture • Capability-based resource aggregation • Persistent storage & BW-only donors • Client serving: parallel get • Remote patching using URIs • Better cache management • Stripe entirely when space available • When eviction is needed, only stripe a prefix of the dataset • Victims based on LRU: • Evict chunks from the tail until a prefix • Entire datasets evicted only after all such tails are evicted
Prefix Size Prediction • Goal: Eliminate client perceived delay in data access • What is an optimal prefix size to hide the cost of suffix patching? • Prefix size depends on: • Dataset size, S • In-cache data access rate by the client, Rclient • Suffix patching rate, Rpatch • Initial latency in suffix patching, L • Client access rate indicative of time to patch, S/Rclient = L + (S – Sprefix)/Rpatch • Thus, Sprefix = S(1 – Rpatch/Rclient) + LRpatch
Collective Download • Why? • Wide-area transfer reasons: • Storage systems and protocols for HEC are tuned for bulk transfers (GridFTP, HSI) • Wide-area transfer pitfalls: high latency, connection establishment cost • Client’s local-area cache access reasons: • Client accesses to the cache use a smaller stripe size (e.g., 1MB chunks in FreeLoader) • Finer granularity for better client access rates • Can we derive from collective I/O in parallel I/O
Collective Download Implementation • Patching nodes perform bulk, remote I/O; ~ 256MB per request • Reducing multiple authentication costs per dataset • Automated interactive session with “Expect” for single sign on • FreeLoader patching framework instrumented with Expect • Protocol needs to allow sessions (GridFTP, HSI) • Need to reconcile the mismatch in client access stripe size and the bulk, remote I/O request size • Shuffling • Patching nodes, p, redistribute the downloaded chunks among themselves according to the client’s striping policy • Redistribution will enable a round-robin client access • Each patching node redistributes (p – 1)/p of the downloaded data • Shuffling accomplished in memory to motivate BW-only donors • Thus, client serving, collective download and shuffling are all overlapped
Testbed and Experiment setup • UberFTP stateful client to GridFTP servers at TeraGrid-PSC and TeraGrid-ORNL • HSI access to HPSS • Cold data from tapes • FreeLoader patching framework deployed in this setting
Impact of Prefix Caching on Cache Hit rate • Tera-ORNL will see improvements around 0.2 and 0.4 curve (308% and 176% for 20% and 40% prefix ratio) • Tera-PSC sees up to 76% improvement in hit rate with 80% prefix ratio
Intermediate data cache exploits this area Let me philosophize again… • Novel storage abstractions as a means to: • Provide performance impedance matching • Overlap remote I/O, cache I/O and local I/O into a seamless “data pathway” • Provide rich resource aggregation models • Provide a low-cost, best-effort architecture for “transient” data • A combination of best practices from: parallel I/O, P2P scavenging, cooperative caching, HTTP multimedia streaming; brought to bear on “scientific data caching”
Let me advertise… • http://www.csm.ornl.gov/~vazhkuda/Storage.html • Email: vazhkudaiss@ornl.gov • Collaborator: Xiaosong Ma (NCSU) • Funding: DOE ORNL LDRD (Terascale & Petascale initiatives) • Interested in joining our team? • Full time positions and summer internships available
More slides • Some performance numbers • Impact studies