10 likes | 85 Views
Load Balancing in File Systems. Nadine Amsel Dr. Carlos Maltzahn Storage Systems Research Center (SSRC) at UCSC http://ssrc.cse.ucsc.edu. Results. Introduction. What is the length of each period of overload time?. Will more hardware prevent overload?.
E N D
Load Balancing in File Systems Nadine Amsel Dr. Carlos Maltzahn Storage Systems Research Center (SSRC) at UCSC http://ssrc.cse.ucsc.edu Results Introduction What is the length of each period of overload time? Will more hardware prevent overload? • A new breed of distributed, petabyte-scale file systems uses many Object Storage Devices (OSDs) • Search in such file systems requires OSDs to store large indices and cope with ever-changing hot spots due to a diverse query stream • What is the extent of query hot spots? How long do they persist? Most overload periods last only a few minutes. The distribution of period lengths follows a heavy-tailed power law so the variance is infinite (there is no stable average). Methods • Time-stamped queries by 500,000 AOL users over 3 months used to determine overload patterns • Each term in a query maps to one OSD (i.e. assuming a term-distributed index) • Two questions to answer: Overload occurs all the time. Just one overloaded OSD can slow down the whole storage system. • How many OSDs are overloaded? • How long does an OSD stay overloaded? The median overload length is ~4 minutes for 128 OSDs and ~2 minutes for 1K OSDs. In 99% of all cases, the overload period lasts no longer than an hour. • OSD address determined by taking the hash of the term and extracting the last n bits (where n is determined by the number of OSDs) • An OSD’s load is determined by the number of queries it receives per minute • Query traces analyzed using different numbers of OSDs and overload thresholds: • 128, 1K, and 64K OSDs • 10, 30, and 50 queries/minute overload thresholds Conclusion • Index query workloads cannot be effectively addressed by increasing the number of OSDs. • Load-balancing mechanism needs to adapt on a minute-by-minute basis and any mechanism that takes longer than an hour to adapt will not be able to keep up with 99% of the workload changes. The query workload leads to overload even if distributed over a large number of nodes. Increasing the number of nodes is not a solution. This work was completed as part of UCSC's SURF-IT summer undergraduate research program, an NSD CISE REU Site. This material is based upon work supported by the National Science Foundation under Grant No. CCF-0552688.